Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Longhaul improvements #156

Open
tanvigour opened this issue Dec 12, 2022 · 2 comments
Open

Longhaul improvements #156

tanvigour opened this issue Dec 12, 2022 · 2 comments
Assignees

Comments

@tanvigour
Copy link
Contributor

Describe the proposal

Longhaul infrastructure and monitoring needs to be improved in order for the community to restore trust in the Longhaul tracking

Following are the proposed improvements in this space, ordered as per priority

  1. Need to be able to reproduce scenarios
  2. Better Documentation - Longhauls, dashboards, metrics etc
  3. Make longhaul dashboards publicly accessible: Running Longhaul using different subscription that doesn't have MSIT restrictions might help with this
  4. Define what is the measurement goal using Longhaul
  5. Change metric to be more towards semantic correctness in longhaul dashboards
  6. Introduce chaos to Longhaul tracking to monitor the behavior for tracking longterm stability
  7. Have a terraform script for longhaul for people to deploy their own version of of tests.
  8. Decide if performance metric needs to be a part of Longhaul dashboards, if not, how else do we track them well.
  9. Introduce a centralized data store for aggregating use cases

cc: @halspang @johnewart @artursouza

@tanvigour
Copy link
Contributor Author

dapr/proposals#17

@tmacam
Copy link
Contributor

tmacam commented Nov 7, 2023

Some additional enhancements in light of #211 and #201.

  • Use Azure-hosts prometheus and grafana instead of helm-installed ones and retire/remove prometheus and grafana in this cluster
  • Configure Group-based auth to hosted grafana so members with access to the OSS sub can authenticate using Microsoft Entra.
  • Configure DNS aliases for the hosted grafana instances.
  • Configure logging for AKS instances of longhaul clusters
  • Use AAD Authentication instead of passwords

Whenever possible, those should be done by means of bicep updates so we can treat those clusters more like cattle and less like pets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants