-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "blocked/not blocked" in job count metrics #2945
Comments
It's very important because it commands the auto-scaler. We don't want the number of pods to remain high while all the jobs are blocked. |
reference: #2279 (comment) |
Pending (#2949 (comment)):
|
I think it's OK to wait for the next cron job (every 10 minutes in prod, compared to the duration of the dataset blockage: 6 hours) |
We still need to be able to filter out the blocked jobs from the charts in https://grafana.huggingface.tech/d/i7gwsO5Vz/global-view?orgId=1 (or show two curves: blocked / not blocked) because it's misleading otherwise |
Now that we block datasets, the job count metrics are a bit misleading, because they still include the jobs of blocked datasets. We need to be able to filter them out, because they are outside of the queue during the blockage.
The text was updated successfully, but these errors were encountered: