Skip to content

Update the ClearML health check to ensure that the autoscaler is operational#896

Open
pmachapman wants to merge 1 commit intomainfrom
autoscaler_health_check
Open

Update the ClearML health check to ensure that the autoscaler is operational#896
pmachapman wants to merge 1 commit intomainfrom
autoscaler_health_check

Conversation

@pmachapman
Copy link
Copy Markdown
Collaborator

@pmachapman pmachapman commented Apr 2, 2026

Fixes #584


This change is Reviewable

@pmachapman pmachapman requested review from Enkidu93 and ddaspit April 2, 2026 00:33
@pmachapman pmachapman changed the title Add a health check to ensure that the autoscaler is operational Update the ClearML health check to ensure that the autoscaler is operational Apr 2, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.87%. Comparing base (7cf7238) to head (6cb2473).

Files with missing lines Patch % Lines
...rval.Machine.Shared/Services/ClearMLHealthCheck.cs 0.00% 52 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #896      +/-   ##
==========================================
- Coverage   67.98%   67.87%   -0.12%     
==========================================
  Files         386      386              
  Lines       21224    21256      +32     
  Branches     2740     2749       +9     
==========================================
- Hits        14429    14427       -2     
- Misses       5810     5844      +34     
  Partials      985      985              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you confirmed that this works - at least that the apps.get_instances call responds properly under normal conditions?

@Enkidu93 reviewed 1 file and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on ddaspit).

Copy link
Copy Markdown
Collaborator Author

@pmachapman pmachapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you confirmed that this works - at least that the apps.get_instances call responds properly under normal conditions?

@Enkidu93 Yes, I stepped through it both when an e2e test was running (and so using the autoscaler), and when no e2e test was running (and so there were no workers for the autoscaler).

@pmachapman made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on ddaspit).

Copy link
Copy Markdown
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you.

@Enkidu93 made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on ddaspit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ClearML health check failing with autoscalar only

3 participants