Implement node status synchronization in Controller, including downtime notifications and online/offline management by Paragrf · Pull Request #388 · apache/kvrocks-controller

Paragrf · 2026-04-10T03:23:38Z

Background
To improve cluster reliability and operational efficiency, the controller needs to bridge the status gap between slave nodes and the server. Beyond just reporting unexpected failures, it is essential to support proactive maintenance workflows, allowing operators to safely remove traffic before performing node updates or hardware swaps.

Key Changes

Downtime Push: The controller now monitors slave node health and proactively pushes "Downtime" alerts to the server upon detection of a crash or heartbeat timeout.
Manual Offline (Traffic Draining): Supports a proactive "Offline" command. This allows the server to drain/stop traffic to a specific slave node before any maintenance work begins, ensuring zero-impact operations.
Manual Online: Supports a "Ready-to-Serve" notification when a slave is back online and fully synchronized, allowing the server to safely re-enable traffic.

Related Issues
Fixes #385

codecov-commenter · 2026-04-10T03:36:31Z

Codecov Report

❌ Patch coverage is 79.04762% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.09%. Comparing base (6c56470) to head (87937e4).
⚠️ Report is 105 commits behind head on unstable.

Files with missing lines	Patch %	Lines
controller/cluster.go	54.16%	7 Missing and 4 partials ⚠️
server/api/node.go	73.91%	4 Missing and 2 partials ⚠️
server/helper/helper.go	0.00%	2 Missing ⚠️
store/cluster.go	95.45%	1 Missing and 1 partial ⚠️
server/route.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #388      +/-   ##
============================================
+ Coverage     43.38%   50.09%   +6.70%     
============================================
  Files            37       45       +8     
  Lines          2971     3885     +914     
============================================
+ Hits           1289     1946     +657     
- Misses         1544     1724     +180     
- Partials        138      215      +77

Flag	Coverage Δ
unittests	`50.09% <79.04%> (+6.70%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

paragrf added 2 commits April 10, 2026 11:08

feat(api): add Online/Offline interfaces

81d0b5f

feat(ha): topology is also updated upon slave node failure

87937e4

git-hulk self-requested a review April 10, 2026 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement node status synchronization in Controller, including downtime notifications and online/offline management#388

Implement node status synchronization in Controller, including downtime notifications and online/offline management#388
Paragrf wants to merge 2 commits intoapache:unstablefrom
Paragrf:ha

Paragrf commented Apr 10, 2026

Uh oh!

codecov-commenter commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Paragrf commented Apr 10, 2026

Uh oh!

codecov-commenter commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Apr 10, 2026 •

edited

Loading