Skip to content

Wait for all timeseries deletions to complete#10135

Merged
bnaecker merged 2 commits intomainfrom
ben/fix-replicated-expunge-timeseries
Mar 30, 2026
Merged

Wait for all timeseries deletions to complete#10135
bnaecker merged 2 commits intomainfrom
ben/fix-replicated-expunge-timeseries

Conversation

@bnaecker
Copy link
Copy Markdown
Collaborator

This is an attempt to address #10052, a flaky test deleting a timeseries by name. It's hard to generate a repro for that failure, since it relies on racing the deletion across all the ClickHouse nodes. But this should probably address it by ensuring we completely wait for the deletion to occur on all replicas before returning.

This does make the query slower but (1) we don't run this today and (2) we definitely don't run multinode ClickHouse, which is the only place this would matter anyway.

This is an attempt to address #10052, a flaky test deleting a timeseries
by name. It's hard to generate a repro for that failure, since it relies
on racing the deletion across all the ClickHouse nodes. But this should
probably address it by ensuring we completely wait for the deletion to
occur on all replicas before returning.

This does make the query slower but (1) we don't run this today and (2)
we definitely don't run multinode ClickHouse, which is the only place
this would matter anyway.
@bnaecker bnaecker requested a review from jmcarp March 24, 2026 17:32
@bnaecker
Copy link
Copy Markdown
Collaborator Author

Ping on this quick one @jmcarp :)

Copy link
Copy Markdown
Contributor

@jmcarp jmcarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. I have some questions, but I don't see any harm in merging.

  • Is this a regression, or has the test been flaky since we added it?
  • Can we add a comment explaining the query settings? I'm guessing the goal is to ensure that the delete is synchronous so that subsequent test assertions pass, but might as well write it down.
  • Nit, and I don't even know that I like this idea, but if we really didn't want to make the delete feel slow, we could leave the query as-is, and have the test poll clickhouse's system.mutations until there aren't any pending mutations. But it's not clear that we care if this function is slow, and I'm guessing it doesn't run often.

All of which is to say, LGTM.

@bnaecker
Copy link
Copy Markdown
Collaborator Author

* Is this a regression, or has the test been flaky since we added it?

I think this is a long-standing issue, but it seems to be worse after #10012.

* Can we add a comment explaining the query settings? I'm guessing the goal is to ensure that the delete is synchronous so that subsequent test assertions pass, but might as well write it down.

Yes, that's right. I'll add a note.

* Nit, and I don't even know that I like this idea, but if we really didn't want to make the delete feel slow, we could leave the query as-is, and have the test poll clickhouse's system.mutations until there aren't any pending mutations. But it's not clear that we care if this function is slow, and I'm guessing it doesn't run often.

That's true, but we don't care for a few reasons. We don't run this ever outside of tests, because no customers are running multinode ClickHouse yet. That worked stalled out. Second, it's during schema migrations, which we don't currently do in either single- or multi-node deployments.

@bnaecker bnaecker enabled auto-merge (squash) March 30, 2026 17:37
@bnaecker bnaecker merged commit 1bb7b11 into main Mar 30, 2026
16 checks passed
@bnaecker bnaecker deleted the ben/fix-replicated-expunge-timeseries branch March 30, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants