Skip to content

Fix flaky test_support_bundle_lifecycle#10181

Merged
smklein merged 4 commits intomainfrom
bundle-flake
Mar 30, 2026
Merged

Fix flaky test_support_bundle_lifecycle#10181
smklein merged 4 commits intomainfrom
bundle-flake

Conversation

@smklein
Copy link
Copy Markdown
Collaborator

@smklein smklein commented Mar 27, 2026

The sp_ereport_ingester background task runs on a 30-second timer and ingests ereports from simulated SPs into the database. The support bundle collection test asserts that zero ereports are found, but if the ingester fires before the bundle collector runs, it populates the DB with 11 ereports from the simulated SPs (configured in sp_sim_config.test.toml), causing the assertion to fail.

Fix this by disabling the sp_ereport_ingester in the test config. The ingester has its own dedicated tests that construct it directly, so no test coverage is lost. Comments are added at both assertion sites explaining the dependency on the ingester being disabled.

Fixes #10179

The sp_ereport_ingester background task runs on a 30-second timer and
ingests ereports from simulated SPs into the database. The support
bundle collection test asserts that zero ereports are found, but if the
ingester fires before the bundle collector runs, it populates the DB
with 11 ereports from the simulated SPs (configured in
sp_sim_config.test.toml), causing the assertion to fail
nondeterministically.

Fix this by disabling the sp_ereport_ingester in the test config. The
ingester has its own dedicated tests that construct it directly, so no
test coverage is lost. Comments are added at both assertion sites
explaining the dependency on the ingester being disabled.
@smklein smklein requested a review from hawkw March 27, 2026 22:07
@hawkw
Copy link
Copy Markdown
Member

hawkw commented Mar 27, 2026

The sp_ereport_ingester background task runs on a 30-second timer and ingests ereports from simulated SPs into the database. The support bundle collection test asserts that zero ereports are found, but if the ingester fires before the bundle collector runs, it populates the DB with 11 ereports from the simulated SPs (configured in sp_sim_config.test.toml), causing the assertion to fail.

Gah, sorry. This one's my fault. 😬

Copy link
Copy Markdown
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the change to lose the ereport ingester task's output in test_omdb_success_cases is a bit of a bummer, I kinda wonder if we could have those tests edit the config file to turn the task back on or something? But if that's too annoying, I'd definitely rather fix the flake than test the actual output...

Comment on lines -905 to -910
total ereports received: <TOTAL_EREPORTS_RECEIVED_REDACTED>
new ereports ingested: <NEW_EREPORTS_INGESTED_REDACTED>
total HTTP requests sent: <TOTAL_HTTP_REQUESTS_SENT_REDACTED>
total collection errors: <TOTAL_COLLECTION_ERRORS_REDACTED>
reporters with ereports: <REPORTERS_WITH_EREPORTS_REDACTED>
reporters with collection errors: <REPORTERS_WITH_COLLECTION_ERRORS_REDACTED>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, losing this is a bit of a shame, I'd really much rather see the more realistic output in the test output here...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can tweak the omdb test to explicitly enable this task - which will preserve this output

I do think "random ereports might be injected to your test" could cause some unexpected downstream effects, so disabling sp_ereport_ingester seems like a reasonable default, but the omdb tests seem totally clear to opt-into it.

webhook_deliverator.second_retry_backoff_secs = 20
read_only_region_replacement_start.period_secs = 999999
sp_ereport_ingester.period_secs = 30
sp_ereport_ingester.disable = true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would maybe like to see a comment here explaining why this task is disabled for tests.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +136 to +146
// Use a custom ControlPlaneBuilder so we can enable background tasks
// which might otherwise be disabled.
// We want it enabled here so the omdb is realistic.
let cptestctx =
nexus_test_utils::ControlPlaneBuilder::new("test_omdb_success_cases")
.with_extra_sled_agents(1)
.customize_nexus_config(&|config| {
config.pkg.background_tasks.sp_ereport_ingester.disable = false;
})
.start::<omicron_nexus::Server>()
.await;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me! I agree with your comment on matrix that in the long term, it would be better to just not have the simulated emit the same ereports i nevery test, but this seems good in the meantime.

@smklein smklein merged commit 2050ef3 into main Mar 30, 2026
16 checks passed
@smklein smklein deleted the bundle-flake branch March 30, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test failed in CI: test_support_bundle_lifecycle

2 participants