Speed up local parametric testing capabilities by 2-5x by bm1549 · Pull Request #6256 · DataDog/system-tests

bm1549 · 2026-02-09T22:01:51Z

Motivation

Iterating on parametric tests locally is slow because run.sh rebuilds the library Docker image on every invocation, even when nothing has changed. This is wasted time when only the test code is changing.

Changes

Adds --skip-parametric-build (and SKIP_PARAMETRIC_BUILD=1) to run.sh: when the flag is set and the library image already exists, the build step is skipped (~3–4s saved per run).

What was dropped from the earlier version of this PR:

The original PR also added a +1 flag that forced a single xdist worker and enabled library container reuse between tests. Per reviewer feedback, this was a blocker:

run.sh aims to stay close to bare pytest; the right way to set worker count is -n <int> (standard pytest CLI), not a custom +1 shorthand.
Parametric tests are designed to run with different library_env parameters — sequential container reuse causes test pollution failures, which is the primary use case for this scenario.

The --skip-parametric-build flag is the main benefit for local dev iteration and carries no correctness risk.

Example:

# Edit test code, then re-run quickly without rebuilding the image
TEST_LIBRARY=nodejs ./run.sh PARAMETRIC --skip-parametric-build tests/parametric/test_tracer.py

Workflow

⚠️ Create your PR as draft ⚠️
Work on you PR until the CI passes
Mark it as ready for review
- Test logic is modified? -> Get a review from RFC owner.
- Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present
A scenario is added, removed or renamed?
- Get a review from R&P team

…rary builds and reuse containers. Introduce new rules for parametric testing in documentation. Update related scripts and fixtures for improved performance during test runs.

github-actions · 2026-02-09T22:02:20Z

CODEOWNERS have been resolved as:

.cursor/rules/parametric-testing.mdc                                    @DataDog/system-tests-core
AGENTS.md                                                               @DataDog/system-tests-core
conftest.py                                                             @DataDog/system-tests-core
docs/understand/scenarios/parametric.md                                 @DataDog/system-tests-core
utils/_context/_scenarios/parametric.py                                 @DataDog/system-tests-core
utils/docker_fixtures/_test_agent.py                                    @DataDog/system-tests-core

datadog-datadog-prod-us1 · 2026-02-09T22:08:37Z

⚠️ Tests

✨ Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 36 Tests failed

tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic.test_ffe_eval_metric_basic[chi] from system_tests_suite

(Fix with Cursor)

AssertionError: Expected at least one feature_flag.evaluations metric for flag 'eval-metric-basic-flag', but found none. All eval metrics: []
assert 0 > 0
 +  where 0 = len([])

self = <tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic object at 0x7f40202c5670>

    def test_ffe_eval_metric_basic(self):
        """Test that flag evaluation produces a metric with correct tags."""
        assert self.r.status_code == 200, f"Flag evaluation failed: {self.r.text}"
    
...

tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic.test_ffe_eval_metric_basic[echo] from system_tests_suite

(Fix with Cursor)

AssertionError: Expected at least one feature_flag.evaluations metric for flag 'eval-metric-basic-flag', but found none. All eval metrics: []
assert 0 > 0
 +  where 0 = len([])

self = <tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic object at 0x7f94242d0ce0>

    def test_ffe_eval_metric_basic(self):
        """Test that flag evaluation produces a metric with correct tags."""
        assert self.r.status_code == 200, f"Flag evaluation failed: {self.r.text}"
    
...

tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic.test_ffe_eval_metric_basic[gin] from system_tests_suite

(Fix with Cursor)

AssertionError: Expected at least one feature_flag.evaluations metric for flag 'eval-metric-basic-flag', but found none. All eval metrics: []
assert 0 > 0
 +  where 0 = len([])

self = <tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic object at 0x7f328deacf80>

    def test_ffe_eval_metric_basic(self):
        """Test that flag evaluation produces a metric with correct tags."""
        assert self.r.status_code == 200, f"Flag evaluation failed: {self.r.text}"
    
...

View all

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: da87747 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

nccatoni

These changes could cause reliability issues. While it might be fine to speedup local development you should make sure that all parametric tests do indeed pass. You should also get a review from @cbeauchesne (OOO this week)

nccatoni · 2026-02-11T10:00:23Z

docs/scenarios/parametric.md


+### Making parametric runs faster
+
+- **Skip the build when the image already exists:** Use `--skip-parametric-build` (or set `SKIP_PARAMETRIC_BUILD=1`) when you are only changing test code. This avoids rebuilding the parametric library image on every run. When you change the Dockerfile or app code under `utils/build/docker/<lang>/parametric/`, run without this option so the image is rebuilt.


Are you sure that docker doesn't already cache the images in this case ?

Going based on empirical evidence, it saved an additional ~1-2 seconds per run since we avoid the layer cache validation

docs/scenarios/parametric.md

cbeauchesne · 2026-02-18T10:21:03Z

tests/parametric/conftest.py

 ) -> Generator[TestAgentAPI, None, None]:
+    session_agent = request.getfixturevalue("_session_test_agent")
+    if worker_id == "master" and not agent_env and session_agent is not None:
+        yield session_agent


I'm not sure to get how this works, what if agent_env/otlp ports change from one test to another, it won't be applied on session_agent, right ?

cbeauchesne

If I understand correctly, this option is only meant for local dev.

I usually strongly recommand to use nodeid among run.sh when running on local on parametric tests, it allows to run only the subset of test. I'm wondering what is the benefit of this change using this setup ?

bm1549 · 2026-03-09T18:03:10Z

@cbeauchesne Great question — I ran benchmarks to compare the two approaches on TEST_LIBRARY=nodejs with tests/parametric/test_tracer.py.

Single test re-run (the targeted use case)

Approach	Run 1	Run 2	Run 3	Avg wall time
Nodeid only (`./run.sh PARAMETRIC <nodeid>`)	12.2s	9.8s	10.5s	~10.5s
PR approach (`./run.sh PARAMETRIC +1 --skip-parametric-build <nodeid>`)	5.5s	6.0s	5.7s	~5.7s

~1.8x faster for a single test re-run. The savings come from:

--skip-parametric-build skips the Docker image build step (~3–4s saved). Without this flag, run.sh rebuilds/revalidates the parametric library image on every invocation, even when nothing changed.
+1 avoids spawning 10 xdist workers (gw0–gw9) for a single test.

Running a whole file (e.g. test_tracer.py, ~14 tests)

Approach	Wall time	Test failures
Nodeid only (`./run.sh PARAMETRIC tests/parametric/test_tracer.py`)	~20s	0
PR approach (`./run.sh PARAMETRIC +1 --skip-parametric-build tests/parametric/test_tracer.py`)	~70s	3 failures (container pollution)

For a whole file, nodeid-only is ~3.5x faster because xdist runs tests in parallel. The PR approach runs tests sequentially and the container reuse causes test failures via state pollution.

Summary

You're right that nodeid already narrows the test subset — and for running multiple tests, nodeid-only is both faster and more reliable. The benefit of +1 --skip-parametric-build is specifically for the common "edit → re-run one test → edit → re-run" loop, where the Docker image is never changing. The combination makes that cycle ~1.8x faster by skipping the image build each time.

The docs in this PR try to call this out (the "single-worker runs may cause failures" warning), but I agree the use case is narrower than I initially presented. Happy to adjust the recommendation to be more explicit that +1 is only for single-test re-runs.

Based on benchmark data: nodeid alone is faster for multi-test runs (~20s vs ~70s) due to xdist parallelism. +1 --skip-parametric-build is ~1.8x faster only for single-test repeated re-runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cbeauchesne · 2026-03-12T09:38:48Z

+1 avoids spawning 10 xdist workers (gw0–gw9) for a single test.

We try to keep run.sh as close as possible from a bare pytest call. It means we must stick to the pytest CLI, in that case, it'll be -n <int>.

The PR approach runs tests sequentially and the container reuse causes test failures via state pollution.

To me, that's a blocker (I think it's related to my point here). The full point of parametric test is to have different parameters for a single tests, so that's by design the main use case of this scenario.

WDYT of just keeping the "no rebuild" feature, since it'll be the main benefit for the local dev using one nodeid ?

The single-worker container reuse caused test pollution failures when running multiple tests, which is the primary use case for parametric tests. The +1 flag also diverged from the pytest CLI convention (run.sh should stay close to bare pytest, so -n <int> is the right form). Keep only --skip-parametric-build (and SKIP_PARAMETRIC_BUILD=1), which skips rebuilding the library image on each invocation and saves ~3-4s per run with no correctness risk. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bm1549 · 2026-03-19T21:01:09Z

WDYT of just keeping the "no rebuild" feature, since it'll be the main benefit for the local dev using one nodeid ?

Works for me! Just made the edits so it only does no rebuild

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

utils/_context/_scenarios/parametric.py

Consolidate env-var-to-option defaulting in pytest_configure alongside similar logic for other options, instead of inline in parametric.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Enhance parametric testing capabilities by adding options to skip lib…

ac44597

…rary builds and reuse containers. Introduce new rules for parametric testing in documentation. Update related scripts and fixtures for improved performance during test runs.

bm1549 added the ai-generated The pull request includes a significant amount of AI-generated code label Feb 9, 2026

bm1549 changed the title ~~Speed up local parametric testing capabilities~~ Speed up local parametric testing capabilities by 2-5x Feb 10, 2026

bm1549 mentioned this pull request Feb 10, 2026

Add tests for startup log behavior in tracer libraries #6241

Merged

5 tasks

nccatoni reviewed Feb 11, 2026

View reviewed changes

Document weird failures using single-worker runs

5cc6b09

cbeauchesne reviewed Feb 18, 2026

View reviewed changes

Merge origin/main: resolve conflicts in AGENTS.md and conftest.py

f41407a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bm1549 marked this pull request as ready for review March 19, 2026 21:24

bm1549 requested a review from a team as a code owner March 19, 2026 21:24

bm1549 requested a review from cbeauchesne March 19, 2026 21:24

cbeauchesne requested changes Mar 23, 2026

View reviewed changes

utils/_context/_scenarios/parametric.py Outdated Show resolved Hide resolved

bm1549 and others added 2 commits March 23, 2026 14:50

Move SKIP_PARAMETRIC_BUILD env var defaulting to conftest.py

8637eca

Consolidate env-var-to-option defaulting in pytest_configure alongside similar logic for other options, instead of inline in parametric.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' into brian.marks/faster-parametric

8ba222c

bm1549 requested a review from cbeauchesne March 23, 2026 18:57

cbeauchesne approved these changes Mar 23, 2026

View reviewed changes

Merge branch 'main' into brian.marks/faster-parametric

68f3585

bm1549 enabled auto-merge (squash) March 23, 2026 20:54

Merge branch 'main' into brian.marks/faster-parametric

acc65e4

gh-worker-dd-devflow-36fce6 bot added the mergequeue-status: waiting label Mar 24, 2026

Trigger CI re-run

da87747

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gh-worker-dd-devflow-36fce6 bot removed the mergequeue-status: waiting label Mar 25, 2026

gh-worker-dd-devflow-36fce6 bot added the mergequeue-status: removed label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up local parametric testing capabilities by 2-5x#6256

Speed up local parametric testing capabilities by 2-5x#6256
bm1549 wants to merge 10 commits intomainfrom
brian.marks/faster-parametric

bm1549 commented Feb 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1 bot commented Feb 9, 2026 •

edited by datadog-prod-us1-4 bot

Loading

Uh oh!

nccatoni left a comment

Uh oh!

nccatoni Feb 11, 2026

Uh oh!

bm1549 Feb 11, 2026

Uh oh!

Uh oh!

cbeauchesne Feb 18, 2026

Uh oh!

cbeauchesne left a comment

Uh oh!

bm1549 commented Mar 9, 2026

Uh oh!

cbeauchesne commented Mar 12, 2026 •

edited

Loading

Uh oh!

bm1549 commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		### Making parametric runs faster

		- Skip the build when the image already exists: Use `--skip-parametric-build` (or set `SKIP_PARAMETRIC_BUILD=1`) when you are only changing test code. This avoids rebuilding the parametric library image on every run. When you change the Dockerfile or app code under `utils/build/docker/<lang>/parametric/`, run without this option so the image is rebuilt.

Conversation

bm1549 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Workflow

Reviewer checklist

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-datadog-prod-us1 bot commented Feb 9, 2026 • edited by datadog-prod-us1-4 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

nccatoni left a comment

Choose a reason for hiding this comment

Uh oh!

nccatoni Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

bm1549 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cbeauchesne Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

cbeauchesne left a comment

Choose a reason for hiding this comment

Uh oh!

bm1549 commented Mar 9, 2026

Single test re-run (the targeted use case)

Running a whole file (e.g. test_tracer.py, ~14 tests)

Summary

Uh oh!

cbeauchesne commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bm1549 commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bm1549 commented Feb 9, 2026 •

edited

Loading

github-actions bot commented Feb 9, 2026 •

edited

Loading

datadog-datadog-prod-us1 bot commented Feb 9, 2026 •

edited by datadog-prod-us1-4 bot

Loading

cbeauchesne commented Mar 12, 2026 •

edited

Loading