Skip to content

Speed up local parametric testing capabilities by 2-5x#6256

Open
bm1549 wants to merge 10 commits intomainfrom
brian.marks/faster-parametric
Open

Speed up local parametric testing capabilities by 2-5x#6256
bm1549 wants to merge 10 commits intomainfrom
brian.marks/faster-parametric

Conversation

@bm1549
Copy link
Contributor

@bm1549 bm1549 commented Feb 9, 2026

Motivation

Iterating on parametric tests locally is slow because run.sh rebuilds the library Docker image on every invocation, even when nothing has changed. This is wasted time when only the test code is changing.

Changes

Adds --skip-parametric-build (and SKIP_PARAMETRIC_BUILD=1) to run.sh: when the flag is set and the library image already exists, the build step is skipped (~3–4s saved per run).

What was dropped from the earlier version of this PR:

The original PR also added a +1 flag that forced a single xdist worker and enabled library container reuse between tests. Per reviewer feedback, this was a blocker:

  • run.sh aims to stay close to bare pytest; the right way to set worker count is -n <int> (standard pytest CLI), not a custom +1 shorthand.
  • Parametric tests are designed to run with different library_env parameters — sequential container reuse causes test pollution failures, which is the primary use case for this scenario.

The --skip-parametric-build flag is the main benefit for local dev iteration and carries no correctness risk.

Example:

# Edit test code, then re-run quickly without rebuilding the image
TEST_LIBRARY=nodejs ./run.sh PARAMETRIC --skip-parametric-build tests/parametric/test_tracer.py

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added, removed or renamed?

…rary builds and reuse containers. Introduce new rules for parametric testing in documentation. Update related scripts and fixtures for improved performance during test runs.
@bm1549 bm1549 added the ai-generated The pull request includes a significant amount of AI-generated code label Feb 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

CODEOWNERS have been resolved as:

.cursor/rules/parametric-testing.mdc                                    @DataDog/system-tests-core
AGENTS.md                                                               @DataDog/system-tests-core
conftest.py                                                             @DataDog/system-tests-core
docs/understand/scenarios/parametric.md                                 @DataDog/system-tests-core
utils/_context/_scenarios/parametric.py                                 @DataDog/system-tests-core
utils/docker_fixtures/_test_agent.py                                    @DataDog/system-tests-core

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Feb 9, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 36 Tests failed

tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic.test_ffe_eval_metric_basic[chi] from system_tests_suite   View in Datadog   (Fix with Cursor)
AssertionError: Expected at least one feature_flag.evaluations metric for flag 'eval-metric-basic-flag', but found none. All eval metrics: []
assert 0 > 0
 +  where 0 = len([])

self = <tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic object at 0x7f40202c5670>

    def test_ffe_eval_metric_basic(self):
        """Test that flag evaluation produces a metric with correct tags."""
        assert self.r.status_code == 200, f"Flag evaluation failed: {self.r.text}"
    
...
tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic.test_ffe_eval_metric_basic[echo] from system_tests_suite   View in Datadog   (Fix with Cursor)
AssertionError: Expected at least one feature_flag.evaluations metric for flag 'eval-metric-basic-flag', but found none. All eval metrics: []
assert 0 > 0
 +  where 0 = len([])

self = <tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic object at 0x7f94242d0ce0>

    def test_ffe_eval_metric_basic(self):
        """Test that flag evaluation produces a metric with correct tags."""
        assert self.r.status_code == 200, f"Flag evaluation failed: {self.r.text}"
    
...
tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic.test_ffe_eval_metric_basic[gin] from system_tests_suite   View in Datadog   (Fix with Cursor)
AssertionError: Expected at least one feature_flag.evaluations metric for flag 'eval-metric-basic-flag', but found none. All eval metrics: []
assert 0 > 0
 +  where 0 = len([])

self = <tests.ffe.test_flag_eval_metrics.Test_FFE_Eval_Metric_Basic object at 0x7f328deacf80>

    def test_ffe_eval_metric_basic(self):
        """Test that flag evaluation produces a metric with correct tags."""
        assert self.r.status_code == 200, f"Flag evaluation failed: {self.r.text}"
    
...
View all

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: da87747 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@bm1549 bm1549 changed the title Speed up local parametric testing capabilities Speed up local parametric testing capabilities by 2-5x Feb 10, 2026
Copy link
Collaborator

@nccatoni nccatoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes could cause reliability issues. While it might be fine to speedup local development you should make sure that all parametric tests do indeed pass. You should also get a review from @cbeauchesne (OOO this week)


### Making parametric runs faster

- **Skip the build when the image already exists:** Use `--skip-parametric-build` (or set `SKIP_PARAMETRIC_BUILD=1`) when you are only changing test code. This avoids rebuilding the parametric library image on every run. When you change the Dockerfile or app code under `utils/build/docker/<lang>/parametric/`, run without this option so the image is rebuilt.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that docker doesn't already cache the images in this case ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going based on empirical evidence, it saved an additional ~1-2 seconds per run since we avoid the layer cache validation

) -> Generator[TestAgentAPI, None, None]:
session_agent = request.getfixturevalue("_session_test_agent")
if worker_id == "master" and not agent_env and session_agent is not None:
yield session_agent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to get how this works, what if agent_env/otlp ports change from one test to another, it won't be applied on session_agent, right ?

Copy link
Collaborator

@cbeauchesne cbeauchesne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this option is only meant for local dev.

I usually strongly recommand to use nodeid among run.sh when running on local on parametric tests, it allows to run only the subset of test. I'm wondering what is the benefit of this change using this setup ?

@bm1549
Copy link
Contributor Author

bm1549 commented Mar 9, 2026

@cbeauchesne Great question — I ran benchmarks to compare the two approaches on TEST_LIBRARY=nodejs with tests/parametric/test_tracer.py.

Single test re-run (the targeted use case)

Approach Run 1 Run 2 Run 3 Avg wall time
Nodeid only (./run.sh PARAMETRIC <nodeid>) 12.2s 9.8s 10.5s ~10.5s
PR approach (./run.sh PARAMETRIC +1 --skip-parametric-build <nodeid>) 5.5s 6.0s 5.7s ~5.7s

~1.8x faster for a single test re-run. The savings come from:

  1. --skip-parametric-build skips the Docker image build step (~3–4s saved). Without this flag, run.sh rebuilds/revalidates the parametric library image on every invocation, even when nothing changed.
  2. +1 avoids spawning 10 xdist workers (gw0–gw9) for a single test.

Running a whole file (e.g. test_tracer.py, ~14 tests)

Approach Wall time Test failures
Nodeid only (./run.sh PARAMETRIC tests/parametric/test_tracer.py) ~20s 0
PR approach (./run.sh PARAMETRIC +1 --skip-parametric-build tests/parametric/test_tracer.py) ~70s 3 failures (container pollution)

For a whole file, nodeid-only is ~3.5x faster because xdist runs tests in parallel. The PR approach runs tests sequentially and the container reuse causes test failures via state pollution.

Summary

You're right that nodeid already narrows the test subset — and for running multiple tests, nodeid-only is both faster and more reliable. The benefit of +1 --skip-parametric-build is specifically for the common "edit → re-run one test → edit → re-run" loop, where the Docker image is never changing. The combination makes that cycle ~1.8x faster by skipping the image build each time.

The docs in this PR try to call this out (the "single-worker runs may cause failures" warning), but I agree the use case is narrower than I initially presented. Happy to adjust the recommendation to be more explicit that +1 is only for single-test re-runs.

Based on benchmark data: nodeid alone is faster for multi-test runs
(~20s vs ~70s) due to xdist parallelism. +1 --skip-parametric-build
is ~1.8x faster only for single-test repeated re-runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cbeauchesne
Copy link
Collaborator

cbeauchesne commented Mar 12, 2026

+1 avoids spawning 10 xdist workers (gw0–gw9) for a single test.

We try to keep run.sh as close as possible from a bare pytest call. It means we must stick to the pytest CLI, in that case, it'll be -n <int>.

The PR approach runs tests sequentially and the container reuse causes test failures via state pollution.

To me, that's a blocker (I think it's related to my point here). The full point of parametric test is to have different parameters for a single tests, so that's by design the main use case of this scenario.

WDYT of just keeping the "no rebuild" feature, since it'll be the main benefit for the local dev using one nodeid ?

The single-worker container reuse caused test pollution failures when
running multiple tests, which is the primary use case for parametric
tests. The +1 flag also diverged from the pytest CLI convention (run.sh
should stay close to bare pytest, so -n <int> is the right form).

Keep only --skip-parametric-build (and SKIP_PARAMETRIC_BUILD=1), which
skips rebuilding the library image on each invocation and saves ~3-4s
per run with no correctness risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bm1549
Copy link
Contributor Author

bm1549 commented Mar 19, 2026

WDYT of just keeping the "no rebuild" feature, since it'll be the main benefit for the local dev using one nodeid ?

Works for me! Just made the edits so it only does no rebuild

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bm1549 bm1549 marked this pull request as ready for review March 19, 2026 21:24
@bm1549 bm1549 requested a review from a team as a code owner March 19, 2026 21:24
@bm1549 bm1549 requested a review from cbeauchesne March 19, 2026 21:24
bm1549 and others added 2 commits March 23, 2026 14:50
Consolidate env-var-to-option defaulting in pytest_configure alongside
similar logic for other options, instead of inline in parametric.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bm1549 bm1549 requested a review from cbeauchesne March 23, 2026 18:57
@bm1549 bm1549 enabled auto-merge (squash) March 23, 2026 20:54
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated The pull request includes a significant amount of AI-generated code mergequeue-status: removed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants