[FFL-1942] feat(ffe): add eval metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues] by sameerank · Pull Request #6545 · DataDog/system-tests

sameerank · 2026-03-19T19:48:17Z

Motivation

Add FFE (Feature Flagging & Experimentation) evaluation metrics tests to verify the feature_flag.evaluations OTel counter metric works correctly across SDKs.

Jira: https://datadoghq.atlassian.net/browse/FFL-1942
dd-trace-py PR: feat(openfeature): add flag evaluation metrics dd-trace-py#17029
dd-trace-go PR: fix(openfeature): improve FFE eval metrics cross-tracer consistency dd-trace-go#4590

Changes

Test Infrastructure

Enable tests/ffe/test_flag_eval_metrics.py for Python and Go in manifest
Add OTEL_EXPORTER_OTLP_METRICS_PROTOCOL: "http/protobuf" to FFE scenario config
Add opentelemetry-exporter-otlp-proto-http==1.40.0 to Python weblogs

Test Coverage

Tests for OpenFeature evaluation reasons:

STATIC - catch-all allocation with no rules/shards
TARGETING_MATCH - rules match the context
SPLIT - shards determine variant
DEFAULT - rules don't match, fallback used
DISABLED - flag is disabled

Tests for OpenFeature error codes:

FLAG_NOT_FOUND - config exists but flag missing
TYPE_MISMATCH - STRING→BOOLEAN, NUMERIC→INTEGER conversions
PARSE_ERROR - invalid regex pattern (Python only; Go validates at config load)
PROVIDER_NOT_READY - no config loaded
INVALID_CONTEXT - nested attributes (Python only)
TARGETING_KEY_MISSING - verifies it's NOT returned (JS excluded)

Cross-SDK Consistency

Lowercase reason/error values per OpenFeature telemetry conventions
SDK-specific @irrelevant decorators where behavior intentionally differs

Reviewer checklist

Anything but tests/ or manifests/ is modified? I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present

github-actions · 2026-03-19T19:48:46Z

CODEOWNERS have been resolved as:

manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
tests/ffe/test_flag_eval_metrics.py                                     @DataDog/feature-flagging-and-experimentation-sdk @DataDog/system-tests-core
utils/_context/_scenarios/__init__.py                                   @DataDog/system-tests-core
utils/_features.py                                                      @DataDog/system-tests-core
utils/build/docker/python/django-poc.Dockerfile                         @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/django-py3.13.Dockerfile                      @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/fastapi.Dockerfile                            @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/flask-poc.Dockerfile                          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/python3.12.Dockerfile                         @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/tornado.Dockerfile                            @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/uds-flask.Dockerfile                          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/uwsgi-poc.Dockerfile                          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core

sameerank · 2026-03-24T01:08:17Z

tests/ffe/test_flag_eval_metrics.py

+# INVALID_CONTEXT behavioral differences:
+#   - Python: Returns for nested dict/list attributes (PyO3 conversion failure)
+#   - Go: Flattens nested objects to dot notation instead
+#   - Ruby: Silently skips unsupported attribute types
+#   - Java: Returns only for null context, not nested attributes
+#   - .NET: Relies on native library; not yet standardized
+#   - JS: Does not use INVALID_CONTEXT at all


I am unclear on how to reconcile the variety of ways that the SDKs handle invalid evaluation contexts. For now I'm just noting it down in the system tests code, and hopefully we can chip away at the @irrelevant decorators it as we keep working on the SDKs

I like the Python approach of returning the default value with reason "error" and code "invalid context", but my hunch is that the varying ways of binding with the Rust evaluator might mean that this isn't straightforward in other languages.

Add comprehensive system tests for FFE (Feature Flagging and Experimentation) flag evaluation metrics. These tests verify that tracers emit correct feature_flag.evaluations OTel metrics with proper tags for: - Basic flag evaluation (flag key, variant, reason, allocation_key) - Multiple evaluations (correct count aggregation) - Different flags (separate metric series) - All resolution reasons (static, targeting_match, split, default, disabled) - Error codes (flag_not_found, type_mismatch, parse_error, provider_not_ready) - Lowercase consistency for tag values Also adds the feature_flags_eval_metrics feature declaration for tracer compatibility tracking.

…pe_mismatch NUMERIC and INTEGER are distinct types; evaluating a NUMERIC flag as INTEGER should return type_mismatch (not parse_error) to align with libdatadog FFE.

sameerank mentioned this pull request Mar 19, 2026

feat(openfeature): add flag evaluation metrics DataDog/dd-trace-py#17029

Open

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 4f56c52 to 9e9d075 Compare March 20, 2026 17:29

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@sameerank/FFL-1942/add-flag-eval-metrics]~~ feat(python): enable flag evaluation metrics tests [python@b29232996651286e7f0a8d860a11bfd0d96b3182] Mar 20, 2026

This comment has been minimized.

Sign in to view

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@b29232996651286e7f0a8d860a11bfd0d96b3182]~~ feat(python): enable flag evaluation metrics tests [python@70eb5ba16394d6bf2697af7d10f71e7438b58b76] Mar 20, 2026

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch 5 times, most recently from 29265cb to ca9bf94 Compare March 21, 2026 02:22

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@70eb5ba16394d6bf2697af7d10f71e7438b58b76]~~ feat(python): enable flag evaluation metrics tests [python@b8c9f8e1f3] Mar 21, 2026

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@b8c9f8e1f3]~~ feat(python): enable flag evaluation metrics tests [python@b8c9f8e1f3aa961291f434f0c564f517c5d2c523] Mar 21, 2026

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@b8c9f8e1f3aa961291f434f0c564f517c5d2c523]~~ feat(python): enable flag evaluation metrics tests [python@5f47db9810450484a2e72d4831756368354fcba2] Mar 22, 2026

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@5f47db9810450484a2e72d4831756368354fcba2]~~ feat(python): enable flag evaluation metrics tests [python@e32fb7357d52c151817e1520b02d215ea98ad155] Mar 22, 2026

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@e32fb7357d52c151817e1520b02d215ea98ad155]~~ feat(python): enable flag evaluation metrics tests [python@2913cbefc109609c8bea2ea443333eacd9a4a02a] Mar 22, 2026

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@2913cbefc109609c8bea2ea443333eacd9a4a02a]~~ feat(python): enable flag evaluation metrics tests [python@746cff92e45a37eba94e3d1e6658ceafe5da5fe9] Mar 23, 2026

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@746cff92e45a37eba94e3d1e6658ceafe5da5fe9]~~ feat(python): enable flag evaluation metrics tests [python@810d4c88ae] Mar 23, 2026

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 64c1562 to bc391b7 Compare March 23, 2026 07:42

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@810d4c88ae]~~ feat(python): enable flag evaluation metrics tests [python@74f4110a68] Mar 23, 2026

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from bc391b7 to 8a1102e Compare March 23, 2026 07:55

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@74f4110a68]~~ feat(python): enable flag evaluation metrics tests [python@6dd59ec52a] Mar 23, 2026

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@6dd59ec52a]~~ feat(python): enable flag evaluation metrics tests [python@6dd59ec52a859aeb67a4a314d232bfa48e68ddac] Mar 23, 2026

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch 2 times, most recently from 6971663 to 9240286 Compare March 23, 2026 08:30

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@6dd59ec52a859aeb67a4a314d232bfa48e68ddac]~~ feat(python): enable flag evaluation metrics tests [python@0ce14d8add] Mar 24, 2026

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 11b5f02 to b825fd9 Compare March 24, 2026 01:00

sameerank commented Mar 24, 2026

View reviewed changes

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@0ce14d8add]~~ feat(python): enable flag evaluation metrics tests [python@b825fd92cb48613cb37fd5170f1d4833b45f936a] Mar 24, 2026

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from b825fd9 to 290334b Compare March 24, 2026 01:11

sameerank changed the title ~~feat(python): enable flag evaluation metrics tests [python@b825fd92cb48613cb37fd5170f1d4833b45f936a]~~ feat(python): enable flag evaluation metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] Mar 24, 2026

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch 5 times, most recently from 4307145 to ffcec93 Compare March 24, 2026 07:17

sameerank added 3 commits March 24, 2026 08:21

feat(python): enable flag evaluation metrics tests

36bf7e3

chore(python): add OTel OTLP metrics exporter for FFE metrics

1ae128f

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from ffcec93 to adc1a8d Compare March 24, 2026 15:21

fix(ffe): use NUMERIC for float types in UFC fixture

96216e9

This was referenced Mar 24, 2026

fix(openfeature): improve FFE eval metrics cross-tracer consistency DataDog/dd-trace-go#4590

Open

[FFL-1972] fix(ffe): enable No_Config_Loaded test for Go [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues] #6577

Closed

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from d56dfe5 to 96216e9 Compare March 24, 2026 16:58

fix(ffe): rename Parse_Error test to Numeric_To_Integer and expect ty…

04f2264

…pe_mismatch NUMERIC and INTEGER are distinct types; evaluating a NUMERIC flag as INTEGER should return type_mismatch (not parse_error) to align with libdatadog FFE.

fix(ffe): enable No_Config_Loaded test for Go

5846f99

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 88fc527 to 5846f99 Compare March 24, 2026 18:23

test(ffe): add parse_error test for invalid regex

5bf6fb3

sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 6542ece to 5bf6fb3 Compare March 24, 2026 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FFL-1942] feat(ffe): add eval metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues]#6545

[FFL-1942] feat(ffe): add eval metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues]#6545
sameerank wants to merge 7 commits intomainfrom
sameerank/FFL-1942/add-flag-eval-metrics

sameerank commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

sameerank Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sameerank commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Test Infrastructure

Test Coverage

Cross-SDK Consistency

Reviewer checklist

Uh oh!

github-actions bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

sameerank Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sameerank commented Mar 19, 2026 •

edited

Loading

github-actions bot commented Mar 19, 2026 •

edited

Loading