Skip to content

[FFL-1942] feat(ffe): add eval metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues]#6545

Draft
sameerank wants to merge 7 commits intomainfrom
sameerank/FFL-1942/add-flag-eval-metrics
Draft

[FFL-1942] feat(ffe): add eval metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues]#6545
sameerank wants to merge 7 commits intomainfrom
sameerank/FFL-1942/add-flag-eval-metrics

Conversation

@sameerank
Copy link

@sameerank sameerank commented Mar 19, 2026

Motivation

Add FFE (Feature Flagging & Experimentation) evaluation metrics tests to verify the feature_flag.evaluations OTel counter metric works correctly across SDKs.

Related:

Changes

Test Infrastructure

  • Enable tests/ffe/test_flag_eval_metrics.py for Python and Go in manifest
  • Add OTEL_EXPORTER_OTLP_METRICS_PROTOCOL: "http/protobuf" to FFE scenario config
  • Add opentelemetry-exporter-otlp-proto-http==1.40.0 to Python weblogs

Test Coverage

Tests for OpenFeature evaluation reasons:

  • STATIC - catch-all allocation with no rules/shards
  • TARGETING_MATCH - rules match the context
  • SPLIT - shards determine variant
  • DEFAULT - rules don't match, fallback used
  • DISABLED - flag is disabled

Tests for OpenFeature error codes:

  • FLAG_NOT_FOUND - config exists but flag missing
  • TYPE_MISMATCH - STRING→BOOLEAN, NUMERIC→INTEGER conversions
  • PARSE_ERROR - invalid regex pattern (Python only; Go validates at config load)
  • PROVIDER_NOT_READY - no config loaded
  • INVALID_CONTEXT - nested attributes (Python only)
  • TARGETING_KEY_MISSING - verifies it's NOT returned (JS excluded)

Cross-SDK Consistency

  • Lowercase reason/error values per OpenFeature telemetry conventions
  • SDK-specific @irrelevant decorators where behavior intentionally differs

Reviewer checklist

  • Anything but tests/ or manifests/ is modified? I have the approval from R&P team
  • A docker base image is modified?
    • the relevant build-XXX-image label is present

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

CODEOWNERS have been resolved as:

manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
tests/ffe/test_flag_eval_metrics.py                                     @DataDog/feature-flagging-and-experimentation-sdk @DataDog/system-tests-core
utils/_context/_scenarios/__init__.py                                   @DataDog/system-tests-core
utils/_features.py                                                      @DataDog/system-tests-core
utils/build/docker/python/django-poc.Dockerfile                         @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/django-py3.13.Dockerfile                      @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/fastapi.Dockerfile                            @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/flask-poc.Dockerfile                          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/python3.12.Dockerfile                         @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/tornado.Dockerfile                            @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/uds-flask.Dockerfile                          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/python/uwsgi-poc.Dockerfile                          @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core

@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 4f56c52 to 9e9d075 Compare March 20, 2026 17:29
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@sameerank/FFL-1942/add-flag-eval-metrics] feat(python): enable flag evaluation metrics tests [python@b29232996651286e7f0a8d860a11bfd0d96b3182] Mar 20, 2026
@datadog-datadog-prod-us1-2

This comment has been minimized.

@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@b29232996651286e7f0a8d860a11bfd0d96b3182] feat(python): enable flag evaluation metrics tests [python@70eb5ba16394d6bf2697af7d10f71e7438b58b76] Mar 20, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch 5 times, most recently from 29265cb to ca9bf94 Compare March 21, 2026 02:22
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@70eb5ba16394d6bf2697af7d10f71e7438b58b76] feat(python): enable flag evaluation metrics tests [python@b8c9f8e1f3] Mar 21, 2026
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@b8c9f8e1f3] feat(python): enable flag evaluation metrics tests [python@b8c9f8e1f3aa961291f434f0c564f517c5d2c523] Mar 21, 2026
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@b8c9f8e1f3aa961291f434f0c564f517c5d2c523] feat(python): enable flag evaluation metrics tests [python@5f47db9810450484a2e72d4831756368354fcba2] Mar 22, 2026
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@5f47db9810450484a2e72d4831756368354fcba2] feat(python): enable flag evaluation metrics tests [python@e32fb7357d52c151817e1520b02d215ea98ad155] Mar 22, 2026
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@e32fb7357d52c151817e1520b02d215ea98ad155] feat(python): enable flag evaluation metrics tests [python@2913cbefc109609c8bea2ea443333eacd9a4a02a] Mar 22, 2026
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@2913cbefc109609c8bea2ea443333eacd9a4a02a] feat(python): enable flag evaluation metrics tests [python@746cff92e45a37eba94e3d1e6658ceafe5da5fe9] Mar 23, 2026
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@746cff92e45a37eba94e3d1e6658ceafe5da5fe9] feat(python): enable flag evaluation metrics tests [python@810d4c88ae] Mar 23, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 64c1562 to bc391b7 Compare March 23, 2026 07:42
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@810d4c88ae] feat(python): enable flag evaluation metrics tests [python@74f4110a68] Mar 23, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from bc391b7 to 8a1102e Compare March 23, 2026 07:55
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@74f4110a68] feat(python): enable flag evaluation metrics tests [python@6dd59ec52a] Mar 23, 2026
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@6dd59ec52a] feat(python): enable flag evaluation metrics tests [python@6dd59ec52a859aeb67a4a314d232bfa48e68ddac] Mar 23, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch 2 times, most recently from 6971663 to 9240286 Compare March 23, 2026 08:30
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@6dd59ec52a859aeb67a4a314d232bfa48e68ddac] feat(python): enable flag evaluation metrics tests [python@0ce14d8add] Mar 24, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 11b5f02 to b825fd9 Compare March 24, 2026 01:00
Comment on lines +542 to +548
# INVALID_CONTEXT behavioral differences:
# - Python: Returns for nested dict/list attributes (PyO3 conversion failure)
# - Go: Flattens nested objects to dot notation instead
# - Ruby: Silently skips unsupported attribute types
# - Java: Returns only for null context, not nested attributes
# - .NET: Relies on native library; not yet standardized
# - JS: Does not use INVALID_CONTEXT at all
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unclear on how to reconcile the variety of ways that the SDKs handle invalid evaluation contexts. For now I'm just noting it down in the system tests code, and hopefully we can chip away at the @irrelevant decorators it as we keep working on the SDKs

I like the Python approach of returning the default value with reason "error" and code "invalid context", but my hunch is that the varying ways of binding with the Rust evaluator might mean that this isn't straightforward in other languages.

@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@0ce14d8add] feat(python): enable flag evaluation metrics tests [python@b825fd92cb48613cb37fd5170f1d4833b45f936a] Mar 24, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from b825fd9 to 290334b Compare March 24, 2026 01:11
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@b825fd92cb48613cb37fd5170f1d4833b45f936a] feat(python): enable flag evaluation metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] Mar 24, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch 5 times, most recently from 4307145 to ffcec93 Compare March 24, 2026 07:17
Add comprehensive system tests for FFE (Feature Flagging and Experimentation)
flag evaluation metrics. These tests verify that tracers emit correct
feature_flag.evaluations OTel metrics with proper tags for:

- Basic flag evaluation (flag key, variant, reason, allocation_key)
- Multiple evaluations (correct count aggregation)
- Different flags (separate metric series)
- All resolution reasons (static, targeting_match, split, default, disabled)
- Error codes (flag_not_found, type_mismatch, parse_error, provider_not_ready)
- Lowercase consistency for tag values

Also adds the feature_flags_eval_metrics feature declaration for tracer
compatibility tracking.
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from ffcec93 to adc1a8d Compare March 24, 2026 15:21
…pe_mismatch

NUMERIC and INTEGER are distinct types; evaluating a NUMERIC flag as INTEGER
should return type_mismatch (not parse_error) to align with libdatadog FFE.
@sameerank sameerank changed the title feat(python): enable flag evaluation metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] [FFL-1942] feat(ffe): add eval metrics tests [python@0ce14d8addd8077787d4cd65d3a437af09530a41] [golang@sameerank/FFL-1972/fix-flag-evaluation-metrics-consistency-issues] Mar 24, 2026
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 88fc527 to 5846f99 Compare March 24, 2026 18:23
@sameerank sameerank force-pushed the sameerank/FFL-1942/add-flag-eval-metrics branch from 6542ece to 5bf6fb3 Compare March 24, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant