Skip to content

instrumentation telemetry: validate session id headers#6510

Draft
mabdinur wants to merge 21 commits intomainfrom
munir/test-stable-headers
Draft

instrumentation telemetry: validate session id headers#6510
mabdinur wants to merge 21 commits intomainfrom
munir/test-stable-headers

Conversation

@mabdinur
Copy link
Contributor

@mabdinur mabdinur commented Mar 16, 2026

Motivation

Enable telemetry session ID header tests (DD-Session-ID, DD-Root-Session-ID, DD-Parent-Session-ID) across process forks per the Stable Service Instance Identifier RFC.

Changes

  • GET /spawn_child – New endpoint in weblogs (Python flask, Node.js express4, Ruby rails72, PHP, Go net-http, Java spring-boot, .NET poc). Params: sleep, crash, fork. Uses fork when supported, exec otherwise. Runtimes without fork (Java, Go, PHP, .NET) return 400 for fork=true.
  • Teststest_session_id_headers_across_forks and test_session_id_headers_across_spawned validate session ID headers in lifecycle telemetry. Uses get_lifecycle_events() to avoid lib-datadog metrics/log events. Asserts: DD-Session-ID = runtime_id, one root per app instance, at least two runtimes (parent + child).
  • Library interfaceget_lifecycle_events() added to filter lifecycle events.
  • Docs – Endpoint spec in docs/weblog/end-to-end_weblog.md.
  • Manifests – Enabled for Ruby rails72; missing_feature for other weblogs and non-fork runtimes.

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on your PR until the CI passes
  3. Mark it as ready for review
    • Test logic is modified? → Get a review from RFC owner.
    • Framework is modified, or non-obvious usage of it → Get a review from R&P team

🚀 Once your PR is reviewed and the CI is green, you can merge it!

🛟 #apm-shared-testing 🛟

SDK Implementations

Nodejs: DataDog/dd-trace-js#7821
Go: DataDog/dd-trace-go#4574
Java: DataDog/dd-trace-java#10914

@khanayan123
Copy link
Contributor

khanayan123 commented Mar 17, 2026

As per https://dd.slack.com/archives/D032MDTSCR1/p1773765779731369

We need to assert:

  1. Headers Present & Valid For Every Telemetry Event: DD-Session-ID always present, DD-Root-Session-ID present when a child process is forked/spawned
  2. Root Stability Across Fork: Session-ID regenerates per process, Root-Session-ID inherited and never changes

Co-authored-by: Munir Abdinur <munir.abdinur@datadoghq.com>
@khanayan123
Copy link
Contributor

khanayan123 commented Mar 18, 2026

Remaining gaps I believe are:

Gap 1: The test should assert that for every event where DD-Session-ID != root_session_id (i.e. every child event), DD-Root-Session-ID must be present not just that at least one event has it globally.

Gap 2 (exec vs fork): Same validation function for both test cases, exec propagation via env vars isn't distinctly tested.

@mabdinur
Copy link
Contributor Author

Remaining gaps I believe are:

Gap 1: The test should assert that for every event where DD-Session-ID != root_session_id (i.e. every child event), DD-Root-Session-ID must be present not just that at least one event has it globally.

Gap 2 (exec vs fork): Same validation function for both test cases, exec propagation via env vars isn't distinctly tested.

Both cases should be covered by the current test. We can discuss it in our next sync

@github-actions
Copy link
Contributor

github-actions bot commented Mar 18, 2026

CODEOWNERS have been resolved as:

utils/build/docker/dotnet/weblog/Endpoints/SpawnChildEndpoint.cs        @DataDog/apm-dotnet @DataDog/asm-dotnet @DataDog/system-tests-core
utils/build/docker/golang/app/_shared/common/spawn_child.go             @DataDog/dd-trace-go-guild @DataDog/system-tests-core
utils/build/docker/nodejs/express/fork_child.js                         @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/php/common/spawn_child.php                           @DataDog/apm-php @DataDog/system-tests-core
docs/understand/weblogs/end-to-end_weblog.md                            @DataDog/system-tests-core
manifests/cpp.yml                                                       @DataDog/dd-trace-cpp
manifests/cpp_httpd.yml                                                 @DataDog/dd-trace-cpp
manifests/cpp_kong.yml                                                  @DataDog/system-tests-core
manifests/cpp_nginx.yml                                                 @DataDog/dd-trace-cpp
manifests/dotnet.yml                                                    @DataDog/apm-dotnet @DataDog/asm-dotnet
manifests/golang.yml                                                    @DataDog/dd-trace-go-guild
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java
manifests/nodejs.yml                                                    @DataDog/dd-trace-js
manifests/php.yml                                                       @DataDog/apm-php @DataDog/asm-php
manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
manifests/ruby.yml                                                      @DataDog/ruby-guild @DataDog/asm-ruby
tests/test_telemetry.py                                                 @DataDog/libdatadog-telemetry @DataDog/apm-sdk-capabilities @DataDog/system-tests-core
utils/build/docker/dotnet/weblog/Program.cs                             @DataDog/apm-dotnet @DataDog/asm-dotnet @DataDog/system-tests-core
utils/build/docker/golang/app/net-http/main.go                          @DataDog/dd-trace-go-guild @DataDog/system-tests-core
utils/build/docker/java/spring-boot/src/main/java/com/datadoghq/system_tests/springboot/App.java  @DataDog/apm-java @DataDog/asm-java @DataDog/system-tests-core
utils/build/docker/nodejs/express/app.js                                @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/nodejs/install_ddtrace.sh                            @DataDog/dd-trace-js @DataDog/system-tests-core
utils/build/docker/php/apache-mod/php.conf                              @DataDog/apm-php @DataDog/system-tests-core
utils/build/docker/php/php-fpm/php-fpm.conf                             @DataDog/apm-php @DataDog/system-tests-core
utils/build/docker/python/flask/app.py                                  @DataDog/apm-python @DataDog/asm-python @DataDog/system-tests-core
utils/build/docker/ruby/rails72/app/controllers/system_test_controller.rb  @DataDog/ruby-guild @DataDog/asm-ruby @DataDog/system-tests-core
utils/build/docker/ruby/rails72/config/routes.rb                        @DataDog/ruby-guild @DataDog/asm-ruby @DataDog/system-tests-core
utils/interfaces/_library/core.py                                       @DataDog/system-tests-core

Copy link
Contributor

@khanayan123 khanayan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments, tests LGTM

@datadog-datadog-prod-us1-2
Copy link

datadog-datadog-prod-us1-2 bot commented Mar 18, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 133 Tests failed

tests.appsec.test_span_tags_headers.Test_Headers_Event_Blocking.test_content_type_event_blocking[chi] from system_tests_suite   View in Datadog   (Fix with Cursor)
AssertionError: Expected content-length to be 225, got None
assert None == '225'

self = <tests.appsec.test_span_tags_headers.Test_Headers_Event_Blocking object at 0x7f3a95079ee0>

    def test_content_type_event_blocking(self):
        # Send a request that triggers a blocking security event - should have the content-type and content-length tags
        assert self.r.status_code == 403
        interfaces.library.assert_waf_attack(self.r, rule="arachni_rule")
        # content-length is optional on blocking response
...
tests.appsec.test_span_tags_headers.Test_Headers_Event_Blocking.test_content_type_event_blocking[echo] from system_tests_suite   View in Datadog   (Fix with Cursor)
AssertionError: Expected content-length to be 225, got None
assert None == '225'

self = <tests.appsec.test_span_tags_headers.Test_Headers_Event_Blocking object at 0x7efc90482ab0>

    def test_content_type_event_blocking(self):
        # Send a request that triggers a blocking security event - should have the content-type and content-length tags
        assert self.r.status_code == 403
        interfaces.library.assert_waf_attack(self.r, rule="arachni_rule")
        # content-length is optional on blocking response
...
tests.appsec.test_span_tags_headers.Test_Headers_Event_Blocking.test_content_type_event_blocking[gin] from system_tests_suite   View in Datadog   (Fix with Cursor)
AssertionError: Expected content-length to be 225, got None
assert None == '225'

self = <tests.appsec.test_span_tags_headers.Test_Headers_Event_Blocking object at 0x7fd16967aed0>

    def test_content_type_event_blocking(self):
        # Send a request that triggers a blocking security event - should have the content-type and content-length tags
        assert self.r.status_code == 403
        interfaces.library.assert_waf_attack(self.r, rule="arachni_rule")
        # content-length is optional on blocking response
...
View all

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: eab3a09 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

khanayan123 and others added 4 commits March 20, 2026 11:48
Add a standalone child binary that initializes dd-trace-go and emits
telemetry, replacing the previous shell-based approach. Simplify
spawn_child.go to use plain os/exec since the SDK now auto-propagates
DD_ROOT_GO_SESSION_ID. Update Dockerfile to build/copy the child binary.
Mark fork test as irrelevant for Go (no fork support).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of a separate child binary, re-exec the weblog itself with
DD_SYSTEM_TEST_CHILD_SLEEP env var. RunAsChildIfRequested() in common
handles child mode — starts tracer, sleeps, stops, exits. This avoids
a separate build target and Dockerfile changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match the SDK rename — underscore prefix signifies internal env var.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ifest

- SpawnChildEndpoint now re-execs the dotnet weblog (with CLR profiler)
  instead of spawning a bare shell for the exec path
- Returns 400 for fork=true since .NET doesn't support fork
- Manifest: fork test marked irrelevant, spawn test enabled for >=v3.4.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gh-worker-dd-mergequeue-cf854d bot pushed a commit to DataDog/dd-trace-go that referenced this pull request Mar 20, 2026
## Summary

Implements the [Stable Service Instance Identifier RFC](https://docs.google.com/document/d/1ECKj9_NnwaKYtFqm3p3Rlpicx5d-OQcdj9kI2jvRqVU) for Go instrumentation telemetry.

- **`DD-Session-ID`**: always present on every telemetry request, set to the current `runtime_id`
- **`DD-Root-Session-ID`**: present only in child processes, inherited via `_DD_ROOT_GO_SESSION_ID` env var. Omitted when equal to session ID — backend infers root = self when absent
- **Auto-propagation**: `globalconfig.init()` sets `_DD_ROOT_GO_SESSION_ID` in `os.Environ()` so child processes spawned via `os/exec` inherit it automatically without any user-side calls

## Changes

- `internal/globalconfig/globalconfig.go`: adds `rootSessionID` field, `init()` reads/sets `_DD_ROOT_GO_SESSION_ID` (internal env var, not in supported_configurations), `RootSessionID()` getter
- `internal/telemetry/internal/writer.go`: adds `DD-Session-ID` (always) and `DD-Root-Session-ID` (child processes only) to pre-baked telemetry headers
- Tests for both globalconfig (including cross-process propagation) and writer

## Related

- System-tests PR: DataDog/system-tests#6510
- Node.js PR: DataDog/dd-trace-js#7821
- dd-trace-py fork tracking: DataDog/dd-trace-py#16839
- dd-trace-py spawn tracking: DataDog/dd-trace-py#16842

Co-authored-by: ayan.khan <ayan.khan@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants