Skip to content

Pipeline/group/engine policy precedence: prevent against misuse of unresolved policies#2392

Merged
jmacd merged 10 commits intoopen-telemetry:mainfrom
jmacd:jmacd/policy_default
Mar 25, 2026
Merged

Pipeline/group/engine policy precedence: prevent against misuse of unresolved policies#2392
jmacd merged 10 commits intoopen-telemetry:mainfrom
jmacd:jmacd/policy_default

Conversation

@jmacd
Copy link
Copy Markdown
Contributor

@jmacd jmacd commented Mar 20, 2026

Change Summary

Like #2154 but for the other three policy fields. Make all fields Option types. Adds a ResolvedPolicies type which strips the Options after resolving. There was existing resolve code, but it was not used consistently: this was observed for the telemetry policy.

What issue does this PR close?

Fixes #2389.

How are these changes tested?

One new test. The configs/internal-telemetry.yaml configuration is modified to show the problem. Before the fix, no duration metrics. After the fix, duration metrics, as set by the top-level policy.

jmacd and others added 7 commits March 20, 2026 08:58
…config

When a pipeline defines a policies block with only channel_capacity (and not
telemetry), the telemetry field was deserialized with its default value
(channel_metrics: Basic), silently overriding a top-level channel_metrics:
detailed setting. This prevented node consumer/producer metrics (requiring
Normal or higher) and duration metrics (requiring Detailed) from being
reported.

Fix by making the telemetry field in Policies and EngineObservabilityPolicies
an Option<TelemetryPolicy>, matching the existing pattern used for the
resources field. When absent, the parent scope's telemetry policy is used
instead of the built-in default.

Adds effective_telemetry() helper method on Policies for consistent access.
Adds a focused regression test for the inheritance behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pacity and health config

Apply the same Option pattern previously used for resources (PR open-telemetry#2154) and
telemetry to the channel_capacity and health fields in Policies and
EngineObservabilityPolicies.

When a pipeline defines a policies: block that only sets (e.g.) telemetry,
channel_capacity and health no longer silently deserialize as their defaults
and shadow a top-level override. Instead, serde leaves them as None and the
resolve functions fall through to the parent scope.

Changes:
- Make channel_capacity and health fields Option in Policies and
  EngineObservabilityPolicies
- Update resolve_channel_capacity_policy and resolve_health_policy to use
  .and_then() instead of .map(), with .unwrap_or_default() fallback
- Same for resolve_observability_channel_capacity_policy and
  resolve_observability_health_policy
- Add effective_channel_capacity() and effective_health() helpers on Policies
- Update controller consumers to use .unwrap_or_default()
- Add regression test resolve_channel_capacity_and_health_inherit_from_parent_scope

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fields are now pub(crate) Option<T> — external callers use accessor
  methods (channel_capacity(), health(), telemetry(), resources())
  which return Cow with built-in default fallback.
- Add set_resources() for CLI override mutation.
- Add Policies::resolve(scopes) which iterates policy scopes in
  precedence order (most specific first), picking the first Some
  value per field.  This replaces seven individual resolve_*_policy()
  functions with a single generic loop.
- Rename effective_*() methods to short names matching the fields.
- Net reduction of ~215 lines.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add ResolvedPolicies struct with public non-optional fields for use
  after policy resolution.  Policies::resolve() now returns
  ResolvedPolicies, and ResolvedPipelineConfig.policies uses it.
- Consumers access resolved fields directly (no Cow, no into_owned).
- Clean up EngineObservabilityPolicies::into_policies() — pure field
  mapping with resources: None.  The 'observability uses default
  resources' policy is now explicit at the resolve call site.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the four Cow-returning accessor methods (channel_capacity(),
health(), telemetry(), resources()) with Policies::resolve() at all
call sites. The accessors had misleading doc comments and duplicated
logic now handled by ResolvedPolicies.

- validation_errors() now only validates explicitly configured fields
- Policy.rs tests use Policies::resolve() instead of accessors
- Move Policies import to test module in main.rs (unused in non-test)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jmacd jmacd requested a review from a team as a code owner March 20, 2026 20:02
@github-actions github-actions Bot added the rust Pull requests that update Rust code label Mar 20, 2026
@jmacd jmacd changed the title Pipeline/group/engine policy precedence: prevent against mis-used of unresolved policies Pipeline/group/engine policy precedence: prevent against misuse of unresolved policies Mar 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.06%. Comparing base (0dde9c1) to head (7ae1f34).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #2392    +/-   ##
========================================
  Coverage   88.06%   88.06%            
========================================
  Files         589      589            
  Lines      205392   205512   +120     
========================================
+ Hits       180878   180989   +111     
- Misses      23988    23997     +9     
  Partials      526      526            
Components Coverage Δ
otap-dataflow 90.11% <100.00%> (+<0.01%) ⬆️
query_abstraction 80.61% <ø> (ø)
query_engine 90.61% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 52.44% <ø> (ø)
quiver 91.91% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

}

#[test]
fn resolve_policies_inherit_from_parent_scope() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any tests for when inheritance comes partially from engine level and partially from group level? if not, it is worth adding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gouslu
Copy link
Copy Markdown
Contributor

gouslu commented Mar 23, 2026

If applicable, there should be a test where partial configration from different tiers are taken in a way that it validates the fact that they are merged at the lowest level. And there should also be a test that validates both they are merged and the inner value wins and overrides an outer value.

Copy link
Copy Markdown
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix. I hadn't realized the negative interaction between the default and the override.

jmacd and others added 2 commits March 24, 2026 22:54
Add a test covering the case where a pipeline inherits some policies
from the group level and others fall through to the engine (top) level.
This exercises the full three-level precedence chain (pipeline → group
→ engine) with partial coverage at each level.

Addresses review comment:
open-telemetry#2392 (comment)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jmacd jmacd enabled auto-merge March 25, 2026 00:39
@jmacd jmacd added this pull request to the merge queue Mar 25, 2026
Merged via the queue into open-telemetry:main with commit 2abee0d Mar 25, 2026
69 checks passed
@jmacd jmacd deleted the jmacd/policy_default branch March 25, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Pipeline-level policy overrides engine-level policy

3 participants