Pipeline/group/engine policy precedence: prevent against misuse of unresolved policies#2392
Conversation
…config When a pipeline defines a policies block with only channel_capacity (and not telemetry), the telemetry field was deserialized with its default value (channel_metrics: Basic), silently overriding a top-level channel_metrics: detailed setting. This prevented node consumer/producer metrics (requiring Normal or higher) and duration metrics (requiring Detailed) from being reported. Fix by making the telemetry field in Policies and EngineObservabilityPolicies an Option<TelemetryPolicy>, matching the existing pattern used for the resources field. When absent, the parent scope's telemetry policy is used instead of the built-in default. Adds effective_telemetry() helper method on Policies for consistent access. Adds a focused regression test for the inheritance behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pacity and health config Apply the same Option pattern previously used for resources (PR open-telemetry#2154) and telemetry to the channel_capacity and health fields in Policies and EngineObservabilityPolicies. When a pipeline defines a policies: block that only sets (e.g.) telemetry, channel_capacity and health no longer silently deserialize as their defaults and shadow a top-level override. Instead, serde leaves them as None and the resolve functions fall through to the parent scope. Changes: - Make channel_capacity and health fields Option in Policies and EngineObservabilityPolicies - Update resolve_channel_capacity_policy and resolve_health_policy to use .and_then() instead of .map(), with .unwrap_or_default() fallback - Same for resolve_observability_channel_capacity_policy and resolve_observability_health_policy - Add effective_channel_capacity() and effective_health() helpers on Policies - Update controller consumers to use .unwrap_or_default() - Add regression test resolve_channel_capacity_and_health_inherit_from_parent_scope Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fields are now pub(crate) Option<T> — external callers use accessor methods (channel_capacity(), health(), telemetry(), resources()) which return Cow with built-in default fallback. - Add set_resources() for CLI override mutation. - Add Policies::resolve(scopes) which iterates policy scopes in precedence order (most specific first), picking the first Some value per field. This replaces seven individual resolve_*_policy() functions with a single generic loop. - Rename effective_*() methods to short names matching the fields. - Net reduction of ~215 lines. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add ResolvedPolicies struct with public non-optional fields for use after policy resolution. Policies::resolve() now returns ResolvedPolicies, and ResolvedPipelineConfig.policies uses it. - Consumers access resolved fields directly (no Cow, no into_owned). - Clean up EngineObservabilityPolicies::into_policies() — pure field mapping with resources: None. The 'observability uses default resources' policy is now explicit at the resolve call site. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the four Cow-returning accessor methods (channel_capacity(), health(), telemetry(), resources()) with Policies::resolve() at all call sites. The accessors had misleading doc comments and duplicated logic now handled by ResolvedPolicies. - validation_errors() now only validates explicitly configured fields - Policy.rs tests use Policies::resolve() instead of accessors - Move Policies import to test module in main.rs (unused in non-test) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2392 +/- ##
========================================
Coverage 88.06% 88.06%
========================================
Files 589 589
Lines 205392 205512 +120
========================================
+ Hits 180878 180989 +111
- Misses 23988 23997 +9
Partials 526 526
🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| #[test] | ||
| fn resolve_policies_inherit_from_parent_scope() { |
There was a problem hiding this comment.
are there any tests for when inheritance comes partially from engine level and partially from group level? if not, it is worth adding.
|
If applicable, there should be a test where partial configration from different tiers are taken in a way that it validates the fact that they are merged at the lowest level. And there should also be a test that validates both they are merged and the inner value wins and overrides an outer value. |
lquerel
left a comment
There was a problem hiding this comment.
Thanks for this fix. I hadn't realized the negative interaction between the default and the override.
…nto jmacd/policy_default
Add a test covering the case where a pipeline inherits some policies from the group level and others fall through to the engine (top) level. This exercises the full three-level precedence chain (pipeline → group → engine) with partial coverage at each level. Addresses review comment: open-telemetry#2392 (comment) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change Summary
Like #2154 but for the other three policy fields. Make all fields Option types. Adds a ResolvedPolicies type which strips the Options after resolving. There was existing resolve code, but it was not used consistently: this was observed for the
telemetrypolicy.What issue does this PR close?
Fixes #2389.
How are these changes tested?
One new test. The
configs/internal-telemetry.yamlconfiguration is modified to show the problem. Before the fix, no duration metrics. After the fix, duration metrics, as set by the top-level policy.