You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
storedConfig.Fleet.Client.GetHosts() and then slices.Contains(storedFleetHosts, setupCfg.Fleet.URL) in container.go.
During enrollment, EnrollOptions.RemoteConfig() calls remote.NewConfigFromURL(e.URL).
NewConfigFromURL stores c.Host = u.Host and c.Protocol = u.Scheme, i.e., host-only storage (fleet:8220) while setup typically provides full URL ((fleet/redacted) or `(fleet/redacted)
This is a direct host-vs-full-URL comparison, so equivalent endpoints can compare unequal.
Failure scenario (realistic):
A Kubernetes/container deployment restarts with FLEET_URL=(fleet/redacted) Stored config from previous successful enrollment contains host fleet:8220. shouldFleetEnroll` returns true on every restart, repeatedly re-enrolling instead of reusing existing enrollment.
Why it matters:
Can cause repeated enrollment churn and unstable managed identity behavior.
Can leave orphaned/stale agent records server-side and increase Fleet control-plane load.
Directly impacts enrollment resilience during routine pod/node restarts and cluster migrations.
Suggested fix direction:
Normalize both sides before comparison in shouldFleetEnroll:
Parse setupCfg.Fleet.URL and compare canonical host:port against stored hosts.
Normalize trailing slash and default ports (443/80) consistently.
Optionally include protocol comparison separately using canonicalized values.
Failing test to add:
Package:internal/pkg/agent/cmd
Test name:TestShouldFleetEnroll_NormalizedURLDoesNotReenroll
Scenario: stored Fleet host is fleet:8220 (with protocol https in stored client config), setup URL is `(fleet/redacted)
Expected:shouldFleetEnroll(...) == false.
Current behavior: evaluates to true due to raw string mismatch.
Priority ranking
Unrecoverable / repeated enrollment state churn: URL normalization mismatch in re-enrollment gate (finding above).
Communication paths audited and found resilient in this pass
Liveness ?failon=degraded handling in internal/pkg/agent/application/monitoring/liveness.go correctly maps degraded/failed state to HTTP 500 when coordinator state indicates unhealthy.
Check-in retry pacing uses bounded jitter backoff in the retrier path (internal/pkg/fleetapi/acker/retrier/retrier.go), avoiding tight retry loops.
Notes
I filtered out lower-confidence candidates and only reported the verified high-severity issue above.
Note
🔒 Integrity filtering filtered 2 items
Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.
issue:elastic/elastic-agent#unknown (search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
resource:list_label (list_label: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
Findings
1. Re-enrollment decision compares incompatible URL formats and can trigger perpetual re-enrollment
Severity: High
Location:
internal/pkg/agent/cmd/container.go:1163-1166internal/pkg/agent/application/enroll/options.go:74-77internal/pkg/remote/client.go:64-73Evidence:
storedConfig.Fleet.Client.GetHosts()and thenslices.Contains(storedFleetHosts, setupCfg.Fleet.URL)incontainer.go.EnrollOptions.RemoteConfig()callsremote.NewConfigFromURL(e.URL).NewConfigFromURLstoresc.Host = u.Hostandc.Protocol = u.Scheme, i.e., host-only storage (fleet:8220) while setup typically provides full URL ((fleet/redacted)or `(fleet/redacted)This is a direct host-vs-full-URL comparison, so equivalent endpoints can compare unequal.
Failure scenario (realistic):
A Kubernetes/container deployment restarts with
FLEET_URL=(fleet/redacted) Stored config from previous successful enrollment contains hostfleet:8220.shouldFleetEnroll` returns true on every restart, repeatedly re-enrolling instead of reusing existing enrollment.Why it matters:
Suggested fix direction:
Normalize both sides before comparison in
shouldFleetEnroll:setupCfg.Fleet.URLand compare canonical host:port against stored hosts.443/80) consistently.Failing test to add:
internal/pkg/agent/cmdTestShouldFleetEnroll_NormalizedURLDoesNotReenrollfleet:8220(with protocolhttpsin stored client config), setup URL is `(fleet/redacted)shouldFleetEnroll(...) == false.truedue to raw string mismatch.Priority ranking
Communication paths audited and found resilient in this pass
?failon=degradedhandling ininternal/pkg/agent/application/monitoring/liveness.gocorrectly maps degraded/failed state to HTTP 500 when coordinator state indicates unhealthy.internal/pkg/fleetapi/acker/retrier/retrier.go), avoiding tight retry loops.Notes
I filtered out lower-confidence candidates and only reported the verified high-severity issue above.
Note
🔒 Integrity filtering filtered 2 items
Integrity filtering activated and filtered the following items during workflow execution.
This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.
search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)list_label: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)What is this? | From workflow: Sweeper: Fleet Enrollment and Communication Resilience
Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.