[fleet-enrollment-resilience] False re-enrollment loop due Fleet URL host/full-URL mismatch

## Findings

### 1. Re-enrollment decision compares incompatible URL formats and can trigger perpetual re-enrollment

**Severity:** High

**Location:**
- `internal/pkg/agent/cmd/container.go:1163-1166`
- `internal/pkg/agent/application/enroll/options.go:74-77`
- `internal/pkg/remote/client.go:64-73`

**Evidence:**
- Re-enroll gate uses:
  - `storedConfig.Fleet.Client.GetHosts()` and then `slices.Contains(storedFleetHosts, setupCfg.Fleet.URL)` in `container.go`.
- During enrollment, `EnrollOptions.RemoteConfig()` calls `remote.NewConfigFromURL(e.URL)`.
- `NewConfigFromURL` stores `c.Host = u.Host` and `c.Protocol = u.Scheme`, i.e., host-only storage (`fleet:8220`) while setup typically provides full URL (`(fleet/redacted)` or `(fleet/redacted)

This is a direct host-vs-full-URL comparison, so equivalent endpoints can compare unequal.

**Failure scenario (realistic):**
A Kubernetes/container deployment restarts with `FLEET_URL=(fleet/redacted) Stored config from previous successful enrollment contains host `fleet:8220`. `shouldFleetEnroll` returns true on every restart, repeatedly re-enrolling instead of reusing existing enrollment.

**Why it matters:**
- Can cause repeated enrollment churn and unstable managed identity behavior.
- Can leave orphaned/stale agent records server-side and increase Fleet control-plane load.
- Directly impacts enrollment resilience during routine pod/node restarts and cluster migrations.

**Suggested fix direction:**
Normalize both sides before comparison in `shouldFleetEnroll`:
- Parse `setupCfg.Fleet.URL` and compare canonical host:port against stored hosts.
- Normalize trailing slash and default ports (`443`/`80`) consistently.
- Optionally include protocol comparison separately using canonicalized values.

**Failing test to add:**
- **Package:** `internal/pkg/agent/cmd`
- **Test name:** `TestShouldFleetEnroll_NormalizedURLDoesNotReenroll`
- **Scenario:** stored Fleet host is `fleet:8220` (with protocol `https` in stored client config), setup URL is `(fleet/redacted)
- **Expected:** `shouldFleetEnroll(...) == false`.
- **Current behavior:** evaluates to `true` due to raw string mismatch.

## Priority ranking

1. **Unrecoverable / repeated enrollment state churn:** URL normalization mismatch in re-enrollment gate (finding above).

## Communication paths audited and found resilient in this pass

- Liveness `?failon=degraded` handling in `internal/pkg/agent/application/monitoring/liveness.go` correctly maps degraded/failed state to HTTP 500 when coordinator state indicates unhealthy.
- Check-in retry pacing uses bounded jitter backoff in the retrier path (`internal/pkg/fleetapi/acker/retrier/retrier.go`), avoiding tight retry loops.

## Notes

I filtered out lower-confidence candidates and only reported the verified high-severity issue above.




> [!NOTE]
> <details>
> <summary>🔒 Integrity filtering filtered 2 items</summary>
>
> Integrity filtering activated and filtered the following items during workflow execution.
> This happens when a tool call accesses a resource that does not meet the required integrity or secrecy level of the workflow.
>
> - issue:elastic/elastic-agent#unknown (`search_issues`: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
> - resource:list_label (`list_label`: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".)
>
> </details>


---
[What is this?](https://ela.st/github-ai-tools) | [From workflow: Sweeper: Fleet Enrollment and Communication Resilience](https://github.com/elastic/elastic-agent/actions/runs/23512338593)

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.
> - [x] expires  on Mar 31, 2026, 9:25 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fleet-enrollment-resilience] False re-enrollment loop due Fleet URL host/full-URL mismatch #13301

Findings

1. Re-enrollment decision compares incompatible URL formats and can trigger perpetual re-enrollment

Priority ranking

Communication paths audited and found resilient in this pass

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[fleet-enrollment-resilience] False re-enrollment loop due Fleet URL host/full-URL mismatch #13301

Description

Findings

1. Re-enrollment decision compares incompatible URL formats and can trigger perpetual re-enrollment

Priority ranking

Communication paths audited and found resilient in this pass

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions