Skip to content

Fix orchestration manifest clobber on external disk updates#375

Closed
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/critical-correctness-bugs-17e2
Closed

Fix orchestration manifest clobber on external disk updates#375
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/critical-correctness-bugs-17e2

Conversation

@cursor
Copy link
Copy Markdown
Contributor

@cursor cursor Bot commented May 27, 2026

Bug and impact

Orchestration manifest data loss (high)

When manifest.json on disk was updated externally (same runId, higher serverGeneration) while the service still held a stale in-memory snapshot, manifestPatch could pass the optimistic ifMatchEtag check and overwrite the newer file on disk. The file watcher could also reload the external manifest without the run mutex, racing with an in-flight persistManifest and briefly exposing then reverting newer state to subscribers.

Concrete trigger: Active orchestration run with cached manifest → git pull or another ADE instance advances manifest.json on the lane worktree → lead calls manifestPatch with the old etag still in memory → external manifest is lost.

Root cause

  • loadIntoRuntime short-circuits when memory is already hydrated, so patches can proceed against a stale etag while disk advanced.
  • handleExternalChange ran outside runtime.mutex, interleaving with persistManifest.
  • persistManifest did not compare serverGeneration against on-disk state before committing.

Fix

  • Run watcher debounced reloads inside runtime.mutex.
  • Reject persistManifest when on-disk serverGeneration is newer (returns etag_conflict from manifestPatch / heartbeat).
  • Re-check immediately before atomic rename via beforeCommit.

Validation

  • npx vitest run src/main/services/orchestration/orchestrationService.test.ts -t "etag_conflict instead of overwriting"
  • npx vitest run src/main/services/orchestration/orchestrationService.test.ts src/main/services/orchestration/patchPolicy.test.ts

Does not overlap open draft PR #365 (foreign runId / suspended bundle swap), #364 (plan approval bypass), or #371#374 (PTY/sync/Linear fixes).

Open in Web View Automation 

Serialize watcher reloads on the run mutex and reject in-flight writes
when manifest.json on disk has a newer serverGeneration than the snapshot
being persisted. Prevents stale in-memory patches from overwriting
externally advanced manifests (e.g. git pull on the lane worktree).

Co-authored-by: Arul Sharma <arul28@users.noreply.github.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
ade Ignored Ignored Preview May 27, 2026 9:15am

@arul28
Copy link
Copy Markdown
Owner

arul28 commented May 28, 2026

Closing in favor of #382. I validated the external manifest clobber issue and folded the strengthened conflict handling into the combined orchestration hardening lane.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants