Fix orchestration manifest clobber on external disk updates#375
Closed
cursor[bot] wants to merge 1 commit into
Closed
Fix orchestration manifest clobber on external disk updates#375cursor[bot] wants to merge 1 commit into
cursor[bot] wants to merge 1 commit into
Conversation
Serialize watcher reloads on the run mutex and reject in-flight writes when manifest.json on disk has a newer serverGeneration than the snapshot being persisted. Prevents stale in-memory patches from overwriting externally advanced manifests (e.g. git pull on the lane worktree). Co-authored-by: Arul Sharma <arul28@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Owner
|
Closing in favor of #382. I validated the external manifest clobber issue and folded the strengthened conflict handling into the combined orchestration hardening lane. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug and impact
Orchestration manifest data loss (high)
When
manifest.jsonon disk was updated externally (samerunId, higherserverGeneration) while the service still held a stale in-memory snapshot,manifestPatchcould pass the optimisticifMatchEtagcheck and overwrite the newer file on disk. The file watcher could also reload the external manifest without the run mutex, racing with an in-flightpersistManifestand briefly exposing then reverting newer state to subscribers.Concrete trigger: Active orchestration run with cached manifest →
git pullor another ADE instance advancesmanifest.jsonon the lane worktree → lead callsmanifestPatchwith the old etag still in memory → external manifest is lost.Root cause
loadIntoRuntimeshort-circuits when memory is already hydrated, so patches can proceed against a stale etag while disk advanced.handleExternalChangeran outsideruntime.mutex, interleaving withpersistManifest.persistManifestdid not compareserverGenerationagainst on-disk state before committing.Fix
runtime.mutex.persistManifestwhen on-diskserverGenerationis newer (returnsetag_conflictfrommanifestPatch/ heartbeat).beforeCommit.Validation
npx vitest run src/main/services/orchestration/orchestrationService.test.ts -t "etag_conflict instead of overwriting"npx vitest run src/main/services/orchestration/orchestrationService.test.ts src/main/services/orchestration/patchPolicy.test.tsDoes not overlap open draft PR #365 (foreign
runId/ suspended bundle swap), #364 (plan approval bypass), or #371–#374 (PTY/sync/Linear fixes).