Context
The message queue feature (#22) includes defensive mitigations for a race condition between prompt_complete SSE and the server-side session execute lock. The current test suite is static string-matching — it can't validate actual runtime behavior. We need end-to-end tests that exercise the real queue lifecycle against a running amplifierd instance.
Proposal
Create a Playwright-based test recipe (or test suite) that spins up the chat plugin and validates the queue scenarios discovered during development and debugging.
Test scenarios
Core queue lifecycle:
- Queue a message during execution → verify it appears in QueuePanel → verify it sends after completion with 5s countdown
- Queue multiple messages → verify FIFO drain order (A sends before B before C)
- Queue a message with an image → verify image is preserved through drain and sent to the server
Pause / Resume:
4. Stop execution while queue has items → verify queue enters paused state → click Resume → verify drain restarts
5. Stop during countdown → verify countdown cancels and queue pauses
Session switching:
6. Queue messages in session A → switch to session B → switch back to A → verify queue is preserved and drain resumes
7. Create new session → verify previous session's queue is cleared
8. Switch to a history session mid-countdown → verify countdown cancels (no leak to wrong session)
Race condition resilience:
9. Verify that if execution_error with "already executing" arrives, the drain retries after 2s instead of showing an error
10. Verify that on prompt_complete, any blocks stuck in streaming: true are cleaned up (no blinking cursor)
Delegate sub-agents:
11. Execute a prompt that triggers a delegate → verify delegate's prompt_complete doesn't trigger premature drain → verify drain only fires after parent completes
Implementation approach
- Use Playwright to drive a real browser against the chat plugin
- The recipe should start a test amplifierd instance (or use an existing one)
- Each scenario should be independent and idempotent
- Use Playwright's
page.evaluate() to inspect Preact state when needed (e.g. check msgQueueRef contents)
- Screenshots on failure for debugging
Depends on
Context
The message queue feature (#22) includes defensive mitigations for a race condition between
prompt_completeSSE and the server-side session execute lock. The current test suite is static string-matching — it can't validate actual runtime behavior. We need end-to-end tests that exercise the real queue lifecycle against a running amplifierd instance.Proposal
Create a Playwright-based test recipe (or test suite) that spins up the chat plugin and validates the queue scenarios discovered during development and debugging.
Test scenarios
Core queue lifecycle:
Pause / Resume:
4. Stop execution while queue has items → verify queue enters paused state → click Resume → verify drain restarts
5. Stop during countdown → verify countdown cancels and queue pauses
Session switching:
6. Queue messages in session A → switch to session B → switch back to A → verify queue is preserved and drain resumes
7. Create new session → verify previous session's queue is cleared
8. Switch to a history session mid-countdown → verify countdown cancels (no leak to wrong session)
Race condition resilience:
9. Verify that if
execution_errorwith "already executing" arrives, the drain retries after 2s instead of showing an error10. Verify that on
prompt_complete, any blocks stuck instreaming: trueare cleaned up (no blinking cursor)Delegate sub-agents:
11. Execute a prompt that triggers a delegate → verify delegate's
prompt_completedoesn't trigger premature drain → verify drain only fires after parent completesImplementation approach
page.evaluate()to inspect Preact state when needed (e.g. checkmsgQueueRefcontents)Depends on