test: end-to-end queue validation recipe with Playwright

## Context

The message queue feature (#22) includes defensive mitigations for a race condition between `prompt_complete` SSE and the server-side session execute lock. The current test suite is static string-matching — it can't validate actual runtime behavior. We need end-to-end tests that exercise the real queue lifecycle against a running amplifierd instance.

## Proposal

Create a Playwright-based test recipe (or test suite) that spins up the chat plugin and validates the queue scenarios discovered during development and debugging.

### Test scenarios

**Core queue lifecycle:**
1. Queue a message during execution → verify it appears in QueuePanel → verify it sends after completion with 5s countdown
2. Queue multiple messages → verify FIFO drain order (A sends before B before C)
3. Queue a message with an image → verify image is preserved through drain and sent to the server

**Pause / Resume:**
4. Stop execution while queue has items → verify queue enters paused state → click Resume → verify drain restarts
5. Stop during countdown → verify countdown cancels and queue pauses

**Session switching:**
6. Queue messages in session A → switch to session B → switch back to A → verify queue is preserved and drain resumes
7. Create new session → verify previous session's queue is cleared
8. Switch to a history session mid-countdown → verify countdown cancels (no leak to wrong session)

**Race condition resilience:**
9. Verify that if `execution_error` with "already executing" arrives, the drain retries after 2s instead of showing an error
10. Verify that on `prompt_complete`, any blocks stuck in `streaming: true` are cleaned up (no blinking cursor)

**Delegate sub-agents:**
11. Execute a prompt that triggers a delegate → verify delegate's `prompt_complete` doesn't trigger premature drain → verify drain only fires after parent completes

### Implementation approach

- Use Playwright to drive a real browser against the chat plugin
- The recipe should start a test amplifierd instance (or use an existing one)
- Each scenario should be independent and idempotent
- Use Playwright's `page.evaluate()` to inspect Preact state when needed (e.g. check `msgQueueRef` contents)
- Screenshots on failure for debugging

### Depends on

- #28 (extract queue logic) would make some scenarios easier to test in isolation, but the E2E tests are independently valuable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: end-to-end queue validation recipe with Playwright #29

Context

Proposal

Test scenarios

Implementation approach

Depends on

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

test: end-to-end queue validation recipe with Playwright #29

Description

Context

Proposal

Test scenarios

Implementation approach

Depends on

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions