Skip to content

test: end-to-end queue validation recipe with Playwright #29

@samueljklee

Description

@samueljklee

Context

The message queue feature (#22) includes defensive mitigations for a race condition between prompt_complete SSE and the server-side session execute lock. The current test suite is static string-matching — it can't validate actual runtime behavior. We need end-to-end tests that exercise the real queue lifecycle against a running amplifierd instance.

Proposal

Create a Playwright-based test recipe (or test suite) that spins up the chat plugin and validates the queue scenarios discovered during development and debugging.

Test scenarios

Core queue lifecycle:

  1. Queue a message during execution → verify it appears in QueuePanel → verify it sends after completion with 5s countdown
  2. Queue multiple messages → verify FIFO drain order (A sends before B before C)
  3. Queue a message with an image → verify image is preserved through drain and sent to the server

Pause / Resume:
4. Stop execution while queue has items → verify queue enters paused state → click Resume → verify drain restarts
5. Stop during countdown → verify countdown cancels and queue pauses

Session switching:
6. Queue messages in session A → switch to session B → switch back to A → verify queue is preserved and drain resumes
7. Create new session → verify previous session's queue is cleared
8. Switch to a history session mid-countdown → verify countdown cancels (no leak to wrong session)

Race condition resilience:
9. Verify that if execution_error with "already executing" arrives, the drain retries after 2s instead of showing an error
10. Verify that on prompt_complete, any blocks stuck in streaming: true are cleaned up (no blinking cursor)

Delegate sub-agents:
11. Execute a prompt that triggers a delegate → verify delegate's prompt_complete doesn't trigger premature drain → verify drain only fires after parent completes

Implementation approach

  • Use Playwright to drive a real browser against the chat plugin
  • The recipe should start a test amplifierd instance (or use an existing one)
  • Each scenario should be independent and idempotent
  • Use Playwright's page.evaluate() to inspect Preact state when needed (e.g. check msgQueueRef contents)
  • Screenshots on failure for debugging

Depends on

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttestingTests and test infrastructure

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions