You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ROADMAP.md
+43-1Lines changed: 43 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,49 @@ Core engine, CLI, and all major subsystems are stable. Summary of shipped featur
35
35
36
36
## In Progress
37
37
38
-
_(Currently empty)_
38
+
### Tauri Desktop App — Mission Control for AI Agents
39
+
40
+
**Priority: Urgent** — The desktop app is the primary interface for users to manage, monitor, and assist AI browsing agents.
41
+
42
+
**Phase 1 — Semantic Tree Viewer + CAPTCHA Handoff (current)**
43
+
-[ ] Semantic tree viewer panel — render ARIA role tree with interactive nodes in Tauri dashboard
44
+
-[ ] Per-instance controls — URL bar, navigate, agent status (idle/running/waiting-challenge)
45
+
-[ ] CAPTCHA handoff — when agent hits a challenge, popup OS webview (WKWebView/WebKitGTK/WebView2) for user to solve, then sync cookies back to headless browser via CDP `Network.setCookie`
46
+
-[ ] Cookie bridge — `tokio-tungstenite` WebSocket client to inject cookies into headless CDP server
47
+
-[ ] Agent action log — real-time log of agent actions (navigate, click, type, wait) streamed from CDP events
48
+
-[ ] Cross-platform — dashboard is pure HTML/CSS (no OS webview dependency for primary view); CAPTCHA popup uses OS webview only when needed
49
+
50
+
**Phase 2 — Multi-Agent Dashboard**
51
+
-[ ] Multiple concurrent agent instances — spawn/manage N agents in one window
52
+
-[ ] Agent status grid — see all agents at a glance with status indicators (running, idle, stuck, CAPTCHA)
53
+
-[ ] Live agent action streaming — watch each agent's actions in real-time via CDP event bus
54
+
-[ ] Take-over button — pause agent, let user manually interact, then resume agent
55
+
-[ ] Agent conversation panel — show the LLM conversation alongside browser actions
56
+
57
+
**Phase 3 — Rendered View (Optional)**
58
+
-[ ] Rendered page tab — OS webview shows actual page pixels (WKWebView on macOS, WebKitGTK on Linux, WebView2 on Windows)
59
+
-[ ] Split view — semantic tree on left, rendered pixels on right
60
+
-[ ] Screenshot capture — use pardus-core screenshot feature (chromiumoxide) for pixel-perfect captures
61
+
62
+
**Architecture:**
63
+
```
64
+
┌─ Mission Control ──────────────────────────────────────┐
65
+
│ ┌─ Agents ─────┐ ┌─ Semantic Tree ──────────────────┐ │
0 commit comments