Wyse Browser 🚀

Wyse Browser is a powerful, multi-process runtime engine designed for executing automated flows within a browser environment. It provides a robust platform for creating, managing, and executing complex automation workflows through a comprehensive REST API.

Key Features 🌟

Powerful & Scalable Automation Core ✨: Built on NestJS and Playwright, Wyse Browser provides a reliable and efficient multi-process runtime engine. It orchestrates multiple sandboxed Chrome instances, enabling robust and scalable browser automation.
AI-Driven Workflow Orchestration 🧠: Designed to integrate seamlessly with LLMs and AI Agents, facilitating the creation, management, and execution of sophisticated, AI-driven automation workflows.
Modular & Extensible Worklets 🧩: Leverage Worklets as autonomous, reusable, and highly composable code blocks for specific tasks, allowing for flexible and extensible automation solutions.
Comprehensive REST API Control 🔗: Offers a full-featured REST API for programmatic control over every aspect of the browser environment, including sessions, pages, flows, and individual browser actions.
Parallel & Isolated Session Execution ⚡: Manages multiple independent browser sessions in parallel, each running in a sandboxed Chrome instance with isolated contexts (cookies, local storage), ensuring tasks run without interference.
Rich & Granular Action Space 🤖: Provides a wide array of built-in, low-level browser actions—from navigation and clicking to executing custom JavaScript—offering precise control over browser interactions.
Robust Security & Data Privacy 🔒: Prioritizes user safety with explicit consent mechanisms for data access, strong data privacy measures, and secure handling of worklets which involve arbitrary code execution.

Architecture 🏗️

The Wyse Browser protocol is built for distributed systems, enabling each engine to manage multiple workflow and worklet instances efficiently.

graph TD
    subgraph Intelligence Layer
        M[LLM]
        A[AI Agents]
    end
    M <--> A
    
    APIS -- Exposes --> APIList[/api/session/<br>/api/flow/<br>/api/browser/]

    subgraph Wyse Browser
        B(RunTime)
        C{Workflow}
        AS[Browser Actions]
        APIS[API Service]
    end

    subgraph "(Playwright)"
        BIs[Chrome Instances]
    end
  
    A <--> B
    B <--> APIS
    B --> C
    B --> AS
    
    B -- Manages --> BIs

    AL[visit, history, search, refresh_page<br/>click, click_full, double_click, text<br/>scroll_up, scroll_down, scroll_element_up, scroll_element_down<br/>scroll_to, wait, key_press, hover<br/>evaluate, init_js, content, create_tab<br/>switch_tab, close_tab, tabs_info, set_content<br/>select_option, drag, screenshot]
    AS --> AL

    AL --> W2[Website2]

    C --> D[Worklet1]
    C --> E[Worklet2]
    C --> F[Worklet3]
    C --> G[Worklet4]
    
    subgraph External Resource
        D
        E
        F
        G
    end

    D --> I[Filesystem]
    E --> J[Datasource1]
    F --> K[Website1]
    G --> L[External API]

    style M fill:#D9FFD9,stroke:#66CC66,stroke-width:2px,rx:5px,ry:5px;
    style A fill:#D9FFD9,stroke:#66CC66,stroke-width:2px,rx:5px,ry:5px;
    style B fill:#E0F2FF,stroke:#3399FF,stroke-width:2px;
    style C fill:#E0F2FF,stroke:#3399FF,stroke-width:1px,rx:5px,ry:5px;
    style AS fill:#E0F2FF,stroke:#3399FF,stroke-width:1px,rx:5px,ry:5px;
    style APIS fill:#E0F2FF,stroke:#3399FF,stroke-width:2px;
    style APIList fill:#FFF,stroke:#333,stroke-width:1px,rx:5px,ry:5px;
    style D fill:#FFD9D9,stroke:#FF6666,stroke-width:1px,rx:5px,ry:5px;
    style E fill:#FFD9D9,stroke:#FF6666,stroke-width:1px,rx:5px,ry:5px;
    style F fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style G fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style I fill:#FFD9D9,stroke:#FF6666,stroke-width:1px,rx:5px,ry:5px;
    style J fill:#FFD9D9,stroke:#FF6666,stroke-width:1px,rx:5px,ry:5px;
    style K fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style L fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style W2 fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style AL fill:#FFF,stroke:#333,stroke-width:1px,rx:5px,ry:5px;
    style BIs fill:#D5F5E3,stroke:#2ECC71,stroke-width:1px,rx:5px,ry:5px;

Core Concepts ✨

Session 🌐: A dedicated, isolated browser environment (a sandboxed Chrome instance) that provides a consistent context for executing workflows and browser actions. Each session manages its own cookies, local storage, and pages (tabs), ensuring that automated tasks run without interference from other operations.
Browser Actions 🤖: The fundamental building blocks for automation within a session. These are low-level, atomic operations that can be executed on a browser page, such as visit a URL, click an element, type text, or take a screenshot. These actions are exposed through a comprehensive API, allowing for granular control over browser interactions.
Workflow 🚀: Defines a precise sequence of worklets executed in a specific order. Workflows are designed and created by AI agents to automate complex multi-step tasks within the browser. Each workflow maintains isolated data connections and state, ensuring independent and reliable execution.
Worklet 🧩: A reusable, autonomous, and highly composable code block dedicated to performing a specific task. Worklets act as the modular units of automation, encapsulating logic for interactions with external resources or complex browser operations. They can be implemented in various languages and function as local processes or remote services, allowing for flexible and extensible automation.

Getting Started 🏁

Prerequisites 🛠️

Node.js (v20.x or later)
pnpm

Installation ⬇️

Clone the repository:

git clone https://github.com/wyse-work/wyse-browser.git
cd wyse-browser

Navigate to the browser engine directory and install dependencies:
```
cd browser
pnpm install
```
Build all worklets:
```
./build_worklets.sh
```
Run the API development server:
```
pnpm run start:dev
```
The API server will be running at http://127.0.0.1:13100.

Quick Start: Usage Example ⚡

Here's a quick example of how to use curl to create a session, navigate to a page, and take a screenshot.

Create a new session:

SESSION_ID=$(curl -s -X POST http://127.0.0.1:13100/api/session/create \
-H "Content-Type: application/json" \
-d '{}' | grep -o '"session_id":"[^"]*' | cut -d'"' -f4)

echo "Session created with ID: $SESSION_ID"

Perform a "visit" action:

curl -X POST http://127.0.0.1:13100/api/browser/action \
-H "Content-Type: application/json" \
-d '{
  "session_id": "'"$SESSION_ID"'",
  "action_name": "visit",
  "data": { "url": "https://www.google.com" }
}'

Take a screenshot:

curl -X GET http://127.0.0.1:13100/api/session/$SESSION_ID/screenshot

API Reference 📚

The Wyse Browser exposes a rich set of API endpoints for programmatic control over browser automation tasks.

Base URL 💖

http://127.0.0.1:13100

Health Check

Method	Endpoint	Description	Parameters
`GET`	`/api/health`	Checks if the API server is running.	None

Metadata Management 🗃️

Method	Endpoint	Description	Parameters
`GET`	`/api/metadata/flow/:name`	Retrieves the manifest for a specific flow.	Path: `name` (string, required)
`GET`	`/api/metadata/worklet/:name`	Retrieves the manifest for a specific worklet.	Path: `name` (string, required)
`GET`	`/api/metadata/list/:type`	Lists all available metadata for a given type (`flow` or `worklet`).	Path: `type` (string, required) - `flow` or `worklet`
`POST`	`/api/metadata/save`	Saves or updates a flow manifest.	Body: `UpdateMetadataDto` - `metadata_type` (string, required) - `name` (string, required) - `data` (object, required)

Session Management 📈

Method	Endpoint	Description	Parameters
`POST`	`/api/session/create`	Creates a new browser session.	Body: `CreateSessionDto` - `session_context` (object, optional) - `session_id` (string, optional)
`POST`	`/api/session/:sessionId/add_init_script`	Adds an initialization script to the session.	Path: `sessionId` (string, required) Body: `AddInitScriptDto` - `script` (string, required)
`GET`	`/api/session/:sessionId`	Retrieves details for a specific session.	Path: `sessionId` (string, required)
`GET`	`/api/session/:sessionId/context`	Gets the context (cookies, local storage) of a session.	Path: `sessionId` (string, required)
`GET`	`/api/session/:sessionId/release`	Closes and cleans up a session.	Path: `sessionId` (string, required)
`GET`	`/api/sessions/list`	Lists all active sessions.	None
`GET`	`/api/session/:sessionId/screenshot`	Takes a screenshot of the current page in a session.	Path: `sessionId` (string, required)

Browser Actions 🎬

Method	Endpoint	Description	Parameters
`POST`	`/api/browser/action`	Executes a single browser action (e.g., `click`, `text`).	Body: `BrowserActionDto` - `session_id` (string, required) - `page_id` (number, optional, default: 0) - `action_name` (string, required) - `data` (object, required)
`POST`	`/api/browser/batch_actions`	Executes a batch of browser actions sequentially.	Body: `BatchActionsDto` - `session_id` (string, required) - `page_id` (number, optional, default: 0) - `actions` (array, required): - `action_name` (string, required) - `data` (object, required)

Page Management 📄

Method	Endpoint	Description	Parameters
`POST`	`/api/session/:sessionId/page/create`	Creates a new page (tab) in a session.	Path: `sessionId` (string, required)
`GET`	`/api/session/:sessionId/page/:pageId/switch`	Switches the active page in a session.	Path: - `sessionId` (string, required) - `pageId` (number, required)
`GET`	`/api/session/:sessionId/page/:pageId/release`	Closes a specific page in a session.	Path: - `sessionId` (string, required) - `pageId` (number, required)

Flow Management 🌊

Method	Endpoint	Description	Parameters
`POST`	`/api/flow/create`	Creates a new flow instance from a predefined manifest.	Body: `CreateFlowDto` - `flow_name` (string, required) - `session_id` (string, optional) - `is_save_video` (boolean, optional) - `extension_names` (string[], optional)
`POST`	`/api/flow/deploy`	Deploys a new flow using an inline JSON definition.	Body: `DeployFlowDto` - `flow` (object, required) - `session_id` (string, optional) - `is_save_video` (boolean, optional) - `extension_names` (string[], optional)
`POST`	`/api/flow/fire`	Executes an action within a running flow instance.	Body: `FireFlowDto` - `flow_instance_id` (string, required) - `action_name` (string, optional, default: `action_flow_start`) - `data` (object, required)
`GET`	`/api/flow/list`	Lists all active flow instances.	None

File Management 📁

Method	Endpoint	Description	Parameters
`POST`	`/api/sessions/:sessionId/files`	Uploads one or more files to a session, checking against session storage limits.	Path: `sessionId` (string, required) Body: `filePath` (string, optional) - target path for the file, defaults to original name.
`GET`	`/api/sessions/:sessionId/files`	Lists all files stored within a specific session.	Path: `sessionId` (string, required)
`GET`	`/api/sessions/:sessionId/files/*`	Downloads a specific file from a session.	Path: - `sessionId` (string, required) - `filePath` (string, required)
`HEAD`	`/api/sessions/:sessionId/files/*`	Retrieves metadata (headers) for a specific file in a session without downloading the content.	Path: - `sessionId` (string, required) - `filePath` (string, required)
`GET`	`/api/sessions/:sessionId/files.zip`	Downloads all files from a session as a ZIP archive.	Path: `sessionId` (string, required)
`DELETE`	`/api/sessions/:sessionId/files/*`	Deletes a specific file from a session.	Path: - `sessionId` (string, required) - `filePath` (string, required)
`DELETE`	`/api/sessions/:sessionId/files`	Deletes all files associated with a specific session.	Path: `sessionId` (string, required)

Action Space 🚀

The BrowserAction module provides a comprehensive set of low-level actions that can be executed on a page within a session. These actions are the fundamental building blocks for creating complex automation flows.

Action	Description	Parameters
`url`	Gets the URL of the current page.	None
`visit`	Navigates the page to a specified URL.	`url`: The URL to visit.
`history`	Navigates forward or backward in the browser history.	`num`: A positive number to go forward, a negative number to go back.
`search`	Performs a Google search.	`search_key`: The text to search for.
`refreshpage`	Reloads the current page.	None
`click`	Clicks an element or a point on the page.	`element_id` or (`x`, `y` coordinates).
`clickfull`	A more comprehensive click action.	`element_id` or (`x`, `y` coordinates). Optional: `hold` (seconds), `button` ("left", "right", "middle").
`doubleclick`	Double-clicks an element or a point on the page.	`element_id` or (`x`, `y` coordinates).
`text`	Enters text into an element or at the current cursor position.	`text`: The text to type. Optional: `element_id`, `press_enter` (boolean), `delete_existing_text` (boolean), or (`x`, `y` coordinates).
`scrollup`	Scrolls the page up.	None
`scrolldown`	Scrolls the page down.	None
`scrollelementup`	Scrolls an element's container up.	`element_id`, `page_number`: Number of pages to scroll.
`scrollelementdown`	Scrolls an element's container down.	`element_id`, `page_number`: Number of pages to scroll.
`scrollto`	Scrolls to make an element visible.	`element_id`: The ID of the element to scroll to.
`wait`	Pauses execution for a specified duration.	`time`: The number of seconds to wait.
`keypress`	Simulates key presses.	`keys`: A string or array of strings of keys to press (e.g., 'Enter', 'Control+A').
`hover`	Hovers over an element or a point on the page.	`element_id` or (`x`, `y` coordinates).
`evaluate`	Executes a JavaScript snippet in the page context.	`script`: The JavaScript code to execute.
`initjs`	Injects initialization JavaScript into the page.	None
`waitforloadstate`	Waits for the page to reach a specific load state.	None
`content`	Gets the full HTML content of the page.	None
`createtab`	Creates a new browser tab.	Optional: `url`: The URL to open in the new tab.
`switchtab`	Switches to a different tab.	`tab_index`: The index of the tab to switch to.
`closetab`	Closes a browser tab.	`tab_index`: The index of the tab to close.
`tabsinfo`	Retrieves information about all open tabs.	None
`cleanupanimations`	Removes animations from the page to stabilize tests.	None
`previewaction`	Highlights an element to preview an action without executing it.	`element_id`: The ID of the element to preview.
`setcontent`	Sets the HTML content of the page.	`content`: The HTML content to set.
`ensurepageready`	Ensures the page is fully loaded and ready for interaction.	None
`selectoption`	Selects an option from a dropdown or custom select component.	`element_id` or (`x`, `y` coordinates).
`drag`	Performs a drag-and-drop operation.	`drag_path`: A JSON string or array of points `{x, y}` representing the drag path.
`screenshot`	Takes a screenshot of the current page.	None

Security and Safety 🔒

Security and user safety are paramount in Wyse Browser:

User Consent and Control: Users must explicitly consent to and fully understand all data access and operations.
Data Privacy: Applications must obtain explicit user consent before exposing any user data to external servers.
Worklet Safety: Worklets involve arbitrary code execution and must be handled with extreme caution. Hosts must obtain explicit user consent before invoking any worklet.

Contributing 🤝

Contributions are welcome! Please feel free to submit a pull request.

Fork the repository.
Create your feature branch (git checkout -b feature/AmazingFeature).
Commit your changes (git commit -m 'Add some AmazingFeature').
Push to the branch (git push origin feature/AmazingFeature).
Open a Pull Request.

License 📄

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
browser		browser
configs		configs
docs/img		docs/img
web		web
worklets		worklets
.cursorrules		.cursorrules
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_zh-CN.md		README_zh-CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wyse Browser 🚀

Key Features 🌟

Architecture 🏗️

Core Concepts ✨

Getting Started 🏁

Prerequisites 🛠️

Installation ⬇️

Quick Start: Usage Example ⚡

API Reference 📚

Base URL 💖

Health Check

Metadata Management 🗃️

Session Management 📈

Browser Actions 🎬

Page Management 📄

Flow Management 🌊

File Management 📁

Action Space 🚀

Security and Safety 🔒

Contributing 🤝

License 📄

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wyse Browser 🚀

Key Features 🌟

Architecture 🏗️

Core Concepts ✨

Getting Started 🏁

Prerequisites 🛠️

Installation ⬇️

Quick Start: Usage Example ⚡

API Reference 📚

Base URL 💖

Health Check

Metadata Management 🗃️

Session Management 📈

Browser Actions 🎬

Page Management 📄

Flow Management 🌊

File Management 📁

Action Space 🚀

Security and Safety 🔒

Contributing 🤝

License 📄

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages