Skip to content

WyseOS/wyse-browser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wyse Browser 🚀

English 简体中文

Build Status License: MIT PRs Welcome

Wyse Browser is a powerful, multi-process runtime engine designed for executing automated flows within a browser environment. It provides a robust platform for creating, managing, and executing complex automation workflows through a comprehensive REST API.

Key Features 🌟

  • Powerful & Scalable Automation Core ✨: Built on NestJS and Playwright, Wyse Browser provides a reliable and efficient multi-process runtime engine. It orchestrates multiple sandboxed Chrome instances, enabling robust and scalable browser automation.
  • AI-Driven Workflow Orchestration 🧠: Designed to integrate seamlessly with LLMs and AI Agents, facilitating the creation, management, and execution of sophisticated, AI-driven automation workflows.
  • Modular & Extensible Worklets 🧩: Leverage Worklets as autonomous, reusable, and highly composable code blocks for specific tasks, allowing for flexible and extensible automation solutions.
  • Comprehensive REST API Control 🔗: Offers a full-featured REST API for programmatic control over every aspect of the browser environment, including sessions, pages, flows, and individual browser actions.
  • Parallel & Isolated Session Execution ⚡: Manages multiple independent browser sessions in parallel, each running in a sandboxed Chrome instance with isolated contexts (cookies, local storage), ensuring tasks run without interference.
  • Rich & Granular Action Space 🤖: Provides a wide array of built-in, low-level browser actions—from navigation and clicking to executing custom JavaScript—offering precise control over browser interactions.
  • Robust Security & Data Privacy 🔒: Prioritizes user safety with explicit consent mechanisms for data access, strong data privacy measures, and secure handling of worklets which involve arbitrary code execution.

Architecture 🏗️

The Wyse Browser protocol is built for distributed systems, enabling each engine to manage multiple workflow and worklet instances efficiently.

graph TD
    subgraph Intelligence Layer
        M[LLM]
        A[AI Agents]
    end
    M <--> A
    
    APIS -- Exposes --> APIList[/api/session/<br>/api/flow/<br>/api/browser/]

    subgraph Wyse Browser
        B(RunTime)
        C{Workflow}
        AS[Browser Actions]
        APIS[API Service]
    end

    subgraph "(Playwright)"
        BIs[Chrome Instances]
    end
  
    A <--> B
    B <--> APIS
    B --> C
    B --> AS
    
    B -- Manages --> BIs

    AL[visit, history, search, refresh_page<br/>click, click_full, double_click, text<br/>scroll_up, scroll_down, scroll_element_up, scroll_element_down<br/>scroll_to, wait, key_press, hover<br/>evaluate, init_js, content, create_tab<br/>switch_tab, close_tab, tabs_info, set_content<br/>select_option, drag, screenshot]
    AS --> AL

    AL --> W2[Website2]

    C --> D[Worklet1]
    C --> E[Worklet2]
    C --> F[Worklet3]
    C --> G[Worklet4]
    
    subgraph External Resource
        D
        E
        F
        G
    end

    D --> I[Filesystem]
    E --> J[Datasource1]
    F --> K[Website1]
    G --> L[External API]

    style M fill:#D9FFD9,stroke:#66CC66,stroke-width:2px,rx:5px,ry:5px;
    style A fill:#D9FFD9,stroke:#66CC66,stroke-width:2px,rx:5px,ry:5px;
    style B fill:#E0F2FF,stroke:#3399FF,stroke-width:2px;
    style C fill:#E0F2FF,stroke:#3399FF,stroke-width:1px,rx:5px,ry:5px;
    style AS fill:#E0F2FF,stroke:#3399FF,stroke-width:1px,rx:5px,ry:5px;
    style APIS fill:#E0F2FF,stroke:#3399FF,stroke-width:2px;
    style APIList fill:#FFF,stroke:#333,stroke-width:1px,rx:5px,ry:5px;
    style D fill:#FFD9D9,stroke:#FF6666,stroke-width:1px,rx:5px,ry:5px;
    style E fill:#FFD9D9,stroke:#FF6666,stroke-width:1px,rx:5px,ry:5px;
    style F fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style G fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style I fill:#FFD9D9,stroke:#FF6666,stroke-width:1px,rx:5px,ry:5px;
    style J fill:#FFD9D9,stroke:#FF6666,stroke-width:1px,rx:5px,ry:5px;
    style K fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style L fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style W2 fill:#E6E6FA,stroke:#9370DB,stroke-width:1px,rx:5px,ry:5px;
    style AL fill:#FFF,stroke:#333,stroke-width:1px,rx:5px,ry:5px;
    style BIs fill:#D5F5E3,stroke:#2ECC71,stroke-width:1px,rx:5px,ry:5px;
Loading

Core Concepts ✨

  • Session 🌐: A dedicated, isolated browser environment (a sandboxed Chrome instance) that provides a consistent context for executing workflows and browser actions. Each session manages its own cookies, local storage, and pages (tabs), ensuring that automated tasks run without interference from other operations.
  • Browser Actions 🤖: The fundamental building blocks for automation within a session. These are low-level, atomic operations that can be executed on a browser page, such as visit a URL, click an element, type text, or take a screenshot. These actions are exposed through a comprehensive API, allowing for granular control over browser interactions.
  • Workflow 🚀: Defines a precise sequence of worklets executed in a specific order. Workflows are designed and created by AI agents to automate complex multi-step tasks within the browser. Each workflow maintains isolated data connections and state, ensuring independent and reliable execution.
  • Worklet 🧩: A reusable, autonomous, and highly composable code block dedicated to performing a specific task. Worklets act as the modular units of automation, encapsulating logic for interactions with external resources or complex browser operations. They can be implemented in various languages and function as local processes or remote services, allowing for flexible and extensible automation.

Getting Started 🏁

Prerequisites 🛠️

  • Node.js (v20.x or later)
  • pnpm

Installation ⬇️

  1. Clone the repository:

    git clone https://github.com/wyse-work/wyse-browser.git
    cd wyse-browser
  2. Navigate to the browser engine directory and install dependencies:

    cd browser
    pnpm install
  3. Build all worklets:

    ./build_worklets.sh
  4. Run the API development server:

    pnpm run start:dev

    The API server will be running at http://127.0.0.1:13100.

Quick Start: Usage Example ⚡

Here's a quick example of how to use curl to create a session, navigate to a page, and take a screenshot.

  1. Create a new session:

    SESSION_ID=$(curl -s -X POST http://127.0.0.1:13100/api/session/create \
    -H "Content-Type: application/json" \
    -d '{}' | grep -o '"session_id":"[^"]*' | cut -d'"' -f4)
    
    echo "Session created with ID: $SESSION_ID"
  2. Perform a "visit" action:

    curl -X POST http://127.0.0.1:13100/api/browser/action \
    -H "Content-Type: application/json" \
    -d '{
      "session_id": "'"$SESSION_ID"'",
      "action_name": "visit",
      "data": { "url": "https://www.google.com" }
    }'
  3. Take a screenshot:

    curl -X GET http://127.0.0.1:13100/api/session/$SESSION_ID/screenshot

API Reference 📚

The Wyse Browser exposes a rich set of API endpoints for programmatic control over browser automation tasks.

Base URL 💖

http://127.0.0.1:13100

Health Check

Method Endpoint Description Parameters
GET /api/health Checks if the API server is running. None

Metadata Management 🗃️

Method Endpoint Description Parameters
GET /api/metadata/flow/:name Retrieves the manifest for a specific flow. Path: name (string, required)
GET /api/metadata/worklet/:name Retrieves the manifest for a specific worklet. Path: name (string, required)
GET /api/metadata/list/:type Lists all available metadata for a given type (flow or worklet). Path: type (string, required) - flow or worklet
POST /api/metadata/save Saves or updates a flow manifest. Body: UpdateMetadataDto
- metadata_type (string, required)
- name (string, required)
- data (object, required)

Session Management 📈

Method Endpoint Description Parameters
POST /api/session/create Creates a new browser session. Body: CreateSessionDto
- session_context (object, optional)
- session_id (string, optional)
POST /api/session/:sessionId/add_init_script Adds an initialization script to the session. Path: sessionId (string, required)
Body: AddInitScriptDto
- script (string, required)
GET /api/session/:sessionId Retrieves details for a specific session. Path: sessionId (string, required)
GET /api/session/:sessionId/context Gets the context (cookies, local storage) of a session. Path: sessionId (string, required)
GET /api/session/:sessionId/release Closes and cleans up a session. Path: sessionId (string, required)
GET /api/sessions/list Lists all active sessions. None
GET /api/session/:sessionId/screenshot Takes a screenshot of the current page in a session. Path: sessionId (string, required)

Browser Actions 🎬

Method Endpoint Description Parameters
POST /api/browser/action Executes a single browser action (e.g., click, text). Body: BrowserActionDto
- session_id (string, required)
- page_id (number, optional, default: 0)
- action_name (string, required)
- data (object, required)
POST /api/browser/batch_actions Executes a batch of browser actions sequentially. Body: BatchActionsDto
- session_id (string, required)
- page_id (number, optional, default: 0)
- actions (array, required):
- action_name (string, required)
- data (object, required)

Page Management 📄

Method Endpoint Description Parameters
POST /api/session/:sessionId/page/create Creates a new page (tab) in a session. Path: sessionId (string, required)
GET /api/session/:sessionId/page/:pageId/switch Switches the active page in a session. Path:
- sessionId (string, required)
- pageId (number, required)
GET /api/session/:sessionId/page/:pageId/release Closes a specific page in a session. Path:
- sessionId (string, required)
- pageId (number, required)

Flow Management 🌊

Method Endpoint Description Parameters
POST /api/flow/create Creates a new flow instance from a predefined manifest. Body: CreateFlowDto
- flow_name (string, required)
- session_id (string, optional)
- is_save_video (boolean, optional)
- extension_names (string[], optional)
POST /api/flow/deploy Deploys a new flow using an inline JSON definition. Body: DeployFlowDto
- flow (object, required)
- session_id (string, optional)
- is_save_video (boolean, optional)
- extension_names (string[], optional)
POST /api/flow/fire Executes an action within a running flow instance. Body: FireFlowDto
- flow_instance_id (string, required)
- action_name (string, optional, default: action_flow_start)
- data (object, required)
GET /api/flow/list Lists all active flow instances. None

File Management 📁

Method Endpoint Description Parameters
POST /api/sessions/:sessionId/files Uploads one or more files to a session, checking against session storage limits. Path: sessionId (string, required)
Body: filePath (string, optional) - target path for the file, defaults to original name.
GET /api/sessions/:sessionId/files Lists all files stored within a specific session. Path: sessionId (string, required)
GET /api/sessions/:sessionId/files/* Downloads a specific file from a session. Path:
- sessionId (string, required)
- filePath (string, required)
HEAD /api/sessions/:sessionId/files/* Retrieves metadata (headers) for a specific file in a session without downloading the content. Path:
- sessionId (string, required)
- filePath (string, required)
GET /api/sessions/:sessionId/files.zip Downloads all files from a session as a ZIP archive. Path: sessionId (string, required)
DELETE /api/sessions/:sessionId/files/* Deletes a specific file from a session. Path:
- sessionId (string, required)
- filePath (string, required)
DELETE /api/sessions/:sessionId/files Deletes all files associated with a specific session. Path: sessionId (string, required)

Action Space 🚀

The BrowserAction module provides a comprehensive set of low-level actions that can be executed on a page within a session. These actions are the fundamental building blocks for creating complex automation flows.

Action Description Parameters
url Gets the URL of the current page. None
visit Navigates the page to a specified URL. url: The URL to visit.
history Navigates forward or backward in the browser history. num: A positive number to go forward, a negative number to go back.
search Performs a Google search. search_key: The text to search for.
refreshpage Reloads the current page. None
click Clicks an element or a point on the page. element_id or (x, y coordinates).
clickfull A more comprehensive click action. element_id or (x, y coordinates). Optional: hold (seconds), button ("left", "right", "middle").
doubleclick Double-clicks an element or a point on the page. element_id or (x, y coordinates).
text Enters text into an element or at the current cursor position. text: The text to type. Optional: element_id, press_enter (boolean), delete_existing_text (boolean), or (x, y coordinates).
scrollup Scrolls the page up. None
scrolldown Scrolls the page down. None
scrollelementup Scrolls an element's container up. element_id, page_number: Number of pages to scroll.
scrollelementdown Scrolls an element's container down. element_id, page_number: Number of pages to scroll.
scrollto Scrolls to make an element visible. element_id: The ID of the element to scroll to.
wait Pauses execution for a specified duration. time: The number of seconds to wait.
keypress Simulates key presses. keys: A string or array of strings of keys to press (e.g., 'Enter', 'Control+A').
hover Hovers over an element or a point on the page. element_id or (x, y coordinates).
evaluate Executes a JavaScript snippet in the page context. script: The JavaScript code to execute.
initjs Injects initialization JavaScript into the page. None
waitforloadstate Waits for the page to reach a specific load state. None
content Gets the full HTML content of the page. None
createtab Creates a new browser tab. Optional: url: The URL to open in the new tab.
switchtab Switches to a different tab. tab_index: The index of the tab to switch to.
closetab Closes a browser tab. tab_index: The index of the tab to close.
tabsinfo Retrieves information about all open tabs. None
cleanupanimations Removes animations from the page to stabilize tests. None
previewaction Highlights an element to preview an action without executing it. element_id: The ID of the element to preview.
setcontent Sets the HTML content of the page. content: The HTML content to set.
ensurepageready Ensures the page is fully loaded and ready for interaction. None
selectoption Selects an option from a dropdown or custom select component. element_id or (x, y coordinates).
drag Performs a drag-and-drop operation. drag_path: A JSON string or array of points {x, y} representing the drag path.
screenshot Takes a screenshot of the current page. None

Security and Safety 🔒

Security and user safety are paramount in Wyse Browser:

  1. User Consent and Control: Users must explicitly consent to and fully understand all data access and operations.
  2. Data Privacy: Applications must obtain explicit user consent before exposing any user data to external servers.
  3. Worklet Safety: Worklets involve arbitrary code execution and must be handled with extreme caution. Hosts must obtain explicit user consent before invoking any worklet.

Contributing 🤝

Contributions are welcome! Please feel free to submit a pull request.

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/AmazingFeature).
  3. Commit your changes (git commit -m 'Add some AmazingFeature').
  4. Push to the branch (git push origin feature/AmazingFeature).
  5. Open a Pull Request.

License 📄

This project is licensed under the MIT License. See the LICENSE file for details.