BrowserWing 1.0.0 Release Notes 🎉

v1.0.1-beta.2

New Features

LLM BaseURL Support: LLM client now supports custom BaseURL for integrating with OpenAI-compatible services
Version Endpoint: Added /version endpoint to check current version info
NoSandbox Configuration: Browser supports NoSandbox mode for containerized deployment
Immediate Task Execution: Scheduler supports immediate task execution with result saving

Improvements

Panic Recovery: Added panic recovery for iframe script injection and navigation operations

Installation

npm install -g browserwing@beta

新增功能

AI 控制模式: 新增 AI 驱动的浏览器控制，支持临时会话和适配器接口
多浏览器实例管理: 支持创建、管理多个独立的浏览器实例
定时任务系统: 支持脚本的定时执行和任务管理
XHR/Fetch 捕获: 录制和回放时捕获网络请求
Cookie 管理: 新增 Cookie 的查看、单个删除和批量删除功能
国际化 (i18n): 支持中英文界面切换
MCP 服务管理: 支持外部 MCP 服务的 CRUD 和工具发现
脚本变量系统: 支持脚本参数化和外部变量覆盖
条件执行: 基于变量的条件判断执行
键盘动作: 支持键盘输入录制和回放
滚动动作: 支持页面滚动录制和回放
截图动作: 支持视口/全页/区域截图
AI Explorer: AI 驱动的浏览器探索和脚本生成
自定义 AI 提示词: 可自定义 AI 操作的提示词系统
RefID 系统: 语义化元素选择，提升自动化稳定性
代理支持: 浏览器配置支持 HTTP/SOCKS 代理
认证系统: 支持 JWT 和 API Key 认证

详见 CHANGELOG.md

@e2

BrowserWing 1.0.0 Release Notes 🎉

Release Date: 2026-01-25
Version: 1.0.0
License: MIT

🌟 Overview

BrowserWing 1.0.0 is the first official release, delivering a complete browser automation platform with deep AI integration. A powerful tool for developers, QA engineers, data analysts, and AI application developers, making browser automation simple, intelligent, and efficient.

✨ Core Features

1. 🤖 Built-in AI Agent

Conversational Browser Control

Multi-LLM Support: Compatible with OpenAI, Claude, DeepSeek, Gemini, and more
Natural Language Control: Describe tasks in plain language, AI automates browser operations
Intelligent Task Planning: Automatically evaluates task complexity and selects optimal execution strategy
Real-time Streaming: See execution progress as it happens
Session Management: Support multiple parallel sessions, each with independent model configuration
Performance Optimized:
- Startup time reduced by 89% (4.5s → 0.5s)
- Memory usage reduced by 97% (800MB → 24MB)
- Simple query response time improved by 56% (4.5s → 2s)

Example Tasks:

"Open GitHub, search for 'browser automation', extract names and star counts of top 10 projects"
"Log in to Twitter, post a tweet: 'Hello from BrowserWing!'"
"Monitor product price on Amazon, notify me when price drops below $50"

2. 🔌 Universal AI Tool Integration

Three Integration Methods for All AI Tools

Method 1: MCP Server (Recommended)

Standard Protocol: Full Model Context Protocol (MCP) implementation
Zero Configuration: One-line JSON config to integrate with any MCP-compatible AI tool
Rich Tool Set: 26+ browser control tools including navigation, interaction, extraction, screenshots

{
  "mcpServers": {
    "browserwing": {
      "url": "http://localhost:8080/api/v1/mcp/message"
    }
  }
}

Method 2: Skills File

Plug and Play: Download SKILL.md, import into Cursor, Windsurf, and other Skills-compatible tools
Auto Discovery: AI tools automatically recognize available browser control capabilities
Custom Export: Export recorded scripts as custom Skills files

Method 3: HTTP API

26+ RESTful Endpoints: Complete programmatic browser control
OpenAPI Documentation: Standardized API specs for easy integration
Batch Operations: Multi-step atomic execution support

3. 🎬 Visual Script Recording & Playback

WYSIWYG Automation Workflows

Recording Features

One-Click Recording: Automatically captures clicks, inputs, selections, navigation
Semantic Recording: Stable element location based on ARIA roles and accessibility tree
Smart Waits: Auto-detects page loading, element appearance wait conditions
Cross-Tab Support: Records operations across multiple tabs

Editing Features

Visual Editor: Drag to reorder, delete, modify recorded steps
Parameter Adjustment: Modify URLs, selectors, input text
Logic Addition: Insert waits, conditionals, loops
Variable System: Support script variables and extraction variables

Playback Features

Precise Reproduction: High-fidelity replay of recorded sequences
Step Debugging: Execute step-by-step, observe each operation's effect
Error Recovery: Auto-retry or skip on failures
Batch Execution: Sequential or parallel execution of multiple scripts

Export Features

Export as MCP Commands: Convert to MCP tool call sequences for AI tools
Export as Skills: Generate custom SKILL.md files
Export as Code: Generate Python, JavaScript code (planned)

4. 🎯 Intelligent Data Extraction

LLM-Driven Semantic Extraction

Natural Language Description: Describe data to extract in plain language, no selectors needed
Structured Output: Auto-convert unstructured pages to JSON data
Multiple Extraction Methods:
- CSS/XPath selector extraction
- LLM semantic understanding extraction
- Batch multi-field extraction
- Auto-pagination extraction
Screenshot Annotation: Pair extraction with screenshots for visual reference

Example:

# Traditional: Precise CSS selectors required
curl -X POST 'http://localhost:8080/api/v1/executor/extract' \
  -d '{"selector": "div.product > h2.title", "multiple": true}'

# LLM: Natural language description
curl -X POST 'http://localhost:8080/api/v1/executor/extract-semantic' \
  -d '{"instructions": "Extract all product names, prices, and ratings"}'

5. 🔐 Session Management & Authentication

Stable and Reliable Browser Sessions

Multiple Browser Instances: Manage multiple independent browser instances simultaneously
User Data Persistence: Cookies, LocalStorage, SessionStorage auto-saved
Login State Retention: Login state automatically restored after reopening
Proxy Support: Each instance can configure independent proxy
Custom Configuration: User-Agent, window size, language, etc.
Headless Mode: Support headless/headed mode switching for different scenarios

6. 📸 Screenshots & Debugging

Powerful Debugging and Monitoring

Full Page Screenshots: Capture complete page including scroll areas
Element Screenshots: Precisely capture specific elements
Multiple Formats: PNG, JPEG, etc.
Auto Save: Screenshots auto-saved to specified directory, returns file path
Accessibility Snapshot: Get page accessibility tree, quickly locate elements
RefID System: Assigns stable reference IDs to each interactive element (@e1, @e2...)

7. 🚀 High-Performance Architecture

Optimized for Production

Go Backend: High performance, low latency, concurrency-friendly
React + TypeScript Frontend: Modern, responsive UI
Lazy Loading: Create Agent instances on-demand, save resources
Connection Pooling: Efficient browser connection reuse
Error Recovery: Auto-handle Chrome crashes, connection drops
Graceful Shutdown: Auto-cleanup resources on Ctrl+C, close browsers

📋 Complete Feature List

Browser Control

✅ Navigation: Forward, back, refresh, jump to URL
✅ Tab Management: New, close, switch, list
✅ Window Management: Maximize, minimize, resize
✅ Element Interaction: Click, type, select, hover, drag
✅ Keyboard Operations: Keys, shortcuts, combinations
✅ File Operations: Upload files, monitor downloads
✅ Scroll Control: Scroll to element, scroll page
✅ JavaScript Execution: Run custom JS code
✅ Cookie Management: Get, set, delete cookies
✅ Storage Management: LocalStorage, SessionStorage operations

Data Extraction

✅ Text Extraction: innerText, textContent
✅ HTML Extraction: innerHTML, outerHTML
✅ Attribute Extraction: href, src, data-*, etc.
✅ Multi-Element Extraction: Batch extract list data
✅ LLM Semantic Extraction: Natural language extraction requirements
✅ Table Extraction: Auto-recognize table structure
✅ Pagination Extraction: Auto-paginate and aggregate data

AI Capabilities

✅ Multi-LLM Support: OpenAI, Claude, DeepSeek, etc.
✅ Streaming Response: Real-time view of AI execution
✅ Task Evaluation: Intelligently determine if tool calls needed
✅ Direct Response: Fast response for simple questions (56% faster)
✅ Tool Calling: Auto-select appropriate browser tools
✅ Error Handling: Auto-retry or report failures
✅ Session Management: Parallel sessions, saved history

Scripts & Automation

✅ One-Click Recording: Auto-capture all operations
✅ Visual Editing: Drag to adjust step order
✅ Variable Support: Script variables and extraction variables
✅ Conditional Execution: Conditions based on extracted data
✅ Batch Playback: Sequential execution of multiple scripts
✅ Export as Skills: Generate SKILL.md files
✅ Export as MCP: Generate MCP tool call sequences

Integration & Extension

✅ MCP Server: Standard MCP protocol implementation
✅ Skills File: Cursor, Windsurf compatible
✅ HTTP API: 26+ RESTful endpoints
✅ OpenAPI Spec: Standardized API documentation
✅ Webhooks: Event notifications (planned)
✅ Plugin System: Extend with custom features (planned)

🎯 Use Cases

1. RPA (Robotic Process Automation)

Data Entry: Auto-fill forms, submit orders
Report Generation: Periodically scrape data, generate Excel reports
Email Processing: Auto-login to email, categorize and process
Invoice Processing: Batch download invoices, extract information

2. Data Collection & Monitoring

Price Monitoring: Monitor e-commerce price changes, price alerts
Competitor Analysis: Collect competitor information, comparative analysis
Sentiment Monitoring: Monitor social media, analyze user feedback
Property Monitoring: Real-time property listings, immediate notifications

3. Automated Testing

E2E Testing: End-to-end functional testing
Regression Testing: Batch replay test cases
UI Testing: Screenshot comparison, visual regression testing
Performance Testing: Page load time monitoring

4. AI Agent Development

Intelligent Assistant: Provide browser operation capabilities for AI assistants
Information Retrieval: AI auto-search and extract web information
Task Execution: AI-driven complex task automation
Knowledge Base Building: Auto-collect and organize knowledge content

5. Content Creation & Management

Auto Publishing: Batch publish blog, social media content
Content Collection: Collect quality content, assist creati...

Overview

BrowserWing enables AI agents to control browsers via MCP commands instead of slow, token-heavy step-by-step interactions.

By moving browser execution behind a command boundary, agents can operate faster with significantly fewer LLM calls.

Features

Define browser automation as reusable MCP commands
Execute browser actions with minimal LLM interaction
Local browser control for agent-driven workflows
Simple demo showcasing command-based automation

Motivation

Current browser agents rely on frequent LLM interactions for DOM inspection and reasoning, resulting in high latency and token usage.

BrowserWing reduces this overhead by letting LLMs focus on intent, while execution is handled by predefined commands.

Notes

Early-stage release; APIs may change
Limited command set in this version
Focused on local experimentation

Next Steps

Command composition and chaining
Improved authoring tools
More real-world automation examples

Feedback

Issues, discussions, and early feedback are highly appreciated.

Releases: browserwing/browserwing

v1.0.1-beta.2

v1.0.1-beta.2

New Features

Improvements

Installation

Uh oh!

v1.0.1-beta.1

新增功能

Uh oh!

Release v1.0.0

BrowserWing 1.0.0 Release Notes 🎉

🌟 Overview

✨ Core Features

1. 🤖 Built-in AI Agent

2. 🔌 Universal AI Tool Integration

Method 1: MCP Server (Recommended)

Method 2: Skills File

Method 3: HTTP API

3. 🎬 Visual Script Recording & Playback

Recording Features

Editing Features

Playback Features

Export Features

4. 🎯 Intelligent Data Extraction

5. 🔐 Session Management & Authentication

6. 📸 Screenshots & Debugging

7. 🚀 High-Performance Architecture

📋 Complete Feature List

Browser Control

Data Extraction

AI Capabilities

Scripts & Automation

Integration & Extension

🎯 Use Cases

1. RPA (Robotic Process Automation)

2. Data Collection & Monitoring

3. Automated Testing

4. AI Agent Development

5. Content Creation & Management

Contributors

Uh oh!

v0.0.1 — Initial Public Release

Overview

Features

Motivation

Notes

Next Steps

Feedback

Uh oh!