-
Notifications
You must be signed in to change notification settings - Fork 2
feat(browser): browser automation skill via playwright-mcp integration #2186
Description
Summary
Zeph's current web capabilities are limited to static HTTP scraping (web_scrape / fetch). This covers about 50% of real-world use cases. Modern SPAs (React/Vue/Next.js), authenticated workflows, dynamic content, and form interactions require a real browser.
This issue proposes integrating browser automation via @playwright/mcp as an optional MCP server, complemented by a new browser skill that teaches the LLM a cost-aware escalation strategy.
Current state
WebScrapeExecutor (crates/zeph-tools/src/scrape.rs):
- Static HTML + CSS selector extraction
- HTTPS-only + SSRF protection
- Fast and token-efficient
Gaps:
- No JS rendering — SPAs return empty/broken HTML
- No interaction — cannot click, type, authenticate
- Blocked by Cloudflare and bot-detection
Proposed solution
Phase 1 — MCP config + skill (zero new Rust code)
Add @playwright/mcp as an optional pre-configured [[mcp.servers]] entry:
[[mcp.servers]]
id = "browser"
transport = "stdio"
command = "npx"
args = ["@playwright/mcp@latest", "--headless"]Or via Docker (headless, no Node.js on host):
docker run -d -p 8931:8931 mcr.microsoft.com/playwright/mcp cli.js \
--headless --browser chromium --no-sandbox --port 8931Create .zeph/skills/browser/SKILL.md with a decision tree:
| Scenario | Tool |
|---|---|
| Static HTML | web_scrape (fast, no overhead) |
| SPA / JS-rendered page | browser_navigate + browser_snapshot |
| Form fill / login flow | browser_click + browser_type |
| Visual capture | browser_take_screenshot |
| JS data extraction | browser_evaluate |
Key playwright-mcp tools to expose (core group only, 19 tools):
browser_navigate,browser_snapshot(accessibility tree — token-efficient)browser_click,browser_type,browser_hover,browser_select_optionbrowser_evaluate(arbitrary JS execution)browser_take_screenshot,browser_console_messages,browser_wait_for- Tab management:
browser_new_tab,browser_close_tab,browser_tab_list
Phase 2 — BrowserConfig + init wizard
- Add
[browser]config section tozeph-tools/src/config.rs - Wire into
--initwizard: detect Node.js/Docker, offer auto-config - Wire into
--migrate-configfor adding[browser]defaults
Proposed config schema:
[browser]
enabled = false
transport = "stdio" # "stdio" | "http"
command = "npx"
args = ["@playwright/mcp@latest", "--headless"]
url = "" # for http transport
caps = [] # optional: ["vision", "pdf"]
max_tabs = 5Phase 3 — Native BrowserExecutor (optional)
If MCP latency or Node.js dependency is unacceptable: implement crates/zeph-tools/src/browser.rs as a native ToolExecutor using a Rust WebDriver/CDP crate. Only pursue if Phase 1 proves insufficient.
Why playwright-mcp
- Maintained by Microsoft (Playwright team); GitHub Copilot and Claude Code use it natively
- Both stdio and HTTP/SSE transports — directly compatible with Zeph's rmcp client
- Accessibility snapshot mode (default): structured refs, ~4x fewer tokens than screenshot approach
- Official Docker image for headless deployment
- Apache 2.0 license
Alternatives evaluated:
@modelcontextprotocol/server-puppeteer— deprecated (archived May 2025)browserbase— paid cloud, vendor lock-inbrowsermcp.io— desktop Chrome extension only, not headlessrust-browser-mcp— community project, immature, WebDriver limitations
Acceptance criteria
-
[[mcp.servers]]example for playwright-mcp inconfig.toml.example/ docs -
.zeph/skills/browser/SKILL.mdwith escalation decision tree and tool usage guide -
[browser]config section inBrowserConfigstruct -
--initwizard detects Node.js/Docker and offers browser auto-config -
--migrate-configadds[browser]defaults - Docs:
docs/src/configuration.mdbrowser section - Live session test: navigate to a JS-rendered page, extract content via
browser_snapshot
Open questions
- Should browser state (cookies, localStorage) persist across agent turns?
- Screenshots: inline base64 in
ToolOutputor written to.local/+ referenced by path? - Token budget: should
browser_snapshotoutput be summarized before feeding to LLM on large pages? - SSRF: browser can access internal network — should the same SSRF rules from
WebScrapeExecutorapply via skill constraints?
Research notes
Full research report: .local/reports/browser-skill-research.md