-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[Feature] Add Agent.execute Hybrid CUA + DOM mode #1428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 0d9f8e4 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
can this also be made an arg to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 issues found across 19 files
Prompt for AI agents (all 9 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="packages/core/lib/v3/agent/tools/v3-dragAndDrop.ts">
<violation number="1" location="packages/core/lib/v3/agent/tools/v3-dragAndDrop.ts:12">
P2: Schema allows arrays of any length but code assumes exactly 2 elements. Use `z.tuple([z.number(), z.number()])` to enforce the expected structure and provide better validation errors if the AI model sends malformed input.</violation>
</file>
<file name="packages/core/lib/v3/agent/tools/v3-type.ts">
<violation number="1" location="packages/core/lib/v3/agent/tools/v3-type.ts:24">
P2: The `coordinates` schema allows arrays of any length but the code expects exactly 2 elements. Consider using `z.tuple([z.number(), z.number()])` to enforce the expected structure and get proper TypeScript typing.</violation>
</file>
<file name="packages/core/lib/v3/agent/tools/v3-scroll.ts">
<violation number="1" location="packages/core/lib/v3/agent/tools/v3-scroll.ts:71">
P2: The `coordinates` array schema doesn't enforce length. If fewer than 2 elements are provided, `coordinates[1]` will be `undefined`, causing `processCoordinates` to return NaN values. Consider using `z.tuple([z.number(), z.number()])` to ensure exactly 2 coordinates are provided.</violation>
</file>
<file name="packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts">
<violation number="1" location="packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts:47">
P2: The `clickAndHold` tool is available for hybrid mode (per the tools/index.ts registration) but is not documented in the hybrid mode tools section. The agent won't be aware this tool exists.</violation>
</file>
<file name="packages/core/lib/v3/agent/tools/v3-fillFormVision.ts">
<violation number="1" location="packages/core/lib/v3/agent/tools/v3-fillFormVision.ts:75">
P2: Missing Google provider delay after click. The `v3-type.ts` tool adds a 1000ms delay for Google models after clicking before typing, but this logic is missing here. This could cause typing issues when using Google models with the fillFormVision tool.</violation>
</file>
<file name="packages/core/lib/v3/tests/agent-hybrid-mode.spec.ts">
<violation number="1" location="packages/core/lib/v3/tests/agent-hybrid-mode.spec.ts:234">
P2: Test assertion doesn't match the comment and instruction. The comment says 'Should include screenshot' and the instruction explicitly asks the agent to 'Take a screenshot', but the assertion only verifies `close` was called. Consider adding an assertion to verify screenshot was used, or update the comment if screenshot verification is intentionally omitted.</violation>
</file>
<file name="packages/core/lib/v3/agent/tools/v3-clickAndHold.ts">
<violation number="1" location="packages/core/lib/v3/agent/tools/v3-clickAndHold.ts:19">
P2: The coordinates schema allows arrays of any length but the code assumes exactly 2 elements. Consider using `.length(2)` to validate the array size, or use a tuple schema for stronger typing: `z.tuple([z.number(), z.number()])`.</violation>
</file>
<file name="packages/core/lib/v3/agent/tools/v3-click.ts">
<violation number="1" location="packages/core/lib/v3/agent/tools/v3-click.ts:23">
P2: The coordinates schema `z.array(z.number())` doesn't enforce exactly 2 elements. If an LLM returns an empty or single-element array, `coordinates[1]` will be `undefined`, causing issues in `processCoordinates`. Use `z.tuple([z.number(), z.number()])` to enforce the expected (x, y) pair.</violation>
</file>
<file name="packages/core/lib/v3/agent/tools/index.ts">
<violation number="1" location="packages/core/lib/v3/agent/tools/index.ts:50">
P1: Missing `delete filtered.act;` in hybrid mode. According to the PR description, `act` is a DOM-only tool and should not be available in hybrid mode. Currently only `fillForm` is removed, leaving `act` incorrectly available.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| case "windows": | ||
| return "Meta"; | ||
| case "ctrl": | ||
| case "control": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think worth adding support for ControlOrMeta like playwright does. LLMs know to use it for cross-platform macos/linux/windows keypresses because it's used often in playwright scripts.
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [#1461](#1461) [`0f3991e`](0f3991e) Thanks [@tkattkat](https://github.com/tkattkat)! - Move hybrid mode out of experimental - [#1433](#1433) [`e0e22e0`](e0e22e0) Thanks [@tkattkat](https://github.com/tkattkat)! - Put hybrid mode behind experimental - [#1456](#1456) [`f261051`](f261051) Thanks [@shrey150](https://github.com/shrey150)! - Invoke page.hover for agent move action - [#1473](#1473) [`e021674`](e021674) Thanks [@shrey150](https://github.com/shrey150)! - Add safety confirmation support for OpenAI + Google CUA - [#1399](#1399) [`6a5496f`](6a5496f) Thanks [@tkattkat](https://github.com/tkattkat)! - Ensure cua agent is killed when stagehand.close is called - [#1436](#1436) [`fea1700`](fea1700) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix auto-load key for act/extract/observe parametrized models on api - [#1439](#1439) [`5b288d9`](5b288d9) Thanks [@tkattkat](https://github.com/tkattkat)! - Remove base64 from agent actions array ( still present in messages object ) - [#1408](#1408) [`e822f5a`](e822f5a) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow for act() cache hit when variable values change - [#1472](#1472) [`638efc7`](638efc7) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: agent cache not refreshed on action failure - [#1424](#1424) [`a890f16`](a890f16) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: "Error: -32000 Failed to convert response to JSON: CBOR: stack limit exceeded" - [#1418](#1418) [`934f492`](934f492) Thanks [@miguelg719](https://github.com/miguelg719)! - Cleanup handlers and bus listeners on close - [#1430](#1430) [`bd2db92`](bd2db92) Thanks [@shrey150](https://github.com/shrey150)! - Fix CUA model coordinate translation - [#1465](#1465) [`51e0170`](51e0170) Thanks [@miguelg719](https://github.com/miguelg719)! - Add media resolution high provider option to gemini 3 hybrid agent - [#1431](#1431) [`05f5580`](05f5580) Thanks [@tkattkat](https://github.com/tkattkat)! - Update the cache handling for agent - [#1432](#1432) [`f56a9c2`](f56a9c2) Thanks [@tkattkat](https://github.com/tkattkat)! - Deprecate cua: true in favor of mode: "cua" - [#1406](#1406) [`b40ae11`](b40ae11) Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for hovering with coordinates ( page.hover ) - [#1407](#1407) [`0d2b398`](0d2b398) Thanks [@tkattkat](https://github.com/tkattkat)! - Clean up page methods - [#1412](#1412) [`cd01f29`](cd01f29) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: load GOOGLE_API_KEY from .env - [#1462](#1462) [`a734fca`](a734fca) Thanks [@shrey150](https://github.com/shrey150)! - fix: correctly pass userDataDir to chrome launcher - [#1466](#1466) [`b342acf`](b342acf) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move playwright to optional dependencies - [#1440](#1440) [`2987cd1`](2987cd1) Thanks [@tkattkat](https://github.com/tkattkat)! - [Feature] support excluding tools from agent - [#1455](#1455) [`dfab1d5`](dfab1d5) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update aisdk client to better enforce structured output with deepseek models - [#1428](#1428) [`4d71162`](4d71162) Thanks [@tkattkat](https://github.com/tkattkat)! - Add "hybrid" mode to stagehand agent ## @browserbasehq/[email protected] ### Minor Changes - [#1459](#1459) [`abb3469`](abb3469) Thanks [@monadoid](https://github.com/monadoid)! - Added building of binaries - [#1457](#1457) [`5fc1281`](5fc1281) Thanks [@monadoid](https://github.com/monadoid)! - First changeset for stagehand-server - [#1469](#1469) [`d634d45`](d634d45) Thanks [@monadoid](https://github.com/monadoid)! - Bump to test binary builds ### Patch Changes - Updated dependencies \[[`0f3991e`](0f3991e), [`e0e22e0`](e0e22e0), [`f261051`](f261051), [`e021674`](e021674), [`6a5496f`](6a5496f), [`fea1700`](fea1700), [`5b288d9`](5b288d9), [`e822f5a`](e822f5a), [`638efc7`](638efc7), [`a890f16`](a890f16), [`934f492`](934f492), [`bd2db92`](bd2db92), [`51e0170`](51e0170), [`05f5580`](05f5580), [`f56a9c2`](f56a9c2), [`b40ae11`](b40ae11), [`0d2b398`](0d2b398), [`cd01f29`](cd01f29), [`a734fca`](a734fca), [`b342acf`](b342acf), [`2987cd1`](2987cd1), [`dfab1d5`](dfab1d5), [`4d71162`](4d71162)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - [#1373](#1373) [`cadd192`](cadd192) Thanks [@tkattkat](https://github.com/tkattkat)! - Update screenshot collector in agent evals cli - Updated dependencies \[[`0f3991e`](0f3991e), [`e0e22e0`](e0e22e0), [`f261051`](f261051), [`e021674`](e021674), [`6a5496f`](6a5496f), [`fea1700`](fea1700), [`5b288d9`](5b288d9), [`e822f5a`](e822f5a), [`638efc7`](638efc7), [`a890f16`](a890f16), [`934f492`](934f492), [`bd2db92`](bd2db92), [`51e0170`](51e0170), [`05f5580`](05f5580), [`f56a9c2`](f56a9c2), [`b40ae11`](b40ae11), [`0d2b398`](0d2b398), [`cd01f29`](cd01f29), [`a734fca`](a734fca), [`b342acf`](b342acf), [`2987cd1`](2987cd1), [`dfab1d5`](dfab1d5), [`4d71162`](4d71162)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Hybrid Mode & Agent Tooling Improvements
Introduces a new
hybridmode for agents that uses coordinate-based interactions (click, type, drag) alongside the existing DOM-baseddommode (act, fillForm). Also adds Brave Search integration and improves keyboard event handling.Usage
What's New
Hybrid Mode Tools
click,type,dragAndDrop,clickAndHold,fillFormVision- coordinate-based interactionsUniversal Tools (both modes)
keys- keyboard input (type text or press keys)think- internal reasoning/planningsearch- Brave web search (only enabled whenBRAVE_API_KEYis provided in env)Other Improvements
key/code/keyCodeevents for better site compatibilityAgentToolCall,AgentToolResult,AgentToolTypesMapTool Availability by Mode
act,fillFormclick,type,dragAndDrop,clickAndHold,fillFormVisionCommon:
ariaTree,screenshot,extract,goto,scroll,wait,navback,close,keys,think,searchTests
20 new tests for hybrid mode functionality
Summary by cubic
Introduces a new hybrid agent mode with reliable coordinate-based interactions alongside the existing DOM mode, plus Brave Search support and improved keyboard handling for better site compatibility.
New Features
Refactors
Written for commit 89cb6af. Summary will update automatically on new commits.