Skip to content

Conversation

@tkattkat
Copy link
Collaborator

@tkattkat tkattkat commented Dec 17, 2025

Hybrid Mode & Agent Tooling Improvements

Introduces a new hybrid mode for agents that uses coordinate-based interactions (click, type, drag) alongside the existing DOM-based dom mode (act, fillForm). Also adds Brave Search integration and improves keyboard event handling.

Usage

// DOM mode (default) - structured DOM interactions
const agent = stagehand.agent({ mode: "dom"});

// Hybrid mode - coordinate/screenshot-based interactions  
const agent = stagehand.agent({ mode: "hybrid"});

What's New

Hybrid Mode Tools

  • click, type, dragAndDrop, clickAndHold, fillFormVision - coordinate-based interactions

Universal Tools (both modes)

  • keys - keyboard input (type text or press keys)
  • think - internal reasoning/planning
  • search - Brave web search (only enabled when BRAVE_API_KEY is provided in env)

Other Improvements

  • Dynamic system prompts - system prompt is dynamically created based on which mode is used
  • Enhanced keyboard handling - full key/code/keyCode events for better site compatibility
  • Coordinate normalization - handles Google model's 0-1000 coordinate space
  • Better types - exported AgentToolCall, AgentToolResult, AgentToolTypesMap

Tool Availability by Mode

DOM Mode Hybrid Mode
act, fillForm click, type, dragAndDrop, clickAndHold, fillFormVision

Common: ariaTree, screenshot, extract, goto, scroll, wait, navback, close, keys, think, search

Tests

20 new tests for hybrid mode functionality


Summary by cubic

Introduces a new hybrid agent mode with reliable coordinate-based interactions alongside the existing DOM mode, plus Brave Search support and improved keyboard handling for better site compatibility.

  • New Features

    • Hybrid mode: click, type, dragAndDrop, clickAndHold, fillFormVision with Google-friendly coordinate normalization and optional cursor overlay.
    • Mode-aware tools: DOM keeps act/fillForm; hybrid swaps in vision scroll; universal keys and think; search enabled when BRAVE_API_KEY is set.
    • Dynamic system prompts per mode, with optional Browserbase captcha guidance.
    • Typed exports for agent tool calls/results: AgentTools, AgentToolTypesMap, AgentUITools, AgentToolCall, AgentToolResult.
  • Refactors

    • Centralized prompt builder and tool filtering; handler passes mode/provider to tools.
    • Keyboard events now include key, code, and Windows virtual key codes for broader compatibility.
    • Added hybrid-mode tests and public type tests.

Written for commit 89cb6af. Summary will update automatically on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Dec 17, 2025

🦋 Changeset detected

Latest commit: 0d9f8e4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pirate
Copy link
Member

pirate commented Dec 17, 2025

can this also be made an arg to agent.execute('...', {hybrid: true | false})? would be easier to expose via API if we dont rely on agent init params

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 issues found across 19 files

Prompt for AI agents (all 9 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/lib/v3/agent/tools/v3-dragAndDrop.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-dragAndDrop.ts:12">
P2: Schema allows arrays of any length but code assumes exactly 2 elements. Use `z.tuple([z.number(), z.number()])` to enforce the expected structure and provide better validation errors if the AI model sends malformed input.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-type.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-type.ts:24">
P2: The `coordinates` schema allows arrays of any length but the code expects exactly 2 elements. Consider using `z.tuple([z.number(), z.number()])` to enforce the expected structure and get proper TypeScript typing.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-scroll.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-scroll.ts:71">
P2: The `coordinates` array schema doesn&#39;t enforce length. If fewer than 2 elements are provided, `coordinates[1]` will be `undefined`, causing `processCoordinates` to return NaN values. Consider using `z.tuple([z.number(), z.number()])` to ensure exactly 2 coordinates are provided.</violation>
</file>

<file name="packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts">

<violation number="1" location="packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts:47">
P2: The `clickAndHold` tool is available for hybrid mode (per the tools/index.ts registration) but is not documented in the hybrid mode tools section. The agent won&#39;t be aware this tool exists.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-fillFormVision.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-fillFormVision.ts:75">
P2: Missing Google provider delay after click. The `v3-type.ts` tool adds a 1000ms delay for Google models after clicking before typing, but this logic is missing here. This could cause typing issues when using Google models with the fillFormVision tool.</violation>
</file>

<file name="packages/core/lib/v3/tests/agent-hybrid-mode.spec.ts">

<violation number="1" location="packages/core/lib/v3/tests/agent-hybrid-mode.spec.ts:234">
P2: Test assertion doesn&#39;t match the comment and instruction. The comment says &#39;Should include screenshot&#39; and the instruction explicitly asks the agent to &#39;Take a screenshot&#39;, but the assertion only verifies `close` was called. Consider adding an assertion to verify screenshot was used, or update the comment if screenshot verification is intentionally omitted.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-clickAndHold.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-clickAndHold.ts:19">
P2: The coordinates schema allows arrays of any length but the code assumes exactly 2 elements. Consider using `.length(2)` to validate the array size, or use a tuple schema for stronger typing: `z.tuple([z.number(), z.number()])`.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-click.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-click.ts:23">
P2: The coordinates schema `z.array(z.number())` doesn&#39;t enforce exactly 2 elements. If an LLM returns an empty or single-element array, `coordinates[1]` will be `undefined`, causing issues in `processCoordinates`. Use `z.tuple([z.number(), z.number()])` to enforce the expected (x, y) pair.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/index.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/index.ts:50">
P1: Missing `delete filtered.act;` in hybrid mode. According to the PR description, `act` is a DOM-only tool and should not be available in hybrid mode. Currently only `fillForm` is removed, leaving `act` incorrectly available.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

@pirate pirate changed the title Hybrid [Feature] Add Agent.execute Hybrid CUA + DOM mode Dec 17, 2025
case "windows":
return "Meta";
case "ctrl":
case "control":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think worth adding support for ControlOrMeta like playwright does. LLMs know to use it for cross-platform macos/linux/windows keypresses because it's used often in playwright scripts.

@tkattkat tkattkat merged commit 4d71162 into main Dec 18, 2025
28 of 29 checks passed
miguelg719 pushed a commit that referenced this pull request Dec 27, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/[email protected]

### Patch Changes

- [#1461](#1461)
[`0f3991e`](0f3991e)
Thanks [@tkattkat](https://github.com/tkattkat)! - Move hybrid mode out
of experimental

- [#1433](#1433)
[`e0e22e0`](e0e22e0)
Thanks [@tkattkat](https://github.com/tkattkat)! - Put hybrid mode
behind experimental

- [#1456](#1456)
[`f261051`](f261051)
Thanks [@shrey150](https://github.com/shrey150)! - Invoke page.hover for
agent move action

- [#1473](#1473)
[`e021674`](e021674)
Thanks [@shrey150](https://github.com/shrey150)! - Add safety
confirmation support for OpenAI + Google CUA

- [#1399](#1399)
[`6a5496f`](6a5496f)
Thanks [@tkattkat](https://github.com/tkattkat)! - Ensure cua agent is
killed when stagehand.close is called

- [#1436](#1436)
[`fea1700`](fea1700)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix auto-load key
for act/extract/observe parametrized models on api

- [#1439](#1439)
[`5b288d9`](5b288d9)
Thanks [@tkattkat](https://github.com/tkattkat)! - Remove base64 from
agent actions array ( still present in messages object )

- [#1408](#1408)
[`e822f5a`](e822f5a)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow for
act() cache hit when variable values change

- [#1472](#1472)
[`638efc7`](638efc7)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: agent
cache not refreshed on action failure

- [#1424](#1424)
[`a890f16`](a890f16)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix:
"Error: -32000 Failed to convert response to JSON: CBOR: stack limit
exceeded"

- [#1418](#1418)
[`934f492`](934f492)
Thanks [@miguelg719](https://github.com/miguelg719)! - Cleanup handlers
and bus listeners on close

- [#1430](#1430)
[`bd2db92`](bd2db92)
Thanks [@shrey150](https://github.com/shrey150)! - Fix CUA model
coordinate translation

- [#1465](#1465)
[`51e0170`](51e0170)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add media
resolution high provider option to gemini 3 hybrid agent

- [#1431](#1431)
[`05f5580`](05f5580)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update the cache
handling for agent

- [#1432](#1432)
[`f56a9c2`](f56a9c2)
Thanks [@tkattkat](https://github.com/tkattkat)! - Deprecate cua: true
in favor of mode: "cua"

- [#1406](#1406)
[`b40ae11`](b40ae11)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
hovering with coordinates ( page.hover )

- [#1407](#1407)
[`0d2b398`](0d2b398)
Thanks [@tkattkat](https://github.com/tkattkat)! - Clean up page methods

- [#1412](#1412)
[`cd01f29`](cd01f29)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: load
GOOGLE_API_KEY from .env

- [#1462](#1462)
[`a734fca`](a734fca)
Thanks [@shrey150](https://github.com/shrey150)! - fix: correctly pass
userDataDir to chrome launcher

- [#1466](#1466)
[`b342acf`](b342acf)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move
playwright to optional dependencies

- [#1440](#1440)
[`2987cd1`](2987cd1)
Thanks [@tkattkat](https://github.com/tkattkat)! - [Feature] support
excluding tools from agent

- [#1455](#1455)
[`dfab1d5`](dfab1d5)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update
aisdk client to better enforce structured output with deepseek models

- [#1428](#1428)
[`4d71162`](4d71162)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add "hybrid" mode to
stagehand agent

## @browserbasehq/[email protected]

### Minor Changes

- [#1459](#1459)
[`abb3469`](abb3469)
Thanks [@monadoid](https://github.com/monadoid)! - Added building of
binaries

- [#1457](#1457)
[`5fc1281`](5fc1281)
Thanks [@monadoid](https://github.com/monadoid)! - First changeset for
stagehand-server

- [#1469](#1469)
[`d634d45`](d634d45)
Thanks [@monadoid](https://github.com/monadoid)! - Bump to test binary
builds

### Patch Changes

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

## @browserbasehq/[email protected]

### Patch Changes

- [#1373](#1373)
[`cadd192`](cadd192)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update screenshot
collector in agent evals cli

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants