[Feature] Add Agent.execute Hybrid CUA + DOM mode #1428

tkattkat · 2025-12-17T21:44:21Z

Hybrid Mode & Agent Tooling Improvements

Introduces a new hybrid mode for agents that uses coordinate-based interactions (click, type, drag) alongside the existing DOM-based dom mode (act, fillForm). Also adds Brave Search integration and improves keyboard event handling.

Usage

// DOM mode (default) - structured DOM interactions
const agent = stagehand.agent({ mode: "dom"});

// Hybrid mode - coordinate/screenshot-based interactions  
const agent = stagehand.agent({ mode: "hybrid"});

What's New

Hybrid Mode Tools

click, type, dragAndDrop, clickAndHold, fillFormVision - coordinate-based interactions

Universal Tools (both modes)

keys - keyboard input (type text or press keys)
think - internal reasoning/planning
search - Brave web search (only enabled when BRAVE_API_KEY is provided in env)

Other Improvements

Dynamic system prompts - system prompt is dynamically created based on which mode is used
Enhanced keyboard handling - full key/code/keyCode events for better site compatibility
Coordinate normalization - handles Google model's 0-1000 coordinate space
Better types - exported AgentToolCall, AgentToolResult, AgentToolTypesMap

Tool Availability by Mode

DOM Mode	Hybrid Mode
`act`, `fillForm`	`click`, `type`, `dragAndDrop`, `clickAndHold`, `fillFormVision`

Common: ariaTree, screenshot, extract, goto, scroll, wait, navback, close, keys, think, search

Tests

20 new tests for hybrid mode functionality

Summary by cubic

Introduces a new hybrid agent mode with reliable coordinate-based interactions alongside the existing DOM mode, plus Brave Search support and improved keyboard handling for better site compatibility.

New Features
- Hybrid mode: click, type, dragAndDrop, clickAndHold, fillFormVision with Google-friendly coordinate normalization and optional cursor overlay.
- Mode-aware tools: DOM keeps act/fillForm; hybrid swaps in vision scroll; universal keys and think; search enabled when BRAVE_API_KEY is set.
- Dynamic system prompts per mode, with optional Browserbase captcha guidance.
- Typed exports for agent tool calls/results: AgentTools, AgentToolTypesMap, AgentUITools, AgentToolCall, AgentToolResult.
Refactors
- Centralized prompt builder and tool filtering; handler passes mode/provider to tools.
- Keyboard events now include key, code, and Windows virtual key codes for broader compatibility.
- Added hybrid-mode tests and public type tests.

^{Written for commit 89cb6af. Summary will update automatically on new commits.}

changeset-bot · 2025-12-17T21:44:24Z

🦋 Changeset detected

Latest commit: 0d9f8e4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@browserbasehq/stagehand	Patch
@browserbasehq/stagehand-evals	Patch
@browserbasehq/stagehand-server	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pirate · 2025-12-17T21:47:02Z

can this also be made an arg to agent.execute('...', {hybrid: true | false})? would be easier to expose via API if we dont rely on agent init params

cubic-dev-ai

9 issues found across 19 files

Prompt for AI agents (all 9 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/lib/v3/agent/tools/v3-dragAndDrop.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-dragAndDrop.ts:12">
P2: Schema allows arrays of any length but code assumes exactly 2 elements. Use `z.tuple([z.number(), z.number()])` to enforce the expected structure and provide better validation errors if the AI model sends malformed input.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-type.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-type.ts:24">
P2: The `coordinates` schema allows arrays of any length but the code expects exactly 2 elements. Consider using `z.tuple([z.number(), z.number()])` to enforce the expected structure and get proper TypeScript typing.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-scroll.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-scroll.ts:71">
P2: The `coordinates` array schema doesn&#39;t enforce length. If fewer than 2 elements are provided, `coordinates[1]` will be `undefined`, causing `processCoordinates` to return NaN values. Consider using `z.tuple([z.number(), z.number()])` to ensure exactly 2 coordinates are provided.</violation>
</file>

<file name="packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts">

<violation number="1" location="packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts:47">
P2: The `clickAndHold` tool is available for hybrid mode (per the tools/index.ts registration) but is not documented in the hybrid mode tools section. The agent won&#39;t be aware this tool exists.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-fillFormVision.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-fillFormVision.ts:75">
P2: Missing Google provider delay after click. The `v3-type.ts` tool adds a 1000ms delay for Google models after clicking before typing, but this logic is missing here. This could cause typing issues when using Google models with the fillFormVision tool.</violation>
</file>

<file name="packages/core/lib/v3/tests/agent-hybrid-mode.spec.ts">

<violation number="1" location="packages/core/lib/v3/tests/agent-hybrid-mode.spec.ts:234">
P2: Test assertion doesn&#39;t match the comment and instruction. The comment says &#39;Should include screenshot&#39; and the instruction explicitly asks the agent to &#39;Take a screenshot&#39;, but the assertion only verifies `close` was called. Consider adding an assertion to verify screenshot was used, or update the comment if screenshot verification is intentionally omitted.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-clickAndHold.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-clickAndHold.ts:19">
P2: The coordinates schema allows arrays of any length but the code assumes exactly 2 elements. Consider using `.length(2)` to validate the array size, or use a tuple schema for stronger typing: `z.tuple([z.number(), z.number()])`.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/v3-click.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/v3-click.ts:23">
P2: The coordinates schema `z.array(z.number())` doesn&#39;t enforce exactly 2 elements. If an LLM returns an empty or single-element array, `coordinates[1]` will be `undefined`, causing issues in `processCoordinates`. Use `z.tuple([z.number(), z.number()])` to enforce the expected (x, y) pair.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/index.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/index.ts:50">
P1: Missing `delete filtered.act;` in hybrid mode. According to the PR description, `act` is a DOM-only tool and should not be available in hybrid mode. Currently only `fillForm` is removed, leaving `act` incorrectly available.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

packages/core/lib/v3/agent/tools/dragAndDrop.ts

packages/core/lib/v3/agent/tools/type.ts

packages/core/lib/v3/agent/tools/scroll.ts

packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts

packages/core/lib/v3/agent/tools/fillFormVision.ts

packages/core/lib/v3/tests/agent-hybrid-mode.spec.ts

packages/core/lib/v3/agent/tools/clickAndHold.ts

packages/core/lib/v3/agent/tools/click.ts

packages/core/lib/v3/agent/tools/index.ts

packages/core/lib/v3/handlers/v3AgentHandler.ts

pirate · 2025-12-17T23:29:57Z

packages/core/lib/v3/understudy/page.ts

      case "windows":
        return "Meta";
      case "ctrl":
+      case "control":


I think worth adding support for ControlOrMeta like playwright does. LLMs know to use it for cross-platform macos/linux/windows keypresses because it's used often in playwright scripts.

packages/core/lib/v3/agent/tools/search.ts

@tkattkat

tkattkat added 4 commits December 17, 2025 12:06

hybrid

af08c17

key changes

c91a02b

cleanup some code

21fcbde

update public method tests

9fbdaed

changeset

9bac946

format

27764a0

cubic-dev-ai bot reviewed Dec 17, 2025

View reviewed changes

pirate reviewed Dec 17, 2025

View reviewed changes

packages/core/lib/v3/handlers/v3AgentHandler.ts Outdated Show resolved Hide resolved

tkattkat added 2 commits December 17, 2025 14:14

cubic / pirate comments

dfa501e

update test

ff41003

pirate changed the title ~~Hybrid~~ [Feature] Add Agent.execute Hybrid CUA + DOM mode Dec 17, 2025

pirate reviewed Dec 17, 2025

View reviewed changes

packages/core/lib/v3/agent/tools/search.ts Show resolved Hide resolved

tkattkat added 3 commits December 17, 2025 15:45

update tool naming

20268f6

format

89cb6af

move mode back to stagehand agent definition

0d9f8e4

pirate approved these changes Dec 18, 2025

View reviewed changes

tkattkat merged commit 4d71162 into main Dec 18, 2025
28 of 29 checks passed

This was referenced Dec 18, 2025

Version Packages #1414

Merged

Version Packages CloudEngineHub/stagehand#1

Open

Version Packages kenchikuliu/stagehand#1

Open

Version Packages shaneholloman/stagehand#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Add Agent.execute Hybrid CUA + DOM mode #1428

[Feature] Add Agent.execute Hybrid CUA + DOM mode #1428

Uh oh!

tkattkat commented Dec 17, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

pirate commented Dec 17, 2025

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pirate Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feature] Add Agent.execute Hybrid CUA + DOM mode #1428

[Feature] Add Agent.execute Hybrid CUA + DOM mode #1428

Uh oh!

Conversation

tkattkat commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hybrid Mode & Agent Tooling Improvements

Usage

What's New

Hybrid Mode Tools

Universal Tools (both modes)

Other Improvements

Tool Availability by Mode

Tests

Summary by cubic

Uh oh!

changeset-bot bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pirate commented Dec 17, 2025

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pirate Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tkattkat commented Dec 17, 2025 •

edited

Loading

changeset-bot bot commented Dec 17, 2025 •

edited

Loading