Skip to content

Conversation

@tkattkat
Copy link
Collaborator

@tkattkat tkattkat commented Dec 18, 2025

why

add support for keys cache

what changed

save click , and type actions in act cache for agent

test plan

ran on sign in example with click type and keys , re ran using cache


Summary by cubic

Makes agent actions deterministic and cacheable by recording XPath-based “act” steps for clicks, typing, drag-and-drop, click-and-hold, and form fills, and adds replay support for key events. This improves cache hit rates and makes replays more reliable.

  • New Features
    • Record tools as “act” steps with Action objects (method, selector, args) using XPath captured via returnXpath.
    • Add AgentReplayKeysStep and AgentCache logic to replay type/press (with repeat support).
    • Introduce ensureXPath utility and use it across tools and v3CuaAgentHandler to normalize selectors.

Written for commit f62c537. Summary will update automatically on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Dec 18, 2025

🦋 Changeset detected

Latest commit: 29d57a3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 10 files

Prompt for AI agents (all 2 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/lib/v3/agent/tools/type.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/type.ts:65">
P2: The replay step is silently skipped when xpath is not available. Previously, the step was always recorded. This could cause incomplete replay caches with no logging or warning. Consider adding an else clause to log when xpath cannot be determined, or fall back to recording with coordinates.</violation>
</file>

<file name="packages/core/lib/v3/agent/tools/click.ts">

<violation number="1" location="packages/core/lib/v3/agent/tools/click.ts:61">
P1: Silent failure when xpath is unavailable - click succeeds but no replay step is recorded. The old code always recorded a replay step, but now if `xpath` is null/undefined/empty, caching is silently skipped. Consider adding a fallback to record using coordinates when xpath is unavailable, or at least log a warning.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 18, 2025

Greptile Summary

This PR enhances agent cache handling by converting coordinate-based actions (click, type, dragAndDrop, clickAndHold, fillFormVision) to XPath-based deterministic actions for reliable cache replay. The implementation extracts returnXpath from page interactions and normalizes them using a new ensureXPath utility function.

Key changes:

  • Created packages/core/lib/v3/agent/utils/xpath.ts utility to normalize XPath selectors
  • Modified agent tools to capture XPaths during execution and record as "act" steps instead of coordinate-based steps
  • Added AgentReplayKeysStep type and replay handler to support keyboard action caching
  • Refactored v3CuaAgentHandler to use the extracted ensureXPath utility

Architecture:
The changes shift from coordinate-based replay (brittle, viewport-dependent) to selector-based replay (deterministic, resilient to layout changes). When recording, tools now capture the XPath of interacted elements and store them as Action objects with selectors. During replay, takeDeterministicAction uses these selectors to locate and interact with elements reliably.

Trade-offs:
Actions silently skip cache recording if XPath extraction fails (returns null/empty), which could lead to cache misses without clear debugging signals. For fillFormVision, partial XPath failures result in incomplete action lists being cached.

Confidence Score: 4/5

  • Safe to merge with minor observability considerations
  • The implementation is sound and follows good patterns (utility extraction, proper type definitions, graceful fallbacks). The logic correctly handles XPath normalization and recording. Score of 4 (not 5) due to silent failures when XPath extraction fails - while this prevents crashes, it could make debugging cache misses difficult. The fillFormVision partial recording behavior could also cause subtle issues on replay.
  • Pay close attention to packages/core/lib/v3/agent/tools/fillFormVision.ts - the partial action recording behavior could cause incomplete form fills on cache replay if some XPath extractions fail

Important Files Changed

Filename Overview
packages/core/lib/v3/agent/utils/xpath.ts New utility function to normalize XPath selectors with proper xpath= prefix, extracted from v3CuaAgentHandler for reusability
packages/core/lib/v3/cache/AgentCache.ts Added replayAgentKeysStep method to handle replaying cached keyboard actions with proper text/key repetition logic
packages/core/lib/v3/agent/tools/click.ts Modified to capture XPath from click action and record as deterministic "act" step for cache replay; only records if xpath is valid
packages/core/lib/v3/agent/tools/type.ts Modified to capture XPath from click before typing and record as deterministic "act" step; silently skips recording if xpath is invalid
packages/core/lib/v3/agent/tools/fillFormVision.ts Modified to capture XPaths for each form field and record as single "act" step with multiple actions; may record partial actions if some xpaths fail

Sequence Diagram

sequenceDiagram
    participant Agent as Agent Tool (click/type/etc)
    participant Page as Page (Understudy)
    participant XPath as ensureXPath Util
    participant Recording as v3.recordAgentReplayStep
    participant Cache as AgentCache
    participant Replay as Cache Replay

    Note over Agent,Replay: Recording Phase (First Execution)
    Agent->>Page: click(x, y, {returnXpath: true})
    Page-->>Agent: xpath string
    Agent->>XPath: ensureXPath(xpath)
    XPath-->>Agent: normalized xpath with "xpath=" prefix (or null)
    alt xpath is valid
        Agent->>Recording: recordAgentReplayStep({type: "act", actions: [{selector: xpath, method: "click", ...}]})
        Recording->>Cache: store step with xpath-based Action
    else xpath is null/invalid
        Note over Agent,Recording: Skip recording (silent failure)
    end

    Note over Agent,Replay: Replay Phase (Cache Hit)
    Cache->>Replay: replayAgentActStep(step)
    Replay->>Page: takeDeterministicAction(action with xpath selector)
    Page->>Page: Locate element by xpath and execute action
    Note over Page: Action replayed deterministically using selector
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Member

@pirate pirate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved but I would recommend recording the typing step even if we cant get a normalized xpath, typing into the page without a specific focus is still better than not typing at all

@tkattkat tkattkat merged commit 05f5580 into main Dec 18, 2025
26 checks passed
miguelg719 pushed a commit that referenced this pull request Dec 27, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/[email protected]

### Patch Changes

- [#1461](#1461)
[`0f3991e`](0f3991e)
Thanks [@tkattkat](https://github.com/tkattkat)! - Move hybrid mode out
of experimental

- [#1433](#1433)
[`e0e22e0`](e0e22e0)
Thanks [@tkattkat](https://github.com/tkattkat)! - Put hybrid mode
behind experimental

- [#1456](#1456)
[`f261051`](f261051)
Thanks [@shrey150](https://github.com/shrey150)! - Invoke page.hover for
agent move action

- [#1473](#1473)
[`e021674`](e021674)
Thanks [@shrey150](https://github.com/shrey150)! - Add safety
confirmation support for OpenAI + Google CUA

- [#1399](#1399)
[`6a5496f`](6a5496f)
Thanks [@tkattkat](https://github.com/tkattkat)! - Ensure cua agent is
killed when stagehand.close is called

- [#1436](#1436)
[`fea1700`](fea1700)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix auto-load key
for act/extract/observe parametrized models on api

- [#1439](#1439)
[`5b288d9`](5b288d9)
Thanks [@tkattkat](https://github.com/tkattkat)! - Remove base64 from
agent actions array ( still present in messages object )

- [#1408](#1408)
[`e822f5a`](e822f5a)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow for
act() cache hit when variable values change

- [#1472](#1472)
[`638efc7`](638efc7)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: agent
cache not refreshed on action failure

- [#1424](#1424)
[`a890f16`](a890f16)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix:
"Error: -32000 Failed to convert response to JSON: CBOR: stack limit
exceeded"

- [#1418](#1418)
[`934f492`](934f492)
Thanks [@miguelg719](https://github.com/miguelg719)! - Cleanup handlers
and bus listeners on close

- [#1430](#1430)
[`bd2db92`](bd2db92)
Thanks [@shrey150](https://github.com/shrey150)! - Fix CUA model
coordinate translation

- [#1465](#1465)
[`51e0170`](51e0170)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add media
resolution high provider option to gemini 3 hybrid agent

- [#1431](#1431)
[`05f5580`](05f5580)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update the cache
handling for agent

- [#1432](#1432)
[`f56a9c2`](f56a9c2)
Thanks [@tkattkat](https://github.com/tkattkat)! - Deprecate cua: true
in favor of mode: "cua"

- [#1406](#1406)
[`b40ae11`](b40ae11)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
hovering with coordinates ( page.hover )

- [#1407](#1407)
[`0d2b398`](0d2b398)
Thanks [@tkattkat](https://github.com/tkattkat)! - Clean up page methods

- [#1412](#1412)
[`cd01f29`](cd01f29)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: load
GOOGLE_API_KEY from .env

- [#1462](#1462)
[`a734fca`](a734fca)
Thanks [@shrey150](https://github.com/shrey150)! - fix: correctly pass
userDataDir to chrome launcher

- [#1466](#1466)
[`b342acf`](b342acf)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move
playwright to optional dependencies

- [#1440](#1440)
[`2987cd1`](2987cd1)
Thanks [@tkattkat](https://github.com/tkattkat)! - [Feature] support
excluding tools from agent

- [#1455](#1455)
[`dfab1d5`](dfab1d5)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update
aisdk client to better enforce structured output with deepseek models

- [#1428](#1428)
[`4d71162`](4d71162)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add "hybrid" mode to
stagehand agent

## @browserbasehq/[email protected]

### Minor Changes

- [#1459](#1459)
[`abb3469`](abb3469)
Thanks [@monadoid](https://github.com/monadoid)! - Added building of
binaries

- [#1457](#1457)
[`5fc1281`](5fc1281)
Thanks [@monadoid](https://github.com/monadoid)! - First changeset for
stagehand-server

- [#1469](#1469)
[`d634d45`](d634d45)
Thanks [@monadoid](https://github.com/monadoid)! - Bump to test binary
builds

### Patch Changes

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

## @browserbasehq/[email protected]

### Patch Changes

- [#1373](#1373)
[`cadd192`](cadd192)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update screenshot
collector in agent evals cli

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants