Skip to content

Conversation

@shrey150
Copy link
Contributor

@shrey150 shrey150 commented Dec 18, 2025

why

  • Google, Microsoft, and OpenAI CUA clicks miss targets when using advanced stealth mode with Browserbase
  • Advanced stealth scales screenshots to 1288×711 but browser reports 2560×1305 viewport
  • GoogleCUAClient (and others) was scaling coordinates directly from Google's 0-1000 range to viewport dimensions without accounting for the screenshot→viewport size mismatch.

This forced users to disable advanced stealth mode (losing bot protection) to get accurate clicks.

what changed

CUA Clients:

  • Added actualScreenshotSize field to track screenshot dimensions
  • Added setScreenshotSize() method to update dimensions
  • Fixed normalizeCoordinates() with two-step scaling:
    1. Convert 0-1000 range → screenshot space
    2. Scale screenshot space → viewport space (handles different X/Y scales)

v3CuaAgentHandler.ts:

  • Added getPNGDimensions() helper to read PNG dimensions from buffer (header-only, no decode)
  • Updated screenshot provider to detect actual screenshot size and call setScreenshotSize()
  • Applied same logic to captureAndSendScreenshot()
  • Graceful fallback if dimension reading fails

test plan

Wrote test scripts locally and ensured that coordinate positioning is now correct


Summary by cubic

Fixes mis-scaled CUA clicks in advanced stealth by mapping model coordinates through the actual screenshot size to the viewport, and prevents corrupted images by encoding screenshots correctly.

  • Bug Fixes
    • Track screenshot dimensions and add setScreenshotSize() in GoogleCUAClient, OpenAICUAClient, and MicrosoftCUAClient.
    • Scale coordinates screenshot → viewport (Google: 0–1000 → screenshot → viewport; OpenAI/Microsoft: screenshot → viewport with separate X/Y).
    • Read PNG width/height from the buffer header; set screenshot size when available with safe fallback, and pass base64 to captureScreenshot() to avoid corrupted images.
    • Fix event name to agent_screenshot_taken_event and emit the screenshot buffer via the bus.

Written for commit cabead9. Summary will update automatically on new commits.

@shrey150 shrey150 requested a review from miguelg719 December 18, 2025 00:05
@changeset-bot
Copy link

changeset-bot bot commented Dec 18, 2025

🦋 Changeset detected

Latest commit: cabead9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 18, 2025

Greptile Summary

Fixes CUA coordinate mis-scaling in advanced stealth mode by tracking actual screenshot dimensions separately from viewport dimensions and implementing two-step coordinate transformation.

Key Changes:

  • Added actualScreenshotSize field to GoogleCUAClient, OpenAICUAClient, and MicrosoftCUAClient to track actual screenshot dimensions
  • Added setScreenshotSize() method to all three CUA clients
  • Implemented getPNGDimensions() helper in v3CuaAgentHandler.ts to read PNG dimensions from buffer headers
  • Updated coordinate normalization logic:
    • Google: 0-1000 range → screenshot space → viewport space
    • OpenAI/Microsoft: screenshot space → viewport space (with separate X/Y scaling)
  • Fixed typo in event name: agent_screensot_taken_eventagent_screenshot_taken_event
  • Removed unused imageResize() function from utils.ts

Potential Gap:

  • AnthropicCUAClient also handles coordinates through the computer tool but was not updated with the same scaling fix. If Anthropic CUA is used with advanced stealth mode, it may experience the same coordinate mis-scaling issue.

Confidence Score: 4/5

  • Safe to merge with minor style improvements possible
  • Core logic is sound and addresses the coordinate scaling issue correctly. PNG dimension reading is lightweight (header-only). Graceful fallback on dimension reading failure. One style suggestion for code simplification. No consideration given to AnthropicCUAClient which may need the same fix.
  • Check if AnthropicCUAClient needs the same coordinate scaling fix for consistency

Important Files Changed

Filename Overview
packages/core/lib/v3/agent/GoogleCUAClient.ts Added actualScreenshotSize tracking and two-step coordinate scaling (0-1000 → screenshot → viewport)
packages/core/lib/v3/agent/OpenAICUAClient.ts Added actualScreenshotSize tracking and coordinate scaling in convertComputerCallToAction() from screenshot to viewport space
packages/core/lib/v3/agent/MicrosoftCUAClient.ts Added actualScreenshotSize tracking and coordinate scaling in convertFunctionCallToAction() from screenshot to viewport space
packages/core/lib/v3/handlers/v3CuaAgentHandler.ts Added getPNGDimensions() helper and calls to setScreenshotSize() in screenshot provider and captureAndSendScreenshot(); fixed typo in event name

Sequence Diagram

sequenceDiagram
    participant Handler as v3CuaAgentHandler
    participant Page as Browser Page
    participant Client as CUA Client<br/>(Google/OpenAI/Microsoft)
    participant Model as AI Model
    
    Note over Handler,Model: Screenshot Capture & Size Detection
    Handler->>Page: page.screenshot()
    Page-->>Handler: PNG Buffer (e.g., 1288x711)
    Handler->>Handler: getPNGDimensions(buffer)
    Note right of Handler: Reads PNG header<br/>bytes 16-23 for dimensions
    Handler->>Client: setScreenshotSize(1288, 711)
    Note right of Client: Stores actualScreenshotSize<br/>separate from viewport
    Handler->>Client: setViewport(2560, 1305)
    Note right of Client: Stores currentViewport<br/>(browser reports larger size)
    Handler->>Model: Send base64 screenshot
    
    Note over Handler,Model: Model Returns Coordinates
    Model-->>Client: Coordinates in model space<br/>(Google: 0-1000, Others: screenshot space)
    
    Note over Handler,Model: Two-Step Coordinate Scaling
    Client->>Client: normalizeCoordinates()
    Note right of Client: Step 1: Model → Screenshot<br/>Google: (x/1000) * 1288<br/>Others: x (already in screenshot space)
    Note right of Client: Step 2: Screenshot → Viewport<br/>(screenshotX * 2560/1288)<br/>(screenshotY * 1305/711)
    Client->>Handler: Scaled coordinates (viewport space)
    Handler->>Page: click(x, y)
    Note right of Page: Click at correct position<br/>in 2560x1305 viewport
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/lib/v3/handlers/v3CuaAgentHandler.ts">

<violation number="1" location="packages/core/lib/v3/handlers/v3CuaAgentHandler.ts:607">
P2: Breaking change: Event name was changed from `agent_screensot_taken_event` to `agent_screenshot_taken_event`, but existing listeners in `onlineMind2Web.ts` still subscribe to the old misspelled name. This will cause those listeners to miss events from this handler.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

@miguelg719
Copy link
Collaborator

is this issue isolated on google cua or does it extend to other cua providers? @shrey150

@shrey150
Copy link
Contributor Author

is this issue isolated on google cua or does it extend to other cua providers? @shrey150

@miguelg719 this doesn't affect Microsoft CUA, but I'll investigate with both Anthropic and OpenAI CUA

@shrey150
Copy link
Contributor Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@shrey150 shrey150 changed the title fix: Google CUA coordinating scaling issue fix: CUA coordinating scaling issue Dec 18, 2025
@shrey150 shrey150 requested a review from tkattkat December 18, 2025 19:37
@shrey150
Copy link
Contributor Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Convert screenshotBuffer to base64 string before passing to captureScreenshot().
Previously, Buffer was passed directly which caused UTF-8 encoding instead of
base64, resulting in corrupted screenshot data sent to LLM APIs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@shrey150
Copy link
Contributor Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +917 to +921
const screenshotX = (x / 1000) * this.actualScreenshotSize.width;
const screenshotY = (y / 1000) * this.actualScreenshotSize.height;
const scaleX = this.currentViewport.width / this.actualScreenshotSize.width;
const scaleY =
this.currentViewport.height / this.actualScreenshotSize.height;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: redundant intermediate variables - can simplify to direct calculation

Suggested change
const screenshotX = (x / 1000) * this.actualScreenshotSize.width;
const screenshotY = (y / 1000) * this.actualScreenshotSize.height;
const scaleX = this.currentViewport.width / this.actualScreenshotSize.width;
const scaleY =
this.currentViewport.height / this.actualScreenshotSize.height;
const scaleX = this.currentViewport.width / this.actualScreenshotSize.width;
const scaleY =
this.currentViewport.height / this.actualScreenshotSize.height;
return {
x: Math.floor((x / 1000) * this.actualScreenshotSize.width * scaleX),
y: Math.floor((y / 1000) * this.actualScreenshotSize.height * scaleY),
};

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/lib/v3/agent/GoogleCUAClient.ts
Line: 917:921

Comment:
**style:** redundant intermediate variables - can simplify to direct calculation

```suggestion
    const scaleX = this.currentViewport.width / this.actualScreenshotSize.width;
    const scaleY =
      this.currentViewport.height / this.actualScreenshotSize.height;

    return {
      x: Math.floor((x / 1000) * this.actualScreenshotSize.width * scaleX),
      y: Math.floor((y / 1000) * this.actualScreenshotSize.height * scaleY),
    };
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Member

@pirate pirate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice typo catch too

@shrey150 shrey150 merged commit bd2db92 into main Dec 19, 2025
16 checks passed
miguelg719 pushed a commit that referenced this pull request Dec 27, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/[email protected]

### Patch Changes

- [#1461](#1461)
[`0f3991e`](0f3991e)
Thanks [@tkattkat](https://github.com/tkattkat)! - Move hybrid mode out
of experimental

- [#1433](#1433)
[`e0e22e0`](e0e22e0)
Thanks [@tkattkat](https://github.com/tkattkat)! - Put hybrid mode
behind experimental

- [#1456](#1456)
[`f261051`](f261051)
Thanks [@shrey150](https://github.com/shrey150)! - Invoke page.hover for
agent move action

- [#1473](#1473)
[`e021674`](e021674)
Thanks [@shrey150](https://github.com/shrey150)! - Add safety
confirmation support for OpenAI + Google CUA

- [#1399](#1399)
[`6a5496f`](6a5496f)
Thanks [@tkattkat](https://github.com/tkattkat)! - Ensure cua agent is
killed when stagehand.close is called

- [#1436](#1436)
[`fea1700`](fea1700)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix auto-load key
for act/extract/observe parametrized models on api

- [#1439](#1439)
[`5b288d9`](5b288d9)
Thanks [@tkattkat](https://github.com/tkattkat)! - Remove base64 from
agent actions array ( still present in messages object )

- [#1408](#1408)
[`e822f5a`](e822f5a)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow for
act() cache hit when variable values change

- [#1472](#1472)
[`638efc7`](638efc7)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: agent
cache not refreshed on action failure

- [#1424](#1424)
[`a890f16`](a890f16)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix:
"Error: -32000 Failed to convert response to JSON: CBOR: stack limit
exceeded"

- [#1418](#1418)
[`934f492`](934f492)
Thanks [@miguelg719](https://github.com/miguelg719)! - Cleanup handlers
and bus listeners on close

- [#1430](#1430)
[`bd2db92`](bd2db92)
Thanks [@shrey150](https://github.com/shrey150)! - Fix CUA model
coordinate translation

- [#1465](#1465)
[`51e0170`](51e0170)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add media
resolution high provider option to gemini 3 hybrid agent

- [#1431](#1431)
[`05f5580`](05f5580)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update the cache
handling for agent

- [#1432](#1432)
[`f56a9c2`](f56a9c2)
Thanks [@tkattkat](https://github.com/tkattkat)! - Deprecate cua: true
in favor of mode: "cua"

- [#1406](#1406)
[`b40ae11`](b40ae11)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
hovering with coordinates ( page.hover )

- [#1407](#1407)
[`0d2b398`](0d2b398)
Thanks [@tkattkat](https://github.com/tkattkat)! - Clean up page methods

- [#1412](#1412)
[`cd01f29`](cd01f29)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: load
GOOGLE_API_KEY from .env

- [#1462](#1462)
[`a734fca`](a734fca)
Thanks [@shrey150](https://github.com/shrey150)! - fix: correctly pass
userDataDir to chrome launcher

- [#1466](#1466)
[`b342acf`](b342acf)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move
playwright to optional dependencies

- [#1440](#1440)
[`2987cd1`](2987cd1)
Thanks [@tkattkat](https://github.com/tkattkat)! - [Feature] support
excluding tools from agent

- [#1455](#1455)
[`dfab1d5`](dfab1d5)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update
aisdk client to better enforce structured output with deepseek models

- [#1428](#1428)
[`4d71162`](4d71162)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add "hybrid" mode to
stagehand agent

## @browserbasehq/[email protected]

### Minor Changes

- [#1459](#1459)
[`abb3469`](abb3469)
Thanks [@monadoid](https://github.com/monadoid)! - Added building of
binaries

- [#1457](#1457)
[`5fc1281`](5fc1281)
Thanks [@monadoid](https://github.com/monadoid)! - First changeset for
stagehand-server

- [#1469](#1469)
[`d634d45`](d634d45)
Thanks [@monadoid](https://github.com/monadoid)! - Bump to test binary
builds

### Patch Changes

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

## @browserbasehq/[email protected]

### Patch Changes

- [#1373](#1373)
[`cadd192`](cadd192)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update screenshot
collector in agent evals cli

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@Bastaff
Copy link

Bastaff commented Jan 12, 2026

We're still experiencing issues with this while using a responsive live view container.
Agent clicks at the wrong coordinates.

Package versions:
"@browserbasehq/sdk": "^2.6.0",
"@browserbasehq/stagehand": "^3.0.7",

Agent config:
modelName: "google/gemini-2.5-computer-use-preview-10-2025",

Stagehand config:
modelName: "google/gemini-2.5-flash",

@miguelg719
Copy link
Collaborator

hey @Bastaff can you share your stagehand config, including the browserbaseSessionCreateParams? I'll take a look

@Bastaff
Copy link

Bastaff commented Jan 12, 2026

Here you go @miguelg719
Changing the resolution in this code snippet does not change the results.
Changing the resolution of the iframe that shows the live view also doesn't change the results.

async function initStageHand(workflow, config) {
    const logger = createLogger(workflow);

    const stagehandConfig = {
        env: "BROWSERBASE",
        verbose: process.env.NODE_ENV === "development" ? 2 : 1,
        logger: (logLine) => logger.log(logLine),
        model: {
            modelName: "google/gemini-2.5-flash",
            apiKey: process.env.GOOGLE_API_KEY
        },
        domSettleTimeout: 30_000,
        experimental: true,
        useApi: false,
    };
    
    if (config.provider === "BROWSERBASE") {
        stagehandConfig.env = "BROWSERBASE";
        stagehandConfig.apiKey = process.env.BROWSERBASE_API_KEY;
        stagehandConfig.projectId = process.env.BROWSERBASE_PROJECT_ID;
        stagehandConfig.browserbaseSessionID = config.sessionId;
    } 
    
    const stagehand = new Stagehand(stagehandConfig);
    await stagehand.init();
    return stagehand;
}


async function createContext(userId = "anon") {
    try {
        const context = await bb.contexts.create({
            projectId: process.env.BROWSERBASE_PROJECT_ID,
        });
        return context;
    } catch (e) {
        console.error("Error creating Browserbase context:", e);
        throw e;
    }
}

async function createSession(userId, contextId) {
    if (!contextId) {
        const ctx = await createContext(userId);
        contextId = ctx.id;
    }

    const session = await bb.sessions.create({
        projectId: process.env.BROWSERBASE_PROJECT_ID,
        userMetadata: { userId },
        browserSettings: {
            solveCaptchas: true,
            viewport: { width: 1288, height: 711 },
            context: {
                id: contextId,
                persist: true
            },
            advancedStealth: true,
        },
        timeout: 10 * 60,
        keepAlive: true,
        proxies: true,
    });

    await sleep(3000);
    return session;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants