-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix: CUA coordinating scaling issue #1430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: cabead9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Greptile SummaryFixes CUA coordinate mis-scaling in advanced stealth mode by tracking actual screenshot dimensions separately from viewport dimensions and implementing two-step coordinate transformation. Key Changes:
Potential Gap:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Handler as v3CuaAgentHandler
participant Page as Browser Page
participant Client as CUA Client<br/>(Google/OpenAI/Microsoft)
participant Model as AI Model
Note over Handler,Model: Screenshot Capture & Size Detection
Handler->>Page: page.screenshot()
Page-->>Handler: PNG Buffer (e.g., 1288x711)
Handler->>Handler: getPNGDimensions(buffer)
Note right of Handler: Reads PNG header<br/>bytes 16-23 for dimensions
Handler->>Client: setScreenshotSize(1288, 711)
Note right of Client: Stores actualScreenshotSize<br/>separate from viewport
Handler->>Client: setViewport(2560, 1305)
Note right of Client: Stores currentViewport<br/>(browser reports larger size)
Handler->>Model: Send base64 screenshot
Note over Handler,Model: Model Returns Coordinates
Model-->>Client: Coordinates in model space<br/>(Google: 0-1000, Others: screenshot space)
Note over Handler,Model: Two-Step Coordinate Scaling
Client->>Client: normalizeCoordinates()
Note right of Client: Step 1: Model → Screenshot<br/>Google: (x/1000) * 1288<br/>Others: x (already in screenshot space)
Note right of Client: Step 2: Screenshot → Viewport<br/>(screenshotX * 2560/1288)<br/>(screenshotY * 1305/711)
Client->>Handler: Scaled coordinates (viewport space)
Handler->>Page: click(x, y)
Note right of Page: Click at correct position<br/>in 2560x1305 viewport
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files
Prompt for AI agents (all 1 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="packages/core/lib/v3/handlers/v3CuaAgentHandler.ts">
<violation number="1" location="packages/core/lib/v3/handlers/v3CuaAgentHandler.ts:607">
P2: Breaking change: Event name was changed from `agent_screensot_taken_event` to `agent_screenshot_taken_event`, but existing listeners in `onlineMind2Web.ts` still subscribe to the old misspelled name. This will cause those listeners to miss events from this handler.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
|
is this issue isolated on google cua or does it extend to other cua providers? @shrey150 |
@miguelg719 this doesn't affect Microsoft CUA, but I'll investigate with both Anthropic and OpenAI CUA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, 1 comment
Convert screenshotBuffer to base64 string before passing to captureScreenshot(). Previously, Buffer was passed directly which caused UTF-8 encoding instead of base64, resulting in corrupted screenshot data sent to LLM APIs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, 1 comment
| const screenshotX = (x / 1000) * this.actualScreenshotSize.width; | ||
| const screenshotY = (y / 1000) * this.actualScreenshotSize.height; | ||
| const scaleX = this.currentViewport.width / this.actualScreenshotSize.width; | ||
| const scaleY = | ||
| this.currentViewport.height / this.actualScreenshotSize.height; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: redundant intermediate variables - can simplify to direct calculation
| const screenshotX = (x / 1000) * this.actualScreenshotSize.width; | |
| const screenshotY = (y / 1000) * this.actualScreenshotSize.height; | |
| const scaleX = this.currentViewport.width / this.actualScreenshotSize.width; | |
| const scaleY = | |
| this.currentViewport.height / this.actualScreenshotSize.height; | |
| const scaleX = this.currentViewport.width / this.actualScreenshotSize.width; | |
| const scaleY = | |
| this.currentViewport.height / this.actualScreenshotSize.height; | |
| return { | |
| x: Math.floor((x / 1000) * this.actualScreenshotSize.width * scaleX), | |
| y: Math.floor((y / 1000) * this.actualScreenshotSize.height * scaleY), | |
| }; |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/lib/v3/agent/GoogleCUAClient.ts
Line: 917:921
Comment:
**style:** redundant intermediate variables - can simplify to direct calculation
```suggestion
const scaleX = this.currentViewport.width / this.actualScreenshotSize.width;
const scaleY =
this.currentViewport.height / this.actualScreenshotSize.height;
return {
x: Math.floor((x / 1000) * this.actualScreenshotSize.width * scaleX),
y: Math.floor((y / 1000) * this.actualScreenshotSize.height * scaleY),
};
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
pirate
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice typo catch too
This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [#1461](#1461) [`0f3991e`](0f3991e) Thanks [@tkattkat](https://github.com/tkattkat)! - Move hybrid mode out of experimental - [#1433](#1433) [`e0e22e0`](e0e22e0) Thanks [@tkattkat](https://github.com/tkattkat)! - Put hybrid mode behind experimental - [#1456](#1456) [`f261051`](f261051) Thanks [@shrey150](https://github.com/shrey150)! - Invoke page.hover for agent move action - [#1473](#1473) [`e021674`](e021674) Thanks [@shrey150](https://github.com/shrey150)! - Add safety confirmation support for OpenAI + Google CUA - [#1399](#1399) [`6a5496f`](6a5496f) Thanks [@tkattkat](https://github.com/tkattkat)! - Ensure cua agent is killed when stagehand.close is called - [#1436](#1436) [`fea1700`](fea1700) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix auto-load key for act/extract/observe parametrized models on api - [#1439](#1439) [`5b288d9`](5b288d9) Thanks [@tkattkat](https://github.com/tkattkat)! - Remove base64 from agent actions array ( still present in messages object ) - [#1408](#1408) [`e822f5a`](e822f5a) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow for act() cache hit when variable values change - [#1472](#1472) [`638efc7`](638efc7) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: agent cache not refreshed on action failure - [#1424](#1424) [`a890f16`](a890f16) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: "Error: -32000 Failed to convert response to JSON: CBOR: stack limit exceeded" - [#1418](#1418) [`934f492`](934f492) Thanks [@miguelg719](https://github.com/miguelg719)! - Cleanup handlers and bus listeners on close - [#1430](#1430) [`bd2db92`](bd2db92) Thanks [@shrey150](https://github.com/shrey150)! - Fix CUA model coordinate translation - [#1465](#1465) [`51e0170`](51e0170) Thanks [@miguelg719](https://github.com/miguelg719)! - Add media resolution high provider option to gemini 3 hybrid agent - [#1431](#1431) [`05f5580`](05f5580) Thanks [@tkattkat](https://github.com/tkattkat)! - Update the cache handling for agent - [#1432](#1432) [`f56a9c2`](f56a9c2) Thanks [@tkattkat](https://github.com/tkattkat)! - Deprecate cua: true in favor of mode: "cua" - [#1406](#1406) [`b40ae11`](b40ae11) Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for hovering with coordinates ( page.hover ) - [#1407](#1407) [`0d2b398`](0d2b398) Thanks [@tkattkat](https://github.com/tkattkat)! - Clean up page methods - [#1412](#1412) [`cd01f29`](cd01f29) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: load GOOGLE_API_KEY from .env - [#1462](#1462) [`a734fca`](a734fca) Thanks [@shrey150](https://github.com/shrey150)! - fix: correctly pass userDataDir to chrome launcher - [#1466](#1466) [`b342acf`](b342acf) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move playwright to optional dependencies - [#1440](#1440) [`2987cd1`](2987cd1) Thanks [@tkattkat](https://github.com/tkattkat)! - [Feature] support excluding tools from agent - [#1455](#1455) [`dfab1d5`](dfab1d5) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update aisdk client to better enforce structured output with deepseek models - [#1428](#1428) [`4d71162`](4d71162) Thanks [@tkattkat](https://github.com/tkattkat)! - Add "hybrid" mode to stagehand agent ## @browserbasehq/[email protected] ### Minor Changes - [#1459](#1459) [`abb3469`](abb3469) Thanks [@monadoid](https://github.com/monadoid)! - Added building of binaries - [#1457](#1457) [`5fc1281`](5fc1281) Thanks [@monadoid](https://github.com/monadoid)! - First changeset for stagehand-server - [#1469](#1469) [`d634d45`](d634d45) Thanks [@monadoid](https://github.com/monadoid)! - Bump to test binary builds ### Patch Changes - Updated dependencies \[[`0f3991e`](0f3991e), [`e0e22e0`](e0e22e0), [`f261051`](f261051), [`e021674`](e021674), [`6a5496f`](6a5496f), [`fea1700`](fea1700), [`5b288d9`](5b288d9), [`e822f5a`](e822f5a), [`638efc7`](638efc7), [`a890f16`](a890f16), [`934f492`](934f492), [`bd2db92`](bd2db92), [`51e0170`](51e0170), [`05f5580`](05f5580), [`f56a9c2`](f56a9c2), [`b40ae11`](b40ae11), [`0d2b398`](0d2b398), [`cd01f29`](cd01f29), [`a734fca`](a734fca), [`b342acf`](b342acf), [`2987cd1`](2987cd1), [`dfab1d5`](dfab1d5), [`4d71162`](4d71162)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - [#1373](#1373) [`cadd192`](cadd192) Thanks [@tkattkat](https://github.com/tkattkat)! - Update screenshot collector in agent evals cli - Updated dependencies \[[`0f3991e`](0f3991e), [`e0e22e0`](e0e22e0), [`f261051`](f261051), [`e021674`](e021674), [`6a5496f`](6a5496f), [`fea1700`](fea1700), [`5b288d9`](5b288d9), [`e822f5a`](e822f5a), [`638efc7`](638efc7), [`a890f16`](a890f16), [`934f492`](934f492), [`bd2db92`](bd2db92), [`51e0170`](51e0170), [`05f5580`](05f5580), [`f56a9c2`](f56a9c2), [`b40ae11`](b40ae11), [`0d2b398`](0d2b398), [`cd01f29`](cd01f29), [`a734fca`](a734fca), [`b342acf`](b342acf), [`2987cd1`](2987cd1), [`dfab1d5`](dfab1d5), [`4d71162`](4d71162)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
We're still experiencing issues with this while using a responsive live view container. Package versions: Agent config: Stagehand config: |
|
hey @Bastaff can you share your stagehand config, including the |
|
Here you go @miguelg719 async function initStageHand(workflow, config) {
const logger = createLogger(workflow);
const stagehandConfig = {
env: "BROWSERBASE",
verbose: process.env.NODE_ENV === "development" ? 2 : 1,
logger: (logLine) => logger.log(logLine),
model: {
modelName: "google/gemini-2.5-flash",
apiKey: process.env.GOOGLE_API_KEY
},
domSettleTimeout: 30_000,
experimental: true,
useApi: false,
};
if (config.provider === "BROWSERBASE") {
stagehandConfig.env = "BROWSERBASE";
stagehandConfig.apiKey = process.env.BROWSERBASE_API_KEY;
stagehandConfig.projectId = process.env.BROWSERBASE_PROJECT_ID;
stagehandConfig.browserbaseSessionID = config.sessionId;
}
const stagehand = new Stagehand(stagehandConfig);
await stagehand.init();
return stagehand;
}
async function createContext(userId = "anon") {
try {
const context = await bb.contexts.create({
projectId: process.env.BROWSERBASE_PROJECT_ID,
});
return context;
} catch (e) {
console.error("Error creating Browserbase context:", e);
throw e;
}
}
async function createSession(userId, contextId) {
if (!contextId) {
const ctx = await createContext(userId);
contextId = ctx.id;
}
const session = await bb.sessions.create({
projectId: process.env.BROWSERBASE_PROJECT_ID,
userMetadata: { userId },
browserSettings: {
solveCaptchas: true,
viewport: { width: 1288, height: 711 },
context: {
id: contextId,
persist: true
},
advancedStealth: true,
},
timeout: 10 * 60,
keepAlive: true,
proxies: true,
});
await sleep(3000);
return session;
}
|
why
This forced users to disable advanced stealth mode (losing bot protection) to get accurate clicks.
what changed
CUA Clients:
actualScreenshotSizefield to track screenshot dimensionssetScreenshotSize()method to update dimensionsnormalizeCoordinates()with two-step scaling:v3CuaAgentHandler.ts:
getPNGDimensions()helper to read PNG dimensions from buffer (header-only, no decode)setScreenshotSize()captureAndSendScreenshot()test plan
Wrote test scripts locally and ensured that coordinate positioning is now correct
Summary by cubic
Fixes mis-scaled CUA clicks in advanced stealth by mapping model coordinates through the actual screenshot size to the viewport, and prevents corrupted images by encoding screenshots correctly.
Written for commit cabead9. Summary will update automatically on new commits.