Skip to content

Conversation

@tkattkat
Copy link
Collaborator

@tkattkat tkattkat commented Dec 5, 2025

why

  • when transitioning to v3, we did not use the latest version of screenshot collector
  • screenshot collector currently fails due to not having page.on and page.off support for the load, and domcontentloaded events.

what changed

  • added latest version of screenshot collector

test plan

  • ran evals in cli with additional logging to also verify everything is working as expected

Summary by cubic

Updated the evals CLI screenshot collector to the latest version, adding image-diff filtering and a V3 event bus that emits agent screenshots. This reduces duplicate screenshots and stabilizes capture on v3 pages where navigation events are disabled.

  • New Features

    • Skip similar screenshots using MSE/SSIM thresholds with sharp.
    • Event bus integration: agents emit screenshots; collector can ingest them.
    • Non-blocking initial/final captures and safer interval capture with error handling.
  • Dependencies

    • Added sharp ^0.34.5 for image processing (evals and core).
    • Patch bump via changeset for @browserbasehq/stagehand-evals.

Written for commit f4e90f8. Summary will update automatically on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Dec 5, 2025

🦋 Changeset detected

Latest commit: f4e90f8

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@browserbasehq/stagehand-evals Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 5, 2025

Greptile Overview

Greptile Summary

Updates the ScreenshotCollector utility for V3 compatibility and adds intelligent screenshot deduplication.

  • Switched from generic ScreenshotCapablePage interface to the concrete V3 Page type from @browserbasehq/stagehand
  • Added sharp library for image processing to enable screenshot deduplication via MSE (Mean Squared Error) and SSIM (Structural Similarity Index) comparisons
  • Disabled navigation event listeners (page.on("load"), etc.) since V3 pages don't support these yet - documented with TODO comment
  • Changed default captureOnNavigation from true to false to reflect current V3 limitations
  • Added new addScreenshot() method for manual screenshot additions with deduplication logic
  • Improved error handling with .catch() for async operations instead of fire-and-forget calls

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations around the async final screenshot behavior
  • The changes are well-structured and address real V3 compatibility issues. The image deduplication logic is sound. One minor issue exists where the final screenshot in stop() may not be captured before returning, but this is unlikely to cause problems in practice since the collector is typically used for incremental captures.
  • packages/evals/utils/ScreenshotCollector.ts - review async final screenshot handling in stop() method

Important Files Changed

File Analysis

Filename Score Overview
packages/evals/utils/ScreenshotCollector.ts 4/5 Major update adding image deduplication via MSE/SSIM, switching to V3 Page type, and adding sharp dependency for image comparison. Minor issue with async final screenshot not being awaited in stop().
packages/evals/package.json 5/5 Added sharp ^0.34.5 dependency for image processing used in screenshot deduplication.
.changeset/beige-taxes-punch.md 5/5 Standard changeset for patch update to stagehand-evals package.
pnpm-lock.yaml 5/5 Lock file updates for sharp 0.34.5 and its native dependencies across platforms.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant ScreenshotCollector
    participant Page
    participant Sharp

    Caller->>ScreenshotCollector: start()
    ScreenshotCollector->>Page: screenshot() [initial]
    Page-->>ScreenshotCollector: Buffer
    ScreenshotCollector->>ScreenshotCollector: store screenshot
    
    loop Every interval (5000ms default)
        ScreenshotCollector->>Page: screenshot()
        Page-->>ScreenshotCollector: Buffer
        ScreenshotCollector->>Sharp: resize & compare (MSE)
        Sharp-->>ScreenshotCollector: MSE value
        alt MSE >= threshold
            ScreenshotCollector->>Sharp: calculate SSIM
            Sharp-->>ScreenshotCollector: SSIM value
            alt SSIM < threshold (different enough)
                ScreenshotCollector->>ScreenshotCollector: store screenshot
            end
        end
    end

    Caller->>ScreenshotCollector: stop()
    ScreenshotCollector->>Page: screenshot() [final, async]
    ScreenshotCollector-->>Caller: Buffer[]
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@miguelg719 miguelg719 force-pushed the update-screenshot-collector branch 2 times, most recently from d98af74 to 43a1cab Compare December 13, 2025 18:03
});

const agent = v3.agent({
cua: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address conditional loading on a follow up pr

@pirate
Copy link
Member

pirate commented Dec 15, 2025

LGTM

@miguelg719 miguelg719 force-pushed the update-screenshot-collector branch from 9cd72db to 43a1cab Compare December 15, 2025 19:29
@miguelg719 miguelg719 force-pushed the update-screenshot-collector branch from 43a1cab to f037f6b Compare December 15, 2025 19:43
@miguelg719 miguelg719 merged commit cadd192 into main Dec 15, 2025
15 checks passed
miguelg719 pushed a commit that referenced this pull request Dec 27, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/[email protected]

### Patch Changes

- [#1461](#1461)
[`0f3991e`](0f3991e)
Thanks [@tkattkat](https://github.com/tkattkat)! - Move hybrid mode out
of experimental

- [#1433](#1433)
[`e0e22e0`](e0e22e0)
Thanks [@tkattkat](https://github.com/tkattkat)! - Put hybrid mode
behind experimental

- [#1456](#1456)
[`f261051`](f261051)
Thanks [@shrey150](https://github.com/shrey150)! - Invoke page.hover for
agent move action

- [#1473](#1473)
[`e021674`](e021674)
Thanks [@shrey150](https://github.com/shrey150)! - Add safety
confirmation support for OpenAI + Google CUA

- [#1399](#1399)
[`6a5496f`](6a5496f)
Thanks [@tkattkat](https://github.com/tkattkat)! - Ensure cua agent is
killed when stagehand.close is called

- [#1436](#1436)
[`fea1700`](fea1700)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix auto-load key
for act/extract/observe parametrized models on api

- [#1439](#1439)
[`5b288d9`](5b288d9)
Thanks [@tkattkat](https://github.com/tkattkat)! - Remove base64 from
agent actions array ( still present in messages object )

- [#1408](#1408)
[`e822f5a`](e822f5a)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow for
act() cache hit when variable values change

- [#1472](#1472)
[`638efc7`](638efc7)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: agent
cache not refreshed on action failure

- [#1424](#1424)
[`a890f16`](a890f16)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix:
"Error: -32000 Failed to convert response to JSON: CBOR: stack limit
exceeded"

- [#1418](#1418)
[`934f492`](934f492)
Thanks [@miguelg719](https://github.com/miguelg719)! - Cleanup handlers
and bus listeners on close

- [#1430](#1430)
[`bd2db92`](bd2db92)
Thanks [@shrey150](https://github.com/shrey150)! - Fix CUA model
coordinate translation

- [#1465](#1465)
[`51e0170`](51e0170)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add media
resolution high provider option to gemini 3 hybrid agent

- [#1431](#1431)
[`05f5580`](05f5580)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update the cache
handling for agent

- [#1432](#1432)
[`f56a9c2`](f56a9c2)
Thanks [@tkattkat](https://github.com/tkattkat)! - Deprecate cua: true
in favor of mode: "cua"

- [#1406](#1406)
[`b40ae11`](b40ae11)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
hovering with coordinates ( page.hover )

- [#1407](#1407)
[`0d2b398`](0d2b398)
Thanks [@tkattkat](https://github.com/tkattkat)! - Clean up page methods

- [#1412](#1412)
[`cd01f29`](cd01f29)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: load
GOOGLE_API_KEY from .env

- [#1462](#1462)
[`a734fca`](a734fca)
Thanks [@shrey150](https://github.com/shrey150)! - fix: correctly pass
userDataDir to chrome launcher

- [#1466](#1466)
[`b342acf`](b342acf)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move
playwright to optional dependencies

- [#1440](#1440)
[`2987cd1`](2987cd1)
Thanks [@tkattkat](https://github.com/tkattkat)! - [Feature] support
excluding tools from agent

- [#1455](#1455)
[`dfab1d5`](dfab1d5)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update
aisdk client to better enforce structured output with deepseek models

- [#1428](#1428)
[`4d71162`](4d71162)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add "hybrid" mode to
stagehand agent

## @browserbasehq/[email protected]

### Minor Changes

- [#1459](#1459)
[`abb3469`](abb3469)
Thanks [@monadoid](https://github.com/monadoid)! - Added building of
binaries

- [#1457](#1457)
[`5fc1281`](5fc1281)
Thanks [@monadoid](https://github.com/monadoid)! - First changeset for
stagehand-server

- [#1469](#1469)
[`d634d45`](d634d45)
Thanks [@monadoid](https://github.com/monadoid)! - Bump to test binary
builds

### Patch Changes

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

## @browserbasehq/[email protected]

### Patch Changes

- [#1373](#1373)
[`cadd192`](cadd192)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update screenshot
collector in agent evals cli

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants