Engineering a Playwright-Native Developer Experience: One Flag, Three Strategies

Advanced Topics, Product — Published March 19, 2026

Hello everyone! I’m Noam, an SDK developer on the Applitools JS-SDKs team. While my day-to-day focus is on core engineering, I work closely with our field teams and occasionally join technical deep-dive sessions with customers.

In these conversations, we frequently encounter questions about performance and the engineering philosophy behind our integration. Specifically, there is often curiosity about how to make visual testing feel more “Playwright-native” and natural to developers.

In this post, I’ll share the design logic behind these architectural choices so you can apply these patterns in your own CI pipelines in a way that fits your organization’s needs.

Adding unresolved to Playwright

Integrating visual regression testing into Playwright requires combining two different status models: Playwright’s binary Pass/Fail and the visual testing concept of unresolved.

In visual testing, instead of having two (passed and failed) states, there’s an additional third state: unresolved. This state indicates a difference was detected, but a human decision is required to determine if it is a bug or a valid change that should be approved as a new baseline.

​Playwright doesn’t support this third state out of the box. Visual test maintenance using Playwright’s native toHaveScreenshot API forces the developer into a cumbersome cycle requiring three separate test executions:

  1. First, the developer needs to run to see the failure.
  2. Then, they need to run with the --update-snapshots flag to create new baseline images.
  3. Then, most developers would run again to validate that everything works with the updated baseline as expected—which isn’t always the case, because the Playwright native comparison method (pixelmatch) tends to be very flaky, unlike Visual AI.

​After this local cycle, the developer must commit the new baseline images to the repository—bloating the git history—and wait for a new CI execution to provide final feedback. For dev-centered organizations that focus on feedback loop velocity, this workflow is… suboptimal. Personally, I believe that’s one of the reasons visual testing isn’t as popular as it should be among Playwright users.

​When we engineered the Applitools fixture, one of our goals was to support this Unresolved state natively, without disrupting Playwright’s core lifecycle—specifically its Worker Processes and Retry mechanisms.

The solution rests on two key engineering decisions: moving rendering to the background (async architecture) and giving developers control over the exit signal and performance tradeoffs (failTestsOnDiff).

We don’t block test execution when Applitools is rendering

The core value of visual testing lies in AI-based comparison to eliminate false positives and multi-platform rendering.

Architecturally, these processes are cloud-native services.

  • AI-as-a-Service: Just like massive LLMs or other generative models, the Visual AI engine runs on specialized cloud infrastructure optimized for heavy inference. It cannot simply be “installed” on a lightweight CI agent.
  • Platform Constraints: Authentic cross-platform rendering (e.g., iOS Safari on a Linux CI agent) is physically impossible on a single local machine.

Since these operations inherently occur remotely, performing them synchronously would force the local test runner to idle while waiting for network round-trips and cloud processing.

To solve this, we designed the fixture around an asynchronous architecture:

  • Instant Capture: When eyes.check() is called, we synchronously capture the DOM and CSS resources (instead of a rasterized image). This operation is extremely fast.
  • Immediate Release: We purposefully use soft assertions by design. We release the Playwright test thread immediately so the functional logic can proceed to the next step or test case without blocking.
  • Background Heavy Lifting: The heavy work—uploading assets, rendering across different browsers and operating systems, and performing the AI comparison in the Applitools cloud—starts immediately in the background, managed by the Worker process.

The “Draining Queue” Effect

​This architecture explains why the Playwright Worker sometimes remains active after the final test completes.

​The background tasks are limited only by your account’s concurrency settings, and the screenshot size. For example, when rendering a 10,000 px page on a small mobile device, the rendering infrastructure might need time for scrolling and stitching. If your functional tests execute faster than the background workers can process the queue (rendering & comparing), the Worker process stays alive at the end solely to “drain the queue” and ensure data integrity.

While it does ensure your test logic runs at maximum speed, offloading the processing cost to the background, this experience might cause friction and frustration as the developers see that workers are “hanging” after tests are completed. When facing such issues, our support team is here to advise and assist with various solutions—we can investigate execution logs and if needed even make custom suggestions to tailor Eyes-Playwright to your needs.

Solving the Matrix Problem

​Standard Playwright documentation recommends defining multiple projects in playwright.config.ts to cover different browsers (Chromium, Firefox, WebKit) and various viewport sizes.

​While this ensures coverage, it introduces a linear performance penalty (O(N)). To test three browsers across two viewports, your CI must execute the functional logic (clicks, waits, navigation) six times. It’s 6x more load on the CI machine and the testing environment.

​We recommend shifting this workload to the Ultrafast Grid (UFG).

​In this mode, you execute the Playwright test once, typically on Chromium. We upload the DOM state, and our cloud infrastructure renders that state across all configured browsers and viewports in parallel.

This transforms an O(N) execution problem into an O(1) execution problem, significantly shortening the feedback loop.

The Strategy: failTestsOnDiff

​Since the actual comparison happens asynchronously and potentially completes after the test logic finishes, we need a mechanism to map the visual result back to the Playwright status.

​This is controlled by the failTestsOnDiff flag. It’s not just a boolean; it’s a strategic choice for your CI pipeline.

  • The Logic: This is the configuration our own Front-End team uses. We believe that Visual Change Test Failure.
  • Behavior: The Playwright test passes (Green). The unresolved status is reported externally via our SCM integration (GitHub/GitLab).
  • Why: Retrying a visual test is computationally wasteful—the pixels won’t change on the second run. By keeping the test “Green,” we avoid triggering Playwright’s retry mechanism. The decision is moved to the Pull Request, where it belongs.

Read more about SCM integration or hop directly to our GitHub, Bitbucket, Gitlab or Azure Devops articles.

  • The Logic: You need a “Red” pipeline to block deployment, but you want to avoid the noise of retries and gain a significant performance improvement.
  • Behavior: Individual tests pass, but the Worker Process exits with a failure code if any diffs were found in the suite.
  • Why: This provides a hard gatekeeper for the build status. It allows the Eyes rendering farms to continue processing visual test results in the background without blocking the execution thread, allowing the worker to move on to handle more tests efficiently.
  • The Logic: Immediate feedback loop.
  • Behavior: Fails the test immediately in the afterEach hook.
  • Why: Best for local development where you want to see the failure immediately in the console. It is also useful if you use the trace: retainOnFailure setting in Playwright, as it ensures traces are preserved for unresolved visual assertions. Not recommended for CI due to the retry loops described above.

TL;DR – When to use each setting

Mode afterEach afterAll false
Performance Less performant
The Playwright worker will wait after each test for all renders to be completed and for the visual AI to compare the results
Best performance
The Playwright workers will collect the resources and manage the rendering and Visual AI comparisons in the background
Best performance
Similar to afterAll
Observability Best
Applitools reporter will show all statuses correctly, other reporters will consider unresolved tests as failing
Good
Applitools reporter will show all statuses correctly, other reporters will consider unresolved tests as passing. You will get a failure of the worker process, and other reporters won’t link it to a specific test case.
Great in pull request (If SCM integration is enabled).
The Applitools reporter will reflect the tests perfectly. Other reporters will consider unresolved tests as passing.
Best fit Local testing Local testing AND
CI environments without SCM integration
CI environments with SCM integration

Closing the Visibility Gap: The Custom Reporter

​If you adopt Strategy A (false) or Strategy B (afterAll), you introduce a secondary challenge: Visibility.

Since Playwright technically marks these tests as Passed to avoid retries, the standard Playwright HTML Report will show them as “Green,” potentially masking unresolved visual differences that require attention.

​To bridge this gap without forcing developers to switch context, we developed a Custom Applitools Reporter.

​This reporter extends the standard Playwright HTML report. It injects the actual visual status (Passed, Failed, or unresolved) directly into the test results view.

  • True Status: You see which tests have visual diffs, even if the Playwright exit code was successful.
  • Direct Links: It provides a direct link from the test report to the specific batch results in the Applitools Dashboard.
  • Context: It enriches the report with UFG render status and batch information.

​This ensures you get the best of both worlds: The optimization of a “Green” CI run (no retries), with the transparency of a report that highlights exactly where manual review is needed.

Summary

​The Applitools Playwright fixture is designed to be non-blocking and scalable. By leveraging asynchronous architecture and Applitools UltraFast Grid, we offload the heavy lifting from your CI. By correctly configuring failTestsOnDiff, you ensure that your pipeline reflects your team’s engineering culture—whether that’s strict gating or modern, PR-based visual review.

Quick Answers

What is visual regression testing in Playwright

Visual regression testing in Playwright verifies that changes to an application’s UI do not introduce unintended visual differences. Playwright can perform basic visual regression checks using screenshot comparisons like toHaveScreenshot, while dedicated visual testing tools (such as Applitools Eyes) extend this by detecting meaningful UI changes, managing baselines, and enabling review workflows for approving visual updates.

What is the best way to do visual testing in Playwright?

Playwright supports basic visual testing through screenshot comparisons such as toHaveScreenshot, but this approach can become difficult to maintain at scale. Dedicated visual testing tools, like Applitools Eyes, extend Playwright by adding Visual AI comparison, cross-browser rendering, and review workflows that allow teams to detect visual regressions without maintaining large sets of screenshot baselines.

How does Playwright screenshot testing (toHaveScreenshot) compare to visual regression testing tools?

Playwright’s toHaveScreenshot performs pixel-by-pixel image comparisons against stored baseline images. While this works for simple cases, it often requires updating and maintaining many snapshots. Visual regression testing tools like Applitools Eyes use Visual AI to detect meaningful UI changes while ignoring insignificant rendering differences, provide review workflows to approve or reject visual changes, and allows custom match levels for different regions of the screen.

Can Playwright run visual tests across multiple browsers and devices

Yes, but with a limited scope. Natively, Playwright supports three browser engines (Chromium, Firefox, and WebKit), but it does not execute tests across different real operating systems or mobile devices. This lack of OS-level rendering limits coverage and imposes a risk of missing platform-specific visual bugs. For example, see how a frontend team caught a visual bug specific to Mac Retina screens that a standard engine check would miss.

How can you run cross-browser visual tests in Playwright without running tests multiple times?

Normally, cross-browser testing requires executing the same tests separately for each browser configuration. Tools like Applitools Ultrafast Grid allow tests to run once while visual rendering is executed across multiple browsers and viewport combinations in parallel. This removes the need to multiply test execution across the full browser matrix.

Why is cross-browser testing in Playwright so slow?

Natively, cross-browser testing introduces a significant performance penalty. Playwright must execute the entire test logic (clicks, waits, network requests) separately for every browser and viewport configuration. Modern visual testing tools (e.g., Applitools Ultrafast Grid) eliminate this overhead by executing the test logic just once locally, performing the cross-browser rendering and visual comparison in parallel in the cloud.

Are you ready?

Get started Schedule a demo