At a glance

Test lifecycle

  • Taskcluster schedules talos jobs
  • Taskcluster runs a Talos job on a hardware machine when one is available - this is bootstrapped by mozharness
  • Treeherder displays a green (all OK) status and has a link to Perfherder
  • 13 pushes later, analyze_talos.py is ran which compares your push to the previous 12 pushes and next 12 pushes to look for a regression

Test types

There are two different species of Talos tests:

  • #Startup: Start up the browser and wait for either the load event or the paint event and exit, measuring the time
  • #Page load: Load a manifest of pages

In addition we have some variations on existing tests:

  • #Heavy: Run tests with the heavy user profile instead of a blank one
  • #Web extension: Run tests with a web extension to see the perf impact extension have
  • #Real-world WebExtensions: Run tests with a set of 5 popular real-world WebExtensions installed and enabled.

Some tests measure different things:

  • #Paint: These measure events from the browser like moz_after_paint, etc.
  • #ASAP: These tests go really fast and typically measure how many frames we can render in a time window
  • #Benchmarks: These are benchmarks that measure specific items and report a summarized score


Startup tests launch Firefox and measure the time to the onload or paint events. We run this in a series of cycles (default to 20) to generate a full set of data. Tests that currently are startup tests are:

Page load

Many of the talos tests use the page loader to load a manifest of pages. These are tests that load a specific page and measure the time it takes to load the page, scroll the page, draw the page etc. In order to run a page load test, you need a manifest of pages to run. The manifest is simply a list of URLs of pages to load, separated by carriage returns, e.g.:


Example: svgx.manifest

Manifests may also specify that a test computes its own data by prepending a % in front of the line:

% https://www.mozilla.org
% https://www.mozilla.com

Example: v8.manifest

The file you created should be referenced in your test config inside of test.py. For example, open test.py, and look for the line referring to the test you want to run:

tpmanifest = '${talos}/page_load_test/svgx/svgx.manifest'
tpcycles = 1 # run a single cycle
tppagecycles = 25 # load each page 25 times before moving onto the next page


All our testing is done with empty blank profiles, this is not ideal for finding issues for end users. We recently undertook a task to create a daily update to a profile so it is modern and relevant. It browses a variety of web pages, and have history and cache to give us a more realistic scenario.

The toolchain is documented on github and was added to Talos in bug 1407398.

Currently we have issues with this on windows (takes too long to unpack the files from the profile), so we have turned this off there. Our goal is to run this on basic pageload and startup tests.

Web extension

Web Extensions are what Firefox has switched to and there are different code paths and APIs used vs addons. Historically we don't test with addons (other than our test addons) and are missing out on common slowdowns. In 2017 we started running some startup and basic pageload tests with a web extension in the profile (bug 1398974). We have updated the Extension to be more real world and will continue to do that.

Real-world WebExtensions

We've added a variation on our test suite that automatically downloads, installs and enables 5 popular WebExtensions. This is used to measure things like the impact of real-world WebExtensions on start-up time.

Currently, the following extensions are installed:

  • Adblock Plus (3.5.2)
  • Cisco Webex Extension (1.4.0)
  • Easy Screenshot (3.67)
  • NoScript (10.6.3)
  • Video DownloadHelper (7.3.6)

Note that these add-ons and versions are "pinned" by being held in a compressed file that's hosted in an archive by our test infrastructure and downloaded at test runtime. To update the add-ons in this set, one must provide a new ZIP file to someone on the test automation team. See this comment in Bugzilla.


Paint tests are measuring the time to receive both the MozAfterPaint and OnLoad event instead of just the OnLoad event. Most tests now look for this unless they are an ASAP test, or an internal benchmark


We have a variety of tests which we now run in ASAP mode where we render as fast as possible (disabling vsync and letting the rendering iterate as fast as it can using `requestAnimationFrame`). In fact we have replaced some original tests with the 'x' versions to make them measure. We do this with RequestAnimationFrame().

ASAP tests are:


Many tests have internal benchmarks which we report as accurately as possible. These are the exceptions to the general rule of calculating the suite score as a geometric mean of the subtest values (which are median values of the raw data from the subtests).

Tests which are imported benchmarks are:

Row major vs. column major

To get more stable numbers, tests are run multiple times. There are two ways that we do this: row major and column major. Row major means each test is run multiple times and then we move to the next test (and run it multiple times). Column major means that each test is run once one after the other and then the whole sequence of tests is run again.

More background information about these approaches can be found in Joel Maher's Reducing the Noise in Talos blog post.

Page sets

We run our tests 100% offline, but serve pages via a webserver. Knowing this we need to store and make available the offline pages we use for testing.


Some tests make use of a set of 50 "real world" pages, known as the tp5n set. These pages are not part of the talos repository, but without them the tests which use them won't run.

  • To add these pages to your local setup, download the latest tp5n zip from tooltool, and extract it such that `tp5n` ends up as `testing/talos/talos/tests/tp5n`. You can also obtain it by running a talos test locally to get the zip into `testing/talos/talos/tests/`, i.e ./mach talos-test --suite damp
  • see also tp5 test.

Test definitions

Please keep these in alphabetical order


Test Name

  • reporting: test time in ms (lower is better)

This test ensures basic a11y tables and permutations do not cause performance regressions.

Example data


Test Name

[TODO] add test details


Test Name

  • contact: :jaws
  • source: [1]
  • type: PageLoader
  • measuring: first-non-blank-paint
  • data: We load 5 urls 1 time each, and repeat for 25 cycles; collecting 25 sets of 5 data points
  • summarization
  • reporting: test time in ms (lower is better)

This test measures the performance of the Firefox about:preferences page. This test is a little different than other pageload tests in that we are loading one page (about:preferences) but also testing the loading of that same page's subcategories/panels (i.e. about:preferences#home).

When simply changing the page's panel/category, that doesn't cause a new onload event as expected; therefore we had to introduce loading the 'about:blank' page in between each page category; that forces the entire page to reload with the specified category panel activated.

For that reason, when new panels/categories are added to the 'about:preferences' page, it can be expected that a performance regression may be introduced, even if a subtest hasn't been added for that new page category yet.

This test should only ever have 1 pagecycle consisting of the main about-preferences page and each category separated by an about:blank between. Then repeats are achieved by using 25 cycles (instead of pagecycles).

Example data


Test Name

  • contact: :jandem
  • source: ARES-6
  • type: PageLoader
  • data: 6 cycles of the entire benchmark
  • Lower is better
  • unit: geometric mean / benchmark score

Basic compositor video

Test Name

  • contact: :davidb
  • source: video
  • type: PageLoader
  • data: 12 cycles of the entire benchmark, each subtest will have 12 data points (see below)
  • summarization:
  • Lower is better
  • unit: ms/frame
Example data


Test Name

  • contact: :mconley
  • measuring: Time from opening a new tab (which creates a new content process) to having that new content process be ready to load URLs.
  • source: cpstartup
  • type: PageLoader
  • bug: bug 1336389
  • data: 20 cycles of the entire benchmark
  • Lower is better
  • unit: ms
Example data


Test Name

  • contact: :ochameau
  • source: damp
  • type: PageLoader
  • measuring: Developer Tools toolbox startup, shutdown, and reload performance
  • reporting: intervals in ms (lower is better) - see below for details
  • data: there are 36 reported subtests from DAMP which we load 25 times, resulting in 36 sets of 25 data points.
  • summarization:

To run this locally, you'll need to pull down the tp5 page set and run it in a local web server. See the tp5 section.

Example data


Test Name


This measures the amount of time it takes to render a page after changing its display list. The page has a large number of display list items (10,000), and mutates one every frame. The goal of the test is to make displaylist construction a bottleneck, rather than painting or other factors, and thus improvements or regressions to displaylist construction will be visible. The test runs in ASAP mode to maximize framerate, and the result is how quickly the test was able to mutate and re-paint 600 items, one during each frame.


Dromaeo suite of tests for JavaScript performance testing. See the Dromaeo wiki for more information.

This suite is divided into several sub-suites.

Each sub-suite is divided into tests, and each test is divided into sub-tests. Each sub-test takes some (in theory) fixed piece of work and measures how many times that piece of work can be performed in one second. The score for a test is then the geometric mean of the runs/second numbers for its sub-tests. The score for a sub-suite is the geometric mean of the scores for its tests.

Dromaeo CSS

Test Name

  • contact: :bz
  • source: css.manifest
  • type: PageLoader
  • reporting: speed in test runs per second (higher is better)
  • data: Dromaeo has 6 subtests which run internal benchmarks, each benchmark reports about 180 raw data points each


  • subtest: Dromaeo is a custom benchmark which has a lot of micro tests inside each subtest, because of this we use a custom dromaeo filter to summarize the subtest. Each micro test produces 5 data points and for each 5 data points we take the mean, leaving 36 data points to represent the subtest (assuming 180 points). These 36 micro test means, are then run through a geometric_mean to produce a single number for the dromaeo subtest. source: filter.py

Each page in the manifest is part of the dromaeo css benchmark. Each page measures the performance of searching the DOM for nodes matching various CSS selectors, using different libraries for the selector implementation (jQuery, Dojo, Mootools, ExtJS, Prototype, and Yahoo UI).

Example data

Dromaeo DOM (Linux64 only)

Test Name

  • contact: :bz
  • source: dom.manifest
  • type: PageLoader
  • data: see Dromaeo DOM
  • reporting: speed in test runs per second (higher is better)

Each page in the manifest is part of the dromaeo dom benchmark. These are the specific areas that Dromaeo DOM covers:

DOM Attributes

Measures performance of getting and setting a DOM attribute, both via getAttribute and via a reflecting DOM property. Also throws in some expando getting/setting for good measure.

DOM Modification

Measures performance of various things that modify the DOM tree: creating element and text nodes and inserting them into the DOM.

DOM Query

Measures performance of various methods of looking for nodes in the DOM: getElementById, getElementsByTagName, and so forth.

DOM Traversal

Measures performance of various accessors (childNodes, firstChild, etc) that would be used when doing a walk over the DOM tree.

Please see #Dromaeo CSS for examples of data.


Test Name

  • contact: :jgilbert
  • source: glterrain
  • type: PageLoader
  • data: we load the perftest.html page (which generates 4 metrics to track) 25 times, resulting in 4 sets of 25 data points
  • summarization: Measures average frames interval while animating a simple WebGL scene

This tests animates a simple WebGL scene (static textured landscape, one moving light source, rotating viewport) and measure the frames throughput (expressed as average interval) over 100 frames. It runs in ASAP mode (vsync off) and measures the same scene 4 times - for all combination of antialiasing and alpha. It reports the results as 4 values - one for each combination. Lower results are better.

Example data


Test Name

  • contact: :jgilbert
  • source: glvideo
  • type: PageLoader
  • data: 5 cycles of the entire benchmark, each subtest will have 5 data points (see below)
  • summarization: WebGL video texture update with 1080p video. Measures mean tick time across 100 ticks.
  • Lower is better
  • unit: ms/100 ticks
Example data
0;Mean tick time across 100 ticks: ;54.6916;49.0534;51.21645;51.239650000000005;52.44295

This test playbacks a video file and ask WebGL to draw video frames as WebGL textures for 100 ticks. It collects the mean tick time across 100 ticks to measure how much time it will spend for a video texture upload to be a WebGL texture (gl.texImage2D). We run it for 5 times and ignore the first found. Lower results are better.


Test Name

  • contact: :jandem
  • source: jetstream.manifest and jetstream.zip from tooltool
  • type: PageLoader
  • measuring: JavaScript performance
  • reporting: geometric mean from the benchmark
  • data: internal benchmark

This is the JetStream javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.


Test Name

  • contact: :sdetar
  • source: kraken.manifest
  • type: PageLoader
  • measuring: JavaScript performance
  • reporting: Total time for all tests, in ms (lower is better)
  • data: there are 14 subtests in kraken, each subtest is an internal benchmark and generates 10 data points, or 14 sets of 10 data points.
  • summarization:
    • subtest: For all of the 10 data points, we take the mean to report a single number.
    • suite: geometric mean of the 14 subtest results.

This is the Kraken javascript benchmark taken verbatim and slightly modified to fit into our pageloader extension and talos harness.

Example data


Test Name

  • contact: :davidb
  • source: source manifests
  • type: PageLoader
  • measuring: benchmark measuring the time to animate complex scenes
  • summarization:
    • subtest: FPS from the subtest, each subtest is run for 15 seconds, repeat this 5 times and report the median value
    • suite: we take a geometric mean of all the subtests (9 for animometer, 11 for html suite)


Test Name

[TODO] add test details


Test Name

  • contact: :jgilbert
  • source: source manifest
  • type: PageLoader
  • measuring: Draw call performance in WebGL
  • summarization:
    • subtest: FPS from the subtest, each subtest is run once for 15 seconds, report the average FPS over that time.
    • suite: identical to subtest


Test Name

  • contact: :bdahl
  • source:
  • type: PageLoader
  • reporting: time from performance.timing.navigationStart to pagerendered event in ms (lower is better)
  • data: load a PDF 20 times


Test Name

  • contact: :bholley
  • source: perf-reftest
  • type: PageLoader
  • reporting: intervals in ms (lower is better)
  • data: each test loads 25 times
  • summarization:

Important note: This test now requires an 'opt' build. If the perf-reftest is ran on a non-opt build, it will time out (more specifically on innertext-1.html, and possibly others in the future).

Style system performance test suite. The perf-reftest suite is a unique talos suite where each subtest loads two different test pages: a 'base' page (i.e. bloom_basic) and a 'reference' page (i.e. bloom_basic_ref), and then compares each of the page load times against eachother to determine the variance.

Talos runs each of the two pages as if they are stand-alone tests, and then calculates and reports the variance; the test output 'replicates' reported from bloom_basic are actually the comparisons between the 'base' and 'reference' pages for each page load cycle. The suite contains multiple subtests, each of which contains a base page and a reference page.

If you wish to see the individual 'base' and 'reference' page results instead of just the reported difference, the 'base_replicates' and 'ref_replicates' can be found in the PERFHERDER_DATA log file output, and in the 'local.json' talos output file when running talos locally. In production, both of the page replicates are also archived in the perfherder-data.json file. The perfherder-data.json file is archived after each run in production, and can be found on the Treeherder Job Details tab when the perf-reftest job symbol is selected.

This test suite was ported over from the style-perf-tests (https://github.com/heycam/style-perf-tests).

Example data
"replicates": [1.185, 1.69, 1.22, 0.36, 11.26, 3.835, 3.315, 1.355, 3.185, 2.485, 2.2, 1.01, 0.9, 1.22, 1.9,
0.285, 1.52, 0.31, 2.58, 0.725, 2.31, 2.67, 3.295, 1.57, 0.3], "value": 1.7349999999999999, "unit": "ms",

"base_replicates": [166.94000000000003, 165.16, 165.64000000000001, 165.04000000000002, 167.355, 165.175,
165.325, 165.11, 164.175, 164.78, 165.555, 165.885, 166.83499999999998, 165.76500000000001, 164.375, 166.825,
167.13, 166.425, 169.22500000000002, 164.955, 165.335, 164.45000000000002, 164.85500000000002, 165.005, 166.035]}],

"ref_replicates": [165.755, 166.85000000000002, 166.85999999999999, 165.4, 178.615, 169.01, 168.64, 166.465,
167.36, 167.265, 167.75500000000002, 166.895, 167.735, 166.985, 166.275, 166.54000000000002, 165.61, 166.115,
166.64499999999998, 165.68, 167.64499999999998, 167.12, 168.15, 166.575, 166.335], 


Test Name


Individual style system performance tests. The perf-reftest-singletons suite runs the perf-reftest 'base' pages (i.e. bloom_basic) test individually, and reports the values for that single test page alone, NOT the comparison of two different pages. There are multiple subtests in this suite, each just containing the base page on its own.

This test suite was ported over from the style-perf-tests (https://github.com/heycam/style-perf-tests).

Example data


Test Name


This page animates some complex gradient patterns in a requestAnimationFrame callback. However, it also churns the CPU during each callback, spinning an empty loop for 14ms each frame. The intent is that, if we consider the rasterization costs to be 0, then the animation should run close to 60fps. Otherwise it will lag. Since rasterization costs are not 0, the lower we can get them, the faster the test will run. The test runs in ASAP mode to maximize framerate.

The test runs for 10 seconds, and the resulting score is how many frames we were able to render during that time. Higher is better. Improvements (or regressions) to general painting performance or gradient rendering will affect this benchmark.


Test Name


This page animates some complex SVG patterns in a requestAnimationFrame callback. However, it also churns the CPU during each callback, spinning an empty loop for 14ms each frame. The intent is that, if we consider the rasterization costs to be 0, then the animation should run close to 60fps. Otherwise it will lag. Since rasterization costs are not 0, the lower we can get them, the faster the test will run. The test runs in ASAP mode to maximize framerate. The result is how quickly the browser is able to render 600 frames of the animation.

Improvements (or regressions) to general painting performance or SVG are likely to affect this benchmark.


Test Name

  • contact: :mikedeboer, :mconley, :felipe
  • source: talos/sessionrestore
  • bug: bug 936630, bug 1331937, bug 1531520
  • type: Startup
  • measuring: time spent reading and restoring the session.
  • reporting: interval in ms (lower is better).
  • data: we load the session restore index page 10 times to collect 1 set of 10 data points.
  • summarization:

Three tests measure the time spent reading and restoring the session from a valid sessionstore.js. Time is counted from the process start until the sessionRestored event.

In sessionrestore, this is tested with a configuration that requires the session to be restored. In sessionrestore_no_auto_restore, this is tested with a configuration that requires the session to not be restored. Both of the above tests use a sessionstore.js file that contains one window and roughly 89 tabs. In sessionrestore_many_windows, this is tested with a sessionstore.js that contains 3 windows and 130 tabs. The first window contains 50 tabs, 80 remaning tabs are divided equally between the second and the third window.

Example data
[2362.0, 2147.0, 2171.0, 2134.0, 2116.0, 2145.0, 2141.0, 2141.0, 2136.0, 2080.0]


Test Name


See #sessionrestore.


Test Name


See #sessionrestore.


Test Name

  • contact: :mconley
  • source: [2]
  • type: Startup
  • measuring: The time from process start to the point where the about:home page reports that it has painted the Top Sites.
  • data: we load restart the browser 20 times, and collect a timestamp for each run.
  • reporting: test time in ms (lower is better)
Example data
[1503.0, 1497.0, 1523.0, 1536.0, 1511.0, 1485.0, 1594.0, 1580.0, 1531.0, 1471.0, 1502.0, 1520.0, 1488.0, 1533.0, 1531.0, 1502.0, 1486.0, 1489.0, 1487.0, 1475.0]


Test Name

[TODO] add test details


Test Name

  • contact: :mconley
  • source: [3]
  • type: Startup, Real-world WebExtensions
  • measuring: The time from process start to the point where the about:home page reports that it has painted the Top Sites when 5 popular, real-world WebExtensions are installed and enabled.
  • data: we install the 5 real-world WebExtensions, then load and restart the browser 20 times, and collect a timestamp for each run.
  • reporting: test time in ms (lower is better)
Example data
[1503.0, 1497.0, 1523.0, 1536.0, 1511.0, 1485.0, 1594.0, 1580.0, 1531.0, 1471.0, 1502.0, 1520.0, 1488.0, 1533.0, 1531.0, 1502.0, 1486.0, 1489.0, 1487.0, 1475.0]


Test Name



Test Name

  • contact: :mconley
  • source: tabpaint
  • bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1253382
  • type: Pageloader
  • measuring:
    • The time it takes to paint the content of a newly opened tab when the tab is opened from the parent (ex: by hitting Ctrl-T)
    • The time it takes to paint the content of a newly opened tab when the tab is opened from content (ex: by clicking on a target="_blank" link)
  • NOT measuring:
    • The time it takes to animate the tabs. That's the responsibility of the TART test. tabpaint is strictly concerned with the painting of the web content.
  • data: we load the tabpaint trigger page 20 times, each run produces two values (the time it takes to paint content when opened from the parent, and the time it takes to paint content when opened from content), resulting in 2 sets of 20 data points.
    • Example:
Example data
1; tabpaint-from-content;129;68;72;72;70;78;86;85;82;79;120;92;76;80;74;82;76;89;77;85
  • summarization:
    • subtest: ignore first data point, then take the median of the remaining 19 data points
    • suite: geometric_mean(subtests)


Test Name

Example data


Test Name

  • contact: :mconley
  • source: tart
  • type: PageLoader
  • measuring: Desktop Firefox UI animation speed and smoothness
  • reporting: intervals in ms (lower is better) - see below for details
  • data: there are 30 reported subtests from TART which we load 25 times, resulting in 30 sets of 25 data points.
  • summarization:

TART is the Tab Animation Regression Test.

TART tests tab animation on these cases:

  • Simple: single new tab of about:blank open/close without affecting (shrinking/expanding) other tabs.
  • icon: same as above with favicons and long title instead of about:blank.
  • Newtab: newtab open with thumbnails preview - without affecting other tabs, with and without preload.
  • Fade: opens a tab, then measures fadeout/fadein (tab animation without the overhead of opening/closing a tab).
    • Case 1 is tested with DPI scaling of 1.
    • Case 2 is tested with DPI scaling of 1.0 and 2.0.
    • Case 3 is tested with the default scaling of the test system.
    • Case 4 is tested with DPI scaling of 2.0 with the "icon" tab (favicon and long title).
    • Each animation produces 3 test results:
      • error: difference between the designated duration and the actual completion duration from the trigger.
      • half: average frame interval over the 2nd half of the animation.
      • all: average frame interval over all recorded intervals.
      • And the run logs also include the explicit intervals from which these 3 values were derived.
Example data


Test Name

[TODO] add test details


Note that the tp5 test no longer exists (only talos-tp5o) though many tests still make use of this pageset. Here, we provide an overview of the tp5 pageset and some information about how data using the tp5 pageset might be used in various suites.

Test Name

  • contact: :davehunt
  • source: tp5n.zip
  • type: PageLoader
  • data: we load each of the 51 tp5o pages 25 times, resulting in 51 sets of 25 data points.
  • To run it locally, you'd need tp5n.zip.
  • summarization: tp5 with limited pageset (48 pages as others have too much noise)

Tests the time it takes Firefox to load the tp5 web page test set. The web set was culled from the Alexa top 500 April 8th, 2011 and consists of 100 pages in tp5n and 51 in tp5o. Some suites use a subset of these, i.e. 48/51 tests to reduce noise - check with the owner of the test suite which uses the pageset to check if this difference exists there.

Here are the broad steps we use to create the test set:

  1. Take the Alexa top 500 sites list
  2. Remove all sites with questionable or explicit content
  3. Remove duplicate site (for ex. many Google search front pages)
  4. Manually select to keep interesting pages (such as pages in different locales)
  5. Select a more representative page from any site presenting a simple search/login/etc. page
  6. Deal with Windows 255 char limit for cached pages
  7. Limit test set to top 100 pages

Note that the above steps did not eliminate all outside network access so we had to take further action to scrub all the pages so that there are 0 outside network accesses (this is done so that the tp test is as deterministic measurement of our rendering/layout/paint process as possible).

Example data

File IO

File IO is tested using the tp5 test set in the #xperf test.

Possible regression causes
  • nonmain_startup_fileio opt (with or without e10s) windows7-32bug 1274018 This test seems to consistently report a higher result for mozilla-central compared to Try even for an identical revision due to extension signing checks. In other words, if you are comparing Try and Mozilla-Central you may see a false-positive regression on perfherder. Graphs: non-e10s e10s

Xres (X Resource Monitoring)

A memory metric tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on linux only.

xres man page.


Cpu usage tracked during tp5 test runs. This metric is sampled every 20 seconds. This metric is collected on windows only.


contact: :jimm, :overholt

Measures the delay for the event loop to process a tracer event. For more details, see bug 631571.

The score on this benchmark is proportional to the sum of squares of all event delays that exceed a 20ms threshold. Lower is better.

We collect 8000+ data points from the browser during the test and apply this formula to the results:

return sum([float(x)*float(x) / 1000000.0 for x in val_list])


Test Name

  • contact: :kats
  • source: tp5n.zip
  • type: PageLoader
  • data: we load each of the 51 tp5o pages 12 times, resulting in 51 sets of 12 data points.
  • To run it locally, you'd need tp5n.zip.
  • summarization: Measures average frames interval while scrolling the pages of the tp5o set

This test is identical to tscrollx, but it scrolls the 50 pages of the tp5o set (rather than 6 synthetic pages which tscrollx scrolls). There are two variants for each test page. The "regular" variant waits 500ms after the page load event fires, then iterates 100 scroll steps of 10px each (or until the bottom of the page is reached - whichever comes first), then reports the average frame interval. The "CSSOM" variant is similar, but uses APZ's smooth scrolling mechanism to do compositor scrolling instead of main-thread scrolling. So it just requests the final scroll destination and the compositor handles the scrolling and reports frame intervals.

Example data

Possible regression causes

Some examples of things that cause regressions in this test are:

  • Increased displayport size (which causes a larger display list to be built)
  • Slowdown in the building of display list
  • Slowdown in rasterization of content
  • Slowdown in composite times


Test Name

[TODO] add test details


Warning signWarning: This test no longer exists
  • contact: :davidb
  • source: tpaint-window.html
  • type: Startup
  • data: we load the tpaint test window 20 times, resulting in 1 set of 20 data points.
  • summarization:
Talos test name Description
tpaint twinopen but measuring the time after we receive the MozAfterPaint and OnLoad event.

Tests the amount of time it takes the open a new window. This test does not include startup time. Multiple test windows are opened in succession, results reported are the average amount of time required to create and display a window in the running instance of the browser. (Measures ctrl-n performance.)

Example data
[209.219, 222.180, 225.299, 225.970, 228.090, 229.450, 230.625, 236.315, 239.804, 242.795, 244.5, 244.770, 250.524, 251.785, 253.074, 255.349, 264.729, 266.014, 269.399, 326.190]

Possible regression causes

  • None listed yet. If you fix a regression for this test and have some tips to share, this is a good place for them.


  • contact: :jimm
  • source: tresize-test.html
  • type: StartupTest
  • measuring: Time to do XUL resize, in ms (lower is better).
  • data: we run the tresize test page 20 times, resulting in 1 set of 20 data points.
  • summarization:

A purer form of paint measurement than tpaint. This test opens a single window positioned at 10,10 and sized to 300,300, then resizes the window outward |max| times measuring the amount of time it takes to repaint each resize. Dumps the resulting dataset and average to stdout or logfile.

In bug 1102479 tresize was rewritten to work in e10s mode which involved a full rewrite of the test.

To run resize locally without talos, please install the addon to run the test locally.

Example data
[23.2565333333333, 23.763383333333362, 22.58369999999999, 22.802766666666653, 22.304050000000025, 23.010383333333326, 22.865466666666677, 24.233716666666705, 24.110983333333365, 22.21390000000004, 23.910333333333316, 23.409816666666647, 19.873049999999992, 21.103966666666686, 20.389749999999978, 20.777349999999984, 20.326283333333365, 22.341616666666667, 20.29813333333336, 20.769600000000104]

Possible regression causes

  • slowdown in the paint pipeline
  • resizes also trigger a rendering flush so bugs in the flushing code can manifest as regressions
  • introduction of more spurious MozAfterPaint events - see bug 1471961


Test Name

  • contact: :davidb
  • source: tspaint_test.html
  • Perfomatic: "Ts, Paint"
  • type: Startup
  • data: 20 times we start the browser and time how long it takes to paint the startup test page, resulting in 1 set of 20 data points.
  • summarization:

Starts the browser to display tspaint_test.html with the start time in the url, waits for MozAfterPaint and onLoad to fire, then records the end time and calculates the time to startup.

Example data
[1666.0, 1195.0, 1139.0, 1198.0, 1248.0, 1224.0, 1213.0, 1194.0, 1229.0, 1196.0, 1191.0, 1230.0, 1247.0, 1169.0, 1217.0, 1184.0, 1196.0, 1192.0, 1224.0, 1192.0]

Possible regression causes

  • (and/or maybe tpaint?) will regress if a new <panel> element is added to the browser window (e.g. browser.xul) and it's frame gets created. Fix this by ensuring it's display:none by default.


Test Name

[TODO] add test details


Test Name


ts_paint test run against a heavy user profile.

[TODO] add test details


Test Name

[TODO] add test details


Test Name


This test scrolls several pages where each represent a different known "hard" case to scroll (* needinfo), and measures the average frames interval (1/FPS) on each. The ASAP test (tscrollx) iterates in unlimited frame-rate mode thus reflecting the maximum scroll throughput per page. To turn on ASAP mode, we set these preferences:

preferences = {'layout.frame_rate': 0, 'docshell.event_starvation_delay_hint': 1}

See also the tp5o_scroll entry which has relevant information for this test.

Example data


Test Name

  • contact: :jwatt, :dholbert, :neerja
  • source: svg_static
  • type: PageLoader
  • data: we load the 5 svg pages 25 times, resulting in 5 sets of 25 data points
  • summarization: An svg-only number that measures SVG rendering performance of some complex (but static) SVG content.
Example data


Test Name


An svg-only number that measures SVG rendering performance for dynamic content only.

[TODO] add test details


Test Name

  • contact: :jwatt, :dholbert
  • source: [4]
  • type: PageLoader
  • data: we load the 2 svg opacity pages 25 times, resulting in 2 sets of 25 data points
  • summarization: Row Major and 25 cycles/page.

Renders many semi-transparent, partially overlapping SVG rectangles, and measures time to completion of this rendering.

Note that this test also tends to reflect changes in network efficiency and navigation bar rendering issues:

  • Most of the page load tests measure from before the location is changed, until onload + mozafterpaint, therefore any changes in chrome performance from the location change, or network performance (the pages load from a local web server) would affect page load times. SVG opacity is rather quick by itself, so any such chrome/network/etc performance changes would affect this test more than other page load tests (relatively, in percentages).
Example data


Test Name


An svg-only number that measures SVG rendering performance, with animations or iterations of rendering. This is an ASAP test -- i.e. it iterates in unlimited frame-rate mode thus reflecting the maximum rendering throughput of each test. The reported value is the overall duration the sequence/animation took to complete. To turn on ASAP mode, we set these preferences:

preferences = {'layout.frame_rate': 0, 'docshell.event_starvation_delay_hint': 1}
Example data

Possible regression causes

  • Did you change the dimensions of the content area? Even a little? The tsvgx test seems to be sensitive to changes like this. See bug 1375479, for example. Usually, these sorts of "regressions" aren't real regressions - they just mean that we need to re-baseline our expectations from the test.


Test Name

 twinopen ext+twinopen:twinopen.html
  • contact: :bdahl, :jimm
  • source: twinopen
  • type: Startup
  • data: we open a new browser window 20 times, resulting in 1 set of 20 data points.
  • summarization: Time from calling OpenBrowserWindow until the chrome of the new window has painted.

Tests the amount of time it takes the open a new window from a currently open browser. This test does not include startup time. Multiple test windows are opened in succession, results reported are the average amount of time required to create and display a window in the running instance of the browser. (Measures ctrl-n performance.)

Example data
[209.219, 222.180, 225.299, 225.970, 228.090, 229.450, 230.625, 236.315, 239.804, 242.795, 244.5, 244.770, 250.524, 251.785, 253.074, 255.349, 264.729, 266.014, 269.399, 326.190]

xperf (tp5n)

  • contact: [email protected]
  • source: xperf instrumentation
  • type: Pageloader (tp5n) / Startup
  • measuring: IO counters from windows (currently, only startup IO is in scope)
  • reporting: Summary of read/write counters for disk, network (lower is better)

These tests only run on windows builds. See this active-data query for an updated set of platforms that xperf can be found on. If the query is not found, use the following on the query page:


Talos will turn orange for 'x' jobs on windows 7 if your changeset accesses files which are not predefined in the whitelist during startup; specifically, before the "sessionstore-windows-restored" Firefox event. If your job turns orange, you will see a list of files in Treeherder (or in the log file) which have been accessed unexpectedly (similar to this):

* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768 TEST-UNEXPECTED-FAIL : xperf: File '{profile}\secmod.db' was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 16904, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File '{profile}\cert8.db' was accessed and we were not expecting it. DiskReadCount: 4, DiskWriteCount: 0, DiskReadBytes: 33288, DiskWriteBytes: 0
* TEST-UNEXPECTED-FAIL : xperf: File 'c:\$logfile' was accessed and we were not expecting it. DiskReadCount: 0, DiskWriteCount: 2, DiskReadBytes: 0, DiskWriteBytes: 32768 

In the case that these files are expected to be accessed during startup by your changeset, then we can add them to the whitelist.

Xperf runs tp5 while collecting xperf metrics for disk IO and network IO. The providers we listen for are:

The values we collect during stackwalk are:


Build metrics

These are not part of the Talos code, but like Talos they are benchmarks that record data using the graphserver and are analyzed by the same scripts for regressions.

Number of constructors (num_ctors)

This test runs at build time and measures the number of static initializers in the compiled code. Reducing this number is helpful for startup optimizations.

Platform microbenchmark

IsASCII and IsUTF8 gtest microbenchmarks

Test whose name starts with PerfIsASCII test the performance of the XPCOM string IsASCII function with ASCII inputs if different lengths.

Test whose name starts with PerfIsUTF8 test the performance of the XPCOM string IsUTF8 function with ASCII inputs if different lengths.

Possible regression causes

  • The --enable-rust-simd accidentally getting turned off in automation.
  • Changes to encoding_rs internals.
  • LLVM optimizations regressing between updates to the copy of LLVM included in the Rust compiler.


  • contact: :bholley
  • source: MozGTestBench.cpp
  • type: Custom GTest micro-benchmarking
  • data: Time taken for a GTest function to execute
  • summarization: Not a Talos test. This suite is provides a way to add low level platform performance regression tests for things that are not suited to be tested by Talos. See the [[[../Sheriffing#Microbench_Policy|Microbench Sheriffing Policy]]] for some notes on how to treat regressions.

PerfStrip Tests

PerfStripWhitespace - call StripWhitespace() on 5 different test cases 20k times (each)

PerfStripCharsWhitespace - call StripChars("\f\t\r\n") on 5 different test cases 20k times (each)

PerfStripCRLF - call StripCRLF() on 5 different test cases 20k times (each)

PerfStripCharsCRLF() - call StripChars("\r\n") on 5 different test cases 20k times (each)

Stylo gtest microbenchmarks

  • contact: :bholley, :SimonSapin
  • source: [6]
  • type: Microbench
  • reporting: intervals in ms (lower is better)
  • data: each test is run and measured 5 times
  • summarization: take the median of the 5 data points; source: MozGTestBench.cpp

Servo_StyleSheet_FromUTF8Bytes_Bench parses a sample stylesheet 20 times with Stylo’s CSS parser that is written in Rust. It starts from an in-memory UTF-8 string, so that I/O or UTF-16-to-UTF-8 conversion is not measured.

Gecko_nsCSSParser_ParseSheet_Bench does the same with Gecko’s previous CSS parser that is written in C++, for comparison.

Servo_DeclarationBlock_SetPropertyById_Bench parses the string "10px" with Stylo’s CSS parser and sets it as the value of a property in a declaration block, a million times. This is similar to animations that are based on JavaScript code modifying Element.style instead of using CSS @keyframes.

Servo_DeclarationBlock_SetPropertyById_WithInitialSpace_Bench is the same, but with the string " 10px" with an initial space. That initial space is less typical of JS animations, but is almost always there in stylesheets or full declarations like "width: 10px". This microbenchmark was used to test the effect of some specific code changes. Regressions here may be acceptable if Servo_StyleSheet_FromUTF8Bytes_Bench is not affected.

History of tp tests

The original tp test created by Mozilla to test browser page load time. Cycled through 40 pages. The pages were copied from the live web during November, 2000. Pages were cycled by loading them within the main browser window from a script that lived in content.


The same tp test but loading the individual pages into a frame instead of the main browser window. Still used the old 40 page, year 2000 web page test set.


An update to both the page set and the method by which pages are cycled. The page set is now 393 pages from December, 2006. The pageloader is re-built as an extension that is pre-loaded into the browser chrome/components directories.


Updated web page test set to 100 pages from February 2009.


This is a smaller pageset (21 pages) designed for mobile Firefox. This is a blend of regular and mobile friendly pages.

We landed on this on April 18th, 2011 in bug 648307. This runs for Android and Maemo mobile builds only.


Updated web page test set to 100 pages from April 8th, 2011. Effort was made for the pages to no longer be splash screens/login pages/home pages but to be pages that better reflect the actual content of the site in question. There are two test page data sets for tp5 which are used in multiple tests (i.e. awsy, xperf, etc.): (i) an optimized data set called tp5o, and (ii) the standard data set called tp5n.


Created June 2017 with recorded pages via mitmproxy using modern google, amazon, youtube, and facebook. Ideally this will contain more realistic user accounts that have full content, in addition we would have more than 4 sites- up to top 10 or maybe top 20.

These were migrated to Raptor between 2018 and 2019.