Create new test type `aamtest` for accessibility API testing by spectranaut · Pull Request #57696 · web-platform-tests/wpt

spectranaut · 2026-02-11T00:17:09Z

This PR adds a new test type to test accessibility APIs exposed by browsers, as defined by the ARIA, Core-AAM and HTML-AAM specifications. The RFC can be found here.

This is a replacement for: #53733

Instead of extending testharness, I added a new test type (aamtest) that is similar to wdspec tests and uses a lot of the same infrastructure. This idea came from @foolip in this comment on the RFC, and I think it looks good!

Here is a PR that adds documentation for this test type: #58632

To run tests on Linux:

On Debian distros:
apt install libatspi2.0-dev libcairo2-dev libgirepository1.0-dev

# Chrome
./wpt run chrome core-aam/aamtests/role/blockquote.py

# Chromium
./wpt run  --binary <chromiumbinary> chromium core-aam/aamtests/role/blockquote.py

# Firefox (needs --no-headless explicitly set)
./wpt run --no-headless firefox core-aam/aamtests/role/blockquote.py

On Mac:

Run chrome tests with --no-headless. Safari does not yet support this test type.

community-tc-integration · 2026-02-11T20:24:29Z

Uh oh! Looks like an error!

Client ID static/taskcluster/github does not have sufficient scopes and is missing the following scopes:

{
  "AnyOf": [
    "queue:rerun-task:taskcluster-github/RBJqfU0pQ82PBMqZMXxGYw/RqnO5x7vQu-LPaJ5z8Ue5w",
    "queue:rerun-task-in-project:none",
    {
      "AllOf": [
        "queue:rerun-task",
        "assume:scheduler-id:taskcluster-github/RBJqfU0pQ82PBMqZMXxGYw"
      ]
    }
  ]
}

This request requires the client to satisfy the following scope expression:

{
  "AnyOf": [
    "queue:rerun-task:taskcluster-github/RBJqfU0pQ82PBMqZMXxGYw/RqnO5x7vQu-LPaJ5z8Ue5w",
    "queue:rerun-task-in-project:none",
    {
      "AllOf": [
        "queue:rerun-task",
        "assume:scheduler-id:taskcluster-github/RBJqfU0pQ82PBMqZMXxGYw"
      ]
    }
  ]
}

method: rerunTask
errorCode: InsufficientScopes
statusCode: 403
time: 2026-02-11T20:24:29.184Z

spectranaut · 2026-02-11T22:08:35Z

@jcsteh -- I'd love your early feedback on this completely new direction to add AAM tests, the tests are like the wpt's webdriver spec tests, all in python!

Look at the blockquote test.

The APIs are passed to the test as arguments ("fixtures" in pytest speak -- defined in wai-aria-aam/support/fixtures_a11y_api.py). The atspi argument is a AtspiWrapper object, and the axapi argument is a AxapiWrapper object, and the ia2 argument is a Ia2Wrapper object.

You can see these tests already in the wpt.fyi for this PR: https://wpt.fyi/results/?label=pr_head&max-count=1&pr=57696

jcsteh · 2026-02-12T07:47:48Z

@spectranaut Thanks for the early ping and for your work on this. This looks really neat!

I haven't looked at this in-depth yet, but here are some early thoughts:

I notice that this moves away from the declarative approach and is more imperative. On one hand, that's what I was advocating for, so that's nice for me. :) On the other hand, I recall you feeling strongly about declarative tests, so I'd love to understand why you feel the imperative approach works with this Python framework, but didn't fit for the TestDriver framework. I totally understand if you just had a change of heart and incorporated that here, but if there's a different and/or another reason you think it makes more sense here, that understanding might guide other thinking and future possibilities.
This more or less flips the flow. Instead of writing web stuff and calling out to Python to test it, we now call out to the browser from Python to load web stuff and then test it in Python. At the risk of stating the obvious, while there were challenges with the former approach for complex cases (e.g. needing to potentially send Python code to be evaluated), there are also challenges with this latter approach for complex cases (e.g. testing mutations will require sending JS to be evaluated). It's probably fair to say that we need to run more Python than we do JS for these tests, so driving them with Python will reduce the amount of ugly cross-language shenanigans, but I just want to flag that we're not going to escape this altogether; we will absolutely need to test many kinds of mutations going forward. I reckon most of the obscurest browser engine bugs I have to fix end up being related to mutations in some way or another. :)
I'm not super familiar with this framework, so just to double check, is it definitely possible for us to evaluate whatever JS we need to run via the session object?
Can we await results from JS too; e.g. await some DOM event before executing something in Python? We'll mostly want to wait for accessibility events, not DOM events, but there are complex cases where being able to wait for some DOM event can be useful.
Speaking of accessibility events, I do think this will make supporting those simpler. We could have done that by sending Python from JS, having the Python block and then return the result to JS, which is what the Gecko IA2/ATK/UIA tests do. However, that might have been a bit tricky/ugly with TestDriver, whereas it's cleaner and more straightforward if we can keep it all in Python. FWIW, I wrote Python helpers to wait for specific IA2, ATK and UIA events for Gecko tests, so that should hopefully be helpful when we get to that point.
It'd be nice if we could avoid the if not atspi: return style boilerplate at the top of every test, but we probably can't. I thought about using a decorator that could handle this for us, but I suspect the "magic" used by fixtures wouldn't like that much and it only really reduces the boilerplate by 1 or 2 lines anyway (a decorator still means a line of code).

spectranaut · 2026-02-17T20:41:38Z

@jcsteh thanks as always for the thoughts! :)

On 1, imperative vs declarative -- tbh I never had a strong preference either way, maybe slight :) I think the declarative approach aligns the way the mapping of the Core-AAM are presented.. they are somewhat simplified and kind of have their own language for describing the APIs. Plus we can reuse all the manual tests Joanie maintained. But I think I've been convinced by you that closer to the API/imperative tests will get us better results -- and make a better and more flexible test suite in the long run.

On 2, on the python vs html+js flip -- yeah I see the tradeoffs! The tests in this PR all have inline html, but for more complicated tests, we can create an separate html file to open. I think if we are going to write imperative tests (which I've been convinced), I think we should write the tests in python, and choose those tradeoffs.

On 3, on session objects/executing javascript -- the session object is an implementation of webdriver maintained in wpt here: https://github.com/web-platform-tests/wpt/tree/master/tools/webdriver These tests have all of webdriver available to them, including the ability to send in javascript to execute, or sending clicks, keys, etc.

On 4, on DOM events -- in webdriver classic, you can't wait on DOM events, you can only poll for changes, which is probably good enough? There is a way to wait for things with webdriver bidi, but Safari doesn't have support for bidi yet.

On 5 accessibility events -- awesome, yes, that will be helpful, and I think accessibility event testing will be easier here too.

On 6, the if not atspi: return -- it's not great and I'll keep an eye out for options.. not sure that fixtures can help, but maybe some other pytest thing. I really want there to be a "not applicable" concept which can be applied to subtests, but I haven't dug into that.

spectranaut · 2026-02-19T19:55:33Z

Hi @jcsteh -- I'm noticing that these tests are flakey on Firefox on Linux.. and I wonder if you know why or can think of an easy fix. The flakes were caught by the Community-TC Integration / wpt-firefox-nightly-stability and are easy to reproduce locally.

Basically, the nodes all appear in the tree, but not all the correct attributes are set by the time we query for them.

In the code, before we run the test, we (1) load the webpage, then (2) find for the correct tab (role: document web), then (3) wait until "busy" is not set.

But when you run the test immediately after that, finding the node by DOM ID fails sometimes -- the blockquote node does not always have a DOM ID attribute. I added a poll to try to solve for this but it doesn't seem like a great solution, and then, I'm getting flakey failures while looking for another attribute in another test, as you can see in this CI report.

Am I waiting for busy on the wrong thing? Or is this bug in firefox?

jcsteh · 2026-02-19T21:29:16Z

Ah, this is due to caching granularity. By default, we only enable a small set of cached attributes to improve memory usage and performance, since a lot of clients don't need everything. When a client first requests something that isn't in the cache, we asynchronously enable it from that point forward. You can work around this by setting the pref accessibility.enable_all_cache_domains to true, the same way you set the accessibility.force_disabled pref.

spectranaut · 2026-02-20T19:13:22Z

Bad news, @jcsteh 😢
I turned on caching and I still see the flake. I confirmed the setting was on in about:config. See the flake report when I remove polling for the dom id and the flake report with polling enabled -- it's essentially the same as if this setting was not set.

jcsteh · 2026-02-23T03:34:47Z

Very odd. I'll need to get this running locally so I can shove some logging into Gecko and see what's going on. What's really strange is that we have a whole bunch of Gecko tests which cover exactly this behaviour.

jcsteh · 2026-03-02T07:34:07Z

@spectranaut, are you far enough along with Windows or Mac testing to know whether this flake shows up for Firefox on either of those platforms? That is, is this just a Linux flake at this stage or is that not conclusive yet?

cookiecrook · 2026-03-17T17:49:49Z

It wouldn't be a problem to use the existing directories, however. I could make each -aam directory have a folder aamtests that contain the python tests, and html-aam can point to the core-aam/aamtests/support/ directory, and we can discover if the aamtest test type if they are contained in an aamtests folder.

Yes, that would align well with the WPT pattern that tests should be in a directory that uses the same name as the spec. That's why the ARIA directory is /wai-aria instead of just /aria, and the /accessibility tests are just crash tests that aren't tied to a particular spec.

core-aam/aamtests/support/atspi_wrapper.py

Ms2ger · 2026-03-19T11:31:12Z

tools/manifest/sourcefile.py

Please document the path requirement if you haven't yet

Oh you mean documentation like what shows up in https://web-platform-tests.org/? I think I'll do that in a separate PR, unless you think I should do it in this one.

Added a PR with documentation: #58632

tools/manifest/sourcefile.py

jgraham · 2026-03-25T17:38:30Z

core-aam/aamtests/support/ia2/constants.py

@@ -0,0 +1,210 @@
+# type: ignore


Could we put all the pure library code (i.e. everything that's not one of the pytest fixtures) under tools? That matches the organization of the rest of the repo more closely and has the advantage that lints and unittests will automatically cover this code.

I forgot to mention -- I moved this to "third_party_modified": https://github.com/web-platform-tests/wpt/pull/57696/changes#diff-0d2c8b6952aab7e1d99eaf6f96101ea52337da802caa86528d83b27cf3822a5c

But then saw the only thing in there was recently moved. Is it ok there, or do you have a better place in mind? The tlb file is copied in, the ia2.py file contains helpers I wrote.

jgraham · 2026-03-25T17:44:26Z

core-aam/aamtests/role/button.py

+}
+
+@pytest.mark.parametrize("test_html", TEST_HTML.values(), ids=TEST_HTML.keys())
+def test_atspi(atspi, session, inline, test_html):


I'm not sure I love having a totally different test function per AT API (if I'm understanding what's happening here correctly). I guess the problem is that the tuple (platform, browser, AT API) is more unique than just (platform, browser) i.e. we can have specific platform/browser combinations with >1 AT, so the fact that wpt.fyi (for example) is currently parameterized by (platform, browser, channel) means we need different AT APIs to show up as different tests rather than different configurations?

I'm not sure I fully understand this comment/question -- but in general, there is one accessibility API per platform, except on Windows, where there are two. One of those APIs (IA2) is suppose to be defunct but is more fully featured and absolutely necessary for screen readers to interact with web content, and the other (UIA) is newer, and Windows is actively trying to get support for and extending the capabilities of. So I guess the answer is yes, we do need a the tuple (platform, browser, Accessibility API) to understand the results on Windows.

The goal of the different test functions per API is to easily see if there is accessibility support for a web feature for a given (platform, browser) combination without having to think about APIs. You just look at the test results for role=blockquote (and ignore the "PRECONDITION_FAILED" results). So it's not exactly to solve the tuple problem, its for understanding the test results and compare support across different (platform, browser) combinations.

Oh I guess you could have just one single function per concept, no subtests per API...

def test_blockquote(atspi, axapi, ia2, uia, session, inline): session.url = inline(TEST_HTML) if (atspi): node = atspi.find_node("test", session.url) assert atspi.Accessible.get_role(node) == atspi.Role.BLOCK_QUOTE if (axapi): node = axapi.find_node("test", session.url) role = axapi.AXUIElementCopyAttributeValue(node, "AXRole", None)[1] assert role == "AXGroup" role = axapi.AXUIElementCopyAttributeValue(node, "AXSubrole", None)[1] assert role == "AXUnknown" if (ia2): node = ia2.find_node("test", session.url) assert ia2.get_role(node) == "IA2_ROLE_BLOCK_QUOTE" assert ia2.get_msaa_role(node) == "ROLE_SYSTEM_GROUPING" if (uia): ...

But then you are right, you can't see easily if IA2 or UIA is failing. So it is ultimately about solving the (platform, browser, AT API) problem, after all.

Yeah, exactly. If we wrote all tests as test_foo_{browser}_{os}_{channel} and produced PRECONDITION_FAILED for all the cases not corresponding to the the current browser/platform/channel it would be really hard to interpret the results.

Of course doing it for one property that is expected to have 4 different values isn't so bad, but it still makes it harder to compare the results.

We could make the AT-API a property of the run (although it would mean you'd have to have separate runs for each AT API on platforms with > 1). That would also require a bit of work on the wpt.fyi side to allow filtering the runs appropriately, but it might be better overall?

Also for what it's worth, if I was writing this with one test function I'd define a single at fixture returning a class representing the AT API and use it something like:

def test_blockquote(at, session, inline): session.url = inline(TEST_HTML) if isinstance(at, AtSpi): node = at.find_node("test", session.url) assert atspi.Accessible.get_role(node) == atspi.Role.BLOCK_QUOTE elif isinstance(at, AxApi): # Fill in other cases here else: raise PreconditionError(f"Test not supported for {at}")

One could also consider moving that to a match block once we have Python 3.10 support.

The suggestion to make the accessibility API a property of the run is interesting (btw in general we called it an "accessibility API", not an "AT API").

But I do have a concern about going this direction -- right now, to support all assistive technologies on Windows, you need to support both accessibility APIs, and it will likely be that way for a long long time. So I think it would be more ideal if any time anyone ran these tests on windows (or ran the full wpt test suite), they got results for both APIs, since both are relevant to making an accessible browser on Windows. In this ideal scenario, all consumers of wpt (and everyone sending results to wpt.fyi) wouldn't have to change anything -- you can get the full picture of windows support just by running the test suite as you were previously doing.

If we added an accessibility property to the run, would that mean that in order for wpt.fyi to get the results of both API, then whoever maintains and sends the edge results will have to add infrastructure to run the tests in these directories twice...? Do you think there is a way around this, so that consumers don't have to change anything but still get a full report (the results for both APIs) -- somehow wptrunner would have to handle running tests of type "aamtest" on twice on Windows... and results of the test run are reported correctly to wpt.fyi? It sounds complicated, but if you think it's really worth it, I could look into it.

On the other hand, if we made it easy to filter out "PRECONDITION_FAILED" subtests in WPT (web-platform-tests/wpt.fyi#4672), which I planned on picking up next -- I think it would resolve the difficulty in reading/understanding the results.

I could also potentially do different test files per API, and then use the test file status PRECONDITION_FAILED when it doesn't apply to platform.

But I don't like this, because (1) you can't easily see the support for a specific feature cross-platfoms, and (2) for each feature, you will have to make a test file per platform. Already I have 300 tests to add for core-aam alone, it would become 1200 if we did it for every test platform. Alternatively, we could combine tests for many different roles into one test file -- but all roles would be a huge test file, and otherwise, I'm not sure what a clean division would be.

tools/third_party_modified/iaccessible2/README.txt

spectranaut · 2026-03-31T18:29:26Z

@whimboo I just noticed you changes to tests/support/fixtures.py and I thought I'd let you know that this change is coming :) We are using some of the wdspec test infrastructure/fixtures for a new test type, aamtest, similarly written in python -- to test the exposed accessibility APIs of the browser.

whimboo · 2026-04-01T06:28:06Z

@whimboo I just noticed you changes to tests/support/fixtures.py and I thought I'd let you know that this change is coming :) We are using some of the wdspec test infrastructure/fixtures for a new test type, aamtest, similarly written in python -- to test the exposed accessibility APIs of the browser.

Thank you for pointing me to this PR. I do have some concerns about reusing our fixtures outside the WebDriver tests directory, since that makes it harder to change them safely when we may not be aware of effects on tests elsewhere in the repository.

If duplication is not desirable and we genuinely need higher-level shared fixtures (for example, for creating WebDriver sessions) we should probably identify a more appropriate location for them. I would also expect similar needs to arise for other Python-based tests in different parts of the tree.

That said, this likely needs broader discussion with people who have a more complete view of the repository structure and testing strategy, such as @jgraham.

spectranaut · 2026-04-01T21:57:45Z

@whimboo thanks for so quickly taking a look!!

If duplication is not desirable and we genuinely need higher-level shared fixtures (for example, for creating WebDriver sessions) we should probably identify a more appropriate location for them. I would also expect similar needs to arise for other Python-based tests in different parts of the tree.

I would really like to not have duplication, would it be ok if I opened another PR to move the fixtures we'd need to the tools/ directory that both sets of test reference -- and we could talk about it there? So far, these are the only python based tests, the webdriver tests and the accessibility API tests.

That said, this likely needs broader discussion with people who have a more complete view of the repository structure and testing strategy, such as @jgraham.

@jgraham is already reviewing this patch :) and we have been in discussion about it for a while -- there is an RFC for this change that has just been merged.

moz-wptsync-bot added the mozilla:gecko-blocked label Feb 11, 2026

spectranaut force-pushed the acacia-wdspec-style-tests branch 4 times, most recently from 5aacc60 to 2016a87 Compare February 11, 2026 19:47

spectranaut force-pushed the acacia-wdspec-style-tests branch 2 times, most recently from 3101e53 to 7bda3f7 Compare February 11, 2026 21:13

spectranaut requested a review from jcsteh February 11, 2026 21:35

This was referenced Feb 17, 2026

Meeting: March 3, 2026, @ 9 AM PST web-platform-tests/interop-accessibility#219

Closed

RFC 204: WPT testing for AAMs (platform accessibility APIs) web-platform-tests/rfcs#204

Merged

spectranaut force-pushed the acacia-wdspec-style-tests branch 2 times, most recently from a9fac34 to de8bf82 Compare February 19, 2026 00:58

spectranaut added 5 commits February 19, 2026 11:28

Create new test type for accessibility API testing

80f1246

Fix mac tests

a45d66b

Fix windows

389751b

Refactor atspi_wrapper and fix flake

94cf02c

Refactor wrappers, formatting

8765a3a

spectranaut force-pushed the acacia-wdspec-style-tests branch from a09c749 to 8765a3a Compare February 19, 2026 19:29

spectranaut added 2 commits February 20, 2026 09:37

Enable firefox caching and don't poll for id

4cf35c7

Add back polling for firefox

81691a3

Use test result 'PRECONDITION_FAILED' for tests on wrong platform

9637525

wpt-pr-bot requested review from jnurthen, michael-n-cooper and sideshowbarker March 16, 2026 17:57

spectranaut added 2 commits March 17, 2026 09:13

Move tests to core-aam/aamtests

39b5e83

Remove old wai-aria-aam test dir

a51f1c5

spectranaut force-pushed the acacia-wdspec-style-tests branch 2 times, most recently from 1fd1df6 to 3f7b977 Compare March 17, 2026 17:18

Put aamtest specific env vars in browser_kwargs

61be1c5

spectranaut force-pushed the acacia-wdspec-style-tests branch from 3f7b977 to 61be1c5 Compare March 17, 2026 18:38

Ms2ger approved these changes Mar 19, 2026

View reviewed changes

Code review from ms2ger

ddc1227

spectranaut mentioned this pull request Mar 20, 2026

Add documentation for aamtest test type #58632

Open

spectranaut added 3 commits March 20, 2026 09:57

Remove tests, adding in seperate PR

1cb6089

Remove unnecessary __init__.py

0e04aa6

Add initial role and attribute tests

6f6156f

jgraham reviewed Mar 25, 2026

View reviewed changes

More IA2 directory to third_party_modified

41433f7

wpt-pr-bot added localpaths.py third_party_modified labels Mar 30, 2026

spectranaut added 3 commits March 30, 2026 12:22

Merge branch 'master' into acacia-wdspec-style-tests

467b970

Remove blank line

3bf2e8b

Fix import names

1705643

spectranaut commented Mar 31, 2026

View reviewed changes

tools/third_party_modified/iaccessible2/README.txt Outdated Show resolved Hide resolved

spectranaut added 2 commits March 31, 2026 09:40

Apply suggestion from @spectranaut

2f74d37

Add location to classic session fixture

779a1d8

spectranaut mentioned this pull request Apr 1, 2026

Filter by PRECONDITION_FAILED subtest status web-platform-tests/wpt.fyi#4672

Open

Conversation

spectranaut commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To run tests on Linux:

On Mac:

Uh oh!

community-tc-integration bot commented Feb 11, 2026

Uh oh!

spectranaut commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcsteh commented Feb 12, 2026

Uh oh!

spectranaut commented Feb 17, 2026

Uh oh!

spectranaut commented Feb 19, 2026

Uh oh!

jcsteh commented Feb 19, 2026

Uh oh!

spectranaut commented Feb 20, 2026

Uh oh!

jcsteh commented Feb 23, 2026

Uh oh!

jcsteh commented Mar 2, 2026

Uh oh!

cookiecrook commented Mar 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spectranaut Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spectranaut Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spectranaut Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spectranaut Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spectranaut commented Mar 31, 2026

Uh oh!

whimboo commented Apr 1, 2026

Uh oh!

spectranaut commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

spectranaut commented Feb 11, 2026 •

edited

Loading

spectranaut commented Feb 11, 2026 •

edited

Loading

spectranaut Mar 25, 2026 •

edited

Loading

spectranaut Mar 25, 2026 •

edited

Loading

spectranaut Mar 30, 2026 •

edited

Loading

spectranaut Apr 2, 2026 •

edited

Loading