0% found this document useful (0 votes)
41 views15 pages

Code A11 y

The document presents CodeA11y, a GitHub Copilot Extension designed to enhance web accessibility by providing accessibility-compliant code suggestions, identifying errors, and reminding developers to validate their code. A study with novice developers revealed that while AI coding assistants like GitHub Copilot have potential, they often fail to prompt for accessibility and overlook critical manual steps. CodeA11y aims to address these gaps, demonstrating effectiveness in guiding developers towards more accessible UI code creation.

Uploaded by

salem mars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views15 pages

Code A11 y

The document presents CodeA11y, a GitHub Copilot Extension designed to enhance web accessibility by providing accessibility-compliant code suggestions, identifying errors, and reminding developers to validate their code. A study with novice developers revealed that while AI coding assistants like GitHub Copilot have potential, they often fail to prompt for accessibility and overlook critical manual steps. CodeA11y aims to address these gaps, demonstrating effectiveness in guiding developers towards more accessible UI code creation.

Uploaded by

salem mars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

CodeA11y: Making AI Coding Assistants Useful for Accessible

Web Development
Peya Mowar Yi-Hao Peng Jason Wu
Robotics Institute Human-Computer Interaction Apple
Carnegie Mellon University Institute Seattle, Washington, USA
Pittsburgh, Pennsylvania, USA Carnegie Mellon University [email protected]
[email protected] Pittsburgh, Pennsylvania, USA
[email protected]

Aaron Steinfeld Jefrey P Bigham


Robotics Institute Human-Computer Interaction
Carnegie Mellon University Institute
Pittsburgh, Pennsylvania, USA Carnegie Mellon University
[email protected] Pittsburgh, Pennsylvania, USA
[email protected]

Figure 1: CodeA11y is a GitHub Copilot Extension for Accessible Web Development. CodeA11y addresses accessibility limitations
of Copilot observed in our study with developers through three features: (1) accessibility-by-default code suggestions, (2)
automatic identifcation of relevant accessibility errors, and (3) reminders to replace placeholders in generated code. Integrating
these features directly into AI coding assistants would improve the accessibility of the user interfaces (UIs) developers create.
Abstract
A persistent challenge in accessible computing is ensuring devel-
This work is licensed under a Creative Commons Attribution 4.0 International License.
opers produce web UI code that supports assistive technologies.
CHI ’25, Yokohama, Japan
© 2025 Copyright held by the owner/author(s). Despite numerous specialized accessibility tools, novice developers
ACM ISBN 979-8-4007-1394-1/25/04 often remain unaware of them, leading to ~96% of web pages that
https://doi.org/10.1145/3706598.3713335
CHI ’25, April 26–May 01, 2025, Yokohama, Japan Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, and Jefrey P Bigham

contain accessibility violations. AI coding assistants, such as GitHub accessibility was considered confusing, and advocating for accessi-
Copilot, could ofer potential by generating accessibility-compliant ble development was in confict with other business goals. Clearly,
code, but their impact remains uncertain [52]. Our formative study what we have done so far is not working.
with 16 developers without accessibility training revealed three key We argue that AI coding assistants (e.g., Github Copilot [1]) could
issues in AI-assisted coding: failure to prompt AI for accessibility, ofer an opportunity to make UI code more accessible. AI coding
omitting crucial manual steps like replacing placeholder attributes, assistants are already widely adopted, which means that developers
and the inability to verify compliance. To address these issues, we do not need to be convinced to use them or to install a specialized
developed CodeA11y, a GitHub Copilot Extension, that suggests tool for accessibility. They produce a wide variety of UI code and
accessibility-compliant code and displays manual validation re- are capable enough to both refect on the quality of arbitrary code
minders. We evaluated it through a controlled study with another and also prompt developers to fx what they are unable to do. This
20 novice developers. Our fndings demonstrate its efectiveness paper explores how AI coding assistants currently help developers
in guiding novice developers by reinforcing accessibility practices create UI code, what problems remain, and presents a system called
throughout interactions, representing a signifcant step towards CodeA11y that shows that AI coding assistants can be made better
integrating accessibility into AI coding assistants. at enabling developers to improve the accessibility of their UI code.
To explore this potential opportunity, we frst conducted a user
CCS Concepts study (Section 3) with 16 developers not trained in accessibility to
• Human-centered computing → Accessibility design and understand how current tools (GitHub Copilot) impact the produc-
evaluation methods; Interactive systems and tools; • Soft- tion of accessible UI code. Our fndings (Section 4) shows that while
ware and its engineering → Development frameworks and Copilot may potentially improve accessibility of UI code, three bar-
environments. riers prevent realization of those improvements: (1) developers may
need to explicitly prompt the assistants for accessible code and thus
not beneft if they fail to do so, (2) developers may overlook critical
Keywords manual steps suggested by Copilot, such as replacing placeholders
AI Coding Assistants, Web Accessibility, Coding Agents, AI Agents in alternative text for images, and (3) developers may not be able
to verify if they fully implemented more complex accessibility en-
ACM Reference Format:
Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, and Jefrey P Bigham.
hancements properly. The formative study showed the potential
2025. CodeA11y: Making AI Coding Assistants Useful for Accessible Web of AI coding assistants to improve the accessibility of UI code, but
Development. In CHI Conference on Human Factors in Computing Systems revealed several gaps that led us to design goals (Section 5) for
(CHI ’25), April 26–May 01, 2025, Yokohama, Japan. ACM, New York, NY, improving AI coding assistants to support accessibility.
USA, 15 pages. https://doi.org/10.1145/3706598.3713335 We then built CodeA11y (Section 6, Figure 1), a GitHub Copilot
Extension that addresses the observed gaps by consistently reinforc-
ing accessible development practices throughout the conversational
1 Introduction interaction. We evaluated CodeA11y (Section 7) with 20 developers,
Most websites contain extensive accessibility errors [87], despite assessing its efectiveness in supporting accessible UI development
decades of investment in standards and guidelines [13, 18], tools [8, and gathering insights for further refnement. We found that de-
81, 82], advocacy [48, 57, 78], and policy. According to a recent velopers using CodeA11y are signifcantly more likely to produce
analysis by WebAim [87], the homepages of the top million websites accessible UI code. Finally, we refect on the broader implications
each contain 57 accessibility errors on average, including (but not of integrating AI coding assistants into accessibility workfows,
limited to) missing alt-text for images [9, 28], inadequate color including the balance between automation and developer educa-
contrast [69], incorrect or missing labels for forms and links [36], tion, and the potential for AI tools to shape long-term developer
and improper use of heading levels [12]. As a result, many people behavior toward accessibility-conscious practices (Section 8).
with disabilities will fnd it difcult to use these websites efectively The contributions of our paper are:
and may not be able to use them at all. • We conducted a study with 16 developers that uncovered
Front-end web developers ultimately determine the accessibility both benefts and limitations of current AI coding assistants
(or inaccessibility) of the UI code that they write. Getting front-end for authoring accessible UI code.
developers to write more accessible code has proven exceptionally • CodeA11y1 : a GitHub Copilot Extension that generates ac-
difcult. As Jonathan Lazar et al. wrote twenty years ago in 2004, cessible UI code, identifes existing issues and reminds de-
“Since tools and guidelines are available to help designers and web- velopers to perform manual validation.
masters make their web sites accessible, it is unclear why so many
sites remain inaccessible.” [40]. A survey of webmasters at the time
2 Related Work
indicated that they generally would like to make their web pages
accessible but cited a number of reasons they do not: "lack of time, Our research builds upon (i) Assessing Web Accessibility, (ii) End-
lack of training, lack of managerial support, lack of client support, User Accessibility Repair, and (iii) Developer Tools for Accessibility.
inadequate software tools, and confusing accessibility guidelines."
Sixteen years later, Patel et al. reported remarkably similar results
in their 2020 survey of 77 technology professionals [58]. Few de-
velopers had received formal accessibility training, implementing 1 The source code for CodeA11y is available at https://github.com/peyajm29/codea11y/.
CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development CHI ’25, April 26–May 01, 2025, Yokohama, Japan

2.1 Assessing Web Accessibility 2.2 End-user Accessibility Repair


From the earliest attempts to set standards and guidelines, web In addition to detecting accessibility errors and measuring web ac-
accessibility has been shaped by a complex interplay of technical cessibility, signifcant research has focused on fxing these problems.
challenges, legal imperatives, and educational campaigns. Over Since end-users are often the frst to notice accessibility problems
the past 25 years, stakeholders have sought to improve digital in- and have a strong incentive to address them, systems have been
clusion by establishing foundational standards [13, 18], enforcing developed to help them report or fx these problems.
legal obligations [77, 90], and promoting a broader culture of ac- Collaborative, or social accessibility [76, 83], enabled these end-
cessibility awareness among developers [48, 57, 78]. Despite these user contributions to be scaled through crowd-sourcing. Access-
longstanding eforts, systemic accessibility issues persist. Accord- Monkey [10] and Accessibility Commons [37] were two examples
ing to the 2024 WebAIM Million report [87], 95.9% of the top one of repositories that store accessibility-related scripts and metadata,
million home pages contained detectable WCAG violations, aver- respectively. Other work has developed browser extensions that
aging nearly 57 errors per page. These errors take many forms: leverage crowd-sourced databases to automatically correct reading
low color contrast makes the interface difcult for individuals with order, alt-text, color contrast, and interaction-related issues [31, 75].
color defciency or low vision to read text; missing alternative text One drawback of collaborative accessibility approaches is that
leaves users relying on screen readers without crucial visual con- they cannot fx problems for an “unseen” web page on-demand, so
text; and unlabeled form inputs or empty links and buttons hinder many projects aim to automatically detect and improve interfaces
people who navigate with assistive technologies from completing without the need for an external source of fxes. A large body of re-
basic tasks. Together, these accessibility issues not only limit user search has focused on making specifc web media (e.g., images [26–
access to critical online resources such as healthcare, education, 28, 30, 41], design [43, 61, 65, 67], and videos [32, 59, 60, 63]) acces-
and employment but also result in signifcant legal risks and lost sible through a combination of machine learning (ML) and user-
opportunities for businesses to engage diverse audiences. Address- provided fxes. Other work has focused on applying more general
ing these pervasive issues requires systematic methods to identify, fxes across all websites.
measure, and prioritize accessibility barriers, which is the frst step Opportunity accessibility addressed a common accessibility prob-
toward achieving meaningful improvements. lem of most websites: by default, content is often hard to see for
Prior research has introduced methods blending automation and people with visual impairments, and many users, especially older
human evaluation to assess web accessibility. Hybrid approaches adults, do not know how to adjust or enable content zooming [7].
like SAMBA combine automated tools with expert reviews to mea- To this end, a browser script (oppaccess.js) was developed that
sure the severity and impact of barriers, enhancing evaluation reli- automatically adjusted the browser’s content zoom to maximally en-
ability [11]. Quantitative metrics, such as Failure Rate and Unifed large content without introducing adverse side-efects (e.g., content
Web Evaluation Methodology, support large-scale monitoring and overlap). While oppaccess.js primarily targeted zoom-related ac-
comparative analysis, enabling cost-efective insights [49, 85]. How- cessibility, recent work aimed to enable larger types of changes, by
ever, automated tools alone often detect less than half of WCAG using LLMs to modify the source code of web pages based on user
violations and generate false positives, emphasizing the need for hu- questions or directives [42].
man interpretation [25, 86]. Recent progress with large pretrained Several eforts have been focused on improving access to desktop
models like Large Language Models (LLMs) [5, 24] and Large Mul- and mobile applications, which present additional challenges due
timodal Models (LMMs) [6, 44] ofers a promising step forward, to the unavailability of app source code (e.g., HTML). Prefab is an
automating complex checks like non-text content evaluation and approach that allows graphical UIs to be modifed at runtime by
link purposes, achieving higher detection rates than traditional detecting existing UI widgets, then replacing them [22]. Interac-
tools [20, 45]. Yet, these large models face challenges, including tion Proxies used these runtime modifcation strategies to “repair”
dependence on training data, limited contextual judgment, and the Android apps by replacing inaccessible widgets with improved al-
inability to simulate real user experiences. These limitations under- ternatives [92, 93]. The widget detection strategies used by these
score the necessity of combining models with human oversight for systems previously relied on a combination of heuristics and sys-
reliable, user-centered evaluations [11, 20, 86]. tem metadata (e.g., the view hierarchy), which are incomplete or
Our work builds on these prior eforts and recent advancements missing in the accessible apps. To this end, ML has been employed
by leveraging the capabilities of large pretrained models while ad- to better localize [15] and repair UI elements [14, 62, 89, 91].
dressing their limitations through a developer-centric approach. In general, end-user solutions to repairing application accessibil-
CodeA11y integrates LLM-powered accessibility assessments, tai- ity are limited due to the lack of underlying code and knowledge
lored accessibility-aware system prompts, and a dedicated acces- of the semantics of the intended content.
sibility checker directly into GitHub Copilot—one of the most
widely used coding assistants. Unlike standalone evaluation tools,
2.3 Developer Tools for Accessibility
CodeA11y actively supports developers throughout the coding pro-
cess by reinforcing accessibility best practices, prompting critical Ultimately, the best solution for ensuring an accessible experience
manual validations, and embedding accessibility considerations lies with front-end developers. Many eforts have focused on build-
into existing workfows. ing adequate tooling and support to help developers with ensuring
that their UI code complies with accessibility standards.
Numerous automated accessibility testing tools have been cre-
ated to help developers identify accessibility issues in their code:
CHI ’25, April 26–May 01, 2025, Yokohama, Japan Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, and Jefrey P Bigham

i) static analysis tools, such as IBM Equal Access Accessibility description on the website’s code repository, as illustrated in Fig-
Checker [35] or Microsoft Accessibility Insights [51], scan the UI ure 2. Correctly performing the tasks required the consideration of
code’s compliance with predefned rules derived from accessibility several common web accessibility issues: color contrast, alternative
guidelines; and ii) dynamic or runtime accessibility scanners, such text, link labels, and form labeling [87]. The goal was to mirror the
as Chrome Devtools [29] or axe-Core Accessibility Engine [21], kinds of specifcations that developers often receive that do not
perform real-time testing on user interfaces to detect interaction explicitly mention accessibility.
issues not identifable from the code structure. While these tools Protocol. Our within-subjects user study had two conditions: (1) a
greatly reduce the manual efort required for accessibility testing, control condition where participants received no AI assistance, and
they are often criticized for their limited coverage. Thus, experts (2) a test condition where participants used GitHub Copilot. Each
often recommend manually testing with assistive technologies to participant was assigned to edit two distinct websites, each with two
uncover more complex interaction issues. Prior studies have created tasks. To counterbalance order efects, participants were evenly and
accessibility crawlers that either assist in developer testing [79, 80] randomly assigned to one of four user groups (Table 2), balanced by
or simulate how assistive technologies interact with UIs [72–74]. website order and control/test conditions. Further, to simulate real-
Similar to end-user accessibility repair, research has focused on world scenarios, we concealed the true purpose of the study from
generating fxes to remediate accessibility issues in the UI source participants. Participants were informed that the study was about
code. Initial attempts developed heuristic-based algorithms for fx- the usability of AI pair programmers in web development tasks
ing specifc issues, for instance, by replacing text or background but were not explicitly instructed to make their web components
color attributes [94]. More recent work has suggested that the code- accessible. This allowed us to observe how developers naturally
understanding capabilities of LLMs allow them to suggest more handle accessibility when it is not explicitly emphasized, refecting
targeted fxes. For example, a study demonstrated that prompting typical developer behavior. The research protocol was reviewed and
ChatGPT to fx identifed WCAG compliance issues in source code approved by the Institutional Review Board (IRB) at our university.
could automatically resolve a signifcant number of them [55]. Re-
Participants. We employed convenience sampling and snowball
searchers have sought to leverage this capability by employing a
sampling methods to recruit our participants. Our study was ad-
multi-agent LLM architecture to automatically identify and localize
vertised on university bulletin boards, social media, and shared
issues in source code and suggest potential code fxes [50].
communication channels (Twitter, Slack, and mailing groups). Our
While the approaches mentioned above focus on assessing UI
recruitment criteria stipulated that participants must be over 18
accessibility of already-authored code (i.e., fxing existing code),
years of age, live in the United States, and have self-assessed famil-
there is potential for more proactive approaches. For example, LLMs
iarity with web development. Further, we required the participants
are often used by developers to generate UI source code from natu-
to be physically present on our university campus for the duration
ral language descriptions or tab completions [1, 16, 33, 46, 71, 95],
of the study. To avoid priming during participant recruitment, we
but LLMs frequently produce inaccessible code by default [4, 52],
did not stipulate awareness of web accessibility as an eligibility
leading to inaccessible output when used by developers without
criterion. We chose university-specifc avenues for recruiting CS
sufcient awareness of accessibility knowledge. The primary focus
students, that refect a typical novice developer cohort.
of this paper is to design a more accessibility-aware coding assistant
Our study enlisted 16 participants (7 female and 9 male; ages
that both produces more accessible code without manual interven-
ranged from 22 to 29). Almost all of our participants were stu-
tion (e.g., specifc user prompting) and gradually enables developers
dents and had multiple years of coding experience. Most (n=10) had
to implement and improve accessibility of automatically-generated
multi-year industrial programming experience (e.g., full-time or
code through IDE UI modifcations (e.g., reminder notifcations).
intern experience in the company). Nearly all participants (except
one) had previously used AI coding assistants. GitHub Copilot and
3 Formative Study Methods OpenAI ChatGPT were the most popular (n=10). Others preferred
We conducted a formative study to assess the implications of AI Tabnine (n=6) and AWS CodeWhisperer (n=2). 12 participants had
coding assistants on web accessibility. We recruited novice develop- self-described substantial experience with HTML and CSS. 10 were
ers and tasked them with editing real-world websites using GitHub profcient in JavaScript and 7 were profcient in React.js. Despite this
Copilot. Our goal was to better understand how the use of Copilot expertise, the majority (14 participants) were unfamiliar with the
afected the accessibility of the user interface code they produced. Web Content Accessibility Guidelines (WCAG). Only 2 participants
knew about these guidelines, but they had not actively engaged in
Tasks. The participants completed tasks in the codebases for two
creating accessible web user interfaces or received formal training
open-source websites, Kubernetes [68] and BBC News [54]. Both
on the subject (details are provided in Table 3).
websites received over 2 million monthly visits worldwide [3] and
belong to diferent categories in the IAB Content Taxonomy [2]. Procedure. The study was conducted in person at our lab, where
These websites were developed using diferent web development participants performed programming tasks on a MacBook Pro lap-
frameworks (Hugo and React, respectively). To choose the four spe- top equipped with IntelliJ IDEA with the GitHub Copilot plugin
cifc tasks used in this formative study (Table 1), we sampled actual preinstalled. Before starting the study, we explained the study pro-
issues from each website’s repository. We chose issues for which cedure to the participants and took their informed consent. The
accessibility needed to be considered to complete them correctly, participants then watched a 5-minute instructional video explain-
but accessibility was not explicitly mentioned as a requirement in ing Copilot’s features, such as code autocompletion and the Copilot
either the task description given to participants or on the issue
CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development CHI ’25, April 26–May 01, 2025, Yokohama, Japan

Figure 2: Examples of task descriptions and visual references given to our participants: (a) Task 2 was to implement a new
contact form for subscribing to a mailing list, and (b) Task 3 was to add a ‘Top Stories’ section with linked articles. Successfully
completing them required proper labeling of the form elements and links, but this was not explicitly stated in the instructions.

Table 1: Our formative study included four tasks. Each task was not primarily about accessibility but included an accessibility
issue that was required to complete the task successfully. We adopt the scales of Unacceptable, Average and Good from
prior work [66]. Uninformative attributes are those that merely refect the feld, such as ‘alt’ as alt-text or ‘click here’ as link
description, without providing more meaningful or descriptive content [70]. Tasks are ranked from easy to difcult based on
the time taken and success rates observed in our pilot studies.

Task Difculty Accessibility Issue Evaluation Criteria


(T1) Button Visibility Easy Color Contrast Unacceptable: contrast ratio of < 4.5:1 for normal text and < 3:1 for large text
Average: WCAG level AA in default state (contrast ratio of >= 4.5:1 for normal text)
Good: WCAG level AA in all states (default, hover, active, focus, etc.)
(T2) Form Element Moderate Form Labeling Unacceptable: Missing form labels and keyboard navigation
Average: One of form labels and keyboard navigation
Good: Both form labeling and keyboard navigation
(T3) Add Section Moderate Link Labeling Unacceptable: Missing link descriptions
Average: Uninformative link descriptions
Good: Descriptive link descriptions
(T4) Enhance Image for SEO Difcult Adding alt-text Unacceptable: Missing or uninformative alt-text
Average: Added alt-text with < 3 required descriptors [47]
Good: Added alt-text with >= 3 out of 4 required descriptors

Table 2: Participant User Groups: Each group is assigned During the coding session, a researcher observed silently, ofer-
specifc order of tasks and testing conditions. Participants ing help with tasks, tool usage, or debugging only if participants
are evenly and randomly distributed among these groups. were stuck, and asked them to move on after 30 minutes, without
giving any accessibility-related hints. Based on our observations
# Order 1, Testing Condition Order 2, Testing Condition from pilot studies, we set time limits ranging from 15 to 30 min-
utes per task. Finally, after completing the coding tasks, they were
1 Kubernetes, With AI Assistance BBC News, No AI Assistance
2 Kubernetes, No AI Assistance BBC News, With AI Assistance
asked to participate in a 10-15 minute survey on their experience
3 BBC News, With AI Assistance Kubernetes, No AI Assistance in AI-assisted programming and web accessibility, development
4 BBC News, No AI Assistance Kubernetes, With AI Assistance expertise, and open-ended feedback. In the end, the participants
were compensated with a gift voucher worth 30 USD.
Data Collection and Analysis. We collected both quantitative
chat2 . Participants were assigned tasks related to two selected web- and qualitative data for a mixed-method analysis. For quantitative
sites, with a total of four tasks to complete in 90 minutes. They data, we used an IntelliJ IDEA plugin [23] that tracked user actions
were required to work on one website with and the other without — such as keyboard input (typing, backspace), IDE commands (copy,
GitHub Copilot. Further, they were allowed to access the web for paste, undo), and interactions with GitHub Copilot (accepting sug-
task exploration or code documentation through traditional search gestions, opening the Copilot Chat window) — and recorded their
engines like Google Search, but with generative results turned of. timestamps. Additionally, we employed the axe-Core Accessibility
2 https://www.youtube.com/watch?v=jXp5D5ZnxGM Engine 2 to gather accessibility violation metrics, including the type
CHI ’25, April 26–May 01, 2025, Yokohama, Japan Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, and Jefrey P Bigham

Table 3: The distribution of participants’ opinions on AI-powered programming tools and their awareness of web accessibility.
The percentages in the distribution column indicate the proportion of participants who either disagree (including both ‘strongly
disagree’ and ‘disagree’) or agree (including both ‘strongly agree’ and ‘agree’) with the provided statements.

Statement Distribution
“I trust the accuracy of AI programming tools.” 13% 25%
“I am profcient in web accessibility.” 75% 19%
“I am familiar with the web accessibility standards, such as WCAG 2.0.” 88% 12%
“I am familiar with ARIA roles, states, and properties.” 69% 25%
Strongly Disagree Disagree Neutral Agree Strongly Agree

and count of WCAG failures, for each code submission, a method SD = 1.72) of their task time typing in the GitHub Copilot chat
proven reliable in previous studies [56]. We also collected AI usage, window, while, they also accepted Copilot’s code auto-complete
programming languages and framework preferences, and expertise suggestions around 5.44 times (SD = 5.00) on average.
in web accessibility via a post-task survey. AI Usage and Prompting Strategies. Participants mainly used the
On the qualitative side, we captured the entire study sessions autocomplete feature only when they had a clear mental model of
through screen recordings, resulting in a total of 18.73 hours of the desired code structure and sought to accelerate the code typing
video data. We complemented this with observational notes taken process. In contrast, they heavily relied on the conversational inter-
during the sessions, documenting verbal comments made by partic- face for syntax inquiries, conceptual understanding, and the genera-
ipants. The participants’ interactions with Copilot Chat were also tion of code templates. We noticed that our participants wrote brief,
recorded for further analysis between prompts and the fnal code. task-oriented prompts that focused on immediate code solutions
The analysis of this data was carried out using open coding and or specifc interface modifcations, often disregarding broader ar-
thematic analysis [19]. Some themes that emerged were: ‘visual chitectural considerations. Their prompting style was iterative and
enhancement’, ‘recalling syntax’, ‘feature request’, and ‘code under- reactive, frequently requesting small incremental changes, fxes to
standing’. For accessibility evaluation, we manually inspected the previous outputs, or refnements to their vague prompts.
websites created during the study and evaluated their accessibil- Furthermore, none of the participants, including the two who
ity on a qualitative scale of “Unacceptable,” “Average,” and “Good” were familiar with web accessibility, prompted with accessibility
adopted from prior work [66]. The criteria for these evaluations in mind. Instead, our participants’ prompts centered around visual
were developed per best practices identifed in prior research pub- and functional attributes (e.g., ‘‘add a gray background to the sub-
lished in CHI and ASSETS, detailed further in Table 1. scription form” (P4) or “add a grey patch” (P1)). Consequently, the AI
assistant’s suggestions often failed to incorporate accessibility best
4 Formative Findings practices automatically. Occasionally, our participants prompted
Our formative study revealed that while existing AI coding assis- for enhancements that indirectly aligned with accessibility require-
tants can produce accessible code, developers still need accessibility ments, and Copilot provided relevant accessibility suggestions, as
expertise for efective use. Otherwise, (1) the accessibility intro- shown in Table 4. However, participants’ overreliance on AI as-
duced is likely to not be applied comprehensively, (2) the advanced sistance often led them to assume that Copilot’s code output was
features recommended by the assistant are unlikely to be imple- correct and complete. For instance, despite additional explanations
mented, (3) the accessibility errors introduced by the assistant are from Copilot advising manual adjustments to image descriptions,
unlikely to be caught. participants directly pasted the code, resulting in code submissions
Developer Behavior. In the study, participants spent slightly more with empty alt attributes.
time on tasks without Copilot, averaging 30.84 minutes (SD = 11.95) Implications for Web Accessibility. Our study showed mixed re-
compared to 28.94 minutes (SD = 8.57) with Copilot. Copilot also fa- sults of AI assistants in considering accessibility issues with no sta-
cilitated a greater volume of code edits (13.28 lines of code, SD = 9.02 tistically signifcant diference between the experimental conditions,
vs 10.41 lines of code, SD = 5.87), indicating that AI-assisted work- as shown in Figure 3. Notably, Copilot could (sporadically) generate
fows encouraged iterative coding practices. However, even with accessible components by utilizing patterns from other parts of a
Copilot, participants spent approximately 39.84% of their task time website. For example, it might automatically include proper labels
(11.91 minutes, SD = 8.00) away from the IDE, browsing the web or for form felds, such as <label for="email"> Email: </label>
checking the rendered HTML, highlighting the importance of tra- in a signup form. However, there were also instances where Copi-
ditional validation methods. The study also found fewer backspace lot inadvertently introduced new accessibility issues. For example,
key presses, an indicator of post-paste corrections, without Copilot when adding new button components with hover efects, it failed to
(average = 92.62, SD = 68.27) than with the AI assistant (average = ensure adequate contrast between the hover color and background.
104.50, SD = 91.91). Further, code pasting was slightly higher when Further, the efectiveness of AI assistants was limited by the
participants solely browsed the web for exploration, averaging 12.68 need for more sophisticated accessibility knowledge. Since our
times (SD = 8.09), compared to 11.43 times (SD = 5.35) with access
to Copilot Chat. Participants dedicated about 7.39% (2.14 minutes,
CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development CHI ’25, April 26–May 01, 2025, Yokohama, Japan

Table 4: The (partial) chat history revealed that directly copying Copilot’s code suggestions would be incomplete, as developers
overlooked the additional recommended steps.

User: “User Query”


Copilot: Generated Code
Associated Text Explanation
User 1: “I want to label the images or add relevant image descriptors”
Copilot: alt=imgAlt // Add this line
Remember to replace imageAlt, <...> with your actual image attributes. In this example, imageAlt is the alt-text for the image, and <...>. You
will need to replace these with the actual values.
User 2: “Can you optimise the images on this article page for search engine optimization. Refactor the code accordingly by labeling the
images or adding relevant image descriptions.”
Copilot: imgAlt=imageDescription // Add this line
Sure, to optimize images for SEO, you can add alt attributes to your img tags. The alt attribute provides alternative information for an image
if a user for some reason cannot view it (because of slow connection, an error in the src attribute, or if the user uses a screen reader).

G1: Integrate System Prompts for Accessibility Awareness. With-


out explicit prompting, the AI assistant rarely produced accessibility-
compliant code, refecting the accessibility issues prevalent in its
training data. However, it occasionally suggested accessibility fea-
tures when participants indirectly prompted them, demonstrating
its ability to recall accessibility practices from training data upon in-
struction. AI assistants should have a system prompt tuned towards
following accessibility guidelines by default, for consistent gen-
eration of accessibility-compliant code, even when developers do
not mention accessibility specifcally. Further, the system prompt
should also direct the assistant to suggest accessibility-focused
iterative refnements.
G2: Support Identifcation of Accessibility Issues. Due to their
unfamiliarity with accessibility standards, our participants were un-
able to identify compliance issues in the existing and modifed code.
Figure 3: Mean Accessibility Evaluation Scores by Tasks and
They primarily prompted changes to individual components (such
Copilot Usage: Higher scores indicate success.
as buttons and forms), hardly addressing broader page-level acces-
sibility concerns (such as heading structure or landmark regions).
AI assistants should not only automatically generate accessibility-
compliant code, but also provide real-time feedback to detect and
participants had limited awareness about these accessibility fea- resolve accessibility violations within the codebase. In addition,
tures, they would often ignore such suggestions by blindly ac- AI assistants and automated accessibility checkers should work in
cepting alt = "" // Add your text here or manually delet- tandem to ensure that incomplete or incorrect implementations of
the AI-suggested code are always fagged by the latter.
ing the <label> tag. Some errors, such as providing blank alt-texts
for informative images, were not even fagged by automated ac- G3: Encourage Developers to Complete AI-Generated Code. Our
cessibility checkers because they interpret the image as decorative observations revealed that accessibility implementation in AI-assisted
and consider this deliberate. This is particularly problematic as it coding workfows commonly required critical manual intervention
implies that AI assistance might increase the risk of accessibility to complete and validate AI-generated code. This involved replacing
oversights, allowing critical errors to go unnoticed and uncorrected. placeholder attributes, such as labels and alt-texts, with meaning-
ful values and verifying color contrast ratios. However, we found
that participants blindly copy-pasted code and proceeded further if
5 Design Requirements there were no apparent errors. This behavior of deferring thought
Our formative study identifed three limitations in novice devel- to suggestions has also been documented in previous work [53]. To
opers’ interactions with AI assistants: (1) failing to prompt for mitigate this, AI assistants should proactively remind developers to
accessibility considerations explicitly, (2) uncritically accepting in- ensure that all necessary accessibility features – such as contrast
complete code suggestions from Copilot, and (3) struggling to detect ratios or keyboard navigation support – are fully implemented and
potential accessibility issues in their code. These shortcomings in- verifed.
dicate possible directions to support accessibility through three
design goals (G1-G3):
CHI ’25, April 26–May 01, 2025, Yokohama, Japan Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, and Jefrey P Bigham

6 CodeA11y
Guided by the design goals identifed through our user study, we
built CodeA11y, a GitHub Copilot Extension for Visual Studio IDE.
In this section, we present the interactions that it supports and its
system architecture.
CodeA11y has three primary features (F1-F3, aligned to G1-
G3, respectively): (F1) it produces user interface code that better
complies with accessibility standards, (F2) it prompts the developer
to resolve existing accessibility errors in their website, and (F3) it
reminds the developer to complete any AI-generated code that
requires manual intervention. CodeA11y is integrated into Visual
Studio Code as a GitHub Copilot Extension3 , enabling CodeA11y
to act as a chat participant within the GitHub Copilot Chat window
panes. While we implemented this as an extension, it could be
integrated directly into Copilot in the future.
Multi-Agent Architecture. CodeA11y has three LLM agents (Fig-
ure 4): Responder Agent, Correction Agent, and Reminder Agent.
We provide their prompt instruction highlights in Table 5. These Figure 4: CodeA11y Architecture: Multi-agent workfow
agents facilitate each of the above features (F1-F3) as follows:
• Responder Agent (for F1): This agent generates relevant
User Interaction. Developers invoke6 CodeA11y in the GitHub
code suggestions based on the developer’s prompt. It as-
Copilot Chat window panes (includes Quick Chat and Chat View)
sumes that the developer is unfamiliar with accessibility
standards and automatically generates accessible code. The using @CodeA11y . When a developer prompts CodeA11y, an in-
prompt instruction for this agent is adapted from GitHub’s ternal chat_context state is established, storing the latest user
recommended user prompt for accessibility.4 prompts and agent responses. The get_relevant_context()
• Correction Agent (for F2): This agent parses through acces- function is called, which passes the source code and project con-
sibility error logs produced by an automated accessibility text to the Responder Agent. The agent generates code suggestions,
checker (axe DevTools Accessibility Linter5 ) to hint the de- accessibility explanations, and links to additional resources and
veloper at making additional accessibility fxes in the compo- updates chat_context . The get_log_context() function is
nent or page being currently discussed in the chat context.
called, which passes the accessibility linter logs to the Correction
• Reminder Agent (for F3): This agent reviews the Responder
Agent. This agent suggests additional fxes and displays the re-
Agent’s suggestions and identifes required manual steps
sponses in the chat pane. Lastly, the updated chat_context state
for completing their implementation. It accordingly sends
is forwarded to the Reminder Agent, which generates and sends
reminder notifcations to the developer through the Visual
reminder notifcations. Figure 5 illustrates a typical interaction be-
Studio IDE infrastructure.
tween a developer and CodeA11y, showing how it compares to
The agents are provided with several diferent sources of context: baseline assistants like GitHub Copilot.
• Code Context: the 100 lines of code centered around the
cursor position in the active fles. 7 User Evaluation
• Chat Context: the current active chat window interaction. We conducted a within-subjects user study with 20 new participants
• Accessibility Linter Logs: automated Axe DevTools Acces- to evaluate CodeA11y’s efectiveness in guiding novice developers
sibility Linter error logs, refreshed periodically. toward adhering to accessibility standards, as compared to Copilot.
• Project Context: code context from the README and index
fles, which contain information about the web framework 7.1 Methodology
that is being used, information about project structure, and We made the following revisions to our formative study proto-
other key confguration details. col (Section 3). First, the experimental conditions were updated as
follows: (1) the control condition involved using the baseline AI
Due to the constraints in the context window, we optimized our
assistant (GitHub Copilot), and (2) the test condition where the
prompts and fltered this context when it exceeded 4000 charac-
participants used CodeA11y. Second, we changed the post-task
ters. The agents use GPT-4o as the back-end model from the same
survey to a brief semi-structured interview to get more nuanced
OpenAI GPT family of models that powers GitHub Copilot.
insights about the usability of our system. We analyzed interview
responses to better understand the factors shaping participants’
3 https://docs.github.com/en/copilot/using-github-copilot/using-extensions-to- assistant preferences and their perceptions of any new coding prac-
integrate-external-tools-with-copilot-chat tices introduced during the study. Third, we used Visual Studio
4 https://github.blog/developer-skills/github/prompting-github-copilot-chat-to-
become-your-personal-ai-assistant-for-accessibility/ 6 In
the long term, the goal is for GitHub Copilot to invoke CodeA11y automatically
5 https://www.deque.com/axe/devtools/linter/ during frontend development tasks.
CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development CHI ’25, April 26–May 01, 2025, Yokohama, Japan

Table 5: Prompt instructions for the three LLM agents in CodeA11y

Agent Prompt Instruction Highlights

• I am unfamiliar with accessibility and need to write code that conforms with WCAG 2.1 level AA criteria.
• Be an accessibility coach that makes me account for all accessibility requirements.
Responder Agent • Use reputable sources such as w3.org, webaim.org and provide links and references for additional learning.
• Don’t give placeholder variables but tell me where to give meaningful values.
• Prioritise my current request and don’t mention accessibility if I give a generic request like "Hi".

• Review the accessibility checker log and provide feedback to fx errors relevant to current chat context.
Correction Agent • If a log error relevant to current chat context occurs, provide a code snippet to fx it.

• Is there an additional step required by the developer to meet accessibility standards after pasting code?
• Reminder should be single line. Be conservative in your response, if not needed, say "No reminders needed."
Reminder Agent • For example, remind the developer to replace the placeholder attributes with meaningful values or labels, or visually inspect
element for colour contrast when needed.

Figure 5: Contrasting responses for the same task across AI-assistants, showing diferences in workfows. Developers had access
to both the code and the rendered user interface. (a) and (d) represent conversations with the baseline assistant, and CodeA11y
respectively. (b) and (e) show the buttons generated by each assistant in their default state. (c) and (f) display the buttons when
hovered over, illustrating the diferences in button color contrast.

Code as the IDE interface (which had advanced AI updates since to the formative study, to evaluate CodeA11y’s performance on
the formative study – regarding both model performance and intro- the same UI development tasks. These participants were the same
duction of new features such as Inline Chat). Finally, we recruited 20 demographic as our formative participants (students; multiyear
new participants for this subsequent study, with no prior exposure programming experience; 6 female and 14 male; ages ranged from
CHI ’25, April 26–May 01, 2025, Yokohama, Japan Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, and Jefrey P Bigham

Table 6: The distribution of opinions on AI-powered programming tools and their awareness of web accessibility based on the
responses from participants in the evaluation study. The percentages in the distribution column indicate the proportion of
participants who either disagree (including ‘strongly disagree’, ‘disagree’ and ‘slightly disagree’) or agree (including ‘strongly
agree’, ‘agree’ and ‘slightly agree’) with the provided statements.

Statement Distribution
“I trust the accuracy of AI programming tools.” 15% 70%
“I am profcient in web accessibility.” 70% 25%
“I am familiar with the web accessibility standards, such as WCAG 2.0.” 80% 15%
“I am familiar with ARIA roles, states, and properties.” 85% 10%
Strongly Disagree Disagree Slightly Disagree Neutral Slightly Agree Agree Strongly Agree

22 to 30). Again, most participants were unfamiliar with the web ac- Developers’ Perspectives. Overall, participants reported no statis-
cessibility standards (Table 6), but most (90%) had experience using tically signifcant diference in satisfaction (� = 5.15, � = 1.75 vs.
AI programming tools. The IRB approved all our modifcations. � = 4.95, � = 1.67; � = 0.37, � = 0.36) and ease of use (� = 5.8,
To avoid biasing participants towards adhering to accessibility � = 1.47 vs. � = 5.4, � = 1.67; � = 0.8, � = 0.21) between CodeA11y
guidelines, we did not disclose the specifc purpose of the CodeA11y and Github Copilot, respectively, as illustrated in Table 7.
plugin. For the duration of the study, we renamed the assistant During the post-study interviews, participants provided addi-
“Codally” and described it as a general-purpose chat assistant for tional reasoning for their preferences. Most (n=16) participants did
website editing. We assumed the interface would be intuitive, sim- not have a specifc preference between the two assistants, which is
ilar to widely used assistants, and therefore briefed participants consistent with the conclusion of our statistical analysis. Others did
only on basic AI assistant usage (e.g., Copilot), deliberately with- indicate a preference (n=3) but provided reasoning that was based
holding explanations of error pop-ups to prevent infuencing their on the complexity of the task rather than assistant features, “I liked
behavior before the main study tasks. However, during the course the frst assistant (CodeA11y) better, maybe because of the tasks. The
of our study, we realized that VS Code was dismissing popup boxes second one (GitHub Copilot) required me to understand the code, and
created by our plugin more rapidly than expected - causing some the frst directly gave me the code. That’s the diference.” (P18)
participants to miss them. After 8 participants, we switched from We asked our participants if they were introduced to any new
foating popups to modals (which prevent the IDE’s auto-dismissal) coding practices by either of the assistants. To our surprise, only
due to a technical limitation. Both notifcation strategies do not 4 participants mentioned accessibility, demonstrating CodeA11y’s
require users to address errors, making them valid design choices. efectiveness in “silently” improving the accessibility of our partici-
In our baseline comparison, we aggregate data from all users and pants’ UI code. These participants noted that they had not these
include anecdotal observations of user behavior with each strat- considerations before. However, some mentioned either not paying
egy. We acknowledge that such UI design choices may introduce attention to them or subconsciously rejecting them, as they were
variability and plan to investigate this further in future work. primarily focused on completing the tasks, which they perceived
to be unrelated to accessibility:
7.2 Results
Here, we present the results of our subsequent evaluation study. “I did not fnd any diference (between the assistants).
When I was prompting CodeA11y, it was hinting at me
Accessibility Improvements. We implemented the accessibility to use alt texts, which was not happening in Copilot. It
assessments using the same measures outlined in our formative didn’t come to me by default, so that was good ... But I
study (Table 1). Notably, our participants demonstrated a marked don’t think I implemented that.” (P27)
improvement in generating accessible web components and re-
solving accessibility issues with CodeA11y (Figure 6). CodeA11y Still, they appreciated CodeA11y for emphasizing best practices
facilitated the automatic addition of form labels and ensured con- for accessibility. For instance, P27 continued: “I did see a few of the
trasting colors for button states, leading to statistically signifcant popups, and they did mention some interesting points like you need to
enhancements in accessibility outcomes. Specifcally, participants consider the color of the button when you add a new button because
performed better at adding form labels (� = 1.5, � = 0.85) com- if people are color blind, they might not be able to notice it.” Further,
pared to GitHub Copilot (� = 0.5, � = 0.85; � = 2.63, � < 0.05) P19, familiar with web accessibility but not profcient, realized
and in ensuring contrasting button colors (� = 1.3, � = 0.67 vs. that although he did not learn anything new about CodeA11y’s
� = 0.7, � = 0.82; � = 1.78, � < 0.05). We also observed improve- color contrast suggestion, he noticed a visible diference in user
ments in adding alt-texts with CodeA11y (� = 0.7, � = 0.95 vs. experience after accepting it. Our sole participant who claimed
� = 0.1, � = 0.32; � = 1.9, � < 0.05). Though we did not fnd any profciency in accessibility valued support for a specifc framework:
statistical improvements in labeling links (perhaps because GitHub
Copilot did a decent job at this task itself), all participants who “I am familiar with accessibility coding practices but
used CodeA11y successfully completed this task (� = 2, � = 0 vs. not in the React Native environment; I don’t know if
� = 1.7, � = 0.67; � = 1.9, � = 0.09). I would have needed that help in HTML, but I liked
CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development CHI ’25, April 26–May 01, 2025, Yokohama, Japan

Table 7: The distribution of participants’ opinions on GitHub Copilot and CodeA11y, as well as their ease of completing tasks
with these tools. The distribution column shows the count of responses from Strongly Disagree (1) to Strongly Agree (7).

Statement Distribution
“I am satisfed with the code suggestions provided by”:
GitHub Copilot 20% 75%
CodeA11y 15% 75%
“I found it easy to complete the coding tasks with”:
GitHub Copilot 15% 80%
CodeA11y 10% 90%
Strongly Disagree Disagree Slightly Disagree Neutral Slightly Agree Agree Strongly Agree

8 Discussion
This paper has explored how AI coding assistants currently con-
tribute to UI code that is accessible to people with disabilities. While
these tools ofer a new opportunity for achieving accessibility, we
have revealed the remaining challenges and showed how they could
be addressed with changes to the way the coding assistants operate.
Which comes frst: Adoption or Awareness?. Adoption is a peren-
nial challenge faced by most accessibility technologies, even when
the technology could lead to substantial improvements in user ex-
perience. One reason adoption is low is because awareness is low
– people who could beneft from access technology do not know
about it. For example, a survey found that only 10% of older adults
knew what the term “accessibility” meant and therefore did not
enable any useful settings [64, 88]. Similarly, developers beneft
Figure 6: Mean Accessibility Evaluation Scores by Tasks and in many ways from tools that improve the accessibility of their
AI Assistant: Higher scores indicate success. code (e.g., linters, scanners), resulting in better-designed applica-
tions and reaching more users. Unfortunately, many developers are
unaware of these tools or unwilling to adopt new practices that
require changing their original workfow. For example, while AI
assistants like Copilot are capable of generating accessible code, our
formative study found that developers were unaware or unwilling
that it tried to highlight accessibility practices in React to explicitly prompt it to do so.
Native.” (P21) One goal of our work was to investigate whether developers
could increase the adoption of accessibility technology and devel-
Other Observations. Participants frequently made minor edits to opment practices independently or in tandem with awareness. For
the AI-generated code for refning the visual appearance of web example, prior work [7] found that by “opportunistically” zooming
components. For instance, P18 remarked, “CodeA11y did the job into web pages and confguring settings, users could automatically
for me; I only had to change the property values.” However, while beneft from improved accessibility. Our motivation is similar: we
focusing on visual adjustments, participants occasionally removed aimed to improve code accessibility while introducing minimal
accessibility enhancements suggested by CodeA11y. Further, many changes to existing AI-assistant developer workfows i.e., GitHub
participants still overlooked the manual validation steps required Copilot. According to the Visual Studio Marketplace, GitHub Copi-
for implementing more advanced accessibility features. Participants lot has been installed over 20 million times (as of the time of writ-
appeared to consistently lack interest in the foating popups, so 50% ing), suggesting that many developers are already familiar with the
of participants using CodeA11y still added uninformative alt-texts. plugin’s interactions, tooling, and interface. Our results show that
We observed a slightly diferent pattern with the modal reminders. CodeA11y signifcantly improved code accessibility while main-
Our participants initially paid attention to the modal interface, but taining a similar (slightly improved) ease of use to Copilot. This
then began to just close them. Although performance across the suggests that if GitHub Copilot included our set of features or were
two reminder types is hard to assess objectively due to the small willing to use a similar plugin without requiring substantial devi-
sample size, we observed higher means for the alt-text and form la- ation from existing workfows, millions of developers could start
bel tasks for the modal reminders. As a whole, the reminders proved writing more accessible code immediately.
somewhat efective: none of the participants using CodeA11y sub-
mitted empty alt-texts, which meant at least automated accessibility
checkers would not consider the images decorative.
CHI ’25, April 26–May 01, 2025, Yokohama, Japan Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, and Jefrey P Bigham

AI-Assisted but Developer-Completed. Even when developers more proactively identify and address accessibility problems. Still,
adopt accessibility tools, additional expertise is required to max- prior research suggests that developers often tolerate false positives
imize their utility. This is especially true for AI assistants, which more readily than false negatives [39], reasoning that overly cau-
are incapable of generating entirely correct or accessible code. Our tious guidance from an assistant is less harmful than failing to fag
work ofers some insight into the manual efort needed to write genuine accessibility issues. Indeed, even standalone accessibility
accessible code. Both our formative and evaluation studies under- checkers, which are also known for producing false positives [34],
score the necessity for developers to manually intervene with AI- have been widely adopted due to their overall benefcial efect on
generated code to efectively implement accessibility features. A UI quality. Moreover, the occasional presence of false positives does
recurrent challenge in AI-assisted coding is the generation of in- not necessarily negate the value of employing such tools. By rais-
complete or boilerplate code, which often requires developers to ing awareness and prompting developers to consider accessibility
take additional steps for completion and validation. Our fndings from the outset, AI assistants can help cultivate a proactive mindset
reveal that novice developers tend to critically evaluate AI out- toward inclusive design. In this sense, the technology does not need
puts in areas they prioritize, such as visual enhancements, while to achieve perfect accuracy to have a net positive efect. As the
overlooking aspects they are less familiar with, like accessibility. underlying models and APIs improve, and assistants become better
One of the tensions of this work is that while we aim to increase integrated with real-world workfows, their precision and utility in
the adoption of AI-driven accessibility tools among users with little improving accessibility are likely to increase.
expertise or awareness, some degree of understanding is required Third, while our study demonstrates the potential of CodeA11y
to use these tools efectively. CodeA11y and other tools can employ to encourage developers to adopt accessibility practices, we ac-
strategies to scafold this interaction, e.g., asking a user “can you knowledge that the broader impact of AI coding assistants on long-
describe what’s in this image?” instead of asking them directly term learning and behavior changes (e.g., for accessibility aware-
for “alternative text.” However, developers ultimately need to be ness) remains underexplored in the current scope of the study.
willing to expend additional efort and manually implement the Research on how real-time AI tools can help developers internalize
more challenging aspects of this work. Thus, while our work sug- new practices, such as accessibility, or foster long-term behavioral
gests that it is possible to “silently” improve the accessibility of changes could have strengthened the case for CodeA11y’s instruc-
developer-written code, it is ultimately not a replacement for better tional components. For instance, prior work has shown that AI
accessibility awareness, development practices, and education. Nev- coding tools can enhance immediate task performance but may not
ertheless, CodeA11y could help gradually improve awareness by consistently lead to deeper learning or sustained skill retention [38].
slowly introducing and explaining accessibility concepts to users Similarly, studies on meta-cognitive demands in AI-assisted work-
after they have found benefts from using the tool. fows emphasize the importance of tools promoting refective learn-
Limitations & Future Work. We describe several limitations in ing and adaptive strategies, particularly as developers integrate
the current scope of the study and identify avenues for future work them into their daily practices [84]. Although our fndings suggest
to build upon our fndings: that CodeA11y has the potential to raise awareness of accessibility
First, the utility of the CodeA11y plugin was limited by the con- issues through direct integration with verifed accessibility checks,
straints placed by our target development environment (Visual further research is needed to understand whether such tools can
Studio Code). Because CodeA11y was implemented as a Copilot foster a lasting developer’s commitment to accessibility or simi-
plugin, we could only access a few APIs available to standard VS- lar best practices. Additionally, exploring how these tools impact
Code IDE plugins. The Copilot plugin infrastructure was limiting broader developer workfows, collaboration habits, and the ability
because it restricted the source code that could be passed to the to generalize learned behaviors across contexts would provide a
model (i.e., context window length). Our implementation contained more comprehensive view of their instructional value. By exam-
some mechanisms for heuristically determining the most relevant ining these dimensions, future studies could better elucidate the
fles but ultimately serves as a proof of concept of what would role of AI coding assistants in shaping not just productivity, but
be possible in the future, well-integrated version (e.g., built into also the culture of inclusive and responsible software development.
Copilot). These factors afected the code generation of our system. While CodeA11y focuses on improving accessibility, its approach
Second, although our user study provides statistical evidence could extend to other non-functional requirements, such as privacy
that CodeA11y helps developers write more accessible website code, and security. Investigating how specialized copilots could be seam-
we acknowledge certain limitations of AI coding assistants that lessly invoked within mainstream coding assistants for high-stakes
could afect their overall efectiveness and reliability. One well- scenarios—such as leveraging CodeA11y for front-end development
documented issue, particularly with proactive AI assistants [17], is tasks—represents a promising direction.
their potential to provide untimely or irrelevant guidance. For in- Finally, we see opportunities to iterate on and refne CodeA11y’s
stance, the models may occasionally suggest fxes for problems that design. Our conservative approach adhered closely to Copilot’s ex-
do not exist (i.e., false positives), such as recommending changes isting interface to minimize friction during adoption. Future work
to code that is already fully compliant with accessibility standards. could explore how developers respond to new features and interac-
While our study did not surface such occurrences—likely because tions, identifying areas where innovation could enhance usability
CodeA11y’s suggestions were tied directly to verifed issues from and functionality without compromising adoption. By address-
an accessibility checker, rather than the tool identifying issues on ing current limitations and exploring broader applications, tools
its own—this risk becomes more salient as AI assistants evolve to like CodeA11y can refne how developers approach accessibility
and other critical non-functional requirements. Beyond technical
CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development CHI ’25, April 26–May 01, 2025, Yokohama, Japan

improvements, such advancements hold the potential to redefne [16] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde
AI’s role in shaping more inclusive, secure, and efcient coding De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph,
Greg Brockman, et al. 2021. Evaluating large language models trained on code.
practices. arXiv preprint arXiv:2107.03374 (2021).
[17] Valerie Chen, Alan Zhu, Sebastian Zhao, Hussein Mozannar, David Sontag, and
Ameet Talwalkar. 2024. Need Help? Designing Proactive AI Assistants for Pro-
9 Conclusion gramming. arXiv preprint arXiv:2410.04596 (2024).
Our work bridges decades of accessibility eforts with AI coding [18] Wendy Chisholm, Gregg Vanderheiden, and Ian Jacobs. 2001. Web content
accessibility guidelines 1.0. Interactions 8, 4 (2001), 35–54.
assistants, ofering a novel solution to persistent web accessibility [19] Victoria Clarke and Virginia Braun. 2017. Thematic analysis. The journal of
challenges. Through a formative user study, we identify shortcom- positive psychology 12, 3 (2017), 297–298.
ings in how current AI-assisted development workfows handle [20] Giovanni Delnevo, Manuel Andruccioli, and Silvia Mirri. 2024. On the Interaction
with Large Language Models for Web Accessibility: Implications and Challenges.
accessibility implementation. Accordingly, we develop CodeA11y, a In 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC).
GitHub Copilot Extension, and demonstrate that novice developers IEEE, 1–6.
using it are signifcantly more likely to create accessible interfaces. [21] Deque Systems. 2024. axe: Accessibility Testing Tools and Software. https:
//www.deque.com/axe/ Accessed: 2024-12-10.
By focusing on integrating accessibility improvements seamlessly [22] Morgan Dixon and James Fogarty. 2010. Prefab: implementing advanced behav-
into everyday development workfows, this work marks a frst step iors using pixel-based reverse engineering of interface structure. In Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems. 1525–1534.
toward fostering accessibility-conscious practices in human-AI col- [23] dkandalov. 2024. activity-tracker. https://github.com/dkandalov/activity-tracker.
laborative UI development. [24] Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad
Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan,
et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024).
References [25] André P Freire, Renata PM Fortes, Marcelo AS Turine, and Debora MB Paiva. 2008.
[1] 2024. GitHub Copilot. https://github.com/features/copilot. [Accessed: 2024-04- An evaluation of web accessibility metrics based on their attributes. In Proceedings
22]. of the 26th annual ACM international conference on Design of communication. 73–
[2] 2024. IAB Website Categories. https://docs.webshrinker.com/v3/iab-website- 80.
categories.html#iab-categories. Accessed: 2024-04-22. [26] Cole Gleason, Amy Pavel, Himalini Gururaj, Kris Kitani, and Jefrey Bigham. 2020.
[3] 2024. SimilarWeb - Website Trafc & Market Intelligence. http://similarweb.com. Making GIFs Accessible. In Proceedings of the 22nd International ACM SIGACCESS
Accessed: 2024-04-22. Conference on Computers and Accessibility. 1–10.
[4] Wajdi Aljedaani, Abdulrahman Habib, Ahmed Aljohani, Marcelo Eler, and Yunhe [27] Cole Gleason, Amy Pavel, Xingyu Liu, Patrick Carrington, Lydia B Chilton,
Feng. 2024. Does ChatGPT Generate Accessible Code? Investigating Accessibility and Jefrey P Bigham. 2019. Making memes accessible. In Proceedings of the
Challenges in LLM-Generated Source Code. In Proceedings of the 21st International 21st International ACM SIGACCESS Conference on Computers and Accessibility.
Web for All Conference (Singapore, Singapore) (W4A ’24). Association for Com- 367–376.
puting Machinery, New York, NY, USA, 165–176. doi:10.1145/3677846.3677854 [28] Cole Gleason, Amy Pavel, Emma McCamey, Christina Low, Patrick Carrington,
[5] Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Kris M. Kitani, and Jefrey P. Bigham. 2020. Twitter A11y: A Browser Extension to
Wenbin Ge, Yu Han, Fei Huang, et al. 2023. Qwen technical report. arXiv preprint Make Twitter Images Accessible. In Proceedings of the 2020 CHI Conference on Hu-
arXiv:2309.16609 (2023). man Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for
[6] Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3313831.3376728
Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-vl: A frontier large vision- [29] Google. 2024. Chrome DevTools Documentation. https://developer.chrome.com/
language model with versatile abilities. arXiv preprint arXiv:2308.12966 (2023). docs/devtools Accessed: 2024-12-10.
[7] Jefrey P Bigham. 2014. Making the web easier to see with opportunistic accessi- [30] Darren Guinness, Edward Cutrell, and Meredith Ringel Morris. 2018. Caption
bility improvement. In Proceedings of the 27th annual ACM symposium on User crawler: Enabling reusable alternative text descriptions using reverse image
interface software and technology. 117–122. search. In Proceedings of the 2018 chi conference on human factors in computing
[8] Jefrey P Bigham, Jeremy T Brudvik, and Bernie Zhang. 2010. Accessibility systems. 1–11.
by demonstration: enabling end users to guide developers to web accessibility [31] Yun Huang, Brian Dobreski, Bijay Bhaskar Deo, Jiahang Xin, Natã Miccael Bar-
solutions. In Proceedings of the 12th international ACM SIGACCESS conference on bosa, Yang Wang, and Jefrey P Bigham. 2015. CAN: Composable accessibility
Computers and accessibility. 35–42. infrastructure via data-driven crowdsourcing. In Proceedings of the 12th Interna-
[9] Jefrey P. Bigham, Ryan S. Kaminsky, Richard E. Ladner, Oscar M. Danielsson, tional Web for All Conference. 1–10.
and Gordon L. Hempton. 2006. WebInSight: making web images accessible. In [32] Mina Huh, Saelyne Yang, Yi-Hao Peng, Xiang’Anthony’ Chen, Young-Ho Kim,
Proceedings of the 8th International ACM SIGACCESS Conference on Computers and Amy Pavel. 2023. AVscript: Accessible Video Editing with Audio-Visual
and Accessibility (Portland, Oregon, USA) (Assets ’06). Association for Computing Scripts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing
Machinery, New York, NY, USA, 181–188. doi:10.1145/1168987.1169018 Systems. 1–17.
[10] Jefrey P Bigham and Richard E Ladner. 2007. Accessmonkey: a collaborative [33] Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu
scripting framework for web users and developers. In Proceedings of the 2007 Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. 2024. Qwen2. 5-coder technical
international cross-disciplinary conference on Web accessibility (W4A). 25–34. report. arXiv preprint arXiv:2409.12186 (2024).
[11] Giorgio Brajnik and Rafaella Lomuscio. 2007. SAMBA: a semi-automatic method [34] Syed Fatiul Huq, Abdulaziz Alshayban, Ziyao He, and Sam Malek. 2023. #
for measuring barriers of accessibility. In Proceedings of the 9th International ACM A11yDev: Understanding Contemporary Software Accessibility Practices from
SIGACCESS Conference on Computers and Accessibility. 43–50. Twitter Conversations. In Proceedings of the 2023 CHI Conference on Human
[12] Jeremy T. Brudvik, Jefrey P. Bigham, Anna C. Cavender, and Richard E. Lad- Factors in Computing Systems. 1–18.
ner. 2008. Hunting for headings: sighted labeling vs. automatic classifca- [35] IBM. 2024. IBM Equal Access Toolkit. https://www.ibm.com/able/toolkit Ac-
tion of headings. In Proceedings of the 10th International ACM SIGACCESS cessed: 2024-12-10.
Conference on Computers and Accessibility (Halifax, Nova Scotia, Canada) (As- [36] Muhammad Asiful Islam, Yevgen Borodin, and I. V. Ramakrishnan. 2010. Mixture
sets ’08). Association for Computing Machinery, New York, NY, USA, 201–208. model based label association techniques for web accessibility. In Proceedings
doi:10.1145/1414471.1414508 of the 23nd Annual ACM Symposium on User Interface Software and Technology
[13] Ben Caldwell, Michael Cooper, Loretta Guarino Reid, Gregg Vanderheiden, Wendy (New York, New York, USA) (UIST ’10). Association for Computing Machinery,
Chisholm, John Slatin, and Jason White. 2008. Web content accessibility guide- New York, NY, USA, 67–76. doi:10.1145/1866029.1866041
lines (WCAG) 2.0. WWW Consortium (W3C) 290, 1-34 (2008), 5–12. [37] Shinya Kawanaka, Yevgen Borodin, Jefrey P Bigham, Darren Lunn, Hironobu
[14] Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhu, Guo- Takagi, and Chieko Asakawa. 2008. Accessibility commons: a metadata infrastruc-
qiang Li, and Jinshui Wang. 2020. Unblind your apps: Predicting natural-language ture for web accessibility. In Proceedings of the 10th international ACM SIGACCESS
labels for mobile gui components by deep learning. In Proceedings of the ACM/IEEE conference on Computers and accessibility. 153–160.
42nd international conference on software engineering. 322–334. [38] Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J Ericson, David
[15] Jieshan Chen, Mulong Xie, Zhenchang Xing, Chunyang Chen, Xiwei Xu, Liming Weintrop, and Tovi Grossman. 2023. Studying the efect of AI code generators
Zhu, and Guoqiang Li. 2020. Object detection for graphical user interface: Old on supporting novice learners in introductory programming. In Proceedings of
fashioned or deep learning or a combination?. In proceedings of the 28th ACM the 2023 CHI Conference on Human Factors in Computing Systems. 1–23.
joint meeting on European Software Engineering Conference and Symposium on [39] Rafal Kocielnik, Saleema Amershi, and Paul N Bennett. 2019. Will you accept an
the Foundations of Software Engineering. 1202–1214. imperfect ai? exploring designs for adjusting end-user expectations of ai systems.
CHI ’25, April 26–May 01, 2025, Yokohama, Japan Peya Mowar, Yi-Hao Peng, Jason Wu, Aaron Steinfeld, and Jefrey P Bigham

In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. [63] Yi-Hao Peng, JiWoong Jang, Jefrey P Bigham, and Amy Pavel. 2021. Say it all:
1–14. Feedback for improving non-visual presentation accessibility. In Proceedings of
[40] Jonathan Lazar, Alfreda Dudley-Sponaugle, and Kisha-Dawn Greenidge. 2004. the 2021 CHI Conference on Human Factors in Computing Systems. 1–12.
Improving web accessibility: a study of webmaster perceptions. Computers in [64] Yi-Hao Peng, Muh-Tarng Lin, Yi Chen, TzuChuan Chen, Pin Sung Ku, Paul Taele,
human behavior 20, 2 (2004), 269–288. Chin Guan Lim, and Mike Y Chen. 2019. Personaltouch: Improving touchscreen
[41] Jaewook Lee, Yi-Hao Peng, Jaylin Herskovitz, and Anhong Guo. 2021. Image usability by personalizing accessibility settings based on individual user’s touch-
Explorer: Multi-layered touch exploration to make images accessible. In Pro- screen interaction. In Proceedings of the 2019 CHI Conference on Human Factors in
ceedings of the 23rd International ACM SIGACCESS Conference on Computers and Computing Systems. 1–11.
Accessibility. 1–4. [65] Yi-Hao Peng, Jason Wu, Jefrey Bigham, and Amy Pavel. 2022. Difscriber: Describ-
[42] Amanda Li, Jason Wu, and Jefrey P Bigham. 2023. Using llms to customize the ing Visual Design Changes to Support Mixed-Ability Collaborative Presentation
ui of webpages. In Adjunct Proceedings of the 36th Annual ACM Symposium on Authoring. In Proceedings of the 35th Annual ACM Symposium on User Interface
User Interface Software and Technology. 1–3. Software and Technology. 1–13.
[43] Jingyi Li, Son Kim, Joshua A Miele, Maneesh Agrawala, and Sean Follmer. 2019. [66] Athira Pillai, Kristen Shinohara, and Garreth W Tigwell. 2022. Website builders
Editing spatial layouts through tactile templates for people with visual impair- still contribute to inaccessible web design. In Proceedings of the 24th International
ments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing ACM SIGACCESS Conference on Computers and Accessibility. 1–4.
Systems. 1–11. [67] Venkatesh Potluri, Tadashi Grindeland, Jon E Froehlich, and Jennifer Mankof.
[44] Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2024. Visual instruc- 2019. Ai-assisted ui design for blind and low-vision creators. In the ASSETS’19
tion tuning. Advances in neural information processing systems 36 (2024). Workshop: AI Fairness for People with Disabilities.
[45] Juan-Miguel López-Gil and Juanan Pereira. 2024. Turning manual web accessibil- [68] Kubernetes Project. 2024. Kubernetes. https://kubernetes.io/
ity success criteria into automatic: an LLM-based approach. Universal Access in [69] Katharina Reinecke, David R. Flatla, and Christopher Brooks. 2016. Enabling
the Information Society (2024), 1–16. Designers to Foresee Which Colors Users Cannot See. In Proceedings of the 2016
[46] Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy- CHI Conference on Human Factors in Computing Systems (San Jose, California,
Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA,
et al. 2024. Starcoder 2 and the stack v2: The next generation. arXiv preprint 2693–2704. doi:10.1145/2858036.2858077
arXiv:2402.19173 (2024). [70] Anne Spencer Ross, Xiaoyi Zhang, James Fogarty, and Jacob O Wobbrock. 2018.
[47] Kelly Mack, Edward Cutrell, Bongshin Lee, and Meredith Ringel Morris. 2021. Examining image-based button labeling for accessibility in Android apps through
Designing tools for high-quality alt text authoring. In Proceedings of the 23rd large-scale analysis. In Proceedings of the 20th International ACM SIGACCESS
International ACM SIGACCESS Conference on Computers and Accessibility. 1–14. Conference on Computers and Accessibility. 119–130.
[48] Lilu Martin, Catherine Baker, Kristen Shinohara, and Yasmine N Elglaly. 2022. [71] Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiao-
The Landscape of Accessibility Skill Set in the Software Industry Positions. In qing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al. 2023.
Proceedings of the 24th International ACM SIGACCESS Conference on Computers Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950
and Accessibility. 1–4. (2023).
[49] Beatriz Martins and Carlos Duarte. 2024. Large-scale study of web accessibility [72] Navid Salehnamadi, Abdulaziz Alshayban, Jun-Wei Lin, Iftekhar Ahmed, Stacy
metrics. Universal Access in the Information Society 23, 1 (2024), 411–434. Branham, and Sam Malek. 2021. Latte: Use-Case and Assistive-Service Driven
[50] Forough Mehralian, Titus Barik, Jef Nichols, and Amanda Swearngin. 2024. Automated Accessibility Testing Framework for Android. In Proceedings of the
Automated Code Fix Suggestions for Accessibility Issues in Mobile Apps. arXiv 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan)
preprint arXiv:2408.03827 (2024). (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article
[51] Microsoft. 2024. Accessibility Insights. https://accessibilityinsights.io Accessed: 274, 11 pages. doi:10.1145/3411764.3445455
2024-12-10. [73] Navid Salehnamadi, Ziyao He, and Sam Malek. 2023. Assistive-Technology Aided
[52] Peya Mowar, Yi-Hao Peng, Aaron Steinfeld, and Jefrey P Bigham. 2024. Tab Manual Accessibility Testing in Mobile Apps, Powered by Record-and-Replay. In
to Autocomplete: The Efects of AI Coding Assistants on Web Accessibility. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Proceedings of the 26th International ACM SIGACCESS Conference on Computers (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York,
and Accessibility. 1–6. NY, USA, Article 73, 20 pages. doi:10.1145/3544548.3580679
[53] Hussein Mozannar, Gagan Bansal, Adam Fourney, and Eric Horvitz. 2024. Reading [74] Navid Salehnamadi, Forough Mehralian, and Sam Malek. 2023. Groundhog:
between the lines: Modeling user behavior and costs in AI-assisted programming. An Automated Accessibility Crawler for Mobile Apps. In Proceedings of the 37th
In Proceedings of the CHI Conference on Human Factors in Computing Systems. IEEE/ACM International Conference on Automated Software Engineering (Rochester,
1–16. MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA,
[54] BBC News. 2024. BBC Home - Breaking News, World News, US News, Sports ... Article 50, 12 pages. doi:10.1145/3551349.3556905
https://www.bbc.com/ [75] Daisuke Sato, Masatomo Kobayashi, Hironobu Takagi, and Chieko Asakawa. 2009.
[55] Achraf Othman, Amira Dhouib, and Aljazi Nasser Al Jabor. 2023. Fostering What’s next? a visual editor for correcting reading order. In Human-Computer
websites accessibility: A case study on the use of the Large Language Models Interaction–INTERACT 2009: 12th IFIP TC 13 International Conference, Uppsala,
ChatGPT for automatic remediation. In Proceedings of the 16th International Sweden, August 24-28, 2009, Proceedings, Part I 12. Springer, 364–377.
Conference on PErvasive Technologies Related to Assistive Environments. 707–713. [76] Daisuke Sato, Masatomo Kobayashi, Hironobu Takagi, and Chieko Asakawa.
[56] Luís P. Carvalho, Tiago Guerreiro, Shaun Lawson, and Kyle Montague. 2023. 2010. Social accessibility: the challenge of improving web accessibility through
Towards real-time and large-scale web accessbility. In Proceedings of the 25th collaboration. In Proceedings of the 2010 International Cross Disciplinary Conference
International ACM SIGACCESS Conference on Computers and Accessibility. 1–9. on Web Accessibility (W4A). 1–2.
[57] Maulishree Pandey and Tao Dong. 2023. Blending Accessibility in UI Framework [77] Brian Sierkowski. 2002. Achieving web accessibility. In Proceedings of the 30th
Documentation to Build Awareness. In Proceedings of the 25th International ACM annual ACM SIGUCCS conference on User services. 288–291.
SIGACCESS Conference on Computers and Accessibility. 1–12. [78] David Sloan, Andy Heath, Fraser Hamilton, Brian Kelly, Helen Petrie, and Lawrie
[58] Rohan Patel, Pedro Breton, Catherine M. Baker, Yasmine N. El-Glaly, and Kristen Phipps. 2006. Contextual web accessibility-maximizing the beneft of accessibility
Shinohara. 2020. Why Software is Not Accessible: Technology Professionals’ Per- guidelines. In Proceedings of the 2006 international cross-disciplinary workshop
spectives and Challenges. In Extended Abstracts of the 2020 CHI Conference on Hu- on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
man Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association 121–131.
for Computing Machinery, New York, NY, USA, 1–9. doi:10.1145/3334480.3383103 [79] Amanda Swearngin, Jason Wu, Xiaoyi Zhang, Esteban Gomez, Jen Coughenour,
[59] Amy Pavel, Gabriel Reyes, and Jefrey P Bigham. 2020. Rescribe: Authoring and Rachel Stukenborg, Bhavya Garg, Greg Hughes, Adriana Hilliard, Jefrey P
automatically editing audio descriptions. In Proceedings of the 33rd Annual ACM Bigham, et al. 2024. Towards Automated Accessibility Report Generation for
Symposium on User Interface Software and Technology. 747–759. Mobile Apps. ACM Transactions on Computer-Human Interaction 31, 4 (2024),
[60] Yi-Hao Peng, Jefrey P Bigham, and Amy Pavel. 2021. Slidecho: Flexible non- 1–44.
visual exploration of presentation videos. In Proceedings of the 23rd International [80] Maryam Taeb, Amanda Swearngin, Eldon Schoop, Ruijia Cheng, Yue Jiang, and
ACM SIGACCESS Conference on Computers and Accessibility. 1–12. Jefrey Nichols. 2024. Axnav: Replaying accessibility tests from natural language.
[61] Yi-Hao Peng, Peggy Chi, Anjuli Kannan, Meredith Ringel Morris, and Irfan Essa. In Proceedings of the CHI Conference on Human Factors in Computing Systems.
2023. Slide Gestalt: Automatic Structure Extraction in Slide Decks for Non-Visual 1–16.
Access. In Proceedings of the 2023 CHI Conference on Human Factors in Computing [81] Hironobu Takagi, Chieko Asakawa, Kentarou Fukuda, and Junji Maeda. 2003.
Systems. 1–14. Accessibility designer: visualizing usability for the blind. SIGACCESS Access.
[62] Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Xin Yue Li, Jefrey P Bigham, and Comput. 77–78 (sep 2003), 177–184. doi:10.1145/1029014.1028662
Amy Pavel. 2025. DreamStruct: Understanding Slides and User Interfaces via [82] Hironobu Takagi, Chieko Asakawa, Kentarou Fukuda, and Junji Maeda. 2003.
Synthetic Data Generation. In European Conference on Computer Vision. Springer, Accessibility designer: visualizing usability for the blind. ACM SIGACCESS
466–485. accessibility and computing 77-78 (2003), 177–184.
CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development CHI ’25, April 26–May 01, 2025, Yokohama, Japan

[83] Hironobu Takagi, Shinya Kawanaka, Masatomo Kobayashi, Daisuke Sato, and [90] Yeliz Yesilada, Giorgio Brajnik, Markel Vigo, and Simon Harper. 2012. Under-
Chieko Asakawa. 2009. Collaborative web accessibility improvement: challenges standing web accessibility and its drivers. In Proceedings of the international
and possibilities. In Proceedings of the 11th international ACM SIGACCESS confer- cross-disciplinary conference on web accessibility. 1–9.
ence on Computers and accessibility. 195–202. [91] Xiaoyi Zhang, Lilian De Greef, Amanda Swearngin, Samuel White, Kyle Murray,
[84] Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Lisa Yu, Qi Shan, Jefrey Nichols, Jason Wu, Chris Fleizach, et al. 2021. Screen
Sarkar, Abigail Sellen, and Sean Rintel. 2024. The metacognitive demands and recognition: Creating accessibility metadata for mobile applications from pixels.
opportunities of generative AI. In Proceedings of the CHI Conference on Human In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
Factors in Computing Systems. 1–24. 1–15.
[85] Markel Vigo, Myriam Arrue, Giorgio Brajnik, Rafaella Lomuscio, and Julio [92] Xiaoyi Zhang, Anne Spencer Ross, Anat Caspi, James Fogarty, and Jacob O
Abascal. 2007. Quantitative metrics for measuring web accessibility. In Proceedings Wobbrock. 2017. Interaction proxies for runtime repair and enhancement of
of the 2007 international cross-disciplinary conference on Web accessibility (W4A). mobile application accessibility. In Proceedings of the 2017 CHI conference on
99–107. human factors in computing systems. 6024–6037.
[86] Markel Vigo, Justin Brown, and Vivienne Conway. 2013. Benchmarking web [93] Xiaoyi Zhang, Anne Spencer Ross, and James Fogarty. 2018. Robust annotation of
accessibility evaluation tools: measuring the harm of sole reliance on automated mobile application interfaces in methods for accessibility repair and enhancement.
tests. In Proceedings of the 10th international cross-disciplinary conference on web In Proceedings of the 31st Annual ACM Symposium on User Interface Software and
accessibility. 1–10. Technology. 609–621.
[87] WebAIM. 2024. The WebAIM Million - The 2024 report on the accessibility of [94] Yuxin Zhang, Sen Chen, Lingling Fan, Chunyang Chen, and Xiaohong Li. 2023.
the top 1,000,000 home pages. https://webaim.org/projects/million/. Accessed: Automated and Context-Aware Repair of Color-Related Accessibility Issues for
2024-04-22. Android Apps. In Proceedings of the 31st ACM Joint European Software Engineer-
[88] Jason Wu, Gabriel Reyes, Sam C White, Xiaoyi Zhang, and Jefrey P Bigham. ing Conference and Symposium on the Foundations of Software Engineering (San
2021. When can accessibility help? An exploration of accessibility feature recom- Francisco, CA, USA) (ESEC/FSE 2023). Association for Computing Machinery,
mendation on mobile devices. In Proceedings of the 18th international web for all New York, NY, USA, 1255–1267. doi:10.1145/3611643.3616329
conference. 1–12. [95] Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan
[89] Jason Wu, Siyan Wang, Siman Shen, Yi-Hao Peng, Jefrey Nichols, and Jefrey P Wang, Lei Shen, Andi Wang, Yang Li, et al. 2023. Codegeex: A pre-trained model
Bigham. 2023. Webui: A dataset for enhancing visual ui understanding with for code generation with multilingual evaluations on humaneval-x. arXiv preprint
web semantics. In Proceedings of the 2023 CHI Conference on Human Factors in arXiv:2303.17568 (2023).
Computing Systems. 1–14.

You might also like