0% found this document useful (0 votes)
55 views100 pages

Cacm June 25

The document features the June 2025 issue of Communications of the ACM, highlighting the work of A.M. Turing Award recipients Richard S. Sutton and Andrew G. Barto. It discusses innovative low-maintenance interactive wireless stickers for IoT applications, contrasting them with traditional bulky devices. The issue also covers various topics in AI, technology, and computing practices, emphasizing the importance of emerging trends and research in the field.

Uploaded by

Hamza Waqas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views100 pages

Cacm June 25

The document features the June 2025 issue of Communications of the ACM, highlighting the work of A.M. Turing Award recipients Richard S. Sutton and Andrew G. Barto. It discusses innovative low-maintenance interactive wireless stickers for IoT applications, contrasting them with traditional bulky devices. The issue also covers various topics in AI, technology, and computing practices, emphasizing the importance of emerging trends and research in the field.

Uploaded by

Hamza Waqas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

COMMUNICATIONS

ACM
CACM.ACM.ORG OF THE 06/2025 VOL.68 NO.06

Richard S. Sutton and


Andrew G. Barto
Recipients of ACM’s
A.M. Turing Award

Association for
Computing Machinery
ACM BOOKS
Collection III

Sustainable Interactive
Wireless Stickers
From Materials to Devices to Applications

Nivedita Arora

ISBN: 979-8-4007-1374-3
DOI: 10.1145/3705572

Today’s Internet of Things (IoT) devices are bulky, expensive,


require battery maintenance, and involve costly installation.
In contrast, the interactive stickers introduced in this
monograph are low maintenance, inexpensive, and easy to
deploy. Focusing on power, form factor, and cost as system
design parameters, the author describes stickers that
have simple circuitry and can sustain themselves while
wirelessly communicating and responding to various human
interactions.

http://books.acm.org
COMMUNICATIONS OF THE ACM

News Opinion Opinion

8 Turing Profile
A Rewarding Line of Work
Andrew G. Barto and Richard S. Sutton
and the path that led to the 2024
ACM A.M. Turing Award.
By Neil Savage

Watch Barto and Sutton


discuss their work
in the exclusive
Communications video.
https://cacm.acm.org/
videos/turing-award-
sutton-barto

11 The Collapse of GPT


20 26
Will future artificial intelligence
systems perform increasingly
poorly due to AI-generated material 5 Cerf’s Up 26 Kode Vicious
in their training data? The Revenge of the Bots Analyzing Krazy Kode
By Neil Savage By Vinton G. Cerf Accounting for the emotional state
of the person who wrote that code.
14 AI and Art 6 BLOG@CACM By George V. Neville-Neil
If artificial intelligence is used Accept the Consequences
to produce artworks, should it Computer science is based on the 28 Opinion
be considered an artist? cynical acceptance of a mismatch What Is Programming?
By Esther Shein between theory and practice, ... and what is programming in
according to Micah D. Beck. the age of artificial intelligence?
17 Bringing AI to the Edge By Sebastian Nicolajsen
The drive to bring the processing 20 Inside Risks and Claus Brabrand
of data closer to its users. Computer Science and the Law
By Gregory Mone Making a case for stronger influence 31 Opinion
and overlap of technology and law. Thoughts about Some
By Steven M. Bellovin Surprising AI-Era Technology
Last Byte Readiness Findings
23 The Profession of IT Nine steps can help spotlight,
96 Q&A In Large Language Models We Trust? clarify, and address risky
Developing the Foundations Can large language models digital governance gaps.
of Reinforcment Learning be trusted? Not likely. By Stephen J. Andriole
2024 ACM A.M. Turing Award Artificial neural networks and Noah P. Barsky
IMAGES BY AND RIJ BORYS ASSOCIAT ES, USING SH UT TERSTOCK.COM

recipients Andrew G. Barto and offer a much better option.


Richard S. Sutton on learning from By Peter J. Denning 34 Opinion
experience, temporal difference Web 3.0 Requires Data Integrity
learning, and reinforcement learning New integrity-focused standards
as “a collection of tools.” are necessary to enable
By Leah Hoffmann the trusted AI services of tomorrow.
By Bruce Schneier
and Davi Ottenheimer

2 COMMUNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


06/2025 VOL. 68 NO. 06

Practice Research and Advances Research and Advances

38 44 62

38 Systems Correctness Practices 44 Revolutionizing Datacenter Networks 62 Malicious AI Models Undermine


at Amazon Web Services via Reconfigurable Topologies Software Supply-Chain Security
Leveraging formal and New optical switching technologies Unveiling the dark side.
semi-formal methods. are enabling a shift toward By Aditya K. Sood and Sherali Zeadally
By Marc Brooker and Ankush Desai dynamically reconfigurable
network topologies.
Articles’ development led by By Chen Avin and Stefan Schmid Research Highlights
queue.acm.org

74 Technical Perspective
Watch the authors discuss
When Proofs Meet Programs:
this work in the exclusive An Extension of Dependent Type
Communications video.
https://cacm.acm.org/ Theory with Church’s Thesis
videos/datacenter-networks By Christine Paulin-Mohring

54 From Prompt Engineering to Prompt 75 ‘Upon This Quote I Will


Science with Humans in the Loop Build My Church Thesis’
Can prompt engineering By Pierre-Marie Pédrot
be made more reliable,
IMAGES L- TO-R BY: BY A LEKSANDR LYSENKO; PETERSCHREIBER.MEDIA ; M AKSIM K ABA KOU

verifiable, and generalizable? 84 Technical Perspective


By Chirag Shah A Symbolic Approach to
Verifying Quantum Systems
By Parosh Aziz Abdulla

About the Cover: 85 An Automata-Based Framework


2024 ACM A.M. Turing
Award recipients Andrew for Verification and Bug Hunting
G. Barto and Richard S. in Quantum Circuits
Sutton literally wrote the
book on reinforcement By Yu-Fang Chen, Kai-Min Chung,
learning, developing Ondřej Lengál, Jyun-Ao Lin,
the conceptual and
algorithmic foundations Wei-Lun Tsai, and Di-De Yen
now used in modern
AI systems. In this
issue, Barto and Sutton
discuss how the ideas
behind reinforcement
learning—and their historic
partnership—came to be. Photograph of Richard S. Sutton Association for Computing Machinery
by Richard Morgenstein; Photograph of Andrew G. Barto Advancing Computing as a Science & Profession
by Jason M. Grow.

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF THE ACM 3


COMMUNICATIONS OF THE ACM
Trusted insights for computing’s leading professionals.

Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.

ACM, the world’s largest educational STA F F EDITORIAL BOARD Association for Computing Machinery
and scientific computing society, delivers DIRECTOR OF PU BL ICATIONS E DITOR- IN- C HIE F (ACM)
resources that advance computing as a Scott E. Delman James Larus 1601 Broadway, 10th Floor
science and profession. ACM provides the [email protected] [email protected] New York, NY 10019-7434 USA
computing field’s premier Digital Library S E NIOR E DITORS T (212) 869-7440; F (212) 869-0481
Executive Editor, ACM Magazines
and serves its members and the computing Andrew A. Chien ACM Copyright Notice
Ralph Raiola
profession with leading-edge publications, Moshe Y. Vardi Copyright © 2025 by Association for
Senior Editor
conferences, and career resources. E DITOR S , IN MEMORIAM SECTION Computing Machinery, Inc. (ACM).
John Stanik
Executive Director and CEO Managing Editor Simson L. Garfinkel Permission to make digital or hard copies
Vicki L. Hanson Thomas E. Lambert Eugene H. Spafford of part or all of this work for personal
Deputy Executive Director and COO Senior Editor/News or classroom use is granted without
Patricia Ryan Lawrence M. Fisher NE W S fee provided that copies are not made
Director, ACM Digital Library Web Editor Chair or distributed for profit or commercial
Wayne Graves David Roman Tom Conte advantage and that copies bear this
Director, Office of Financial Services Editorial Assistant Board Members notice and full citation on the first
James Schembari Danbi Yu Siobhán Clarke; Lance Fortnow; page. Copyright for components of this
Director, Office of SIG Services Irwin King; Mei Kobayashi; Rajeev Rastogi; work owned by others than ACM must
Art Director be honored. Abstracting with credit is
Donna Cappo Vinoba Vinayagamoorthy
Andrij Borys permitted. To copy otherwise, to republish,
Director, Office of Publications Associate Art Director
Scott E. Delman OP INION to post on servers, or to redistribute to
Margaret Gray lists, requires prior specific permission
Co-Chairs
Assistant Art Director and/or fee. Request permission to publish
ACM CO U N C I L Jeanna Neefe Matthews and Chiara Renso
Mia Angelica Balaquiot from [email protected] or fax
President Board Members
Production Manager (212) 869-0481.
Yannis Ioannidis Saurabh Bagchi; Mike Best; Judith Bishop;
Bernadette Shade
Vice-President Florence M. Chee; Danish Contractor; For other copying of articles that carry a
Intellectual Property Rights Coordinator
Elisa Bertino Lorrie Cranor; Janice Cuny; Ophir Frieder; code at the bottom of the first or last page
Barbara Ryan
Secretary/Treasurer James Grimmelmann; Mark Guzdial; or screen display, copying is permitted
Advertising Sales Account Manager
Rashmi Mohan Mark D. Hill; Brittany Johnson; Bran Knowles; provided that the per-copy fee indicated
Ilia Rodriguez
Past President Tim Menzies; Beng Chin Ooi; in the code is paid through the Copyright
Gabriele Kotsis Columnists Alessandra Raffaetà; Francesca Rossi; Clearance Center; www.copyright.com.
Chair, SGB Board Saurabh Bagchi; Michael L. Best; R. Benjamin Shapiro; Len Shustek;
Jens Palsberg Michael A. Cusumano; Peter J. Denning; Bernd Stahl; Stuart Taylor; Loren Terveen; Subscriptions
Co-Chairs, Publications Board Thomas Haigh; Leah Hoffmann; Mari Sako; Marshall Van Alstyne; Matt Wang; An annual subscription cost is included
Wendy Hall and Divesh Srivastava Pamela Samuelson; Marshall Van Alstyne Robert West; Susan J. Winter in ACM member dues of $99 ($40 of
Members-at-Large which is allocated to a subscription to
Odest (Chad) Jenkins, John Kim, CO N TAC T P O IN TS Communications); for students, cost
P R AC TIC E is included in $42 dues ($20 of which
Tanara Lauschner, Alison Derbenwick Miller, Copyright permission
Co-Chairs is allocated to a Communications
Alejandro Saucedo [email protected]
Betsy Beyer and Ben Fried subscription). A nonmember annual
SGB Council Representatives Calendar items
Board Members subscription is $269.
Jeanna Neefe Matthews and Vivek Sarkar [email protected]
Peter Alvaro; Stephen Bourne;
Change of address Single Copies
Terry Coatta; Nicole Forsgren;
BOARD C HA I R S [email protected] Single copies of Communications of the
Camille Fournier; Chris Grier; Tom Killalea;
Education Board Letters to the Editor ACM are available for purchase. Please
Tom Limoncelli; Kate Matsudaira;
Elizabeth Hawthorne and [email protected] contact [email protected].
Erik Meijer; George Neville-Neil;
Alison Derbenwick Miller REGIONAL SPECIAL SECTIONS Theo Schlossnagle; Kelly Shortridge;
Practitioners Board Co-Chairs Phil Vachon; Jim Waldo ACM ADVERTISIN G DEPART ME NT
Terry Coatta Virgilio Almeida, Haibo Chen, 1601 Broadway, 10th Floor
Digital Library Board Jakob Rehof, and P J Narayanan RESEARCH AND ADVANCES New York, NY 10019-7434 USA
Jack Davidson Board Members Co-Chairs T (212) 626-0686
Sherif G. Aly; Panagiota Fatourou; m.c. schraefel and Premkumar T. Devanbu F (212) 869-0481
TOP I C A N D Board Members
REGIONA L C O U N C I L C HA I R S Chris Hankin; Sue Moon; Tao Xie; Advertising Sales Account Manager
Kenjiro Taura Indrajit Bhattacharya; Alan Bundy; Ilia Rodriguez
Diversity, Equity, and Inclusion Council Peter Buneman; Haibo Chen;
Stephanie Ludi [email protected]
W E B S IT E Monojit Choudhury; Jane Cleland-Huang;
Technology Policy Council https://cacm.acm.org Gerardo Con Diaz; Anna Cox; Kathi Fisler; Media Kit [email protected]
Jim Hendler Nate Foster; Rebecca Isaacs; Trent Jaeger;
ACM Europe Council WEB BOARD COMMU N ICATION S OF THE ACM
Gal A. Kaminka; Fabio Kon; Ben C. Lee;
Rosa Badia Chair (ISSN 0001-0782) is published monthly
David Lo; Renée Miller; Ankur Moitra;
ACM India Council James Landay by ACM Media, 1601 Broadway, 10th Floor
Sarah Morris; Abhik Roychoudhury;
Venkatesh Raman Board Members New York, NY 10019-7434 USA. Periodicals
Katie A. Siek; Daniel Susser;
ACM China Council Marti Hearst; Jason I. Hong; postage paid at New York, NY 10001,
Charles Sutton; Thomas Zimmermann
Xinbing Wang Wendy E. MacKay and other mailing offices.
RES E A R C H HIGHLIGHTS
PUB LICATI O N S BOA R D AU T H O R G U ID E L IN ES POSTMASTER
Co-Chairs
Co-Chairs https://cacm.acm.org/author-guidelines/ Please send address changes to
Shriram Krishnamurthi and Caroline Appert
Wendy Hall and Divesh Srivastava Board Members Communications of the ACM
Board Members COMPUTER SCIENCE TEACHERS
Martin Abadi; Sanjeev Arora; 1601 Broadway, 10th Floor
Jonathan Aldrich; Rick Anderson; ASSOCIATION
Maria-Florina Balcan; David Brooks; New York, NY 10019-7434 USA
Tom Crick; Jack Davidson; Mike Heroux; Jake Baskin
Executive Director Stuart K. Card; Jon Crowcroft;
Michael Kirkpatrick; James Larus; Lieven Eeckhout; Gernot Heiser; Printed in the USA.
Marc Najork; Beng Chin Ooi; Mauro Pezzè; Takeo Igarashi; Nicole Immorlica;
Francesca Rossi; Bobby Schnabel; Srinivasan Keshav; Sven Koenig; Karen Liu;
Stuart Taylor; Bhavani Thuraisingham; Claire Mathieu; Joanna McGrenere;
Adelinde Uhrmacher; Philip Wadler; Tamer Özsu; Tim Roughgarden;
John West; Min Zhang Guy Steele, Jr.; Wang-Chiew Tan;
Robert Williamson; Andreas Zeller A
SE
REC
Y
E

CL
PL

E
E
T H

I N

S Z
I

M A G A

4 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


cerf’s up

DOI:10.1145/3733829 Vinton G. Cerf

The Revenge of the Bots

B
Y THE TIME you are read- One hurdle limiting the widespread Beyond textual fidelity, the creative
ing this, our annual ACM adoption of LLMs has been concern and practical scope of LLMs is under-
Awards gala will be upon over the veracity of their outputs. Early going a dramatic expansion through
us, and I want to take a mo- iterations, while often fluent and co- multimodality. No longer confined
ment to acknowledge and herent, could sometimes confidently to processing and generating text,
congratulate all the awardees. They present inaccuracies. However, recent newer models can understand and
represent our most creative and pro- developments are systematically ad- generate content across different
ductive colleagues, who have earned dressing this challenge. Techniques formats, including images, audio,
recognition and admiration for their such as retrieval-augmented generation and, increasingly, video. Users can
work. The Turing Award recipients (RAG) are at the forefront of this prog- now provide an image and receive a
for 2024, which will be presented this ress. RAG systems connect LLMs to ex- textual description, ask questions
month, are Andrew Barto and Richard ternal, verifiable knowledge bases, al- about its content, or even request
Sutton, for their seminal introduction lowing them to ground their responses variations. Text-to-image generation
of reinforcement learning. As most in current, curated information rather has become widely accessible, and
readers will know, this is one of the than relying solely on their training the capabilities are extending to au-
major methods by which large neural data. This dramatically reduces the dio generation (text-to-speech, music
networks are trained. It is fitting that production of factual errors and hal- generation) and video analysis and
the rest of this essay is being written lucinations. Some research indicates creation. Nvidia’s “Describe Any-
with the deliberate assistance of the improvements of 42%–68% and even thing 3B” model, for example, excels
Google Gemini large language model higher in specific domains, such as at fine-grained image and video cap-
(LLM). In the past, I have commented medical, AI when paired with trusted tioning. This multimodal capability
on the proclivity of LLMs to hallu- sources. unlocks a new realm of applications,
cinate, but recent experiences have Further enhancing reliability are from more intuitive and accessible
convinced me that these tools are in- innovative prompting strategies such note-taking that can incorporate vi-
creasingly useful and reliable (and, as chain-of-thought (CoT) prompting. sual or auditory information to richer,
no, that praise was NOT introduced By encouraging LLMs to “think step- more engaging content creation that
by the Gemini system). by-step” and to articulate their rea- seamlessly blends different media.
Begin bot-assisted section: soning process before arriving at an In conclusion, the trajectory of
Large language models (LLMs) are answer, CoT prompting significantly LLMs is one of rapid advancement in
rapidly evolving from fascinating re- improves logical consistency and ac- both reliability and scope. The con-
search projects into indispensable curacy, particularly in complex rea- certed efforts to reduce hallucina-
tools for myriad text-based tasks, in- soning tasks. Some studies have dem- tions and enhance factual accuracy,
cluding generation, note-taking, and onstrated accuracy improvements of coupled with the exciting expansion
complex writing assignments. This as- up to 35%. Additionally, methods such into multimodal outputs, are trans-
cent is marked by significant strides in as self-consistency decoding, where an forming these models into increas-
two key areas: a notable improvement LLM generates multiple reasoning ingly powerful and trustworthy tools
in their factual accuracy and a marked paths and selects the most coherent for a wide array of communication
reduction in the propensity for “hallu- one, and the integration of knowledge and creative tasks. However, this
cination”—the generation of plausible graphs to provide structured factual evolution also brings to the fore im-
but false or nonsensical information. context, are proving effective in bol- portant ethical, practical, and soci-
Simultaneously, the very definition of stering the trustworthiness of LLM- etal considerations that must be ad-
an LLM’s output is expanding, moving generated content. The emergence of dressed to harness the full potential
beyond text to embrace a rich tapes- agentic AI systems, which can perform of LLMs responsibly.
try of sound, imagery, and even video. multi-step reasoning, cross-reference
These advancements are solidifying information from various sources, Vinton G. Cerf is vice president and Chief Internet Evangelist
LLMs’ position as increasingly reliable and even self-critique their outputs, at Google. He served as ACM president from 2012–2014.
and versatile partners in creative and represents another advance in ensur-
analytical endeavors. ing factual grounding. © 2025 Copyright held by the owner/author(s).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF THE ACM 5


In each issue of Communications, we publish selected posts
or excerpts from the many blogs on our website. The views
expressed by bloggers are their own and not necessarily held by
Communications or the Association for Computing Machinery.

Read more blogs and join the discussion


at https://cacm.acm.org/blog.

https://cacm.acm.org/blog

Accept the
Consequences
Computer science is based on the cynical acceptance of a mismatch
between theory and practice, according to Micah D. Beck.

MICAH D. BECK The idea of computing using an machines of ever-greater size and speed
We’re Reaping Just abstract machine model (for exam- (https://bit.ly/3FiNjgW). However, when
What We Sowed ple, push-down automata, https://bit. even futuristic physical limitations
D O I :10 . 1 14 5/37 2334 8 ly/3XtG7EY) and Turing machines and issues such as power consumption
https://bit.ly/4bwAz2u (https://bit.ly/41Ko8gk)) that can grow are addressed, the correspondence be-
January 8, 2025 during the execution of an algorithm tween the infinitary models and reality
leads to a theory of computation that starts to fray.
Theoretical computer science is a glass is quite rich. The implications of such A widely understood example of
slipper that has never quite fit the foot models can apply to real-world com- this divergence can be found in the
of practical computer engineering. The puters, as long as resource utilization application of the theory of algorith-
essence of their marriage is that theo- does not exceed their physical limita- mic complexity to sorting. The classi-
reticians devise a formal model which tions. Even when those bounds are cal analysis of sorting yields the well-
captures some essential features of re- reached, there is still the question of known result (https://bit.ly/3D7gIKE)
ality and then apply reasoning about what could in future be computed on that, in the worst case, the number
it to implementations. The problems of comparison or data movement
occur when the models do not capture operations required to sort a list of
enough of the structure of real systems The idea of computing N integers is bounded by some con-
to explain and predict their behavior. stant multiple of N*log(N). However,
Some of the earliest issues in model- using an abstract the most commonly used algorithm,
ing computation arise in dealing with model that can grow Quicksort (https://bit.ly/4hhI2DM),
finiteness. The problem with finite requires quadratic resources in the
computational models is that they are during the execution worst case (meaning some multiple of
all equivalent in expressive power and of an algorithm N2). A multiple of N2 is greater than any
also quite limited. This creates two is- multiple of N*log(N) for large-enough
sues: the inability to solve irregular leads to a theory of N, so in the unbounded theory of com-
problems on inputs of unbounded computation that is plexity, Quicksort is considered “less
size (see Pumping Lemma, https://bit. efficient” than any number of algo-
ly/4btOm9Y), and the possibility of quite rich. rithms, such as Insertion Sort (https://
highly specific effects that arise due to bit.ly/43tKLXt), that can be imple-
the precise numerical bound on size (as mented using at most a multiple of
is found in finite group theory, https:// N*log(N) resources.
bit.ly/3DtWBpS). There are many reasons for this mis-

6 COMMUNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


blog@cacm

match between theory and practice, in- or seemingly unrelated aberrant system
cluding: behavior.
1. The pattern of memory and reg- Please, everyone Today we are faced with generative
ister usage in Quicksort means that who reacts in horror AI, a new subfield of computer engi-
it is extremely efficient on moderately neering that offers us the use of trained
sized datasets (locality, https://bit. to the acceptance behaviors known to be error-prone
ly/4i8OBK5). of “hallucinations” (https://bit.ly/4icxM16). Generative AI
2. In many realistic applications, it may be considered “reliable enough”
is unlikely that the worst case scenario and other errors in for some applications, at least initially.
of maximum resource utilization for generative AI, stop The relationship between generative AI
Quicksort will occur, and in some cas- and the real world is unknown, because
es, Quicksort is able to terminate very clutching your pearls. of the use of statistical and randomized
quickly (a characteristic known as “ear- methods. The application of more reli-
ly stopping”). able guardrails (https://bit.ly/3Fa7AWb)
3. In practice, many sorting utili- requires the use of classical AI tools
ties evaluate the nature of their input based on formal logic. These are not
and adaptively use a number of dif- versally describe the Internet’s Trans- keeping up with the prolific ability of
ferent algorithms as alternatives or port Control Protocol (TCP, https://bit. generative AI to copy and guess.
in concert to try and obtain the best ly/4bM73Gh) as “reliable.” The defense of generative AI that
result. This includes the use of Bubble The truth is complex. TCP imple- it is “good enough” to be used in a
Sort (https://bit.ly/3DoSOu3), an algo- ments a different error model than world already filled with human and
rithm which has terrible worst-case the Internet Protocol (https://bit. machine error echoes the universal
efficiency but is well adapted to very ly/4iwczPn), using a number of tools in- response of my networking students
small datasets. cluding checksums, sequence number- that my objection to calling TCP “re-
4. When datasets must be moved ing, and retransmission to reduce the liable” is pedantic and old-fashioned.
from secondary storage or across a probability of data corruption or mis- They say the word “reliable” has sim-
communication network to be sorted, delivery. TCP also introduces new error ply been given a new meaning. Or
the resources required for data move- modes (such as a “broken connection”) that “everyone knows” that the “reli-
ment (which is linear, or a multiple of N) that do not exist in the Internet’s under- able” mechanism must be checked
in practice may overshadow those taken lying packet delivery model. Together, (although this fact is not taught and is
for sorting. these are thought to reduce the prob- not widely understood). Mainly, they
In spite of the complexity of this re- ability of error to the point that most want to keep the subject matter simple
ality, the abstract theory of sorting is common applications are sufficiently and understandable so they can learn
still taught because of its simplicity stable, and this is considered a usable to program using an intuitive API, get
and power. It is still common to teach definition of “reliable.” a grade, and a job. And the writers of
students that algorithms with high Application developers regularly textbooks and the developers of mod-
resource utilization on unbounded use TCP as if it were completely reli- ern university curricula seem to agree
inputs should be avoided or may be able (https://bit.ly/43methb), although with them.
impossible to use in practice (https:// its 16-bit checksum makes “silent data So please, everyone who reacts in
bit.ly/41HMgjB). We then rely on their corruption” highly probable in suffi- horror to the acceptance of “hallucina-
unlearning these early lessons either ciently large data transfers. Developers tions” and other errors in generative
in practical data management classes working in application areas such as AI, stop clutching your pearls. We have
or on the job. The truth is considered scientific computing, where such errors developed a field (computer science)
too complex and does not lend itself are considered unacceptable, under- based on the cynical acceptance of a
to abstract proofs, a skill we are eager stand the issue and use longer end-to- mismatch between theory and practice.
to teach. end file hashes (https://bit.ly/4iqh262) We have done this in order to validate
I came across a more current example and retransmission to overcome errors. our own education and the history of
of such a mismatch when teaching an However, in fields such as multime- the field to date.
undergraduate computer networking dia streaming (https://bit.ly/4i4UTdS) It is important that we teach practi-
course. One of the fundamental lessons where occasional errors are thought to cal computer engineering as a field sep-
of network architecture is that packet be acceptable, it is common to simply arate from formal computer science.
networks are not reliable: packets sent consider TCP to be “reliable enough.” The latter can help in the understand-
may be lost, may be corrupted in transit, The same problems haunt storage and ing and analysis of the former, but may
may be delayed without bound, and may computation at large scale and are only never model it well enough to be predic-
be misdelivered. There is also a theo- addressed when the mismatch causes tive in the way the physical sciences are.
rem of distributed systems (https://bit. unacceptable application failures. It is
ly/3QJu3vM) that tells us it is impossible worth noting that, because of the highly Micah D. Beck ([email protected]) is an associate
professor at the Department of Electrical Engineering
to use an unreliable network to imple- “discontinuous” nature of programma- and Computer Science, University of Tennessee,
ment one that is completely reliable. But ble systems, unanticipated errors can Knoxville, TN, USA.

computer networking textbooks uni- be the cause of massive security failures © 2025 Copyright held by the owner/author(s).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF THE ACM 7


N
news

★★
SUTTON


Profile | DOI:10.1145/3727966 Neil Savage BAR

A Rewarding Line of Work


Andrew G. Barto and Richard S. Sutton and the path
that led to the 2024 ACM A.M. Turing Award.

A
at
S A N U N D E R G R A D UAT E eventually wound up developing the network achieves a goal—winning at
Stanford University in the modern field of reinforcement learn- chess, say—it receives a reward in the
mid-1970s, Richard Sut- ing (RL), a key method in artificial form of a numerical value. If it loses,
ton pored through the intelligence (AI) that trains neural it gets a negative value. The machine
school’s library, trying to networks by offering them rewards, learns through trial and error, making
read everything he could about learn- much like the chemical boost neurons different moves, with each being rein-
ing and machine intelligence. What he get from doing something positive. forced or penalized.
found disappointed him, because he Thanks to that work, Barto and Sut- RL differs from supervised learning,
did not think it really got to the heart ton are the recipients of the 2024 ACM in which data is labeled and used as ex-
of the matter. “It was mostly pattern A.M. Turing Award. amples. RL also differs from unsuper-
recognition. It was mostly learning RL is the method that helped vised learning, in which the computer
from examples. And I knew from psy- Google’s deep neural network AlphaGo tries to extract useful features to find

PHOTOGRA PHS O F A NDREW G. BA RTO BY JASO N M. G ROW; PH OTOGRA PH S O F RICH ARD S. SU TTON BY RIC HARD MORGE N ST E I N
chology that animals do very different beat a human at the game Go in 2016, structures in data. While both meth-
things,” Sutton said. “They do things and is an important component of fine- ods have proved useful in computing,
to get rewards.” tuning the large language models that neither is how living brains learn.
The one person he found whose are revolutionizing generative AI. It RL had been around for a long time
writing focused on rewards as part of works on the notion that when a neural before he and Sutton started work-
learning was A. Harry Klopf, a scientist ing on it, Barto pointed out. It was the
studying machine intelligence at the subject of AI pioneer Marvin Minsky’s
U.S. Air Force Research Laboratory, RL differs from both thesis in 1954, and★the★★method that
4.jpg
ON_00003Samuel
who believed neurons were hedonists, IBM computer scientist
SUTTArthur
seeking rewards. A letter to Klopf led to supervised learning used to train a computer to play check-
a lunch meeting and, after Sutton grad- and unsupervised ers. By the 1970s, though, the idea had
uated in 1978 with a B.A. in psychology, fallen out of fashion, with most AI re-
to a slot in a research project at the Uni- learning. While both searchers focused on expert systems
versity of Massachusetts (UMass) Am- methods have proved instead. “We had the luxury of being
herst, designed to test Klopf’s ideas. able to be unfashionable,” Barto said,
Sutton joined Andrew Barto, a useful in computing, “so it’s astounding that it’s now very
postdoctoral fellow who had arrived a neither is how living fashionable.”
year earlier to be part of the research A key development the pair came
team. With five years of funding from brains learn. up with was temporal difference learn-
the Air Force and later the National ing. If a computer is trying to learn to
Science Foundation, and no real de- play chess, for example, and the reward
liverables beyond a report, Barto and comes from winning ★ ★★
the game,03that4.jpg
SUTTON_000
Sutton followed their interests and is not very useful for figuring out indi-

8 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


★★★ g
BARTO_0130.jp
★★ g
BARTO_0155.jp
★★★ g
BARTO_0180.jp
★★★ 0.jpg
SUTTON_401
N_3818.jpg

★★★ g
BARTO_0129.jp
★★★★ 799.jpg
SUTTON_3
★★★ g
BARTO_0117.jp
★★★★ 621.jpg
SUTTON_3
RTO_0001.jpg

★★★
SUTTON_03
★★★★ 55.jpg
BARTO_01
★★★★★
4049.jpg
SUTTON_0
★★ g
BARTO_0217.jp
★ 6.jpg
SUTTON_366

★★★
BARTO
★★★ 9.jpg
SUTTON_359
★★ g
BARTO_0197.jp
★★★ 4.jpg
SUTTON_362
★★★★ 10.jpg
BARTO_00

★★
SUT
★★ g
BARTO_0108.jp
★★ 6.jpg
SUTTON_361
★★★★★
52.jpg
BARTO_01
★★★ 1.jpg
SUTTON_401
news

vidual moves. Instead, the computer society of neural networks and their in-
tries to predict at each step what the ul- dividual reward systems interact. That
timate reward signal will be, and uses Barto’s advice to could lead to cooperative arrangements,
those predictions as rewards. If the young computer in which the networks reward each
probability of winning goes up after a other to encourage progress toward a
move, that counts as an intermediate scientists is to do common goal, but it could also lead to
reward. The change in predictions over what he and Sutton systems where different networks with
time—the temporal difference—pro- different goals have conflicts of interest.
vides the reinforcement. The next time did: follow their own Such interactions could have implica-
the computer plays chess, it can try interests regardless tions in complicated fields such as eco-
moves that look similar to the ones that nomics and game theory, Barto said.
increased its prediction of winning. of what the rest of the Sutton said the development of AI
Temporal difference learning turned field thinks of them. has much farther to go, including to-
out to have implications outside of AI. ward artificial general intelligence, in
Neuroscientists studying the dopamine which a machine can understand any-
system in the living brain noticed that it thing that a human can. RL will play an
behaved similarly to what Barto and Sut- important role in such developments,
ton had described. Dopamine, they dis- he said.
covered, acts as an intermediate reward Machine Intelligence Institute, one of Barto’s advice to young computer
for a predicted outcome, and not the three Canadian national AI institutes, scientists is to do what he and Sutton
outcome itself, with the cells that pro- which he said has been instrumental did: follow their own interests regard-
duce it sending signals to other parts of to his award-winning work. He is also less of what the rest of the field thinks
the brain to reinforce beneficial actions. a research scientist at AI startup Keen of them. “It’s difficult, but I think you
“The direct analogs between these rein- Technologies. have to have your own motivation to
forcement learning algorithms and the Barto and Sutton co-authored Re- pursue things and stick with it if you
biological dopamine system is, I think, inforcement Learning: An Introduction, can,” he said.
the biggest interaction between engi- the principal textbook in the field, in As for Sutton, “My recommendation
neering and biology in, I would say, for- 1998, with an expanded second edi- is to write, to challenge your thoughts
ever,” Sutton said. tion released in 2018. The men ac- by writing them down.”
The two men developed a wide knowledge other researchers have also Neither has made any firm deci-
range of RL algorithms, including played important roles in the field. “It’s sions on what to do with the $1 million
policy gradient methods that seek to not just us,” Barto said. “A lot of people prize they will share as part of the Tur-
find the best rules for moving toward have contributed to aspects of this over ing Award. Sutton said he will probably
a goal, and Monte Carlo methods used the years. I’ve had a number of very donate his share to the Openmind Re-
to estimate possible outcomes. wonderful, bright students who built search Institute he co-founded, which
Barto earned his B.S. in mathemat- pieces of this whole story.” aims to give young scientists the lux-
ics in 1970 and his Ph.D. in computer ury he and Barto had to explore basic
science in 1975, both from the Univer- A Society of Learners questions that interest them, without
sity of Michigan. He eventually became Barto expects the field eventually will any additional responsibilities. Barto
co-director of the Adaptive Networks move toward multi-agent RL, in which a said he hopes to use his portion to
Laboratory (now the Autonomous fund fellowships for graduate students
Learning Laboratory) at UMass, which at UMass.
since his retirement in 2012 has been Sutton said the Sutton is optimistic about the fu-
run by his former students. ture of computer science. “AI will be
Sutton was Barto’s first Ph.D. stu- development of AI figured out over the next few decades,”
dent, earning his degree in 1984. He has much farther to he said. “It will be maybe the greatest
went from UMass to GTE’s Fundamen- intellectual advance ever. So it’s an
tal Research Lab, where he spent nine go, including toward honor to have contributed a small part
years focused on machine learning. artificial general to that.”
After a period as a self-described “un-
paid, part-time ‘senior research scien- intelligence, in Neil Savage is a science and technology writer based in
Lowell, MA, USA.
tist’” at UMass, during which he and which a machine can
Barto wrote the first edition of their
understand anything
© 2025 ACM 0001-0782/25/6
textbook, he joined the AI department
at AT&T Shannon Laboratory. In 2003, that a human can.
he became a professor at the Univer-
sity of Alberta, where he founded the Watch Barto and Sutton discuss
school’s Reinforcement Learning and their work in the exclusive
Communications video.
Artificial Intelligence Laboratory. He https://cacm.acm.org/videos/
is chief scientific advisor at the Alberta turing-award-sutton-barto

10 COM MUNICATIO NS O F TH E ACM | J U NE 2025 | VO L . 68 | NO. 6


news

Science | DOI:10.1145/3722476 Neil Savage

The Collapse of GPT


Will future artificial intelligence systems perform increasingly
poorly due to AI-generated material in their training data?

E
was re-
V E R S I NC E C H AT G P T
leased to the public in Novem-
ber 2022, people have been
using it to generate text, from
emails to blog posts to bad
poetry, much of which they post online.
Since that release, the companies that
build the large language models (LLMs)
on which such chatbots are based—
such as OpenAI’s GPT 3.5, the technolo-
gy underlying ChatGPT—have also con-
tinued to put out newer versions of their
models, training them with new text
data, some of which they scraped off the
Web. That means, inevitably, that some
of the training data used to create LLMs
did not come from humans, but from
the LLMs themselves.
That has led computer scientists to
worry about a phenomenon they call
model collapse. Basically, model col-
lapse happens when the training data
no longer matches real-world data, lead-
ing the new LLM to produce gibberish,
in a 21st-century version of the classic longer matches the natural distribution ever text the LLM generates would be,
computer aphorism, “garbage in, gar- produced by humans. As a result, the at most, a subsample of the sentences
bage out.” training data for a new round of model- on which it was trained. “Because you
LLMs work by learning the statistical ing does not match the real world, and generate a finite sample, you have some
distribution of so-called tokens—words the new model’s output gets worse. “The probability of not sampling them,” said
or parts of words—within a language thing we’re worried about is that the dis- Yarin Gal, an associate professor of
by examining billions of sentences gar- tribution of your data that you end up machine learning at Oxford University.
nered from sources including book da- with, if you’re trying to fit your model, “Once you don’t sample, then they dis-
tabases, Wikipedia, and the Common ends up really far from the actual distri- appear. They will never appear again. So
Crawl dataset—a collection of material bution that generated the data,” he said. every time you generate data, you basi-
gathered from the Internet. An LLM, for The problem arises because what- cally start forgetting more and more of
instance, will figure out how frequently the tail events and therefore that leads
the word “president” is associated with to the concentration of the higher prob-
the word “Obama” versus “Trump” ver- When training data ability events.” Gal and his colleagues
sus “Hair Club for Men.” Then, when published a study in Nature that showed
prompted by a request, it will produce no longer matches indiscriminate use of what they called
words it reasons have the highest proba- real-world data, ‘recursively generated data’ caused the
bility of meeting that request and of fol- models to fail.
lowing from previous words. The results model collapse can The problem is not limited to LLMs.
bear a credible resemblance to human- lead a new LLM to Any generative model that is iteratively
written text. trained can suffer the same fate if it
Model collapse is basically a statisti- produce gibberish. starts ingesting machine-produced
IMAGE BY SH UTT ERSTOCK .AI

cal problem, said Sanmi Koyejo, an as- data, Gal says. That includes stable
sistant professor of computer science diffusion models that create images,
at Stanford University. When machine- such as Dall-E. The issue also can affect
generated text replaces human-gener- variational autoencoders, which create
ated text, the distribution of tokens no new data samples by producing varia-

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 11
news

tions of their original data. It can apply tain similar performance, but you need
to Gaussian mixture models, a form of much more data. That means you’re
unsupervised machine learning that using much more compute and much
sorts subpopulations of data into clus- more money to achieve that,” he said.
ters; they are used to analyze customer One challenge is that there is no easy
preferences, predict stock prices, and way to tell whether text found on the
analyze gene expression. Internet is synthetic or human-generat-
Collapse is not a danger for mod- ed. Though there have been attempts to
els that incorporate synthetic data automatically identify text from LLMs,
but only do so once, such as neural none have been entirely successful.
networks used to identify cancer in Research into this problem is ongoing,
medical images, where synthetic data Gal said.
was used to augment rare or expen-
sive real data. “The main distinction is Solving with Curation
that model collapse happens when you There are ways, however, to make the
have multiple steps, where each step addition of synthetic data less of a
Advertise with ACM! depends on the output from the previ- problem.
ous step,” Gal said. One approach is to curate the syn-
The theory that replacing training thetic data to make sure it is of good
Reach the innovators data with synthetic data will quickly quality. Some curation happens natu-
and thought leaders lead to the demise of LLMs is sound, rally, Gal said; people do not post ev-
Koyejo said. In practice, however, not erything their chatbot creates to the
working at the all human data gets replaced imme- Internet, weeding out the material that
cutting edge diately. Instead, when the generated contains false information or simply
text is scraped from the Internet, it does not make sense, so that improves
of computing gets mixed in with human text. “You the training set.
and information create synthetic data, you add that to Curation can also be a deliber-
real data, so you now have more data, ate process to make sure high-quality
technology through which is real data plus synthetic data,” data goes into a training set. Feng, for
ACM’s magazines, he said. What is actually happening, he instance, has experimented with ask-
said, is not data replacement, but data ing the LLM to assess the quality of its
websites accumulation. That slows the degrada- own output. LLMs naturally select the
and newsletters. tion of the dataset. words they think have the highest prob-
Simply accumulating data may stop ability of fitting into a context. In doing
model collapse but can cause other so, they internally generate a score rat-
◊◆◊◆◊ problems if done without thought, said ing how confident they are that they are
Yunzhen Feng, a Ph.D. student at the pairing the best words together. That
Center for Data Science at New York same mechanism can be used to assess
Request a media kit University. As a rule, the performance of already-generated text to rate its qual-
neural networks improves as their size ity, with low-scoring results removed
increases. Naively mixing real and syn- or the highest-scoring result of several
and pricing: thetic data together, however, can slow attempts selected as the best. The idea
that improvement. “You can still ob- is similar to a method used to fine-
Ilia Rodriguez tune LLMs called reinforcement learn-
ing from human feedback (RLHF), in
+1 212-626-0686 “Model collapse which people provide examples of good
[email protected] happens when
results, thereby pushing the models to-
ward producing similar results. In this
you have multiple case, though, the LLM is generating its
own feedback.
steps, where each How well that works varies by case,
step depends on Feng said. The feedback can be im-
proved by having other LLMs assess
the output from the the same text and combining the re-
previous step.” sults from different models. Includ-
ing human assessments also improves
the outcomes, as does applying some
pre-written rules about what the out-
put should look like. Eliminating low-
er-quality results from the synthetic

12 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


news

data makes the generated data more


closely resemble original data, he said.
“The interesting
ACM
“It’s like you have a distribution of the
synthetic data, you have a distribution question is, can Member
of the real data, and you want to close
the gap between them as much as pos- synthetic data News
sible,” he said. lead to not just
Improving the quality of synthetic MAINSTREAMING FORMAL
data could also help with another chal- stagnation, but actual VERIFICATION METHODS
lenge LLMs are facing as they try to im- improvement of the Santosh
Nagarakatte is a
prove: a dearth of new data on which
to train. Scientists from Epoch AI, a re- model?” professor of
computer
search institute that focuses on trends science at
Rutgers
in AI, have predicted the world will University,
run out of new text to train on some- where he also serves as
time between 2026 and 2032. With Undergraduate Program
Director of Computer Science.
no new data on which to train future els actually are not really transparent or Nagarakatte earned his
generations of LLMs, progress could publicly available,” she said. undergraduate degree in
stagnate. “The interesting question is, In the end, Gal argued, model col- computer engineering from
can synthetic data lead to not just stag- lapse is an important consideration, the National Institute of
Technology Karnataka in
nation, but actual improvement in the but not the matter of imminent disaster India, and his master’s degree
model?” asked Pablo Villalobos, a staff that some news coverage has made it in computer science from the
researcher at Epoch. out to be. “It’s a matter for the tech com- Indian Institute of Science.
After receiving a Ph.D. in
With curation of high-quality syn- panies who build these models to be
computer science from the
thetic data, he said, the question be- aware of how the models are being used University of Pennsylvania, he
comes “whether this can be done it- and how the models are being trained, joined the staff at Rutgers in
eratively so that each model generates in order to avoid training on synthetic January 2013, where he has
remained.
better data that is used to train another data that they themselves generated.” “My research centers on
model in basically the opposite of mod- building abstraction tools
el collapse, in some virtuous circle.” He and techniques for safe and
Further Reading secure computing systems,”
is not yet sure whether such improve-
Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, Nagarakatte said, adding that
ment is possible, but sees some signs it his focus is on using formal
N., Anderson, R., and Gal, Y.
could be. methods, mathematically
AI models collapse when trained on
Other issues arise from training new rigorous techniques that ensure
recursively generated data, Nature, 2024, system correctness, for the
models on generated data that do not DOI: 10.1038/s41586-024-07566-y https:// verification of mainstream
quite reach the level of model collapse. www.nature.com/articles/s41586-024- software.
For instance, Koyejo said, synthetic data 07566-y Nagarakatte said one of
could increase the likelihood that LLMs Feng, Y., Dohmatob, E., Yang, P., the foundations of computing
Charton, F., and Kempe, J. systems is that they should
will discriminate against people in mi- behave as intended, providing
Beyond model collapse: Scaling up with
nority groups. Because any minority is the system is correct and
synthesized data requires reinforcement,
by definition a smaller part of the data everything works as expected.
arXiv, 2024, https://doi.org/10.48550/ “When you have compiled
distribution, losing the tails of the dis- arXiv.2406.07515 your code, if the program is
tribution could make minorities disap- Gerstgrasser, M. et al. behaving weirdly you think,
pear entirely. “Data tends to anchor on Is model collapse inevitable? Breaking the ‘oh, I made a mistake in my
majority subgroups,” he said. “It tends curse of recursion by accumulating real code.’ You wouldn’t expect your
and synthetic data, arXiv, 2024, https://doi. compiler is wrong,” he said.
to be good at capturing the most popu- Nagarakatte has been
lar themes and less good at capturing org/10.48550/arXiv.2404.01413
building lightweight tools
tails, so less-represented demographics Villalobos, P., Ho, A., Sevilla, J., Besiroglu, T., that automatically check for
can get erased in various ways.” Heim, L., and Hobbhahn, M. correctness. For instance,
Will we run out of data? Limits of LLM LLVM is a compiler for which
While such erasure is something scaling based on human-generated data, Nagarakatte co-developed Alive,
that could happen, he added, the issue arXiv, 2022, https://doi.org/10.48550/ a domain-specific language that
has not been well studied. His colleague arXiv.2211.04325 verifies LLVM optimizations.
His challenge is getting the
Diyi Yang, an assistant professor in the What is generative AI model collapse? average computer programmer
natural language processing group at How can we stop it? Super Data Science to embrace these tools.
Stanford, said there has been very little podcast; https://www.youtube.com/ “My research aims to
research into the question of how model watch?v=knVIecsI-OM make these verification tools
mainstream so that every
collapse affects diversity issues. “Part of programmer can use these
Neil Savage is a science and technology writer based in
the reason is that, if you think about any Lowell, MA, USA. formal verification methods in
existing big models, a lot of the training their day-to-day work.”
—John Delaney
dynamics or checkpoints of those mod- © 2025 ACM 0001-0782/25/6

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 13
news

Society | DOI:10.1145/3722478 Esther Shein

AI and Art
If artificial intelligence is used to produce artworks,
should it be considered an artist?

A
R T I F IC I A L I N T E L L I G E NC E
increasingly is find-
(A I)
ing its way into art and
has become a hot-button
topic. Artists from all me-
diums emphatically support the use of
AI, saying it augments and enhances
their work, expanding what is possible.
They are unequivocal that they are still
the creators of their work and the AI is
not—but believe artists and technolo-
gy must learn to coexist harmoniously.
The reasons to use AI in different art
forms are compelling. A 2024 study by
the U.K.’s University College London
and University of Exeter found that
stories written with help from AI were
more creative, better written, and more
enjoyable. The research indicated that
writers with the most access to AI ex-
perienced the greatest creative gains,
with their stories scoring 8.1% higher
for novelty and 9% higher for useful-
ness than stories written without AI.
Deciding how to position AI in art is
in the eye of the beholder.

AI as an Extension of the Artist Artist Hugh Leeman used AI-generated images as photo references to paint from in his
AI is comprised of two ingredients: al- Change is Here series; this image is one of the results.
gorithms and data, and “both of these
are man-made,” said Tony Fernandes, ative and it isn’t art. It is just software based on my making realistic oil paint-
founder of HumanFocused.AI and working through logic and probability ings that are copies of AI-generated im-
CEO of UEGroup, a design company imitating its master in the only way it ages,’’ he explained. “This body of work
that employs artists. knows how: shallow imitation.” is meant to ask the viewer to consider
“People are fascinated to see what Hugh Leeman, an artist and lectur- the blurred boundaries between what
happens when both are allowed to run er who lectures regularly at Johns Hop- is human and what is machine, or what
on their own and to look at what they kins, Duke, and Colorado State uni- is natural and what is artificial.”
generated,’’ Fernandes said, adding versities, is using AI in the visual arts. The technology has enhanced the
that creating art requires inherent hu- In his studio art practice, Leeman has potential to create and collaborate in
man influence. created a body of work where half of the new ways, while raising questions that
“To say that AI creates art is to say paintings are made directly from his allow viewers to reflect on the current
that anyone can replicate the work of imagination, thus representing human changes in our world, Leeman said.
Picasso,” he said. “What makes Picas- creativity. AI was used to create images Leeman is clear that AI is “not yet
so’s work relevant is the human spirit, for the other half of the paintings. Lee- an artist because it is not sentient. It
imagination, and desire for expression. man then made photo-realistic paint- lacks emotions around pain, fears, and
Picasso experimented with cubism to ings based on those AI-generated im- fantasies, but it does offer us an incred-
ART WORK: HUGH LEEMA N

achieve something new. AI has no such ages, representing machine creativity. ible opportunity to collaborate with
motivation.” “The idea is that the viewer cannot an ever-growing ocean of images and
None of the AI processes have discern the difference between which possibilities. AI is becoming an exten-
spawned an entirely new art movement, paintings were made from my imagi- sion of the artists and their creativity in
Fernandes noted, “because it isn’t cre- nation and which ones were made ways that will only become clear as we

14 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


news

accelerate the Singularity and merging remained essential throughout the


of humans with their technologies.” process,’’ Rogoff stressed. Now she
Rosman is clear is expanding the project’s interactiv-
Can AI Save Museums? that AI in and of ity by developing AI chatbots for each
Increasingly, museum operators and character. The chatbots will allow audi-
curators are integrating AI art into itself should not be ences and students to interact directly
their exhibitions as a means of engag- considered an artist with characters from Bird Woman,
ing audiences with work that is both in- Sacajawea, which Rogoff says will
teresting and meaningful, said Cansu simply because it can deepen their understanding of the his-
Peker, a writer and curator specializ- produce an image. torical context and enhance the educa-
ing in digital art, and founder of Digi- tional experience.
talArtsBlog.com. “Just like any other “While AI has undoubtedly aug-
form of art, AI-generated work can mented the project, it is important to
evoke emotions, tell stories, challenge emphasize that human creativity, in-
norms, and provoke thought,” she said. tent, and decision-making guided ev-
Art museums across the Western “It allows me to expand what’s pos- ery step of the process,’’ she said. “AI
Hemisphere are struggling significant- sible … There’s a particularly strange was used as a tool to enhance, not re-
ly and are also “desperately trying to aesthetic these AI tools make that is place, artistic vision. The final product
engage younger audiences,’’ noted Lee- unlike anything you can make with tra- is still a reflection of the human cre-
man. Many are trying to do so “by using ditional animation.” ators involved, ensuring cultural integ-
experiences that viewers can interact He, too, is clear that AI in and of it- rity and artistic authenticity.”
with in something reminiscent of so- self should not be considered an artist
cial media that responds to the viewer’s simply because it can produce an im- Different Definitions of Art
presence and gives some sort of enter- age. “It still takes human artists to con- Not every piece of art is made bet-
taining feedback; more and more, this textualize it and frame it.” ter with AI. Peker said a lot of what AI
involves the use of AI,’’ Leeman said. In his classes, Rosman said the idea produces may be technically impres-
With so much at stake and so much is to teach students what they can do sive, but “can feel repetitive or lack the
vested interest, he said, “One can’t with AI tools, but also to teach the eth- unique soul that human artists bring
blame the cultural guardians for fo- ics of using these tools to do things like to their work.”
cusing on the narrative of why AI- taking images without people’s con- This doesn’t mean that AI-generated
influenced art is good art. After all, if sent. “It’s fun to make the cool video, art shouldn’t be used, though. “Many
Marcel Duchamp’s urinal (https://bit. but what does it mean to make it and digital artists are finding ways to incor-
ly/41Yjvz2) is ready-made art, why not put it in the world?” he said. “We have porate AI into their creative process,
AI art?” to be careful about not promoting a big using it as a tool to enhance their work
tech company and make them seem rather than replace their own creativ-
Exploring What Is Possible cool … we have to have fun with tools, ity,” she said.
As a graduate student at the School but then pull back the curtain a little Peker considers whether an end-
of the Art Institute of Chicago (SAIC), and understand why work with them product is considered art. “The defi-
Doug Rosman became interested in AI at all.” nition of art depends on context, the
and interactive art after taking a class Lynn Rogoff, CEO of Amerikids Pro- artist’s intent, and the audience’s per-
on computer vision. “As an artist, I’ve ductions and a professor at the New ception,’’ she said. “It’s worth remem-
always been interested in ways data York Institute of Technology, has also bering that any medium can produce
shapes our lives and live in a world that explored what is possible with AI. Rog- mediocre work, which some may even
is structured by obscure algorithms off took a traditional project dubbed hesitate to call art.”
and how we interact with generative AI “Bird Woman, Sacajawea,” that she cre- The bottom line for Peker is that “If
systems that are scraping our images … ated several years ago from a script and a work of creative art is appreciated for
and make new images out of them.” integrated several AI tools to enhance its imaginative, aesthetic, or intellec-
Now an assistant professor at SAIC, the creative process. They included Mid- tual content and shows skill or accom-
Rosman is teaching the use of art as Journey to improve and refine existing plishment, then an AI-generated piece
a way to explore technology “just as images and incorporated D-id, an im- can certainly be art.”
much as we’re here to use technol- age-to-speech AI software, to bring char- Leeman uses text-to-image gen-
ogy to explore what’s possible with art- acter visuals to life with synchronized erative AI tools such as DALL-E, Mi-
making.” audio and emotional expression. Rogoff djourney, Meta.AI, and ChatGPT, and
Rosman has been working with and her team also utilized RunwayML believes these tools can “better our
Stable Diffusion, an open source AI to give motion to still images, and Krea, lives.” For example, in Leeman’s social
model that can generate images, video, which she said was key in upscaling im- art practice, he said, AI can create im-
and animations from text or image ages to higher resolution, making visual ages based on people’s unique lived
prompts. The technology has not nec- quality even more engaging. experiences told in stories embedded
essarily enhanced the artistic process; “These AI tools helped elevate with their emotions and memories.
Rosman just finds AI tools interesting. production, but human direction Yet Leeman also fears future gen-

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 15
news

erations “will likely lose elements of become more blurred by the day, if not
agency of their creativity through the entirely indistinguishable, said Lee-
further merging of AI with humans man. “Consider the profound power of
ACM in our social ecosystem as well as our
physiology.”
art. Be it visual art or writing, its power
relates to society’s ability to construct

Distinguished The UCL study also addressed con-


cerns over a potential loss of collective
cultural narratives that influence how
people believe and ultimately behave”

Speakers
creativity. Anil Doshi, one of the study’s and shape our perceptions of reality.”
authors, said that while the findings At some point, Leeman said, we must
indicate increased individual creativ- address the question of what will hap-
ity, “There is [the] risk of losing collec- pen when humans are no longer the
tive novelty. If the publishing industry sole creators of the images and stories
A great speaker were to embrace more generative AI- that shape our perceptions of reality.
inspired stories, our findings suggest Peker believes AI art will become
can make the that the stories would become less “even more interesting as artists and
difference between unique in aggregate and more similar
to each other.”
engineers collaborate on deeper, more
meaningful ways to use the technol-
a good event and Should AI Art Be Copyrighted?
ogy. Imagine dynamic art that evolves
with real-time data or interactive piec-
a WOW event! Alongside the discussion over AI’s role es where the artist and machine work
in art, there is a “gap” because the together in an almost symbiotic rela-
technology is evolving faster than the tionship.”
Take advantage of regulations and frameworks that gov- Perhaps the most practical way to
ern it, particularly when it comes to think about AI and the arts is to con-
ACM’s Distinguished intellectual property and authorship, sider the adage “If you can’t beat them,
said Peker. join them.” Rather than viewing AI-
Speakers Program She noted that the U.S. Copyright produced art as a threat, “We should
to invite renowned Office, for example, does not allow for see it as a tool for empowering human
AI-generated works to be copyrighted, creativity,” says Jeffrey “digitaljeff”
thought leaders in since only humans can hold copyright Castillo, a futurist, tech entrepreneur,
and the AI is not a human. and filmmaker. “AI may produce artifi-
academia, industry, “Even though AI cannot create art cial art, but tapping into human imagi-
and government to itself and requires a human to author nation and emotion will always yield
the prompts to generate visuals, the the most poignant, culturally relevant
deliver compelling unpredictability of AI’s output chal- works,’’ Castillo said. “The future of art
lenges the idea that these works can be depends not on AI vs. humans, but on
and insightful talks truly ‘authored’ by the individual who integrating our creative gifts to push
on the most important inputs the prompt,’’ Peker said. “This new frontiers of artistic expression.”
lack of legal clarity leaves a significant
topics in computing gray area in determining who the artist
Further Reading
is, and more importantly, how to pro-
and IT today. ACM tect creators’ rights.” Doshi, A.R. and Hauser, O.P.

will cover the cost This is an issue being faced by Colo- Generative AI enhances individual creativity
but reduces the collective diversity of novel
rado Springs, CO, artist Jason Allen,
of transportation for who caused a stir when he submitted
content. Science Advances. July 2024.
https://tinyurl.com/2ctftb4w

the speaker to travel an AI-generated painting to a digital


art competition and won. Allen used
Heikkila, M.
Why artists are becoming less scared of
to your event. Midjourney, a generative AI platform AI. MIT Technology Review. June 2024.
that has been accused of stealing from https://tinyurl.com/245rfvzo
several copyrighted works. Last fall, he Mineo, L.

speakers.acm.org filed an appeal in a federal court in Col-


orado, arguing that AI is a tool just like
If it wasn’t created by a human artist, is it
still art? The Harvard Gazette. August 2023.
https://tinyurl.com/24h5lovk
a brush or camera, and the U.S. Copy-
right Office was wrong in not allowing Art made by artificial intelligence
is developing a style of its own. The
him to register his work.
Economist. May 2023. https://tinyurl.
com/22r9vrgc
A New Form of Artistic Expression
A great challenge of the 21st century will Esther Shein is a freelance technology and business
writer based in the Boston, MA, USA, area.
be trying to discern between human
creativity and AI creativity, which will © 2025 ACM 0001-0782/25/6

16 COM MUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


news

Technology | DOI:10.1145/3722543 Gregory Mone

Bringing AI to the Edge


The drive to bring the processing of data closer to its users.

T
rail carrier
H I S Y E A R , U. S .
Amtrak will be installing
two novel inspection gate-
ways from Duos Technolo-
gies along its busy North-
east Corridor. The barn-like Duos
structures straddle railway tracks;
as passenger trains speed through
at up to 125 mph (201 kph), 97 cam-
eras and dozens of LED lights arrayed
around the sides, top, and bottom of
the tracks will capture thousands of
high-resolution images of the railcars.
These images are aggregated and pro-
cessed onsite in real time to present
a complete, 360-degree, highly de-
tailed view of the train. Artificial in-
telligence (AI) algorithms running on
Nvidia GPUs will analyze the images
locally; if the model flags a potential
structural or mechanical flaw, train The Duos portal aggregates and processes images from its 97 cameras onsite in real time,
personnel will be notified in less than providing a highly detailed view of each train passing through it.
a minute.
The Duos portal is one of many AI processors, like the on-device ver- There is computational capacity
new examples of what is loosely cate- sions from Qualcomm or those from available all the way from local devices
gorized as edge AI, or the deployment Hailo Inc., allow for real-time intelli- to the cloud, explained distributed AI
and operation of AI models outside of gent decision making, and there are researcher Lauri Lovén at the Univer-
massive cloud datacenters. multiple privacy and security benefits sity of Oulu in Finland. Exactly where
The precise definition of what con- to edge processing. These and other an AI model operates along this edge-
stitutes an edge varies. “There’s a factors have everyone from academic cloud continuum depends in part on
spectrum, from telecommunications computer scientists to technology gi- the use case. An autonomous vehicle
points of presence in major cities to ants racing to develop more efficient that has to make a rapid, real-time
smartwatches, smart home devices, means of pulling AI out of the cloud traffic decision is better off eliminat-
and Meta Ray-Bans,” says Shishir G. and closer to users. ing the cloud latency and generating
Patil, a Ph.D. student in computer sci- that result onboard. Conversely, a
ence at the University of California, consumer photo-editing application
Berkeley. “They all come under this The move to the edge powered by AI does not necessarily
pretty broad category of edge devices.” need to run on the user’s personal de-
Operating AI models at the edge reduces latency and vice—a latency-induced lag resulting
is challenging for a number of rea- eliminates the risk from spotty bandwidth would be per-
IMAGE COURTESY OF D UOS TECH NOLOGIES, INC. ( WW W.DUO ST ECH .COM)

sons. Typically, there is less compu- fectly tolerable.


tational capacity available, relative to of inconsistent or Raghubir Singh, an assistant pro-
the cloud. The power demands of AI unreliable bandwith fessor of computer science at the
models are much larger than those of U.K.’s University of Bath, suggests the
traditional applications, which puts because there is no nature of the given problem may de-
tremendous pressure on local hard- round trip to distant termine where, along the continuum,
ware, forcing mobile devices to ex- a task is computed. “There will be a
haust their battery power faster. Yet cloud datacenters. trade-off. Some problems you will be
the move to the edge also reduces la- able to solve locally using edge AI,
tency and eliminates the risk of incon- while others will need GPUs with a
sistent or unreliable bandwidth be- lot more processing power,” he said.
cause there is no round trip to distant “Think of it like a primary school.
cloud datacenters. Purpose-built edge You have a classroom teacher who

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 17
news

can answer most of your questions, but in the tens of cents per inference, if
then maybe there’s one that needs to not higher, especially for the bigger
be solved by someone outside the class- Patil and his models. Basically, the cloud providers
room.” A more complex problem or colleagues are are going to be bleeding money if they
task could be pushed to a more robust give away inferences for free.” But if
AI model in the cloud for resolution. developing more AI tools operate on edge devices
Singh cites security and privacy as techniques to shrink, like smartphones, the cloud provider
equally important variables. If a pa- will assume less of the electricity bill.
tient visits a local health center for a fine-tune, and Instead, the user will pay that cost by
checkup, and machine learning al- personalize models charging their phones more often.
gorithms can process data collected The latest smartphones—consid-
during the exam within that facility, so they can run on ered the extreme edge—are now being
then privacy concerns are minimized. consumer edge optimized for edge AI. The recently re-
“If that patient data is stored and as- leased Pixel phones incorporate new
sessed locally, and not going to the devices. Google Tensor G4 chips designed spe-
cloud, that sensitive information re- cifically for AI workloads. The mod-
mains secure,” he explained. els running on these phones are also
Economics have become increas- shrinking. When Apple announced its
ingly important as well. A personal- Apple Intelligence features, the com-
ized AI agent that understands your pany noted that the language model
preferences might be too expensive tune, and personalize models so they running on the device would have
to operate in the cloud, noted Berke- can run on consumer hardware like roughly 3 billion parameters, com-
ley’s Patil, because this would require smartphones and other edge devices. pared to cloud-based models that run
reserving high-end cloud compute The cost of operating the models to hundreds of billions or even a tril-
capacity just for the individual user, in the cloud is also incentivizing AI lion-plus parameters. Similarly, Me-
who would have to pay the compute leaders and cloud giants to push more ta’s Llama 3.2 release included light-
costs to keep the model ready even tasks out to the edge, explained Patil. weight models with one billion and
when it is not in use. On the other As advanced AI models grow larger, three billion parameters designed to
hand, the costs of a general AI model they consume more electricity and run on devices.
maintained in the cloud for mass con- drive up the price of each inference. These smaller, purpose-built mod-
sumption can be distributed more The most advanced large language els are far more suitable to the edge.
efficiently across all its users. As a models (LLMs) are one obvious exam- An advanced generative text-to-video
result, Patil and his colleagues are ple, according to Patil. “LLM inference model like OpenAI’s Sora requires
developing techniques to shrink, fine- today is super expensive,” he said. “It’s significant cloud compute capacity,

Milestones

Hoefler Awarded ACM Prize in Computing


ACM recently named Torsten The work of Hoefler and his The MPI-3 standard, in which “The capacity of high-
Hoefler, a professor and colleagues to scale network Hoefler played a leading role, performance computers has
director of the Scalable Parallel design and programming in was adopted in 2012 and made become mind-boggling,”
Computing Laboratory at ETH supercomputers revolutionized possible many critical advances said ACM President Yannis
Zurich, to receive the 2024 ACM the capabilities of these systems. in HPC for simulations and AI Ioannidis. “In just the last two
Prize in Computing for his Hoefler’s advances in applications. years, we have ushered in the
fundamental contributions to interconnection networks, Hoefler was among the era of exascale computers which
high-performance computing programming, and parallel first to develop and discover can perform a billion billion
and the ongoing AI revolution. algorithms broke new ground “3D parallelism,” which calculations per second. But
Hoefler developed many in facilitating the use of large- drives infrastructure design high-performance computers
core capabilities of modern scale massively parallel clusters. for the whole AI industry. The could do little without the
supercomputers and defined His innovations across the subsequent work of Hoefler underlying algorithms and
key aspects of the algorithms for supercomputer stack have and his collaborators resulted standards that allow them to
distributing AI models on them. pushed the boundaries of parallel in many innovative techniques process massive influxes of data.
The ACM Prize in Computing systems design and dramatically for efficient pipelining, sparse While Hoefler introduced many
recognizes early-to-mid-career improved supercomputer communication, model sparsity, of these innovations while he
computer scientists whose performance and scalability. and quantization. was still a student, they remain
research contributions have Hoefler played a major role Low-level network routing the definitive way to program
fundamental impact and broad in the evolution of the Message protocols and network topologies massively parallel systems
implications. Passing Interface (MPI), an Hoefler and his colleagues today. The resulting capabilities
High-performance computing informal industry standard for developed for networks such of these systems have allowed
(HPC) plays a critical role in AI exchanging messages between as Myrinet and InfiniBand now significant advances in AI, the
applications, which require a numerous individual nodes power thousands of AI and HPC natural sciences, and many other
great deal of computing power. throughout an HPC network. supercomputers. areas.”

18 COMMUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


news

but simpler AI-enhanced tasks like balance or readjust the distribution

Coming Next Month in COMMUNICATIONS


transforming text to speech could be of components.”
done with lightweight, efficient mod- This decision of where the pro-
els. “You don’t need a Lamborghini to cessing takes place could also be left
cross the road,” said Patil. to individuals, said Bayhan. Her re-
Electricity consumption is not just search group is developing methods
an economics problem, but an en- that would give control to the users,
vironmental one. Suzan Bayhan, an so they can decide where the compu-
associate professor on the Faculty of tation is done based on their own par-
Electrical Engineering, Mathemat- ticular needs or preferences. A user Special Section:
ics and Computer Science at the Uni- could prioritize sustainability, fast Sustainability
versity of Twente in the Netherlands, performance, or personal security
points to the sustainability of AI- and privacy. “For instance,” Bayhan Defying Moore:
specific edge devices or edge accel- explained, “if it is my health data, I
erators—technologies optimized for may not want this computation to be
Envisioning the
edge AI processing. “Deploying smart- done in the cloud, but closer to my Economics of a
er systems means you want to collect, home or on my personal device, for in- Semiconductor
process, and act on data, and you stance, my smartwatch.” Revolution through
want this computation to be closer to Although he notes there is some
12nm Specialization
the user,” she explains. “But if you are hype surrounding edge AI at the mo-
running this computation, especially ment, Singh suspects the combina-
some sort of AI model, on devices that tion of proliferating smart systems Generative AI Policies
were not actually optimized for these and devices and continued advances under the Microscope
kinds of workloads, they will use a lot in AI will lead to major edge AI con-
of energy.” tributions and breakthroughs in the As Government
The creation of new and more ef- next five to 10 years. “There are so
ficient devices could alleviate that many research avenues with edge
Outsources More
problem, she noted, yet this will drive AI,” says Singh. “It’s such an interest- IT, Highly Skilled
up consumption of raw materials and ing field, and I hope this work will Technologists
could lead to a different kind of sus- bring a lot of benefits and advantages Are More Essential
tainability problem because of newly and allow more people to utilize edge
outdated, discarded devices. Plus, AI models.”
even though AI models are driving
Fighting Crime Online
up electricity consumption in data-
centers—Goldman Sachs estimates Further Reading AI Authorship Revisited
a 160% increase by 2030—the cloud Meuser, T. et al.
still allows for the potential to control Revisiting Edge AI: Opportunities and The Rational
the problem. “When you have cloud challenges, IEEE Internet Computing 28,
no. 4 (2024): 49-59. Programmer:
or more centralized resources, then
you are benefitting from the efficiency Singh, R. and Gill, S.S. Investigating
of sharing among different parties,” Edge AI: A survey, Internet of Things and Programming
Cyber-Physical Systems 3 (2023): 71-92.
Bayhan explained, “but if we move to Language Pragmatics
devices with edge accelerators, we are Gunter, T., Wang, Z., Wang, C., Pang, R., et al.
Apple intelligence foundation language
limiting this benefit of economies of models, arXiv preprint arXiv:2407.21075 NeuroRadar:
scale and multi-tenant operation.” (2024).
Not all applications will move out A Neuromorphic Radar
Zhou, Z., Chen, X., Li, E., Zeng, L.,
to that extreme edge, and research- Luo, K., and Zhang, J.
Sensor for Low-Power
ers are working on novel methods of Edge intelligence: Paving the last mile of IoT Systems
determining where different tasks artificial intelligence with edge computing,
might be completed or even how to Proc. IEEE, vol. 107, no. 8, pp. 1738–1762,
Aug. 2019. SRAM Has No Chill:
break down and spread out different
tasks along the edge-cloud contin- Arulraj, J., Chatterjee, A., Daglis, A., Dhekne, A., Exploiting Power
uum. “We want to create a platform and Ramachandran, U. Domain Separation to
eCloud: A vision for the evolution of the
that allows you to distribute applica- edge-cloud continuum, in Computer, vol. 54, Steal On-Chip Secrets
tions within this compute continu- no. 5, pp. 24-33, May 2021.
um and dynamically optimize for,
say, latency, computational capacity, Plus the latest news about
Gregory Mone is the coauthor, with Daniela Rus, of
money, or resource usage,” said the The Heart and the Chip: Our Bright Future With Robots. digital dentistry, weather
prediction advances, and
University of Oulu’s Lovén. “Based on CS under Trump.
the situation, the platform would re- © 2025 ACM 0001-0782/25/6

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 19
O
opinion

DOI:10.1145/3726533 Steven M. Bellovin

Inside Risks
Computer Science
and the Law
Making a case for stronger influence and
overlap of technology and law.

T
HERE W E R E T H R E E U.S. All three of these developments technical risks of deploying a complex
technical/legal develop- could be and were debated on purely cryptographic protocol, anywhere in
ments occurring in ap- legal or policy grounds. But there were the world (and many other countries
proximately 1993 that had also technical issues. Thus, one could have since expressed similar desires).
a profound effect on the argue on legal grounds that the Clip- Sure enough, Matt Blaze showed how
technology industry and on many per chip granted the U.S. government to abuse the Clipper chip to let it do
technologists. More such develop- unprecedented powers, powers argu- backdoor-free encryption, and at least
ments are occurring frequently. ably in violation of the Fourth Amend- two other mechanisms for adding
The three developments were tech- ment to the U.S. constitution. That, of backdoors to encryption protocols
nically unrelated. One was a bill before course, is a U.S. issue—but technolo- were shown to have flaws that allowed
the U.S. Congress for a standardized gists, including me, pointed out the malefactors to read data that others
wiretap interface in phone switches, a had encrypted.
concept that spread around the world These posed a problem: Debating
under the generic name “lawful inter- What are the some issues intelligently required not
cept.” The second was an update to just a knowledge of law or of technol-
the copyright statute to adapt to the implications for ogy, but of both. That is, some prob-
digital age. While there were some copyright law if a lems cannot be discussed purely on
useful changes—caching proxies and technical grounds or purely on legal
Internet service providers (ISPs) trans- system has to be grounds; the crux of the matter lies in
mitting copyrighted material were no trained on more the intersection.
longer to be held liable for making il- Consider, for example, the differ-
legal copies of protected content—it or less everything ence between content and metadata
provided an easy way for careless or available on the in a communication. Metadata alone
unscrupulous actors, including bots, is extremely powerful; indeed, Mi-
to request takedown of perfectly legal Internet? chael Hayden, former director of both
material. The third was the infamous the CIA and the NSA, once said, “We
Clipper chip, an encryption device kill people based on metadata.” The
that provided a backdoor for the U.S.— combination of content and metadata
and only the U.S.—government. is of course even more powerful. How-

20 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


opinion

ever, under U.S. law (and the legal rea- implications for copyright law if a sys- was only laughable, not defamatory—
soning is complex and controversial), tem has to be trained on more or less but what if it were libelous?
the content of a phone call is much everything available on the Internet? Many issues involve international
more strongly protected than the Does the Berne Convention cover it? law or conflicting laws between differ-
metadata: who called whom, when, Who is liable if an erroneous answer ent jurisdictions. How does one bal-
and for how long they spoke. But how (sometimes incorrectly called a “hallu- ance freedom of speech, a core U.S.
does this doctrine apply to the Inter- cination”) libels someone? I have re- value, with the very understandable
net, a network that provides far more peatedly asked one such system for my desire to ban pro-Nazi speech in much
powerful abilities to the endpoints in biography. It has consistently gotten of Europe? Who should reconcile
a conversation? (Metadata analysis is information wrong—including my the different legal standards? What
not an Internet-specific phenomenon. major, alma mater, year of gradua- should ISPs do? What is the effect on
The militaries of the world have likely tion, published books, and more— everyone else, if, say, a search engine
been using it for more than a century.) even though that information is readi- in a NATO country decides it has to
You cannot begin to answer that ques- ly available on my own website. Others suppress information on Tank Man at
tion without knowing not just how the have had similar experiences. This Tiananmen Square, but perhaps only
Internet actually works, but also the in certain countries? How do those
IMAGE BY AND RIJ BORYS ASSOCIAT ES, USING SH UTT ERSTOC K.COM

legal reasoning behind the difference. policies in turn interact with virtual
It took more than 100 pages for some Many issues involve private networks, content distribution
colleagues and I, three computer sci- networks, and more? How do those lo-
entists and a former federal prosecu- international law cation-sensitive answers interact with
tor, to show how the line between con- or conflicting laws privacy legislation?
tent and metadata can be drawn in There are many more very difficult
some cases (and that the Department between different questions at the border of law and tech-
of Justice’s manuals and some federal jurisdictions. nology. Is Internet voting a good idea?
judges got the line wrong), but that in There are obvious technical risks, but
other cases, there is no possible line.1 there is the social good of increasing
Newer technologies pose the same turnout and the technical challenge
sorts of risks. Consider today’s hottest of preserving ballot secrecy. There is
technology, generative AI. What are the also the technical ability for people to

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 21
opinion

verify their vote was counted, but only ogy are international; what is legal in the Legally Speaking column by Pamela
if sophisticated cryptographic meth- one place may not be in another—and Samuelson and the Law and Technol-
ods are used to cast votes. Computer how to find the boundary in a service ogy column by James Grimmelmann
crime? What is unauthorized access? offering is itself a difficult question. on important legal developments; in
Does iterating through sequence num- We need people who understand all addition, ACM conducts a regular con-
bers in a URL violate the law? How un- of this. More significantly, we need ference (Symposium on Computer Sci-
predictable must customer IDs be? Is people who can keep current in both ence and the Law). But the challenge
cryptanalyzing a bad pseudorandom fields, because both change. remains: educating people who under-
number generator illegal? Should it be, There is progress. There are annual stand not just technology, but also law,
if that ability is used to gain access to legal workshops on privacy and cyber- policy, ethics, and all in an internation-
customer profiles and thus to violate security; they welcome technical pa- al context. (For part of that, a broad, lib-
their privacy? How bad must that gen- pers, too. Communications publishes eral education is necessary.) Combined
erator be, if cracking it is to be legal? majors will help, but for many students
The risks can be civil. I know of an it is difficult to fit in enough courses in
incident where a corporate takeover We need people who both fields. That said, without such peo-
was stymied because the lawyers in- ple, we are all at the mercy of systems
volved did not understand IP address- understand all of this. mandated by well-meaning legislators
based geolocation. That is, their lack More significantly, who do not understand the technical
of technical knowledge caused them risks of their proposals.
trouble, and they were not even aware we need people who
of what they did not know. can keep current in Reference
1. Bellovin, S.M., Blaze, M., Landau, S., and Pell, S. It’s
All of these questions pose consid-
erable risks to society. Lawyers alone both fields, because too complicated: How the Internet upends Katz,
Smith, and electronic surveillance law. Harvard J. of
Law and Technology 30, 1 (Autumn 2016);
cannot answer them; for the most both change. https://bit.ly/4iBIf6p
part, they do not know the technology.
But technologists alone cannot an- Steven M. Bellovin ([email protected]) is a professor
of computer science and affiliate law faculty at Columbia
swer them, because for the most part University, New York, NY, USA.
they do not know the law. Besides, the
Internet and other forms of technol- © 2025 Copyright held by the owner/author(s).

Wolf_Half_Page.indd 1 4/4/2025 12:11:41 PM


22 COM MUNICATIO NS O F TH E ACM | J U NE 2025 | VO L . 68 | NO. 6
O
opinion

DOI:10.1145/3726009 Peter J. Denning

The Profession of IT
In Large Language Models
We Trust?
Can large language models be trusted? Not likely.
Artificial neural networks offer a much better option.

I
“ F E V E RY B O DY A LWAY S lies to
you, the consequence is not that
you believe the lies, but rather
that nobody believes anything
any longer.” These words were
spoken by the political scientist and
philosopher Hannah Arendt in 1974.
She was discussing how totalitarian
governments, through their propagan-
da, might induce such wide cynicism
that people lose faith in the truth. Her
words resonate with today’s fears about
the dangers of rampant disinformation
generated by large language models
(LLMs) and spread by social media.
Today’s worry is not of a totalitarian
government, but a drift into a society
where people do not know what is true
and lose their trust in each other and in
the institutions that were established
to preserve stability. I offer here some lief you will honor our relationship, These three assessments are
reflections on trust and truth in the keeping me informed of any needed grounded in the history of our prom-
Internet with the aim of developing de- changes in our agreements and going ises to each other. In his book, Seven
fenses against this drift. out of your way to fulfill them should a Habits, Stephen Covey discusses the
challenging breakdown occur. Emotional Bank Account. Each ful-
Trust filled promise or act of kindness is a
IMAGE BY AND RIJ BORYS ASSOCIAT ES, USING SH UTT ERSTOC K.COM

In our social relations, trust means deposit. Each broken promise or in-
we believe a person will act with com- Many relationships, sensitive act is a withdrawal. Betrayals
petence, sincerity, and care. Suppose are expensive and damaging: It takes
I trust you. The competence assess- such as new ones only one or two to kill the trust earned
ment supports my belief you have the or interactions via a over many deposits. Unfortunately, by
skills and resources to do the job you reducing trust to the effects of transac-
promised. For if I doubt your skills website, must work tions, this metaphor hides the essence
or resources, I will not trust your at a low-trust level. of trust, which is that the parties take
promise. The sincerity assessment care of their relationship. If something
supports my belief you intend to ful- comes up that would violate their ex-
fill your promise. For if I doubt your pectations, they talk to modify their ex-
intent, I will not trust your promise. pectations to fit the new circumstance.
The care assessment supports my be- They will go to extraordinary lengths to

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 23
opinion

meet their expectation if a contingency rates as the clock speed is increased. Google searches now respond with an
or emergency arises. Their specs warn against running the “AI summary” which, because it was
Many relationships, such as new clock too fast. Well-engineered ma- generated by an LLM, might contain
ones or interactions via a website, must chines are likely to be trustworthy in hallucinations.
work at a low-trust level. Low-trust par- their operating ranges. My colleague Espen Anderson (Uni-
ties have a mutual agreement to follow LLMs challenge this standard engi- versity of Oslo) has suggested evaluating
basic transactional rules with precise neering practice. The problem is “hal- trust in the context of the kinds of jobs
protocols for requests, promises, and lucinations”—responses that contain LLMs can do. When would you hire (pay
declarations. Consider a purchase errors or fabrications, often presented money for) an LLM to do a job? LLMs
transaction with a merchant. The mer- as authoritative statements. Hallucina- can do three kinds of jobs: conversa-
chant offers a service at a price (declara- tions appear to be inherent in LLMs. tion, distillation, and fabrication. These
tion). The customer asks the merchant LLMs have no means to verify that distinctions can help focus when I am
to perform the service (request) and statements they generate are true. Us- willing to trust an LLM. I might trust an
agrees and make the payment (prom- ers are left to determine for themselves, LLM to have a companionate conversa-
ise). After the service is complete, the from other information available to tion with me, but not to provide accu-
customer ends the transaction with an them, whether an LLM output is truth- rate biographical information about
acceptance (another declaration). Af- ful. To date, researchers have been un- someone. I might trust an LLM to make
ter several successful transactions, the able to define “safe operating ranges” a useful summary of a book, but not to
customer and the merchant develop a for LLMs or to find internal constraints generate an accurate transcript of my
mutual confidence that the other will that eliminate hallucinations. It seems session with a doctor. I might trust an
do their part. A higher-trust relation- unlikely that anyone is going to find re- LLM to generate interesting new imag-
ship emerges. Now the merchant lets strictions on prompts that guarantee es, but not to prevent deepfakes.
the customer delay a payment or re- hallucination-free responses. It is more
ceive a customized service. A relation- likely that, through extensive testing, Truth
ship can evolve to high-trust when the an LLM can be rated with a hallucina- Trust issues with LLMs bring us to
competence and sincerity assessments tion likelihood. Early experiments of deeper and more difficult questions.
are taken for granted, and the belief this kind show the hallucination rate What is truth? Can we test whether
that the other will take care becomes can sometimes be driven down to ap- a claim is true? How does science do
unconditional. proximately 15%, which is still too high this? Science has evolved rigorous ap-
for critical applications. proaches to determining what is true
Taking It to the Machine How does a human know when an about nature. A good treatment of this
We take this understanding of trust to LLM hallucinates? Sometimes it is dynamic process can be found in the
machines by mapping the assessments obvious because the LLM response 1987 book Science in Action,a by the phi-
into testable metrics. A machine that makes no sense in the context of what losopher Bruno Latour.
passes all the tests is called trustwor- the human knows. More often it’s Latour’s main claim is that a hypoth-
thy. Competence maps to tests that a not obvious—for example when LLM esis moves from its birth as a specula-
machine meets all its technical speci- makes a claim and gives nonexistent tion or hunch to maturity as a settled
fications; the machine is competent if citations. In those cases, the human scientific fact by a social process of ac-
it behaves properly on every use and, in must consult other sources, for ex- cumulating allies. An ally is a person
case of failure, degrades gracefully. Sin- ample a Google search, seeking cor- who accepts the claim. Over a period of
cerity maps to tests that the machine’s roborating or refuting evidence. But experimentation and debate, the claim
builders have kept their word on the searching the Web for truth is prob- gains followers and loses dissenters.
machine’s functions, have been trans- lematic. Much information is incor- Claims that cannot withstand the as-
parent about the development process, rect and has been absorbed into the saults of doubters are discarded. Sci-
and have a following of satisfied cus- LLM when it is trained. In addition, ence has developed rigorous standards
tomers. Care is different. We cannot for when experimental results support
map care to machines because they claims. And, of course, any settled
cannot care at all. We take this claim is falsifiable—it can be reopened
This means we use machines in a if new, contrary evidence surfaces. In
low-trust mode, as enactors of trans- understanding of other words, scientific truths are social-
actions, without presuming that the trust to machines ly constructed as the scientific commu-
machine cares about any transaction nity tests claims and comes to agree-
or any customer. Machines can be very by mapping the ment that they hold.
useful even if they do not care about us. assessments into Some scientists are uncomfortable
We take great pains to learn the ma- with Latour’s claim that scientific facts
chine’s “safe operating range”—when testable metrics. are socially constructed. Science is sup-
it behaves according to expectation
and when it is likely to fail or generate a See B. Latour, Science in Action: How to Follow
errors. Chip-making provides an ex- Scientists and Engineers Through Society. Har-
ample. Chips are tested for their error vard University Press, Cambridge, MA (1987).

24 COMMUNICATIO NS O F TH E ACM | J U NE 2025 | VO L . 68 | NO. 6


opinion

posed to be discovering immutable can produce a large change of output.


truths about nature. However, we can- Even after a lot of training, there is no
not see nature directly. We can only see I might trust an bound on how much error the network
it through our senses and instruments, LLM to have a might generate on predicting the name
and what we see is shaped by our inter- of a new image or a slightly modified
pretations. To overcome differences of companionate version of a trained image. This prob-
interpretation, scientists debate and conversation with lem has rightfully been called fragility.
test claims until everyone agrees they The recurrent network structure
are true. Even that is not final: what is me, but not to provide of an LLM amplifies the fragility of its
taken as true can change. For example, accurate biographical core ANN trained on text data that does
in the mid-1800s physicists generally not conform to a continuous bounded
believed light moves in an ether and the information about function. Some scientists have said that
measured speed of light depends on the someone. the hallucination problem for LLMs is
relative motion of the observer and the so deep that “LLMs always hallucinate.
light source. These assumptions were Sometimes they are right.”
called into question after 1880 because Even so, the evidence from science
not even the most sensitive instrument is encouraging for scientific applica-
was able to detect an ether. In the early tions of ANNs because ANNs are not
1900s, Einstein postulated, in his theo- and the function itself have specific up- fragile when used to predict values of
ry of relativity, that light speed is always per and lower limits. When an ANN is continuous bounded functions; and
measured the same by every observer trained on a large sample of input-out- the are fast approximators for short-
regardless of motion. As experiments put pairs collected from observations range predictions.
confirmed Einstein’s theory, the old be- of a continuous bounded function,
lief in ether and variability of measured the theorem says that there is an error Conclusion
light speed disappeared. bound E such that the network output LLMs are not trustworthy because they
In practice, then, to establish that for any within-bounds input, whether hallucinate. LLM hallucinations are in-
something is true, we need to present or not in the training set, is within E evitable because of the cyclic feedback
enough evidence that observers will ac- of the correct output. The error bound structure for the core ANN amplifies
cept the claim. Only when there is suf- E diminishes with larger training sets the fragility of their core ANNs on dis-
ficient evidence will they accept. When and networks. continuous training data. Detecting
we apply this to LLMs, the problem is This is useful in science because hallucinations, and possibly correcting
finding independent sources that cor- many natural processes can be accu- them, is difficult because all the usual
roborate the claim, which may be diffi- rately modeled as continuous bounded independent sources in the Internet
cult because all evidence visible on the functions. Many, but not all. A process are already included in the training
Internet was in the training data. containing exponential components is data. Detecting hallucinations is fur-
unbounded. A process containing cha- ther complicated because truth in the
Artificial Neural Networks otic components fails to be continuous Internet tends to be whatever a group
Can Be Trustworthy in the chaotic regions. Weather fore- says it is. The processes of science—ex-
The core of an LLM is an artificial neu- casting is of this kind. Some weather tensive testing and debating of hypoth-
ral network (ANN) that outputs a highly phenomena are chaotic (turbulence) or eses aiming to determine their truth–
probable word that continues a prompt. unbounded (wind speeds in a tornado). are the most reliable means we have
The LLM feeds each new output word To overcome this, forecasts are com- of ferreting out truth. Unfortunately,
back into the prompt and generates a posed from the predictions of several these processes take time. There may
new output word from the modified numerical models run in parallel. The be no reliable way, in real time, to rap-
prompt. This cyclical feedback mode of accuracy is quite good for short-range idly “look up” the truth via an Internet
using the core, called “recurrent neural forecasts (a few days) but decays for search for evidence to corroborate an
network,” can amplify errors in the em- longer ranges (weeks). Researchers are LLM’s claim. Hannah Arendt would be
bedded ANN. finding that ANNs are acceptably accu- alarmed at watching Internet commu-
However, ANNs used on their own rate for short-range weather predictions nities drift into a dysfunction because
may be much more reliable. A “uniform but their reliability decays beyond a few no one would trust anything communi-
approximation theorem” for ANNs days because of chaotic events that are cated to them through the Internet.
says an ANN with sufficient training better tolerated with traditional numer-
data and sufficient capacity of nodes ical models. These ANNs are much fast- Peter J. Denning ([email protected]) is Distinguished
Professor of Computer Science at the Naval Postgraduate
and connections can approximate any er than traditional numerical models. School in Monterey, CA, USA, is Editor of ACM Ubiquity,
continuous and bounded function ar- Another problematic use of ANNs and is a past president of ACM. His most recent book is
Navigating a Restless Sea: Mobilizing Innovation in Your
bitrarily closely. The words continuous arises when the training data is not Community (with Todd Lyons, Waterside Productions,
and bounded are important. Continu- from a continuous function, such as 2024). The author’s views expressed here are not
necessarily those of his employer or the U.S. federal
ous means that a small change of in- a map from images to names. In such government.
put produces a small change of output. cases, a small change of input (such as
Bounded means that each parameter modifying just a few pixels in an image) © 2025 ACM 0001-0782/25/6

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 25
O
opinion

DOI:10.1145/3726485 George V. Neville-Neil


Article development led by
queue.acm.org

Kode Vicious
Analyzing Krazy Kode
Accounting for the emotional state of the person who wrote that code.
Dear KV,
Sometimes, when I am reading other
people’s code, I almost feel I can hear
what they were thinking. But then,
sometimes, it seems what they were
thinking was completely alien, or pos-
sibly even insane. Considering that
computer languages are more con-
strained than human ones, I am sur-
prised at the wide variation in the code
I am exposed to, even within the same
company or group.
Somehow, I thought that—with
things like modern coding standards—
code would start to look more alike, but
that seems to be all form and not func-
tion. How do you deal with all the vari-
ous codebases you must be exposed to?
Variously Confused

Dear Variously

IMAGE BY AND RIJ BORYS ASSOCIAT ES, USING PH OTOS BY T RUE TOUCH LIF EST YLE/SH UTT ERSTO CK. CO M
While recent languages that dictate and look at your own code and think this, then this, then this) and C, which
the form of the code—and here I am back on how you felt when you wrote it. remains his day-to-day language. That
thinking about Go and, to some extent, Anyone reading KV code will see is because KV angry code has a staccato
Rust—can enforce some visual simi- that it continues to be tinged with the feel to it—lots, of, short, variables, and
larity, the process of reading someone sins of his past writing in assembler (do comments. KV long ago learned not
else’s code is probably still more akin to only to check the spelling of his com-
psychoanalysis. This applies as well to ments but also to see if there were any
systems that reformat the visual repre- Variables and other things that might cause future
sentation before the code is committed embarrassment.b
for C and C++. KV often wonders if code functions are named KV’s only emotion besides anger is
annotations from the DSM-5 (Diagnostic logically, of course, calm, and calm code has a more playful
Standard Manual)a would be appropri- feel. Variables and functions are named
ate in a code review. but they may also logically, of course, but they may also
It is true that code can tell you a lot contain a bit of contain a bit of whimsy, along with
about a coder, including their age, what comments written in full, grammati-
language they write in most often, and whimsy. cally proper sentences.
their emotional state when the code Why should it matter if the code
was written. If you doubt me, go back was written when you were sad, mad,

a See https://bit.ly/4iBaGkV b See https://bit.ly/4kVRmQT

26 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


opinion

or glad? It turns out your mental state


when solving a problem often has a
bearing on the solution and how cor- Why should it matter Career & Job Center
rect or complete it is. Code written in if the code was
a rush of anger is often incomplete be-
cause, once the anger has been vented, written when you The #1 Career Destination
coders typically lack the emotional en- were sad, mad, or to Find Computing Jobs.
ergy to go back and check their work.
The idea was just to smash the bug, glad? It turns out your
commit, and be done. mental state when
Conversely, when code is written in
a happy state there is a better chance solving a problem
the code (and even the tests) is complete often has a bearing
because the person who wrote it was in
a great mood and took joy in the work. on the solution
If you ever find someone who continu- and how correct
ously creates code in a happy flow state,
you should work with that person. But or complete it is.
you also might want to make sure it is a
real flow state.
There actually are approximately six
or seven emotions, or so I am told. But Connecting you with
the one state you should really try to is what leads back to the angry style of top industry employers.
avoid is confusion, which is not actu- coding mentioned earlier. And, as we all
ally an emotion but instead a state of know, anger leads to the dark side.
mind. Code created by a confused mind Code, like prose, has style and col-
shows itself in the randomness of nam- or and subtlety—if we are sensitive Check out these new features
ing, which is not handled by modern, enough to perceive it. When you open to help you find your next
fascist, programming languages like a function and start reading, try to computing job.
Go. Sure, you may have your names in think not only, “What does this do?”
the proper case and your spaces in the but also, “How do I feel about reading Access to new and exclusive
proper place, but you can still name a this code?” Does it make you mad, sad, career resources, articles,
function PublicThingTwo() if you want glad, or something else? Knowing how job searching tips and tools.
to, and this is a sure sign of trouble. the code makes you feel will better pre- Gain insights and detailed
Now that variables are not often named pare you to understand it because you data on the computing
i through n for integers, except in loops, will have an important clue into the industry, including salary,
they can also give clues to the state mind of the previous author. And, as job outlook, ‘day in the
of the coder’s mind. And beware the KV has pointed out many times, code is life’ videos, education, and
code that contains blob, new_blob, and meant to communicate our intentions more with our new Career
blobNG all in the same function. to other humans. Insights.
The overall structure of a set of func- KV Receive the latest jobs
tions also shows whether or not the delivered straight to your
coder was confused. The presence of inbox with new exclusive
Related articles
lots of tiny functions that do very little Job Flash™ emails.
indicates a lack of understanding about Coding Smart: People vs. Tools
Donn M. Seeley Get a free resume review
how to structure the code as well as a from an expert writer listing
https://queue.acm.org/detail.cfm?id=945135
paranoid mindset that clearly expected your strengths, weaknesses,
every little operation to fail since each Manual Work Is a Bug
Thomas A. Limoncelli and suggestions to give you
was placed in its own function so as to https://queue.acm.org/detail.cfm?id=3197520 the best chance of landing
allow for easier debugging. Overlong an interview.
Overly Attached
functions, on the other hand, exhibit Kate Matsudaira Receive an alert every time
a sense of hubris, or suggest a very old https://queue.acm.org/detail.cfm?id=3335594 a job becomes available
programmer who dates back to the days that matches your personal
when function calls were expensive. Age George V. Neville-Neil ([email protected]) is the proprietor of profile, skills, interests, and
is almost excusable, but hubris is not. Neville-Neil Consulting, Brooklyn, NY, USA, and co-chair of preferred location(s).
the ACM Queue editorial board. He works on networking
Hubris goes before a fall—or, in KV’s and operating systems code for fun and profit, teaches
world, a kernel panic, which means courses on various programming-related subjects,
and encourages your comments, quips, and code snips Visit https://jobs.acm.org/
wading through a 5,000-line function pertaining to his Communications column.
while trying to keep all the context in
your head for debugging purposes. This © 2025 Copyright held by the owner/author(s).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 27
ACM-PrintAd-2.25 x 9.5.indd 1 22/07/2021 10:25 PM
O
opinion

DOI:10.1145/3713068 Sebastian Nicolajsen and Claus Brabrand

Opinion
What Is Programming?
... and what is programming in the age of artificial intelligence?

T
when visit-
WO Y E A R S AG O,
ing research colleagues in
Uppsala, Sweden, we were
asked a deceptively simple
question: “What does it mean
to program?” For context, one of us had
just completed academic education
and training in computer science and
was already deeply involved in actually
teaching introductory programming
(CS1). Arguably, he was (and still is) ca-
pable of programming. The other com-
pleted his Ph.D. degree in 2003 and has
been teaching programming (in various
forms) ever since. Yet, the question took
us by surprise; after all: “What is pro-
gramming?” To the reader, it may ap-
pear a ridiculous question to ask for two
reasons. You may find yourself having
a very clear and succinct definition and
conceptualization of (the idea of) pro- convenient for humans to a language ports the matters at hand; that is, an un-
gramming. Or, you may question what convenient for the computer. If we, as a derstanding of the “problem domain.”
value it brings to discuss such a trivial community, adopt these narrow defini- Thus, programming is not simply what
question. tions, the entirety of programming may is manifested as program code; it is also
However, we believe the answers well be made redundant by the cease- the knowledge of how the program in-
to this question may shed light on the lessly growing capabilities of generative teracts and develops alongside the do-
future of programming in the age of AI. However, during the 1970s, Donald main of the problem. Recently, Amy Ko
generative artificial intelligence (AI). Knuth, and similarly Edsger Dijkstra, et al. have taken this further and argued
We approach an answer to the ques- argued programming was something for the need to also consider the societal
tion by an exploration of the history of more.2,4 They argued programming impact of our programs, including limi-
computing as well as opinions among was the “art” of composing programs, tations, biases, and ethical consider-
contemporary introductory program- which requires awareness of aesthetics, ations embedded within the programs.5
ming educators. use of ingenuity, and inherent creativity. Thus, one can reduce programming
From this perspective, programming is to the act of writing instructions, but
In Theory, Programming Is ... more about composing pieces of code when taking the art and context into
Alan Blackwell’s work takes us through to solve a problem rather than writing consideration, it becomes more about
the historical definitions of program- individual lines of instructions. In 1985, the process of composing and con-
ming.1 Initially, it is presented as the Peter Naur further nuanced the view structing something that, in a sense,
process of drawing up the “schedule” of programming for problem solving.6 exists within a domain. Thus, current
IMAGE BY CRO MAGNON

for the sequence of individual opera- Specifically, he argued that a program’s definitions both encapsulate the act of
tions required to carry out a calcula- “life” depends on its developers foster- writing programs and the surrounding
tion. Later definitions expand this to the ing an understanding of how it [the ingenuity of composing pieces to solve
process of translating from a language program they develop or maintain] sup- problems.

28 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


opinion

Research in computing education tion of code (6%), or automation (5%).c


tells the same story: The obstacles of According to introductory program-
learning to program include so much We argue ming educators: the definition(s) of pro-
more than (just) writing individual in- programming gramming largely focused (narrowly)
structions; the challenges are broader on the ability to instruct a computer. In
about reading code, composing code, concerns the other words, the educators neglected to
planning structure, solving problems, impalpable qualities emphasize the broader programming
and validating solutions.9 context discussed previously. It is plau-
described here, which sible that respondents responded with
In Practice, Programming Is ... are far less obvious impulse rather than well-considered
Let us change the perspective from reflection. We may also speculate that
theory to practice and consider what to outsource to the responses simply articulate the
educators perceive programming to generative AI. “least common denominator,” more
be. You may ask why we focus on educa- easily accessible to a less computing-
tors. For a formal (practical) definition informed audience; as in deliberately
of “programming,” surely one ought to simplifying the act of programming to
turn to practitioners: developers, engi- make it more easily understandable by
neers, and programmers? Educators, an outside audience. However, the ob-
however, are often the ones who, at least programming has emerged in higher servation still stands: we tend to reflect
in the context of Denmark, facilitate a education for an entire country (Den- upon programming as the “act of writ-
student’s first encounter with program- mark). The result includes 130 manda- ing a program” and do not consider the
ming (in our CS1 course, half of the stu- tory courses introducing programming broader impalpable qualities. However,
dents immatriculate without prior pro- shared across 175 programs. By distrib- one thing is certain: there is no singular
gramming experience). Consequently, uting our survey to the 146 educators hegemon definition.
educator perspectives will determine teaching these courses, we obtained 81
how novices will face the learning of responses (55%): from teachers (26%), A Good Programmer Is ...
programming (in the era of generative course assistants (67%), and others, The future of programming is not ex-
AI). Thus, we find it important to explore such as heads of study program (7%). clusively determined by our definition
their viewpoints as they may make or The respondents are involved in 61% of of programming; one must also con-
break the next generation’s understand- the identified program (106 of 175) span- sider the role of a programmer. Today,
ing of the topic. ning the majority of disciplinary areas the concept of a “programmer” has to
Therefore, we recently conducted (as defined by the governmental educa- be broader than the traditional office-
a survey wherein we questioned what tional register). Thus, we, unsurprising- dwelling professional developer, ex-
introductory programming educa- ly, given the broad disciplinary coverage, panding to include end-user developers
tors believed programming, and be- see a more balanced gender diversity (in or vernacular/conversational program-
ing a good programmer, was through this case, around 2:1) than the 4:1 ratio mers, not necessarily identifying as
two open-ended questions. We spe- associated with ICT in Europe.3 At last, “programmers” even though they may
cifically focused on Danish higher we find that 40% of the respondents are be doing exactly that: programming. In
education as course-, program-, and originally from other countries than short, a cohort of “programmers,” who,
teacher-information (who is teaching Denmark. Therefore, we conclude that on a daily basis, develop and maintain
what) is public. Further, computing the responses collected are indeed rep- programs, scripts, and software that
classes are not mandatory in primary resentative of a very diverse group of drive crucial business functions, poten-
nor all of secondary education, mean- educators. tially in unconventional programming
ing that higher education is the edu- By categorizing the responses ac- environments and the group of individ-
cational frontier where students first cording to the use of keywords,b we uals who must symbiotically collaborate
encounter programming. Combined find that 36% define programming in with professional developers to develop
with the limited size of the country, terms of its ability to solve problems. and maintain software, requiring some
this allowed us to comprehensively in- 35% focus on writing lines of code/ degree of technical knowledge.10
vestigate 1,169 programsa and thereby instructions in their definition of pro- Given the (now) broader definition
aggregate responses from a broad and gramming. While some answers did of a programmer, it is important to ac-
representative population of educators focus on other qualities of program- knowledge the potential differences
across disciplines. ming, the aforementioned appeared to between a good vernacular and pro-
By investigating the curricula of all be the dominant (narrow) attributions fessional programmer. Note that the
mandatory courses on these programs, of programming. Other dimensions respondents of our survey teach pro-
we identified educational programs involved: the importance of the com- gramming with various intentions; in
that introduce programming. This con- puter (28%), algorithms (10%), using a particular, training both professional
stitutes a “snapshot” of how mandatory programming language (9%), execu- software engineers and conversational

a Obtained from a publicly available education- b For example, inductively identified keywords c Please note definitions are coded to multiple
al program repository: https://ug.dk such as “problem-solving” or “computer.” categories as they express multiple attributes.

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 29
opinion

programmers. Without further con- gular definition of programming, nei-


text, we asked them what it means to ther do we believe it to be productive.
be a good programmer. While there are Instead, we hope that all will carefully
differences between “doing program- consider what programming is, and, in
ming,” and “being a (good) program- turn, make this explicit to those around
mer” we find this a useful question to us. We argue programming concerns
reflect further upon what we are teach- the impalpable qualities described here,
ing the next generation. Conducting the which are far less obvious to outsource
same categorization as with the previous to generative AI.
open-ended question, educators argued One way of achieving this is by incen-
problem-solving plays an important role tivizing non-use of generative AI among
Peer-reviewed (20%). However, and more interestingly, students in early stages of program-
qualities such as the ability to abstract ming education to develop an intricate
Resources for and decompose (20%), structure and understanding of programming; for
create readable code (20%), reason and example, by using proctored written
Engaging Students work logically (11%), and creativity (9%) or oral student assessment (to educate
appeared in their answers—something craftsmen). A different approach is to in-
we did not see when they defined pro- crease focus on teaching analytical pro-
gramming. Further, there was little to gram comprehension while applying
no mention of the importance of writing generative AI to synthesize instructions,
the instructions for the computer. akin to Porter and Zingaro7 (to educate
EngageCSEdu Thus, when making educators con- editors). While there exists a multitude
sider the qualities of a good program- of potential directions, it all starts with
provides mer, whether vernacular or profession- having the conversation with students
faculty-contributed, al, their responses, to a larger extent,
incorporate the impalpable qualities
and colleagues about what program-
ming is. Thus, we plea that educators
peer-reviewed, and associated with literature definitions
of programming. Albeit the keywords
and advocates commmunicate the in-
tangible qualities of programming that
ACM DL indexed mentioned have changed, there again
seems to be no consensus regarding
will continue to be crucial—even in the
age of AI.
Open Educational which ones are most important.
References

Resources (OERs)
1. Blackwell, A.F. What is programming? In
Programming in the Age of AI Is ... Proceedings of the 14th Workshop of the Psychology of
While the definitions in literature Programming, 14. Citeseer (2002).

for a variety
2. Dijkstra, E.W. A short introduction to the art of
clearly emphasize the more impalpable programming. Vol. 4. Technische Hogeschool
qualities of programming, educators Eindhoven Eindhoven (1971).

of computer do not—at least not until we consider


3. Eurostat. ICT specialists in employment (2022);
https://bit.ly/3FX1ad7
4. Knuth, D.E. Computer programming as an art. In ACM
what it means to be a (good) program-
science courses. mer. Further, there does not appear to
Turing Award Lectures (1974).
5. Ko, A.J. et al. It is time for more critical CS education.
Commun. ACM 63, 11 (Nov. 2020).
be consensus among educators. While 6. Naur, P. Programming as theory building.
this discrepancy is not necessarily prob- Microprocessing and Microprogramming 15, 5 (1985).
7. Porter, L. and Zingaro, D. Learn AI-Assisted Python
lematic in a community where we are Programming: With Github Copilot and ChatGPT.
well versed in the intricacies of such Simon and Schuster (2023).
8. Prather, J. et al. The robots are here: Navigating the
definitions, it could well be problem- generative ai revolution in computing education. In
atic for those now learning program- Proceedings of the 2023 Working Group Reports on
Innovation and Technology in Computer Science
ming. Are we simply telling students Education (2023).
that programming is the act of writing 9. Robins, A., Rountree, J., and Rountree, N. Learning
and teaching programming: A review and discussion.
instructions? Are we even explicating Computer Science Education 13, 2 (2003).
what “programming” is? If not, it would 10. Shaw, M. Myths and mythconceptions: What does it
mean to be a programming language, anyhow? In
be reasonable to expect students unfa- Proceedings of the ACM on Programming Languages
miliar with the domain to attempt to 4 (2022).

engage-csedu.org outsource all their tasks to generative


AI. As stated by Michael Caspersen,
Sebastian Nicolajsen ([email protected]) is a Ph.D. student at
the IT University of Copenhagen, Denmark.
“ ... it will be a danger for undisciplined Claus Brabrand ([email protected]) is a professor at the
students who are seeking the easy way IT University of Copenhagen, Denmark.

out.”8 We believe the lack of consensus


The authors thank the late Brett Becker for his invaluable
among the perspectives gathered and advice and conversations that sparked the beginning of
presented here represent a testimony this Communications Opinion column.
to the absence of such a conversation.
With this, we do not intent to find a sin- © 2025 Copyright held by the owner/author(s).

30 COMMUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


engagecsedu-cacm-third-vert-ad-2025.indd 1 3/26/25 12:52 PM
O
opinion

DOI:10.1145/3716868 Stephen J. Andriole and Noah P. Barsky

Opinion
Thoughts about Some
Surprising AI-Era Technology
Readiness Findings
Nine steps can help spotlight, clarify, and
address risky digital governance gaps.

W
of
I T H T H E A S S I S TA NC E
Arthur D. Little and the
Cutter Consortium,
we surveyed business
executives about their
business-technology plans, readiness,
and priorities.a The results surprised
us—a lot. We learned that nearly 80%
of the companies still believe in some
form of the central control of technol-
ogy, and that only 5% of the companies
are decentralized, where the lines of
business are free to pursue technology
initiatives on their own. Slightly more
than half of the surveyed companies
do not track emerging technologies
or maintain technology watch lists.
Innovation? Only 17% of companies
have a well-funded, well-defined inno-
vation strategy, and 34% have no inno-
vation strategy at all.
What about artificial intelligence
(AI)—the technology driving major
changes in business models and pro- cesses? We discovered that only 20% respondents are in technology, 37% in
of companies define AI initiatives as senior management, and 14% in con-
a More than 70% of the 150+ respondents held high priority. Approximately 77% said sulting. The majority of respondents
senior positions, such as CEOs, CIOs, CFOs, they had only looked at generative AI are in the technology consulting (12%),
CTOs, SVP, EVPs, and senior IT management. (GenAI) briefly or not at all, and only financial services (12%), management
The majority of the respondents were from
North America (54%); Europe was second
26% of companies believe they are consulting (10%), and higher educa-
(24%). The industries that responded the most “ready” for AI. We assumed there was tion (8%). Expressed a little differently,
IMAGE BY VISUA L GENERATION

were financial services (15%) and computer more interest in AI than meets the eye, the survey was representative across
consulting (13%). Beyond that, there was rep- but the survey revealed that 27% of industries and executives.
resentation across many verticals split almost board directors have little or no inter-
evenly—40% of the companies have revenue of
$1M–$50M and more than 20% have revenue
est is AI, and 73% of boards have little Analysis and Interpretation
more than $50M; 30% have revenue exceeding or no interest in technology strategy. Like so many of us in the business-
$1B; and 8% have revenue more than $50B. The survey data revealed 30% of the technology world, we are accustomed

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 31
opinion

to failure. As we all know, technol- indicated that their companies still panies’ odds to deploy and leverage
ogy projects fail at astonishingly manage technology functions cen- emerging technologies.
high rates.4,5 The massive enterprise trally, and only 14% shared technology Boards of directors. Survey results
resource planning (ERP) business— management with the lines of busi- suggest that only 27% of the compa-
$75B by 2030—routinely experiences ness. These results indicate that nearly nies regard their boards as proactive.
failure—with the well-known Gartner 80% of the companies relied on tradi- More than 73% of companies report
Group’s failure rate estimate at 75%.2 tional “command and control” cen- that their boards are only reactive or
Customer relationship management tralization to manage the IT function. have little or no interest in technology
(CRM) programs fail at nearly the There is some good news. In al- strategy. That said, there is one area
same rate.3 Perhaps the harshest find- most 70% of the companies, the CIOs where boards are proactive and reac-
ing is that “90% fail to deliver any mea- and CTOs report directly to the CEO. tive: cybersecurity, where more than
surable ROI.”1 But despite high rates It appears that the days of IT report- 90% of boards are engaged in cyberse-
of failure, companies must leverage ing to CFOs (only 10%) and COOs (12%) curity threats and investments. These
emerging technology as the power of are over. This news is mixed, however, findings highlight the problematic
these technologies continues to rise. since reporting to the CEO reinforces gaps between executive decision mak-
While it was not one of the survey ques- centralized management, which has ing and board influence.
tions, we believed that the frequency huge implications for innovation. Cybersecurity. When asked about
of technology project failures would Technology adoption. The data also cybersecurity, the answers range from
drive more progressive approaches to suggests the majority (51%) of compa- self-described “innovators” to “unpre-
business-technology optimization. nies—including the ones that regard pared.” Remarkably, 29% of respon-
We expected to see survey results that technology as strategic—do not have dents selected either “insufficiently”
were “modern,” and at least tacitly ac- defined processes in place for adopt- or “don’t know” when asked about
knowledged that it’s time to do better. ing new emerging technology. This is their cybersecurity funding. Is cyber-
But the data suggests many com- an extraordinary—and dangerous— security funding an effort to dodge
panies still view technology as a busi- “wait-and-see” mind-set. The data also what “might go wrong” or assuring
ness tool rather than an enabler of suggests only 29% describe themselves “what must go right”?
competitive advantage. Particularly as innovators or early technology Strategy. Why do companies in-
surprising were results observed in adopters. This tracks with the lack of vest in technology? What competi-
technology adoption and awareness, defined innovation processes and the tive advantage are they seeking? The
cybersecurity, innovation, and board inability to be technologically agile. responses here were equally surpris-
readiness, where we observed tech- The data suggests the vast major- ing, and suggest some digital era
nology awareness and commitments ity (70%) of companies see technology challenges. As business moves to-
were much more tepid than antici- solutions as requirements-driven not ward “all-digital” status—that is, 24/7
pated, especially in the age of AI. Most technology-driven (12%). Inexplicably, technology-enabled business process-
companies are struggling to articulate 14% answered “neither” or “not sure.” es—only 19% of companies see their
AI plans, as responses do not describe While a requirements-first approach business and technology strategies
AI as a strategic differentiator or an to technology investment sounds as intertwined. Thirty-five percent be-
investment priority. Similarly, compa- prudent, it’s important to remember lieve they are “very different” and 46%
nies do not see innovation as a priori- that many technology solutions are see business and technology strate-
ty, despite how many companies claim “discovered” by prototyping and the gies as somewhat related.
to be innovation leaders. encouragement of so-called “science Motivations. Motivations vary too.
None of this suggests companies projects” where technologies—espe- While “digital transformation” still
have learned enough lessons from past cially such as AI—are “product tested” has conceptual appeal, cost reduction
experiences. Despite problems with before they are assigned for duty. The topped the list of motivators. Others
the optimization of technology, sur- insistence on old approaches to mod- cited “fear of competition” and “C-
vey results suggest the learning curve ern technology spending is risk mul- suite pressure.” Revenue generation—
is fairly flat. Companies frequently tiplying. The data reflect antiquated while always important—is the first
say that “technology” is an important attitudes and practices that limit com- priority.
part of their strategy and tactics. Ap- Innovation. One of the most
proximately 75% of them also say that surprising findings is that over 55%
technology is “essential to their com- Like so many of us of companies do not actively track
petitiveness over the next 5–7 years.” emerging technologies or maintain
But when asked about “readiness” to in the business world, technology watch lists. Only 36% claim
leverage emerging technologies over we are accustomed that they do, which is surprisingly low
the same period, over 75% answered in the age of digital transformation.
“moderate” or “low.” to failure. Massive innovation readiness gaps fol-
Organizational analysis. Respon- low. Only 17% of companies have a well-
dents also revealed some organiza- funded, well-defined innovation strategy.
tional structure and IT oversight mis- Forty-nine percent have a “partial” in-
matches. Over 65% of respondents novation strategy, and 34% have no

32 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


opinion

innovation strategy at all. This finding as well as their strategic demeanor and
defines the essence of risk, since new, ambition. Technology-savvy board
digital era business models and pro- Most companies directors should be recruited and re-
cesses are almost always technology are struggling to tained. The board should also have a
driven. technology committee separate from
articulate AI plans, audit and compliance oversight.
The Strange Case of AI as responses do ˲ AI, machine learning and GenAI
A 2024 McKinsey survey describes should immediately rise to the top of
GenAI as a major driver of strategic not describe AI the emerging/ disruptive technology
competitiveness, yet the executives as a strategic list. Credible talent should be devel-
we surveyed do not see AI as a strate- oped or hired.
gic differentiator.6 This was especially differentiator or an ˲ Distinctions between business
surprising given all of the publicity investment priority. and technology strategy are nearly
and prototyping related to AI, machine obsolete, since all companies to some
learning and GenAI. But despite all degree are now technology compa-
this publicity, only 20% of companies nies. Technology and business are
defined AI initiatives as high priority, intertwined today more than ever be-
and more than 47% defined them as fore, with growth needs only reason-
insufficient or “unknown.” Consistent enable accountability, opportuni- ably expected.
with this finding, only 25% adequately ties for technology-enabled solutions ˲ The overall business-technology
fund their AI initiatives and 37% be- should be widened across lines of relationship should be assessed by a
lieve they do “sometimes”; 37% report business where decentralized technol- team comprised of chief strategy of-
their AI initiatives are not adequately ogy teams act on their own to develop ficers and chief technology officers
funded or they just do not know. How technology strategies for their own where it is impossible for an observer
was this even possible circa 2024? markets. to know who is who.
When asked about where their AI ˲ Enterprises and lines of business
initiatives appeared on the technol- should invest in technology tracking, Conclusion
ogy adoption curve, 32% of companies testing and adoption. They should The data we collected yielded some
described themselves as “innovators” avoid becoming “late adopters” of surprising results about how technol-
and “early adopters,” which contra- emerging and disruptive technology, ogy is organized, tracked, secured,
dicts some other findings. Approxi- which is the blueprint for competitive innovated, adopted, governed, and
mately 53% describe themselves as unpreparedness. integrated into business strategy. The
“late majority,” “laggards,” or com- ˲ Requirements-first/technolog y results revealed that despite all of the
pletely “unprepared” for AI. When second is a conservative adoption excitement about AI, machine learn-
asked specifically about GenAI—Chat- strategy that defined technology ap- ing, and GenAI, adoption even here is
GPT, Gemini, and so forth—23% said plications development in the 20th slow. Our survey suggests that no mat-
they have explored the potential and century. While the approach still ter what the industry, who the leader-
risks of generative AI, but 77% said holds for some well-bounded re- ship is, or the size of the company, the
that had only looked at GenAI briefly quirements—such as ERP and CRM business-technology relationship still
or not at all. Given all of the interest— requirements—most requirements has a long way to go.
and opportunity—in AI, these find- are dynamic and cannot be easily
ings—in 2023–2024—are stunning. “matched” with technology solutions. References
1. Carlton, R. Ten ERP failure statistics that highlight the
Technology exploration should often importance of getting it right the first time round. EPR
Recommendations replace requirements-first technology Focus. (Aug. 2019).
2. Fruhlinger, J., Wailgum, T., and Sayer, P. 16 famous
The lack of technology ROI, inade- investments. ERP disasters, dustups and disappointments. CIO
quate technology tracking, poor inno- ˲ Emerging and disruptive technol- Magazine. (Mar. 2020).
3. Guerthoff, A. Why 70% of all CRM projects fail.
vation funding, disengaged boards of ogy should be tracked and assessed Linkedin. (May 2020).
4. Peppard, J. Why do companies’ IT projects fail so
directors, a casual view of AI, stubborn on a continuing basis. Internal teams often?” Wall Street J. (Sept. 2023).
requirements-driven technology proj- should be assigned to this task. Tech- 5. Pratt, M. Why IT projects still fail. CIO Magazine (Oct.
2023).
ects, and the separation of business nologies should be assessed for their 6. Singla, A. et al. The State of AI in Early 2024: Gen
and technology strategies all define problem-solving potential, cost, risks AI Adoption Spikes and Starts to Generate Value.
McKinsey (May 2024).
the need to modernize the business- and impact—and reviewed at least
technology relationship and the entire quarterly. Stephen J. Andriole ([email protected]) is
technology investment strategy. ˲ Innovation methods, tools, tech- Thomas G. Labrecque Professor of Business Technology
Department of Accounting and Information Systems at
In response to the survey data, niques, talent and funding should be Villanova University, Villanova, PA, USA.
here are some steps companies might defined and acquired. There is no such Noah P. Barsky ([email protected]) is an
consider: thing as a “partial” innovation strat- associate professor in the Department of Accounting and
˲ Companies should revisit their Information Systems at Villanova University, Villanova,
egy. PA, USA.
technology reporting structures. ˲ Board directors should be recruit-
While “control” can reduce risk and ed for their knowledge and experience © 2025 Copyright held by the owner/author(s).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 33
O
opinion

DOI:10.1145/3723438 Bruce Schneier and Davi Ottenheimer

Opinion
Web 3.0 Requires
Data Integrity
New integrity-focused standards are necessary to enable
the trusted AI services of tomorrow.

I
taken a comput-
F YO U ’ V E E V E R
er security class, you’ve prob-
ably learned about the three
legs of computer security—
confidentiality, integrity, and
availability—known as the CIA triad.a
When we talk about a system being
secure, that’s what we’re referring to.
All are important, but to different de-
grees in different contexts. In a world
populated by artificial intelligence
(AI) systems and artificial intelligent
agents, integrity will be paramount.
What is data integrity? It’s ensuring
that no one can modify data—that’s
the security angle—but it’s much more
than that. It encompasses accuracy,
completeness, and quality of data—all
over both time and space. It’s prevent-
ing accidental data loss; the “undo”
button is a primitive integrity mea-
sure. It’s also making sure that data
is accurate when it’s collected—that it
comes from a trustworthy source, that nizations worldwide established their e-commerce, social media, and online
nothing important is missing, and digital presence, leading to massive everything demanded both data pro-
that it doesn’t change as it moves from digitization projects where quantity tection and user privacy. Confidential-
format to format. The ability to restart took precedence over quality. The em- ity became paramount.
your computer is another integrity phasis on making information avail- We stand at the threshold of a new
measure. able overshadowed other concerns. Web paradigm: Web 3.0. This is a dis-
The CIA triad has evolved with As Web technologies matured, the tributed, decentralized, intelligent
IMAGE BY ALEXA ND ER56 891 /SHUT TERSTOCK.COM

the Internet. The first iteration of the focus shifted to protecting the vast Web. Peer-to-peer social-networking
Web—Web 1.0 of the 1990s and early amounts of data flowing through on- systems promise to break the tech
2000s—prioritized availability. This line systems. This is Web 2.0: the In- monopolies’ control on how we inter-
era saw organizations and individuals ternet of today. Interactive features act with each other. Tim Berners-Lee’s
rush to digitize their content, creating and user-generated content trans- open W3C protocol, Solid, represents
what has become an unprecedented formed the Web from a read-only me- a fundamental shift in how we think
repository of human knowledge. Orga- dium to a participatory platform. The about data ownership and control. A
increase in personal data, and the future filled with AI agents requires
a https://tinyurl.com/248uyhq6 emergence of interactive platforms for verifiable, trustworthy personal data

34 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


opinion

and computation. In this world, data building and deploying digital sys-
integrity takes center stage. tems that impact human lives and so-
For example, the 5G communica- AI systems need cietal outcomes.
tions revolution isn’t just about faster clean, consistent, At the foundation level, bits are
access to videos; it’s about Internet- stored in computer hardware. This
connected things talking to other In- and verifiable control represents the most basic encoding of
ternet-connected things without our processes to learn our data, model weights, and compu-
intervention. Without data integrity, tational instructions. The next layer
for example, there’s no real-time car- and make decisions up is the file system architecture: the
to-car communications about road effectively. way those binary sequences are orga-
movements and conditions. There’s nized into structured files and direc-
no drone swarm coordination, smart tories that a computer can efficiently
power grid, or reliable mesh network- access and process. In AI systems, this
ing. And there’s no way to securely em- includes how we store and organize
power AI agents. training data, model checkpoints, and
In particular, AI systems require ro- development begins. At the model hyperparameter configurations.
bust integrity controls because of how level, mathematical foundations and On top of that are the application
they process data. This means techni- training processes can introduce new layers—the programs and frame-
cal controls to ensure data is accurate, integrity issues even with clean data. works, such as PyTorch and Tensor-
that its meaning is preserved as it is During execution, environmental Flow, that allow us to train models,
processed, that it produces reliable changes and runtime modifications process data, and generate outputs.
results, and that humans can reliably can corrupt previously valid models. This layer handles the complex math-
alter it when it’s wrong. Just as a scien- And at the output level, the challenge ematics of neural networks, gradient
tific instrument must be calibrated to of verifying AI-generated content and descent, and other ML operations.
measure reality accurately, AI systems tracking it through system chains cre- Finally, at the user-interface level,
need integrity controls that preserve ates new integrity concerns. Each level we have visualization and interaction
the connection between their data compounds the challenges of the ones systems—what humans actually see
and ground truth. before it, ultimately manifesting in and engage with. For AI systems, this
This goes beyond preventing data human costs, such as reinforced bi- could be everything from confidence
tampering. It means building systems ases and diminished agency. scores and prediction probabilities to
that maintain verifiable chains of trust Think of it like protecting a house. generated text and images or autono-
between their inputs, processing, and You don’t just lock a door; you also mous robot movements.
outputs, so humans can understand use safe concrete foundations, sturdy Why does this layered perspective
and validate what the AI is doing. AI framing, a durable roof, secure dou- matter? Vulnerabilities and integrity
systems need clean, consistent, and ble-pane windows, and maybe motion- issues can manifest at any level, so un-
verifiable control processes to learn sensor cameras. Similarly, we need derstanding these layers helps securi-
and make decisions effectively. With- digital security at every layer to ensure ty experts and AI researchers perform
out this foundation of verifiable truth, the whole system can be trusted. comprehensive threat modeling. This
AI systems risk becoming a series of This layered approach to under- enables the implementation of de-
opaque boxes. standing security becomes increas- fense-in-depth strategies—from cryp-
Recent history provides many so- ingly critical as AI systems grow in tographic verification of training data
bering examples of integrity failures complexity and autonomy, particular- to robust model architectures to inter-
that naturally undermine public trust ly with large language models (LLMs) pretable outputs. This multi-layered
in AI systems. Machine-learning (ML) and deep-learning systems making security approach becomes especially
models trained without thought on high-stakes decisions. We need to crucial as AI systems take on more au-
expansive datasets have produced verify the integrity of each layer when tonomous decision-making roles in
predictably biased results in hiring critical domains such as healthcare,
systems. Autonomous vehicles with finance, and public safety. We must
incorrect data have made incorrect— A lack of integrity ensure integrity and reliability at every
and fatal—decisions. Medical diag- level of the stack.
nosis systems have given flawed rec- controls undermines The risks of deploying AI without
ommendations without being able to AI systems and proper integrity control measures are
explain themselves. A lack of integrity severe and often underappreciated.
controls undermines AI systems and harms people who When AI systems operate without suf-
harms people who depend on them. depend on them. ficient security measures to handle
They also highlight how AI integ- corrupted or manipulated data, they
rity failures can manifest at multiple can produce subtly flawed outputs
levels of system operation. At the that appear valid on the surface. The
training level, data may be subtly cor- failures can cascade through inter-
rupted or biased even before model connected systems, amplifying errors

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 35
opinion

for strong authentication standards.


By providing standardized ways to
Integrity controls will verify data provenance and maintain
fundamentally shape data integrity throughout its lifecycle,
Web 3.0 creates the trusted environ-
AI development ment that AI systems require to oper-
by moving from ate reliably. This architectural leap for
integrity control in the hands of users
optional features to helps ensure that data remains trust-

ACM Student core architectural worthy from generation and collection


through processing and storage.
requirements.
Research Integrity is essential to trust, on
both technical and human levels.
Competition Looking forward, integrity controls
will fundamentally shape AI develop-
ment by moving from optional features
Attention: and biases. Without proper integrity to core architectural requirements,
Undergraduate and Graduate controls, an AI system might train on much as SSL certificates evolved from
polluted data, make decisions based a banking luxury to a baseline expecta-
Computing Students on misleading assumptions, or have tion for any Web service.
outputs altered without detection. Web 3.0 protocols can build integ-
The results of this can range from de- rity controls into their foundation,
The ACM Student Research graded performance to catastrophic creating a more reliable infrastructure
Competition (SRC) offers a unique failures. for AI systems. Today, we take avail-
forum for undergraduate and We see four areas where integrity is ability for granted; anything less than
graduate students to present their paramount in this Web 3.0 world. The 100% uptime for critical websites is
first is granular access, which allows intolerable. In the future, we will need
original research before a panel
users and organizations to maintain the same assurances for integrity. Suc-
of judges and attendees at well- precise control over who can access cess will require following practical
known ACM-sponsored and co- and modify what information and for guidelines for maintaining data integ-
sponsored conferences. The SRC what purposes. The second is authenti- rity throughout the AI lifecycle—from
is an internationally recognized cation—much more nuanced than the data collection through model train-
simple “Who are you?” authentication ing and finally to deployment, use,
venue enabling students to earn mechanisms of today—which ensures and evolution. These guidelines will
many tangible and intangible that data access is properly verified address not just technical controls but
rewards from participating: and authorized at every step. The third also governance structures and hu-
is transparent data ownership, which man oversight, similar to how privacy
• Awards: cash prizes, medals, and allows data owners to know when and policies evolved from legal boilerplate
ACM student memberships how their data is used and creates an into comprehensive frameworks for
auditable trail of data providence. Fi- data stewardship. Common standards
• Prestige: Grand Finalists receive
nally, the fourth is access standardiza- and protocols, developed through in-
a monetary award and a Grand tion: common interfaces and proto- dustry collaboration and regulatory
Finalist certificate that can be cols that enable consistent data access frameworks, will ensure consistent
framed and displayed while maintaining security. integrity controls across different AI
• Visibility: meet with researchers Luckily, we’re not starting from systems and applications.
scratch. There are open W3C protocols Just as the HTTPS protocol created
in their field of interest and make
that address some of this: decentral- a foundation for trusted e-commerce,
important connections ized identifiersb for verifiable digital it’s time for new integrity-focused
• Experience: sharpen communi- identity, the verifiable credentials data standards to enable the trusted AI ser-
cation, visual, organizational, and modelc for expressing digital creden- vices of tomorrow.
presentation skills tials, ActivityPubd for decentralized
social networking (that’s what Mast- Bruce Schneier is a public-interest technologist, working
at the intersection of security, technology, and people. He
odon uses), Solide for distributed data is currently chief of Security Architecture at Inrupt Inc.
storage and retrieval, and WebAuthnf and has been writing about security issues since 1998.

Learn more: Davi Ottenheimer is vice president of Trust and Digital


Ethics at Inrupt Inc., serving as a security strategist and
implementer working at the intersection of technology,

https://src.acm.org
b https://tinyurl.com/2adqpbta ethics, and governance. Ottenheimer has been leading
c https://tinyurl.com/25lk298r large-scale enterprise security innovations and operations
across multiple industries for more than 30 years.
d https://tinyurl.com/y5ut5tng
e https://tinyurl.com/y58xg4bc
f https://tinyurl.com/y3hxrkol © 2025 Copyright held by the owner/author(s).

36 COMMUNICATIO NS O F TH E ACM | J U NE 2025 | VO L . 68 | NO. 6


SHAPE THE FUTURE OF COMPUTING.
JOIN ACM TODAY. www.acm.org/join/CAPP

SELECT ONE MEMBERSHIP OPTION


ACM PROFESSIONAL MEMBERSHIP: ACM STUDENT MEMBERSHIP:
q Professional Membership: $99 USD q Student Membership: $19 USD
q Professional Membership plus q Student Membership plus ACM Digital Library: $42 USD
ACM Digital Library: $198 USD q Student Membership plus Print CACM Magazine: $42 USD
($99 dues + $99 DL) q Student Membership with ACM Digital Library plus
Print CACM Magazine: $62 USD

q Join ACM-W: ACM-W supports, celebrates, and advocates internationally for the full engagement of women
in computing. Membership in ACM-W is open to all ACM members and is free of charge.

PAYMENT INFORMATION

Name
Purposes of ACM
ACM is dedicated to:
Mailing Address 1) Advancing the art, science, engineering, and application
of information technology
2) Fostering the open interchange of information to serve
both professionals and the public
City/State/Province
3) Promoting the highest professional and ethics standards
ZIP/Postal Code/Country
By joining ACM, I agree to abide by ACM’s Code of Ethics
q Please do not release my postal address to third parties (www.acm.org/code-of-ethics) and ACM’s Policy Against
Harassment (www.acm.org/about-acm/policy-against-
harassment).
Email Address
I acknowledge ACM’s Policy Against Harassment and
q Yes, please send me ACM Announcements via email agree that behavior such as the following will constitute
q No, please do not send me ACM Announcements via email grounds for actions against me:

q AMEX q VISA/MasterCard q Check/money order • Abusive action directed at an individual, such as


threats, intimidation, or bullying
• Racism, homophobia, or other behavior that
Credit Card #
discriminates against a group or class of people
Exp. Date • Sexual harassment of any kind, such as unwelcome
sexual advances or words/actions of a sexual nature
Signature

BE CREATIVE. STAY CONNECTED. KEEP INVENTING.


ACM General Post Office 1-800-342-6626 (US & Canada) Fax: 212-944-1318
P.O. Box 30777 1-212-626-0500 (Global) [email protected]
New York, NY 10087-0777 Hours: 8:30AM - 4:30PM (US EST) acm.org/join/CAPP
practice
DOI:10.1145/ 3729175
Our experience at AWS with TLA+ re-
Leveraging formal and semi-formal methods. vealed two significant advantages of
applying formal methods in practice.
BY MARC BROOKER AND ANKUSH DESAI First, we could identify and eliminate
subtle bugs early in development—

Systems
bugs that would have eluded tradi-
tional approaches such as testing.
Second, we gained the deep under-
standing and confidence needed to
implement aggressive performance

Correctness
optimizations while maintaining sys-
tems correctness.
Moreover, 15 years ago, AWS’s soft-
ware testing practice relied primar-

Practices at
ily on build-time unit testing, often
against mocks, and limited deploy-
ment-time integration testing. Since
then, we have significantly evolved
our correctness practices, integrat-

Amazon Web
ing both formal and semi-formal
approaches into the development
process. As AWS has grown, formal
methods have become increasingly
valuable—not only for ensuring cor-

Services
rectness but also for performance
improvements, particularly in verify-
ing the correctness of both low- and
high-level optimizations. This system-
atic approach toward systems correct-
ness has become a force multiplier at
AWS’s scale, enabling faster develop-
ment cycles through improved devel-
oper velocity while delivering more
cost-effective services to customers.
This article surveys the portfolio
of formal methods used across AWS
(AWS) strives to deliver reliable
A M A Z ON W EB SERV ICE S to deliver complex services with high
confidence in their correctness. We
services that customers can trust completely. This consider an umbrella definition of for-
requires maintaining the highest standards of security, mal methods that encompasses these
rigorous techniques—from traditional
durability, integrity, and availability—with systems formal approaches (such as theorem
correctness serving as the cornerstone for achieving proving,7,10 deductive verification,18
these priorities. An April 2015 article published in and model checking8,14) to more light-
weight semi-formal approaches (such
Communications, titled “How Amazon Web Services as property-based testing,6,19 fuzzing,9
Uses Formal Methods,” highlighted the approach and runtime monitoring11).
for ensuring the correctness of critical services that The P Programming Language
have since become among the most widely used by As the use of formal methods was ex-
AWS customers.21 panded beyond the initial teams at
AWS in the early 2010s, we discovered
Central to this approach was TLA+,14 a formal that many engineers struggled to learn
specification language developed by Leslie Lamport. and become productive with TLA+.

38 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


This difficulty seemed to stem from
TLA+’s defining feature: It is a high-
level, abstract language that more
closely resembles mathematics than
the imperative programming lan-
guages most developers are familiar
with. While this mathematical nature
is a significant strength of TLA+, and
we continue to agree with Lamport’s
views on the benefits of mathematical
thinking,15 we also sought a language
that would allow us to model check
(and later prove) key aspects of systems
design while being more approachable
to programmers.
We found this balance in the P pro-
gramming language.8 P is a state-ma-
chine-based language for modeling
and analysis of distributed systems.
Using P, developers model their sys-
tem designs as communicating state
machines, a mental model familiar
to Amazon’s developer population—
most of whom develop systems based
on microservices and service-orient-
ed architectures (SOAs). P has been
developed at AWS since 2019 and is
maintained as a strategic open source the team to deliver risky optimizations Lightweight Formal Methods
project.22 Teams across AWS that with confidence, as they could be vali- Another way AWS has brought for-
build some of its flagship products— dated using model checking. mal methods closer to its engineering
from storage (for example, Amazon In 2023, the P team at AWS built teams is through the adoption of light-
S3, EBS), to databases (for example, PObserve, which provides a new tool weight formal methods.
Amazon DynamoDB, MemoryDB, Au- for validating the correctness of dis- Property-based testing. The most
rora), to compute (for example, EC2, tributed systems both during testing notable single example of leveraging
IoT)—have been using P to reason and in production. With PObserve, lightweight formal methods is in Ama-
about the correctness of their system we take structured logs from the ex- zon S3’s ShardStore, where the team
designs. ecution of distributed systems and used property-based testing through-
For example, P was used in migrat- validate post hoc that they match be- out the development cycle both to test
ing Simple Storage Service (S3) from haviors allowed by the formal P speci- correctness and to speed up develop-
eventual to strong read-after-write con- fication of the system. This allows for ment (described in detail by Bornholt,
sistency.1 A key component of S3 is its bridging the gap between the P spec- et al.4). The key idea in their approach
index subsystem, an object metadata ification of the system design and the was combining property-based testing
store that enables fast data lookups. To production implementation (typically with developer-provided correctness
achieve strong consistency, the S3 team in languages like Rust or Java). While specifications, coverage-guided fuzz-
had to make several nontrivial changes there are significant benefits from ver- ing (an approach where the distribu-
to the S3 index protocol stack.25 Be- ifying protocols at design time, runt- tion of inputs is guided by code cover-
cause these changes were difficult to ime monitoring of the same properties age metrics), failure injection (where
ALL IM AGES BY ALEKSA NDR LYSENKO

get right at S3 scale, and the team want- for the implementation makes the in- hardware and other system failures are
ed to deliver strong consistency with vestment in formal specification much simulated during testing), and mini-
high confidence in correctness, they more valuable and addresses classic mization (where counterexamples are
used P to formally model and validate concerns with the deployment of for- automatically reduced to aid human-
the protocol design. P helped elimi- mal methods in practice (that is, con- guided debugging).
nate several design-level bugs early in necting design-time validation with Deterministic simulation. Another
the development process and allowed system implementation). lightweight method widely used at

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 39
practice

AWS is deterministic simulation test- ternally within Amazon. Amazon.com,


ing, in which a distributed system is for example, ran 733 FIS-based fault-
executed on a single-threaded simula- injection experiments in preparation
tor with control over all sources of ran- for Prime Day 2024.
domness, such as thread scheduling, In 2014, Yuan et al. found that 92%
timing, and message delivery order. of catastrophic failures in tested dis-
Tests are then written for particular tributed systems were triggered by
failure or success scenarios, such as incorrect handling of nonfatal errors.
the failure of a participant at a particu- Many distributed-systems practitio-
lar stage in a distributed protocol. The ners who were told about this research
nondeterminism in the system is con- were surprised the percentage was not
trolled by the test framework, allowing higher. Happy-case catastrophic fail-
developers to specify orderings they ures are rare simply because the happy
believe are interesting (such as ones case of systems is executed often, test-
that have caused bugs in the past). The ed better (both implicitly and explic-
scheduler in the testing framework isolation. In database testing, trans- itly), and is significantly simpler than
can also be extended for fuzzing of or- actions are automatically generated, the error cases. Fault-injection testing
derings or exploring all possible order- their correct behavior is defined using and FIS make it much easier for practi-
ings to be tested. a formally specified correctness oracle, tioners to test the behavior of their sys-
Deterministic simulation testing and then all possible interleaving of tems under faults and failures, closing
moves testing of system properties, transactions and statements within the gap between happy-case and error-
such as behavior under delay and fail- the transaction is executed against the case bug density.
ure, closer to build time instead of system under test. We also use post While fault injection is not con-
integration testing. This accelerates hoc validation of properties such as sidered a formal method, it can be
development and provides for more isolation (following approaches such combined with formal specifications.
complete behavioral coverage during as Elle13). Defining the expected behavior using
testing. Some of the work done at AWS a formal specification, and then com-
on build-time testing of thread order- Fault Injection as a Service paring results during and after fault
ing and systems failures has been open In early 2021, AWS launched Fault In- injection to the specified behavior, al-
sourced as part of the shuttlea and tur- jection Service (FIS)2 with the goal of lows for catching a lot more bugs than
moilb projects. making testing based on fault injec- simply checking for errors in metrics
Continuous fuzzing or random test- tion accessible to a wide range of AWS and logs (or having a human look and
input generation. Continuous fuzzing, customers. FIS allows customers to say, “Yup, that looks about right”).
especially coverage-guided scalable inject simulated faults, from API er-
test-input generation, is also effective rors to I/O pauses and failed instances, Metastability and Emergent
for testing systems correctness at inte- into test or production deployments System Behavior
gration time. During the development of their infrastructure on AWS. Inject- Over the past decade, there has been
of Amazon Aurora’s data-sharding ing faults allows customers to validate an emerging interest in a particular
feature (Aurora Limitless Database3), that the resiliency mechanisms they class of systems failure: those where
for example, we made extensive use have built into their architectures some triggering event (like an overload
of fuzzing to test two key properties of (such as failovers and health checks) or a cache emptying) causes a distrib-
the system. First, by fuzzing SQL que- actually improve availability and do uted system to enter a state where it
ries (and entire transactions), we vali- not introduce correctness problems. does not recover without intervention
dated that the logic partitioning SQL Fault-injection testing based on FIS is (such as reducing load below normal).
execution over shards is correct. Large widely used by AWS customers and in- This class of failures, dubbed meta-
volumes of random SQL schemas, da- stable failures,5 is one of the most im-
tasets, and queries are synthesized Figure. Metastable system behavior under portant contributors to unavailability
load.
and run through the engines under in cloud systems. The figure, adapted
test, and the results compared with an from Bronson et al.,5 illustrates a com-
oracle based on the non-sharded ver- mon type of metastable behavior: Load
sion of the engine (as well as other ap- increases on the system are initially
proaches to validation, like those pio- met with increasing goodput, followed
Goodput

neered by SQLancer23). by saturation, followed by congestion


Fuzzing, combined with fault-injec- and goodput dropping to zero (or near
tion testing, is also useful for testing zero). From there, the system cannot
other aspects of database correctness return to a healthy state by slightly re-
such as atomicity, consistency, and ducing load. Instead, it must follow the
Offered Load dotted line and may not recover until
a https://tinyurl.com/23k9s8rx load is significantly reduced. This type
b https://tinyurl.com/2xnanynf of behavior is present even in simple

40 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


practice

systems. For example, it can be trig- to prove that the implementation sat-
gered in most systems with timeout- isfies a variety of security properties.
and-retry client logic. This type of proof goes beyond testing.
Traditional formal approaches to It is a proof in the mathematical sense.
modeling distributed systems typi- The team also applied a differential
cally focus on safety (nothing bad hap- testing approach using the Dafny code
pens) and liveness (something good as a correctness oracle to verify the
eventually happens), but metastable correctness of the production-ready
failures remind us that systems have Rust implementation. Publishing
a variety of behaviors that cannot be the Dafny code and test procedures
neatly categorized this way. We have as open source, along with the Cedar
increasingly turned to discrete-event implementation, allows Cedar users to
simulation to understand the emer- check the team’s work on correctness.
gent behavior of systems, investing Another example is the Firecracker
both in custom-built systems simula- virtual machine monitor (VMM). Fire-
tions and tooling that allow the use of cracker uses a low-level protocol called
existing system models (built in lan- virtio to expose emulated hardware de-
guages such as TLA+ and P) to simu- vices (such as a network card or solid- an opportunity to reduce the cost of
late system behavior. Extending ex- state drive) to guest kernels running distributed commits from 2 to 1.5 net-
haustive model checkers, like TLA+’s inside the VM. This emulated device work roundtrips without sacrificing
TLC with probabilistic simulations, is a critical security boundary because any safety properties. These kinds of
also allows for the generation of statis- it is the most complex interaction be- stories are usual for teams that adopt
tical results such as posterior latency tween the untrusted guest and trusted formal methods, driven by at least two
distributions, making model checking host. The Firecracker team used a tool different dynamics.
useful for tasks such as understanding called Kani20 that can reason formally First, the act of deeply thinking
the achievability of latency service-level about Rust code to prove key proper- about and formally writing down dis-
agreements (SLAs). ties of this security boundary. Again, tributed protocols forces a structured
proof here goes beyond testing and en- way of thinking that leads to deeper
sures that the critical properties of this insights about the structure of proto-
boundary are held no matter what the cols and the problem to be solved. Sec-
guest attempts to do. ond, having the ability to formally
Proofs around the behaviors of pro- check (and, in some cases, prove) that
grams are an important part of AWS’s proposed design optimizations are
software correctness program, so we correct allows naturally conservative
support development on tools such distributed-systems engineers to be
as Kani, Dafny,18 and Lean,16 and the bolder in their protocol design choices
underlying tools—such as satisfiabil- without increasing risk and boosting
ity modulo theory (SMT) solvers—that the developer velocity toward deliver-
power them. ing reliable services.
The ability to use formal models and These productivity and cost ben-
specifications—for model-checking efits are limited not only to high-level
Formal Proof systems at design time, for validating design optimizations but also to low-
In some cases, the formal methods in-production behavior using runtime level code that normally gets ignored.
enumerated so far in this article are monitoring by serving as a correct- In one example, the AWS team identi-
not sufficient. For critical security ness oracle, for simulating emergent fied optimizations to the implemen-
boundaries such as authorization and systems behavior, and for building tation of the Rivest-Shamir-Adleman
virtualization, for example, proofs of proofs of critical properties—allows (RSA) public-key encryption scheme on
correctness can be both desirable and AWS to amortize the engineering ef- our ARM-based Graviton 2 processor,
worth the significant investment need- fort of developing these specifications which could improve throughput by up
ed to create them. over a larger amount of business and to 94%.17
In 2023, AWS introduced the Cedar customer value. Using the HOL Light interactive
authorization policy language for writ- theorem prover, the team was able to
ing policies that specify fine-grained Benefits Beyond Correctness prove the correctness of these optimi-
permissions. Cedar was designed Finally, as discussed in the aforemen- zations. Given the high percentage of
for automated reasoning and formal tioned 2015 paper, formal methods cloud CPU cycles spent on cryptogra-
proof.12,24 The language was designed are a crucial part of safely improving phy, this type of optimization can sig-
to be well-suited for proof, and the the performance of cloud systems. nificantly reduce infrastructure costs
implementation was built in the veri- Modeling a key commit protocol for and aid sustainability while at the
fication-aware programming language the Aurora relational database engine same time improving customer-visible
Dafny. Using Dafny, the team was able in P and TLA+ allowed us to identify performance.

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 41
practice

Challenges and Opportunities Proceedings of the Workshop on Hot Topics in


Operating Systems (2021), 221–227;
for the Future https://tinyurl.com/2cqouvlf.
Despite significant success in scaling 6. Claessen, K. and Hughes, J. QuickCheck: A lightweight
tool for random testing of Haskell programs. In
formal and semi-formal testing meth- Proceedings of the 5th ACM SIGPLAN Intern. Conf.
ods across AWS over the past 15 years, Functional Programming (2000), 268–279;
https://tinyurl.com/22jtuvwg.
several challenges persist, particularly 7. de Moura, L. and Ullrich, S. The Lean 4 theorem prover
in industrial adoption of formal meth- and programming language. In Proceedings of the
28th Intern. Conf. on Automated Deduction 12699 of
ods. The primary barriers for formal Lecture Notes in Computer Science. Springer (2021),
methods tools include their steep learn- 625–635; https://tinyurl.com/2b4l5f34.
8. Desai, A. et al. P: Safe asynchronous event-driven
ing curve and the specialized domain programming. ACM SIGPLAN Notices 48, 6 (2013),
321–332; https://tinyurl.com/2xh2f8xl.
expertise required. Additionally, many 9. Fioraldi, A., Maier, D., Eißfeldt, H., and Heuse, M.
of these tools remain academic in na- AFL++: Combining incremental steps of fuzzing
research. In Proceedings of the 14th Usenix Workshop
ture and lack user-friendly interfaces. on Offensive Technologies Usenix (2020);
Even well-established semi-formal https://tinyurl.com/27bwujn5.
10. Harrison, J. HOL Light: An overview. In Proceedings
approaches face adoption challenges. of the 22nd Intern. Conf. Theorem Proving in Higher
For example, deterministic simula- Order Logics, S. Berghofer, T. Nipkow, C. Urban, and M.
Wenzel (eds) 60–66, volume 5674 of Lecture Notes in
tion, a key distributed-systems testing Computer Science. Springer-Verlag (2009);
technique used successfully at AWS challenges of formal methods in prac- https://tinyurl.com/2a8w2qqn.
11. Havelund, K. and Rosu, G. Runtime verification—17
and in projects like FoundationDB, tice. Just as AI-assisted unit testing years later. In Proceedings of the 18th Intern. Conf.
remains unfamiliar to many experi- has gained popularity, these tools are ed., C. Colombo and M. Leucker 3–17, volume 11237 of
Lecture Notes in Computer Science. Springer (2019);
enced distributed-systems develop- expected soon to help developers cre- https://tinyurl.com/2dnmzw9a.
ers joining AWS. Similar gaps exist in ate formal models and specifications, 12. Hicks, M. How we built Cedar with automated
reasoning and differential testing. Amazon Science
the adoption of other proven method- making these advanced techniques (2023); https://bit.ly/Cedar-Amzn-Sci.
ologies, such as fault-injection testing, more accessible to the broader devel- 13. Kingsbury, K. and Alvaro, P. Elle: Inferring isolation
anomalies from experimental observations. In
property-based testing, and fuzzing. oper community. Proceedings of the VLDB Endowment 14, 3 (2020),
268–280; https://tinyurl.com/225tvnkh.
The challenge is educating distrib- 14. Lamport, L. Specifying Systems: The TLA+ Language
uted-systems developers about these Conclusion and Tools for Hardware and Software Engineers.
Addison-Wesley Professional (2002).
testing methods and tools, and teach- Building reliable and secure soft- 15. Lamport, L. 2015. Who builds a house without drawing
ing the art of rigorous thinking. ware requires a range of approaches blueprints? Commun. ACM 58, 4 (2015), 38–41;
https://tinyurl.com/28vfkctc.
The education gap begins at the to reason about systems correctness. 16. Lean Prover Community. Lean 4. GitHub (2024);
academic level, where even basic for- Alongside industry-standard testing https://tinyurl.com/2yyzdba9.
17. Lee, J., Becker, H., and Harrison, J. Formal verification
mal reasoning approaches are rarely methods (such as unit and integra- makes RSA faster—and faster to deploy. Amazon
taught, making it difficult for gradu- tion testing), AWS has adopted model Science (2024); https://bit.ly/forml-verif-Amzn.
18. Leino, K.R.M. Dafny: an automatic program verifier
ates from top institutions to adopt checking, fuzzing, property-based test- for functional correctness. In Proceedings of the 16th
these tools. Although formal methods ing, fault-injection testing, determinis- Intern. Conf. on Logic for Programming, Artificial
Intelligence, and Reasoning. Springer-Verlag (2010),
and automated reasoning are crucial tic simulation, event-based simulation, 348-370; https://tinyurl.com/25bsake7.
for industry applications, they contin- and runtime validation of execution 19. MacIver, D.R. et al. Hypothesis: A new approach to
property-based testing. J. Open Source Software 4, 43
ue to be viewed as niche fields. We an- traces. Formal methods have been an (2019), 1,891; https://tinyurl.com/239azs4n.
ticipate that increased industry adop- important part of the development pro- 20. Monteiro, F. and Roy, P. Using Kani to validate security
boundaries in AWS Firecracker. Technical report,
tion of formal methods and automated cess—perhaps most importantly, for- Amazon Web Services (2023);
https://tinyurl.com/2cf6dub8.
reasoning will attract more talent to mal specifications as test oracles that 21. Newcombe, C. et al. How Amazon Web Services uses
this domain. provide the correct answers for many formal methods. Commun. ACM 58, 4 (2015), 66–73;
https://tinyurl.com/2au9fctm.
Metastability and other emergent of AWS’s testing practices. Correctness 22. P Team. P: A programming language for formal
properties of large-scale systems rep- testing and formal methods remain specification of distributed systems (2024);
https://github.com/p-org/P
resent another critical research area key areas of investment at AWS, accel- 23. Rigger, M. and Su, Z. 2020. Testing database engines
facing similar awareness challenges. erated by the excellent returns already via pivoted query synthesis. In Proceedings of the
14th Usenix Symp. on Operating Systems Design and
Common practices that lead to meta- seen on investments in these areas. Implementation, Article 38 (2020), 667–682;
stable system behavior, such as “retry https://tinyurl.com/2da6rn2e.
24. Rungta, N. Trillions of formally verified authorizations
N times on timeout,” continue to be References a day! Splash keynote (2024);
1. Amazon S3 now delivers strong read-after-write
widely recommended despite their consistency automatically for all applications. Amazon
https://tinyurl.com/24ufvhff.
25. Vogels, W. Diving deep on S3 consistency. All Things
known issues. Current tools and tech- Web Services (2020); https://tinyurl.com/25mwxugm. Distributed; https://tinyurl.com/2xo4xelw.
2. Announcing general availability of AWS Fault Injection
niques for understanding emergent Simulator, a fully managed service to run controlled
system behavior are still in their early experiments. Amazon Web Services (2021); Marc Brooker is a distinguished engineer at Amazon Web
https://tinyurl.com/yhnr9nuj. Services, Seattle, WA, USA.
stages, making system stability model- 3. Announcing Amazon Aurora Limitless Database.
ing expensive and complex. Ongoing Amazon Web Services (2023); Ankush Desai is a principal applied scientist at Amazon
https://tinyurl.com/28hwdrm8. Web Services, Seattle, WA, USA.
research in this area holds promising 4. Bornholt, J. et al. Using lightweight formal methods
potential for advancement. to validate a key-value storage node in Amazon S3. In
Proceedings of the ACM SIGOPS 28th Symp. Operating
This work is licensed under a Creative
Looking ahead, we believe large Systems Principles (2021), 836–850;
Commons Attribution-NonCommercial-
https://tinyurl.com/2yczbl4s.
language models and AI assistants will 5. Bronson, N., Aghayev, A., Charapko, A., and Zhu,
NoDerivs International 4.0. License.
significantly help address the adoption T. Metastable failures in distributed systems. In © 2025 Copyright held by the owner/author(s).

42 COMMUNICATIO NS O F TH E ACM | J U NE 2025 | VO L . 68 | NO. 6


OPEN FOR SUBMISSIONS

Proceedings of the ACM on


Management of Data (PACMMOD)
Editor-in-Chief

Divyakant Agrawal
UC Santa Barbara, USA

Publishing the principles, algorithms, techniques, and applications of database


management systems, data management technology, and data science and engineering

Proceedings of the ACM on Management of Data (PACMMOD) is a journal concerned


with the principles, algorithms, techniques, and applications of database management
systems, data management technology, and data science and engineering. The journal
invites submissions of original research on data management, data engineering, and
data science that target the data life cycle of real applications and study phenomena
at scales, complexities, and granularities never before possible. This data life cycle
often encompasses applications leveraging statistical, machine learning, and artificial
intelligence methods and, in many instances, using massive and heterogeneous
collections of potentially noisy datasets. Articles that address data challenges at various
stages of the life cycle, from modeling, acquisition, cleaning, and integration to indexing,
querying, analysis, exploration, visualization, interpretation, and explanation fall within the
areas of coverage. Papers are generally expected to focus on data-intensive components
of data science pipelines, and to solve problems in areas of interest to our community
(e.g., data curation, optimization, performance, storage, systems), operating within the
constraints of accuracy, privacy, fairness, and diversity. Submissions describing deployed
systems and solutions to data science pipelines and/or fundamental experiences and
insights from evaluating real-world problems are encouraged.

For more information and to submit your


manuscript, please visit pacmmod.acm.org.
research and advances
DOI:10.1145/ 3708980
their networks across generations.
New optical switching technologies are However, this impressive free scaling
of switches now lags behind the dou-
enabling a shift toward dynamically bling of cloud traffic, which occurs
reconfigurable network topologies. roughly every year.38 It is also expected
that Moore’s law will slow further in the
BY CHEN AVIN AND STEFAN SCHMID coming years.9 Accordingly, significant
efforts are being made to improve the

Revolutionizing
throughput of datacenter networks,
and leading companies such as Google,
Microsoft, and Meta are competing to
improve the operations of their net-
works.8,29,32

Datacenter
State-of-the-art datacenter network
topologies have two common features:
They are fixed and oblivious to the traf-
fic demands they serve. In particular,

Networks via
current datacenter topologies are op-
timized for uniform, “all-to-all” traf-
fic. This stands in stark contrast to the
actual traffic patterns in datacenters,

Reconfigurable
which often feature distinct spatial and
temporal structures—that is, they are
skewed and bursty.18,29 Thus, to further
improve datacenter performance, a
particularly intriguing and novel ap-

Topologies
proach is to leverage a new degree of
freedom related to the physical network
topology.
Emerging optical technologies en-
able reconfigurable datacenter net-
works (RDCNs): networks whose to-
pologies can dynamically change to

key insights
˽ Datacenter networks have become a
critical infrastructure for our digital
DATACEN T ER N ET WOR K S H AV Ebecome critical society, serving explosively growing
communication traffic.
infrastructure for our digital society. Indeed, the ˽ Reconfigurable datacenter networks
performance of many distributed systems and (RDCNs), which can adapt their topology
dynamically, based on innovative
cloud applications—for example, those related to optical switching technologies, have the
potential to improve datacenter network
distributed machine learning (ML), batch processing, performance and to simplify datacenter
scale-out databases, or streaming—critically depends planning and operations.
˽ Demand-aware dynamic topologies
on the throughput capacity of the underlying are particularly interesting because
IMAGE BY PETERSC HREIBER.M EDIA

network topology. of the significant spatial and temporal


structures observed in real-world traffic,
Fortunately, thus far, Moore’s law for networking, for example, related to distributed
machine learning.
which states that electrical switches double in ˽ The study of RDCNs and self-adjusting
bandwidth every two years at the same power and cost, networks raises many novel technological
and research challenges related to their
has allowed datacenter operators to scale up design, control, and performance.

44 COM MUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 45
research and advances

accommodate, or even adapt to and clude by discussing challenges ahead, fic is often destined for only a few tens
exploit, the structures in their com- also based on interviews with experts. of racks.
munication traffic in a self-adjusting As a small-scale example, consider
manner. Therefore, RDCNs introduce Empirical Motivation: Structure Figure 1a, which shows a GPU-to-GPU
resource-allocation flexibility in the In Application’s Traffic trace that was generated by a neural
physical layer. This is also known as to- network training job. This trace fea-
Think about a machine learning
pology engineering: the ability to control tures both temporal and spatial local-
training job, say, training a Chat
the dynamic network topology. More ity (structure). The trace histogram—
GPT model. It takes months to train
specifically, these dynamic datacenter more specifically, its demand matrix
this model, but the traffic matrix is
topologies do not require any rewiring, (top)—is skewed, exhibiting a spatial
beautifully predictable and periodic,
but rather are enabled by optical circuit structure. Moreover, compared to a
which makes it very suitable to think
switches (OCSs) and similar technolo- uniform trace (bottom), the traffic is
about whether or not we could adjust
gies. Depending on the specific tech- bursty in the time dimension, featur-
the topology according to the traffic.
nology used, the topological intercon- ing a temporal structure.
— Manya Gobhadi (MIT)
nects between racks can be changed The specific degree of traffic locality
within milliseconds, microseconds, or, Datacenter traffic is growing explo- depends on the application.4,29 A new
recently, even nanoseconds. sively in volume.28 Moreover, it features methodology, called a complexity map,
There is still, however, no consensus distinct structures: Packet traces from provides an intuitive visualization of
in the community on which RDCN de- real-world applications are far from ar- the strengths of the temporal and non-
sign is optimal. In particular, though bitrary or random. Measurement stud- temporal structures in the traffic of dif-
static network topologies are currently ies18,29 have revealed that the traffic ferent distributed applications.4 The
well understood and fat-tree topolo- patterns in datacenters feature consid- complexity map measures the amounts
gies2 have emerged as the de facto to- erable locality, in both space and time: of entropy in a trace along the space
pologies for datacenters, we still lack They are skewed and bursty4 and com- and time dimensions and compares
fundamental models and metrics for munication clusters are relatively stable them to those of a completely unstruc-
rigorously studying and comparing dif- over time.12,29 Kandula et al.18 showed tured (uniform) trace. Figure 1b shows a
ferent RDCN designs. In this article, we that much of the traffic volume in pro- complexity map for seven different real-
provide an overview of RDCNs. We first duction Microsoft datacenters could world traces:29 three different Facebook
review the motivation for RDCNs and be explained by simple patterns. Face- datacenter traces, related to Web (WEB),
discuss their technological enablers. book (now Meta)29 reported that in their database (DB), and Hadoop (HAD) ap-
We then present a taxonomy of the datacenters, many flows are long-lived, plications; two high-performance
RDCN design space and propose a for- and though hosts (servers) often com- computing benchmarks (CNS and Mul-
mal model to raise interest in the topic municate with hundreds of hosts and tiGrid); a machine learning job (ML);
in the research community. We con- racks concurrently, the majority of traf- and a popular synthetic trace, pFabric

Figure 1. (a) Visualization of the spatial and temporal structures of a simple packet trace (middle) from a machine learning application,
a popular convolutional neural network training job, on four GPUs. (top) The demand matrix. (bottom) An i.i.d trace with the same distri-
bution. (b) A complexity map of seven real traces (colored circles) and four reference points placed at the corners of the map.

spatial structure
1.0 Bursty Uniform

0.9
pFab
Non-temporal complexity

ML
U

0.8 CNS
GP
ion

Sou DB
rce
at

GPU 0.7
in
st
De

WEB
Time 0.6 Skewed Multi
Entries in the trace are shown in the order of appearance. Grid HAD
&Bursty
0.5

0.4 Skewed
Temporal structure in the original trace

0.3
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
No temporal structure sampled i.i.d Temporal complexity

(a) A real ML trace (b) Complexity map

46 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


research and advances

(pFab). For reference, it also shows a Figure 2. Examples of state-of-the-art static datacenter network topologies with eight
purely uniform traffic pattern, which racks, each containing two hosts. (a) A fat-tree, bi-regular topology with two types of
is mapped to the upper right; a pattern switches, top-of-rack (ToR) switches (in blue) connecting to the hosts, and additional
that features only a temporal structure switches (in green) to increase throughout. (b) An expander-based, uni-regular topology,
with only ToR switches. In both topologies, the diameter is four.
but no spatial structure, on the upper
left; and a pattern that features only a
spatial structure, on the lower right. We
can see that the seven real-world traces
indeed have different amounts of struc-
ture, and none of these traffic patterns
is uniform.
The expanding variety of cloud ap- (a) Fat-tree topology – bi-regular (b) Expander topology – uni-regular
plications not only creates additional
traffic patterns that come in many dif-
ferent flavors, but also introduces dif- network via a top-of-rack (ToR) switch. ters provide the desired non-blocking
ferent performance requirements. For Traditional datacenter networks architecture, this comes at the cost of
instance, batch processing applications are built from electrical switches con- requiring additional switches that are
such as Hadoop exhibit an all-to-all nected by high-speed optical fibers. not connected to the hosts. Figure 2a
traffic pattern that requires very high Most deployed datacenter networks rely shows an example of a non-blocking,
throughput; short query traffic has a on fat-tree topologies, which are based three-layer, fat-tree topology that con-
skewed pattern and requires very low la- on Clos networks,2 and come in dif- nects eight racks, where the two types
tency; and all-reduce operations in ML ferent flavors.2,29,32 In contrast to other of switches (each of radix four) are col-
applications exhibit a ring or tree-like communication networks, datacen- ored in blue and green.
traffic pattern, with a small number of ter networks typically operate under a Aiming to provide full bisection
elephant flows whose completion times single administrative domain, which bandwidth at lower cost, another recent
critically affect the overall application facilitates simpler control using, for class of datacenter networks is based on
performance.36 Traffic patterns can example, software-defined networking uni-regular topologies,27 which can have
also change over time; for example, one (SDN). Such a single domain can in- lower installation and management
training iteration typically contains for- clude a unified controller and network overheads. Here, every switch connects
ward, backward, and all-reduce phases, operating system (NOS), and allows for to a fixed number of hosts. These net-
resulting in periodic (and fairly predict- fast and fine-grained adaptation of the works also come in different flavors
able) demand.37 network layer. and their topologies are often based on
The main takeaway is that while Existing datacenters typically aim to expander graphs.34 Figure 2b shows an
datacenter communication traffic achieve a throughput metric called full example of an expander-based, uni-reg-
comes in different flavors, each still fea- bisection bandwidth, which essentially ular topology that connects eight racks
tures some structure and therefore has means non-blocking traffic forwarding. using only ToR switches (of radix 4).
low complexity. This structure, in turn, The bisection bandwidth refers to the Alternative throughput metrics have
provides opportunities for researchers smallest aggregate capacity of the links also been presented in the literature on
to optimize the network topology ac- crossing the worst-case cut among all datacenter networks. The throughput
cordingly. possible cuts that divide the topology under a given saturateda traffic matrix
graph into two halves. A network has T can be defined as the largest scal-
Traditional Datacenter full bisection bandwidth if its bisection ing factor θ(T) such that the topology
Network Topology Design bandwidth is equal to half of the total can support the traffic matrix, with-
The most essential property of tradi- host traffic; topologies like fat-tree- out violating any link’s capacity con-
tional datacenter networks is that they based networks permit arbitrary appli- straint. The worst-case throughput of
use static wired topologies. As data- cation instance placement. To achieve a topology among all traffic matrices is
center traffic patterns evolve over time, this, fat-tree datacenter networks are denoted by θ∗. A topology can support
these static topologies are designed to typically based on a non-blocking mul- any traffic demand if and only if θ∗ is at
work well for any communication in- tistage switching topology built out of least 1 (in this case, we say that the to-
put; thus, they are oblivious to the real smaller radix switch chips. In the re- pology has full throughput). Because it
traffic, in the sense that they are opti- sulting hierarchical designs, referred to can support any traffic demand, a full-
mized only for the worst-case scenario. as bi-regular topologies,27 there are two throughput topology also permits arbi-
Accordingly, common datacenter net- types of switches: switches that connect trary application instance placement
work topologies are regular and sym- to hosts (and from which traffic thus en- by definition. In a systematic compari-
metric, with many redundant short ters the network) and switches that con- son of the bisection bandwidth and
paths to support high capacity and nect only to other switches (which there- throughput metrics, it was recently
availability. Hosts (or other compute in- fore merely forward traffic). Adding
frastructures, such as GPUs) in datacen- more switches of the former type can a The sum of the rates in each row and column
ters are usually mounted in racks, each help increase the throughput between of the demand matrix matches the switch
of which is connected to the datacenter hosts. While such bi-regular datacen- capacities.

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 47
research and advances

found that depending on which met- a spine layer with uniform support for set of physical connections (circuits)
ric is used, different conclusions may the fastest devices that might connect between racks via suitably configured
result, affecting cost and management to it. As hosts and storage devices are matching between its ports. Figure 3a
complexity.27 In general, it is believed continuously evolving, without support shows an example of such a setup with
that throughput metrics better capture for heterogeneous speeds, incremental eight ToRs and two optical switches,
the capacity of both uni-regular and bi- deployments are expensive. Clearly, re- each with a matching configuration
regular topologies. placing the entire spine layer to support between its ports that creates optical
Traditional datacenter network de- a faster speed would be impractical, circuits between two ToRs.
signs (whether bi-regular or uni-regular) involving hundreds of individual racks The revolutionary idea underly-
all share the properties of being fixed and thousands of fibers. ing RDCN design is that each switch’s
and oblivious to the real demand. This matching (that is, circuits) can be dy-
can come at a cost. In particular, static Paradigm Shift: Reconfigurable namically reconfigured by remapping
topologies inherently require multi- Topologies and Their Enabling the input ports to different output ports.
hop forwarding, which can introduce Optical Technologies In turn, this reconfiguration changes
inefficiencies, especially for large flows: the physical network topology. Various
It’s really more recently that we’re
The more hops a flow must traverse, the technologies enable this matching re-
performing real-time, what
more network capacity is consumed. In configuration in optical switches. For
I call “topology engineering,” to
the literature, this overhead has been example, a topology can be reconfig-
match communication patterns.
termed bandwidth tax,23 referring to the ured using OCSs based on micro-elec-
— Amin Vahdat (Google)
additional bytes that need to be sent tromechanical systems (MEMSs).28 In
along multiple hops (due to multi-hop With the maturation of optical switch- a MEMS-based OCS, the optical signals
routing) compared with direct, single- ing technologies, RDCNs have emerged are forwarded via mirrors, which can be
hop routing. In current fat-tree-based as an innovative and attractive alter- actuated and tilted to steer each signal
datacenter designs, due to the radix native to traditional static-topology to the desired port (Figure 3b top). In
limitation of switches, the network di- datacenter network designs. These an alternative technology, each source
ameter is usually six or eight hops. Re- datacenter networks based on dynamic ToR switch can change the frequency
cently, it was shown that the expected topologies represent a paradigm shift, (wavelength, color) at which its signal
route length (that is, bandwidth tax) and standing in stark contrast to the fixed is transmitted, and passive gratings
the throughput are inversely related in nature of state-of-the-art datacenter are used in the OCS to direct every en-
uni-regular designs: the higher the tax, network topologies. In particular, re- tering frequency/color to a different
the lower the throughput.1,15,27 configurable topologies can reduce output port8 (Figure 3b bottom). Grat-
There are further limitations associ- the high bandwidth tax of static net- ings are very simple, passive building
ated with existing datacenter network works by dynamically providing shorter blocks with no moving parts that do
designs. For example, while the fiber- paths, or even direct connections. not need any power. Each grating sim-
optic links widely used in datacenters RDCNs are enabled by optical ply diffracts or routes incoming light to
can meet high-bandwidth demands, switching technologies, which fa- an output port based on its wavelength.
the optical–electrical–optical conver- cilitate quick changes in topological This concept can therefore be used for
sion necessary to connect these links to interconnections, for example, be- physical-layer switching by equipping
power-hungry electrical packet switch- tween datacenter racks. More specifi- each node’s transceivers with tunable
es is costly and introduces inefficiencies cally, RDCNs are often realized using lasers: The wavelength (represented by
and overhead.28 Current datacenters OCSs: Rather than actually rewiring color in the figure) serves as a proxy for
are also often inflexible, in that they are the datacenter, the (electrical) ToRs the destination address.
optimized for homogeneous hardware. are connected to a set of OCSs, and at There are a variety of alternative
For example, a fat-tree topology requires any given time, each OCS provides a switching technologies enabling re-
configurable datacenters,16 includ-
Figure 3. An example of an RDCN. (a) A topology with eight ToR switches and two optical ing nanosecond tunable lasers,20
spine switches. Each switch is configured with a matching between its ports (violet arcs)
to create optical circuits between ToR pairs. (b) Examples of two types of reconfigurable
digital micromirror devices,13 and
optical switches that enable dynamic topologies through reconfiguration of the matching liquid-crystal-on-silicon wavelength
between ports. selective switches.31 Different tech-
nologies have different advantages,
in terms of performance or high-pre-
OCS OCS cision manufacturing requirements.
Moving
mirrors For example, liquid-crystal-on-silicon
systems can typically be packed more
Passive compactly due to the absence of mov-
gratings
ing parts. While conventional MEMS-
based beam-steering cross-connects
(a) RDCN topology with two optical circuit (b) Mirror- and frequency-
switches (OCSs) as spines based OCSs cannot provide sub-millisecond
switching functionalities for large
datacenters, alternative noncrossbar

48 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


research and advances

selector switch architectures can offer The Spectrum of Topology


improved performance and scalabil- Engineering Solutions
ity.24 Even faster OCSs can be realized
Demand oblivious doesn’t necessarily
using microcombs and semiconductor
preclude either centralized or
optical amplifiers.8
OCSs generally support very high- Optical switching decentralized control. However, the
link bandwidths at low per-bit cost
and power because they simply redi-
technologies herald history of all networking is, you know,
that centralized control only works to a
rect light from one port to another, a new era of RDCNs. certain level.
— George Papen (UCSD)
independent of the data rate and with-
out requiring header parsing for their
In fact, they enable Optical switching technologies herald a
operation. In this sense, each optical an entire spectrum new era of RDCNs. In fact, they enable
circuit is similar to a wired connec-
tion. Expensive optical–electrical–op-
of datacenter an entire spectrum of datacenter ar-
chitectures and topology-engineering
tical conversions are not necessary architectures solutions, the extent and implications
at optical switches; therefore, in an
OCS-based datacenter, far fewer such and topology- of which researchers are still trying to
understand.
conversions are required (for example, engineering A fundamental dimension along

solutions.
are limited only at ToRs), thus remov- which possible datacenter network
ing a bottleneck.b Furthermore, such designs can be classified concerns
an optical infrastructure can remain whether they are static or dynamic. As
unchanged even as the electrical infra- discussed earlier, traditional networks
structure is upgraded incrementally are static in nature and rely on multi-
and heterogeneously. hop forwarding (thus incurring a band-
Different types of reconfigurable width tax), while emerging dynamic
switches correspond to different types topologies can provide more direct con-
of RDCNs, as we will discuss in more nectivity but incur a latency tax (which
detail in the next section. However, depends on the frequency of topology
independent of the technology used, reconfiguration). Moreover, there is a
all such dynamic networks share a second, and, as we will see, independent
common advantage and a common dimension, along which datacenter
disadvantage. The advantage: By pro- networks can be classified: whether
viding topological shortcuts, dynamic they are demand-oblivious or demand-
datacenter networks can reduce the aware. We therefore propose classifying
bandwidth tax, namely, the overhead datacenter designs in two dimensions:
incurred in multi-hop forwarding. For static vs. dynamic and demand-oblivi-
example, while the network diameter ous vs. demand-aware. This classifica-
in Figure 2 is four, in Figure 3a, the tion implies four main topology types,
expected route length can be as low as as illustrated in Figure 4a. For each of
one, depending on the demand and these four types, multiple realizations
the optimized topology. However, dy- are possible:
namic optical switching also comes at ˲ Static, demand-oblivious: As dis-
a price. In contrast to static networks, cussed earlier, traditional datacenter
RDCNs additionally incur a latency networks are based on static and de-
tax; optical links are not available dur- mand-oblivious topologies, for exam-
ing reconfiguration, and reconfigura- ple, fat-tree or expander graphs topolo-
tion takes time and therefore entails gies.2,32,34 Recall that these networks are
overhead (for example, related to buff- optimized for the worst-case traffic sce-
ering and throughput1,3,15). While the nario by establishing many disjointed
latency tax is inherent to dynamic to- short paths.
pologies, its magnitude depends on ˲ Static, demand-aware: Instead of
the technology and architecture. For the traditional worst-case optimiza-
example, if the demand changes can tion, static networks may also be opti-
be predicted or are known, the latency mized in a demand-aware manner, to-
tax can be reduced by saving real-time ward a specific and fixed workload. In
computations; nevertheless, it cannot this case, we assume that the demand
be completely eliminated. matrix is known (or predicted) in ad-
vance. For example, Google28 recently
b Note that electrical switches are usually also introduced a practical and efficient
connected by fiber-optic links. slowly evolving (on the scale of days)

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 49
research and advances

Figure 4. (a) Classification of the design space along two dimensions, i) static vs. dynamic topologies and ii) demand-oblivious vs.
demand-aware topologies, corresponding to four topology types. (b) Each topology type incurs different levels of bandwidth (BW) and
latency (LT) taxes. (c) Different types of traffic are better served by different types of systems.

Demand- Demand- Demand- Demand- Demand- Demand-


oblivious aware oblivious aware oblivious aware

(iii) for example, (iv) for example, Shuffling / ML /


Dynamic

Dynamic

Dynamic
RotorNet, Helios, FireFly All-to-All Large flows
Opera, ProjecToR,
Sirius, Mars SplayNet, Duo
BW Tax LT Tax BW Tax LT Tax
Static/Slow

Static/Slow

Static/Slow
(i) for example, (ii) for example, Delay Telemetry /
Clos, BCube, DAN, Jupiter sensitive Aggregation
Xpander Evolving
BW Tax LT Tax BW Tax LT Tax

(a) Systems (b) Taxes (c) Traffic

topology for RDCNs based on demand and/or optimize for it). We illustrate Given these trade-offs and the desire
prediction. this in Figure 4b. In summary, static to match different traffic types to their
˲ Dynamic, demand-oblivious: Re- topologies have a lower latency tax (LT optimal network infrastructures, many
searchers started exploring highly dy- Tax) than dynamic topologies, while emerging datacenter network designs
namic but demand-oblivious topolo- demand-oblivious topologies have a are hybrid designs, supporting multiple
gies, for example, relying on rotor (or higher bandwidth tax (BW Tax) than modes of operation. We discuss some
round-robin) spine switches that period- demand-aware topologies. of them in the next section.
ically and quickly (on the scale of nano- A major cause of suboptimal perfor-
seconds) reconfigure the topology in a mance in datacenter networks is the Examples of Reconfigurable
predefined and fixed manner.1,3,8,23,24 mismatch between some common traf- Datacenter Networks Designs
˲ Dynamic, demand-aware: Dynamic fic patterns and the network topologies Reconfigurable datacenters have
and demand-aware topologies, also that serve them, as presented in Figure developed significantly in recent
called self-adjusting networks, can be 4c. In particular, different optical topol- years, from early prototypes to initial
reconfigured according to the current ogies on the design spectrum provide deployments. To highlight this
traffic pattern. Examples include He- different trade-offs. For example, mice evolution and the spectrum of related
lios,10 Firefly,7 ProjecToR,13 and, recent- flows, or delay-sensitive packets should system designs, we present some
ly, Cerberus,15 among others.9,13,17,30,33,35,39 be served on a static topology; sending examples here.
How do these designs fare against them on dynamic topologies that re- RotorNet24 and Sirius8 are examples
each other? It turns out that the opti- quire time for reconfiguration may re- of demand-oblivious, periodically re-
mal architecture depends on the traf- sult in high flow-completion times and configurable networks. RotorNet pio-
fic. Traffic with a significant spatial violation of their latency constraints. neered the idea of decoupling switch
structure (for example, a small num- On the other hand, elephant flows (for reconfiguration from traffic patterns.
ber of elephant flows) can benefit from example, in ML19 or data-mining appli- Each switch simply and independently
demand-aware network designs, while cations14) will benefit from dynamic, rotates through a fixed, static set of con-
uniform traffic may be better served demand-aware topologies. Since the figurations that provide uniform band-
by demand-oblivious networks. In reconfiguration time is relatively short width between all endpoints. This ap-
contrast, only dynamic and demand- compared with the transmission time proach allows for a fully decentralized
aware (self-adjusting) networks can op- for such large flows, the reconfigura- control plane: The round-robin switch
timally adapt to the temporal structure tion time cost can be amortized, and scheduling does not require demand
of evolving traffic. To systematically the throughput can be improved by estimation, schedule computation,
understand the trade-offs, recall that establishing direct links between fre- or schedule distribution, yet delivers
the throughput achievable by static quently communicating pairs. To give full bisection bandwidth for uniform
topologies is typically limited by the one more example, it has been shown traffic. To support arbitrary traffic pat-
bandwidth tax, whereas dynamic to- that due to its all-to-all nature, shuf- terns, RotorNet relies on a form of val-
pologies are subject to a latency tax. fling traffic—for example, for map-re- iant load balancing. RotorNet can ac-
This latency tax is likely to be higher in duce or all-gather collective operations commodate a hybrid optical/electronic
demand-aware networks, which, in ad- in ML—can benefit from dynamic, de- architecture to extend support to traffic
dition to physical reconfiguration, also mand-oblivious topologies that provide with lower latency requirements. Rotor-
require a more complex control plane periodic direct connectivity between all Net has also inspired several additional
(for example, to learn about the traffic rack pairs.8,23,24 demand-oblivious RDCN designs, such

50 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


research and advances

as Opera23 (essentially a sequence of using the datacenter topology that best for establishing a new edge, we set the
expander graphs), Microsoft’s Sirius8 matches its requirements. Some opti- timeslot length Δ equal to the reconfig-
(providing unprecedented nanosec- cal switches are operated in a demand- uration time. Additionally, we assume
ond-granularity reconfiguration in an aware manner, creating a demand- that when a new edge e is established at
optically switched network), and Mars.1 aware topology that can best serve time t, that is, e ∈ Et and e ∉ Et−1, e cannot
To achieve very fast reconfigurations, elephant flows; some optical switches be used at time t, that is, ct (e) = 0, since
Sirius uses a combination of tunable are demand-oblivious, creating a de- this timeslot is devoted to reconfigura-
lasers and simple, passive gratings that mand-oblivious dynamic topology that tion. Note that before reconfiguration,
route light signals based on their wave- is best suited to all-to-all traffic; and yet designers will need to consider graceful
lengths. Sirius also comes with novel other optical switches create a static to- packet drain, or packet loss, or rerout-
congestion-control and time-synchro- pology that can best serve delay-sensi- ing via different paths.38 Additional re-
nization mechanisms. Google Jupi- tive short flows. strictions may also apply, for example,
ter28 is an example of a demand-aware, an inter-reconfiguration time c′ · Δ, en-
slowly evolving network, supporting in- Models of Reconfigurable suring that reconfigurations cannot oc-
cremental heterogeneity and dynamic Datacenter Networks cur too close to each other in time, for
application adaptation. Its OCSs dy- For studying and evaluating reconfigu- example, since some computation or
namically map an optical fiber input rable networks, evolving graphs25 are data collection is needed.
port to an output port through two sets a natural abstract model. In contrast Evolving graphs can also be realized
of MEMS mirrors that can be rotated in to the original evolving graph model, on the basis of a standard datacenter
two dimensions to create arbitrary port- a critical issue in our model is the re- architecture that connects ToRs. This
to-port mappings. Enabled by this OCS configuration time consumed for edge ToR-Matching-ToR (TMT) topology
layer, Google has eliminated the spine changes. In turn, this reconfigura- model, which is common in the litera-
layer from its datacenter networks, in- tion time introduces latency and can ture8,15,23,24 involves n ToR (leaf) switch-
stead connecting heterogeneous ag- influence important metrics such as es that are interconnected by a set of
gregation blocks in a direct mesh, and the throughput. Recently, Mars1 in- k optical spine switches. Each spine
therefore moving beyond fat-tree to- troduced a formal definition of peri- switch provides a n × n directed match-
pologies. odic evolving graphs in the context of ing between its input and output ports.
ProjecToR13 was the first proposal of oblivious reconfigurable networks, en- Depending on the type of spine switch
a dynamic, demand-aware datacenter abling the study of fundamental per- (or its mode of operation), the matching
network developed by Microsoft Re- formance trade-offs in these networks. within each spine switch can be static
search based on free-space optics. This The model can in fact universally cap- or can dynamically change over time, in
network design provides high scalabil- ture static, dynamic, demand-obliv- either a demand-oblivious or a demand-
ity and fan-out, enabled by dense mi- ious, and demand-aware topologies. aware manner. The current topology, Gt
cromirror devices (DMDs). The DMDs Formally, we define evolving graphs in at time t, is implemented as the union of
are placed near the ToRs and paired the following way: the k matching configurations. Figure
with mirrors fixed to the ceiling above 5a presents an example of a TMT model
the racks. Each DMD is programmed Definition 1 (Evolving Graph). with eight ToR switches, 000, 001, ... 111,
to target a specific mirror at the ceil- An evolving graph denoted by and three optical spine switches, S1, S2,
ing, guiding light to another ToR in G = (V, E ) is a sequence of directed S3, that use different matchings sched-
the datacenter. In ProjecToR, possible graphs, in which one graph is defined uling. S1 uses static matching, S2 uses
links are divided into two categories: for each timeslot t ∈ W (the set of whole demand-oblivious dynamic matching,
dedicated and opportunistic. Dedicat- numbers). The directed graph at time and S3 uses demand-aware dynamic
ed links carry small flows, possibly over t ∈ W is defined as Gt = (V, Et), where V is matching. Each ToR-spine link in the
multiple hops, and their configurations the set of n vertices and Et ⊆ V ×V is the figure represents one directed uplink
change at coarse time scales (for ex- set of directed edges at time t. Note that (to the spine switch’s input port) and
ample, daily). Opportunistic links carry an edge e ∈ V × V may appear in mul- one directed downlink (from the spine
large flows over single hops, and their tiple edge sets Et at different times t. switch’s output port). Figure 5b shows
configurations change rapidly based on The timeslot length is denoted by Δ the matching configurations used with-
current demand. [sec]. If not stated otherwise, the capac- in each spine switch at time t. These
Cerberus15 is motivated by the obser- ity of edge e ∈ Et at time t, denoted by matching configurations can overlap
vation that each existing network de- ct (e), is c for all edges and all times. (increasing the capacity of a ToR-to-ToR
sign, be it static or dynamic, demand- connection) or be disjointed. Dynamic
oblivious or demand-aware, has its Evolving graphs can be restricted to switches can change their matching
own specific advantages. In particular, specific families of graphs, for example, configurations over time. Figure 5c il-
different optical topologies provide where for every t, Gt is a degree-d bound- lustrates the topology of the evolving
different trade-offs, and the design ed graph or a r-regular graph, or where graph at time t, Gt. The set of nodes V is
used should depend on the demand. for every t, the edge set Et is periodic the ToR switches, and the set of edges
Accordingly, Cerberus is a hybrid data- with period Γ, that is, Et+Γ = Et (Γ-periodic Et is the union of the edges in the three
center network design that maximizes graphs). matchings. Using these matchings,
throughput by serving each traffic class To capture the reconfiguration time some special topologies can be cre-

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 51
research and advances

Figure 5. A 3-regular evolving graph with eight nodes implemented in a ToR–matching–ToR (TMT) network example with eight ToR
switches and three optical spine switches: One has a static matching configuration, and the other two have dynamic matching configu-
rations (one demand-oblivious and one demand-aware). Each spine switch has eight input and output ports. a) TMT architecture. b) The
matching configurations inside each of the three switches at time t. c) The resulting 3-regular topology at time t, Gt.

000
Demand-oblivious Demand-aware S1 S2 S3
Spine

100 001
S1 static S2 dynamic S3 dynamic 000 000 000 000 000 000
001 001 001 001 001 001
010
010 010 010 010 010 010
011 011 011 011 011 011
101
100 100 100 100 100 100
101 101 101 101 101 101
Leafs (ToRs)

110 011
110 110 110 110 110 110
111 111 111 111 111 111
000 001 010 011 100 101 110 111 111

(a) TMT architecture (b) Switch matching (c) Gt = (V, Et)

ated. For example, careful readers may or even needed to adapt, for example, jecToR (https://bit.ly/3GdstQQ); UCSD
observe that the matchings of S1 and S2 the routing layer or congestion-control professor George Papen describes the
create a 2-regular De Brujin graph that mechanisms for reconfigurable data- main technologies that enable optical
has some nice properties, such as sup- center networks (see, for example, Li et circuit switches that are used in recon-
porting greedy routing; it was recently al.,21 Liang et al.,22 Mukerjee et al.,26 and figurable datacenter networks such as
used in Duo38 and Mars.1 It is important Zerwas et al.38 for early works in these RotorNet, and how different technolo-
to note that although we abstractly con- directions)? Should we maintain a fixed gies fare against each other (https://
sider k spine switches here, each imple- network layering structure, or is a cross- bit.ly/3YvaEms); and Amin Vahdat, a
menting an n × n matching, each such layer approach justified, for example, for vice president at Google, discusses the
matching configuration can be split joint topology optimization and traffic advantages and disadvantages of the
across a set of several smaller switches engineering? What are the implications different datacenter architectures, and
in practice, as in Sirius.8 of reconfigurations on, for example, why machine learning applications may
buffer management1 or the traffic pat- particularly benefit from reconfigu-
Conclusion, Challenges Ahead, tern itself? Which functionality can be rable networks (https://bit.ly/44zUADE).
and Interviews with Experts supported in software (which is more A key challenge, according to Vah-
While reconfigurable datacenter topol- flexible), and which in hardware (which dat, concerns the control system of the
ogies are an exciting vision, they are also is more complex but more performant)? reconfigurable datacenter network,
a bold one. While the first concepts and Where should we deploy the optical cir- and its scalability. He envisions that
specific technologies for reconfigurable cuit switches—is the typical assump- future, improved control systems may
datacenter topologies are being actively tion that one layer of OCSs is enough in enable a faster and more fine-grained
developed at, for example, Google, the RDCNs justified? Should optical switch- adaptation of the topology, which can
field is very young. Early deployments es connect datacenter racks or rather further improve datacenter network
are small-scale and reconfigure slowly; entire pods? What is the optimal inter- performance. If decisions have to be
however, there are developments and play between electrical packet switches made within seconds or even millisec-
efforts to support more fine-grained and optical circuit switches in future hy- onds, it can be difficult or impossible
adaptations and hence further improve brid datacenters? And what is the cost of to collect sufficient information about
network performance. Reconfigurable such architectures? And so on. the network traffic in real time, and
networks, both demand-oblivious and To answer some of these questions, generally, the control system may have
demand-aware, introduce additional we provide complementary video inter- to combine both decentralized compo-
complexities, but also bear the poten- views, in which three leading experts nents (for fast reaction and robustness)
tial to simplify certain aspects of net- share with us their perspectives on and centralized components (for global
work operations. In general, the transi- reconfigurable datacenter networks, visibility and optimization). Multiple
tion to highly dynamic topologies will the state of the art, and the challenges controllers may also be organized in a
likely be long and strenuous. ahead. hierarchical manner. By default, exist-
Operational and deployment chal- Video interviews with experts. MIT ing distributed system software is not
lenges. In terms of operational and de- professor Manya Ghobadi shares with used to exert control at such granular-
ployment challenges, we currently do us her insights on the special struc- ity. Despite these challenges, Vahdat
not have a good understanding of the ture of datacenter traffic compared emphasizes that reconfigurable data-
implications of dynamic networks from with other communication traffic, and center networks also bear a potential
the perspective of the other layers of the how this structure can be exploited in for simplifying datacenter planning and
networking stack. When is it beneficial reconfigurable networks such as Pro- operations in many aspects, for exam-

52 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


research and advances

ple, by naturally supporting heteroge- the European Research Council (ERC), Implementation. USENIX (2020), 1–18.
24. Mellette, W.M. et al. Rotornet: A scalable, low-
neous network elements and therefore grant agreement no. 864228 (Adjust- complexity, optical datacenter network. In
incremental hardware upgrades. Net), as well as by the Israeli Science Proceedings of the ACM SIGCOMM 2017 Conf. on
Data Communication. ACM (2017), 267–280.
Ghobadi sees a big opportunity, but Foundation, grant ISF 2497/23. 25. Michail, O. An introduction to temporal graphs: An
also a challenge, for reconfigurable net- algorithmic perspective. Internet Mathematics 12, 4
(2016), 239–280.
works coming from ML applications. References 26. Mukerjee, M.K. et al. Adapting TCP for reconfigurable
1. Addanki, V., Avin, C., and Schmid, S. Mars: Near- datacenter networks. In 17th USENIX Symp. on
ML workloads often feature significant optimal throughput with shallow buffers in Networked Systems Design and Implementation.
temporal and spatial structure, and traf- reconfigurable datacenter networks. In Proc. ACM USENIX (2020), 651–666.
SIGMETRICS 7, 1 (2023), 1–43. 27. Namyar, P. et al. A throughput-centric view of the
fic is fairly predictable, which can sim- 2. Al-Fares, M., Loukissas, A., and Vahdat, A. A scalable, performance of datacenter topologies. In Proceedings
plify the control plane of reconfigurable commodity data center network architecture. ACM of the ACM SIGCOMM Conf. ACM (2021), 349–369.
SIGCOMM Computer Communication Rev. 38, 4 (Aug. 28. Poutievski, L. et al. Jupiter evolving: Transforming
datacenters. However, collecting and 2008), 63–74. Google’s datacenter network via optical circuit
characterizing these traffic patterns and 3. Amir, D. et al. Optimal oblivious reconfigurable switches and software-defined networking. In
networks. In Proceedings of ACM STOC. ACM (2022), Proceedings of the ACM SIGCOMM 2022 Conf. ACM
tailoring a reconfigurable network for a 1339–1352. (2022), 66–85.
4. Avin, C. et al. On the complexity of traffic traces and
very specific workload, such as distrib- implications. Proceedings of the ACM on Measurement
29. Roy, A. et al. Inside the social network’s (datacenter)
network. In Proceedings of the 2015 ACM Conf. on
uted ML, still imposes significant opera- and Analysis of Computing Systems 4, 1 (2020), 1–29. Special Interest Group on Data Communication. ACM
5. Avin, C., Mondal, K., and Schmid, S. Demand-aware
tional and research challenges. network designs of bounded degree. In 31st Intern.
(2015), 123–137.
30. Schmid, S.C. et al. SplayNet: Towards locally self-
According to Papen, dynamic to- Symp. on Distributed Computing. Schloss Dagstuhl- adjusting networks. IEEE/ACM Trans.on Networking
Leibniz-Zentrum fuer Informatik (2017).
pologies also introduce novel failure 6. Avin, C. and Schmid, S. Toward demand-aware
24, 3 (2016), 1421–1433.
31. Shakeri, A. et al. Traffic allocation strategies in
scenarios, for which current debugging networking: a theory for self-adjusting networks. WSS-based dynamic optical networks. J. of Optical
ACM SIGCOMM Computer Communication Rev. 48, 5
tools are insufficient or not applicable (2018), 31–40.
Communications and Networking 9, 4 (2017), B112–
B123.
at all. Since in large networks, failures 7. Azimi, N.H. et al. Firefly: A reconfigurable wireless 32. Singh, A. et al. Jupiter rising: A decade of Clos
data center fabric using free-space optics. ACM topologies and centralized control in Google’s
can happen frequently, detection and SIGCOMM Computer Communication Rev. 44, 4 datacenter network. ACM SIGCOMM Computer
reaction must be fast. Failure scenarios (2014), 319–330. Communication Rev. 45, 4 (2015), 183–197.
8. Ballani, H. et al. Sirius: A flat datacenter network with 33. Teh, M.Y., Wu, Z., and Bergman, K. Flexspander:
can also be technology-dependent and nanosecond optical switching. In Proceedings of the Augmenting expander networks in high-performance
generally, different technologies cur- ACM SIGCOMM 2020 Conf. ACM (2020), 782–797. systems with optical bandwidth steering. IEEE/OSA
9. Chen, K. et al. OSA: An optical switching architecture J. of Optical Communications and Networking 12, 4
rently come with different trade-offs. for data center networks with unprecedented (2020), B44–B54.
Which architecture will be most at- flexibility. IEEE/ACM Trans Netw. 22, 2 (2014), 34. Valadarsky, A. et al. Xpander: Towards optimal-
498–511. performance datacenters. In Proceedings of the 12th
tractive in terms of the performance 10. Farrington, N. Helios: A hybrid electrical/optical switch Intern. on Conf. on Emerging Networking EXperiments
architecture for modular data centers. SIGCOMM
and robustness it provides, as well as Comput. Commun. Rev. 40, 4 (Oct. 2010), 339–350.
and Technologies. ACM (2016).
35. Venkatakrishnan, S.B., Alizadeh, M., and Viswanath,
its hardware and operational costs, is 11. Figiel, A. et al. Efficient algorithms for demand- P. Costly circuits, submodular schedules and
aware networks and a connection to virtual network
currently unclear and will likely evolve embedding. In 28th Intern. Conf. on Principles of
approximate carathéodory theorems. In Proceedings
of the 2016 ACM SIGMETRICS Intern. Conf. on
over time. At least in terms of energy Distributed Systems. Schloss Dagstuhl – Leibniz- Measurement and Modeling of Computer Science.
Zentrum für Informatik (2024).
costs, optical switches are expected to 12. Foerster, K.-T. Analyzing the communication clusters
ACM (2016), 75–88.
36. Wang, W. et al. TopoOpt: Co-optimizing network
be more efficient compared with tradi- in datacenters. In Proceedings of the ACM Web Conf. topology and parallelization strategy for distributed
2023. ACM (2023), 3022–3032.
tional switches.8 13. Ghobadi, M. et al. Projector: Agile reconfigurable
training jobs. In 20th USENIX Symp. on Networked
Systems Design and Implementation. USENIX (2023),
Research challenges. Last but not data center interconnect. In Proceedings of the ACM 739–767.
SIGCOMM Conf. ACM (2016), 216–229. 37. Xiao, W. et al. Gandiva: Introspective cluster
least, the vision of reconfigurable data- 14. Greenberg, A. et al. Vl2: A scalable and flexible data scheduling for deep learning. In 13th USENIX Symp.
center networks and topology engineer- center network. In Proceedings of the ACM SIGCOMM on Operating Systems Design and Implementation.
2009 Conf. on Data Communication. ACM (2009), USENIX (2018), 595–610.
ing also poses interesting graph-theo- 51–62. 38. Zerwas, J. et al. Duo: A high-throughput
retical research problems. The design 15. Griner, C. et al. Cerberus: The power of choices in reconfigurable datacenter network using local routing
datacenter topology design-a throughput perspective. and control. Proceedings of the ACM on Measurement
of self-adjusting networks is related to Proceedings of the ACM on Measurement and Analysis and Analysis of Computing Systems 7, 1 (2023).
graph spanners and embedding prob- of Computing Systems 5, 3 (2021), 1–33. 39. Zhou, X. et al. Mirror mirror on the ceiling: Flexible
16. Hall, M.N. et al. A survey of reconfigurable optical wireless links for data centers. ACM SIGCOMM
lems,5,11 as well as to self-adjusting data networks. Optical Switching and Networking 41, Comput. Commun. Rev. 42, 4 (2012), 443–454.
(2021), 100621.
structures.6 In a paper,5 we presented 17. Kandula, S., Padhye, J., and Bahl, P. Flyways to de-
initial theoretical results, relating the congest data center networks. In Proceedings of ACM Chen Avin ([email protected]) is a professor in the School
Workshop on Hot Topics in Networks. ACM (2009).
average path length in a demand-aware 18. Kandula, S. et al. The nature of data center traffic:
of Electrical and Computer Engineering at Ben Gurion
University of the Negev, Israel.
network to the entropy of the demand. measurements & analysis. In Proceedings of the
9th ACM Internet Measurement Conf. ACM (2009), Stefan Schmid is a professor in the Faculty of Electrical
However, for many performance met- 202–208. Engineering and Computer Science at the Technical
rics, including throughput,1,3 we cur- 19. Khani, M. et al. Sip-ml: high-bandwidth optical network University Berlin, Germany.
interconnects for machine learning training. In
rently know only very little about how Proceedings of the 2021 ACM SIGCOMM Conf.. ACM
they can be optimized algorithmically (2021), 657–675. This work is licensed under a
20. Lange, A. et al. Sub-nanosecond optical switching Creative Commons Attribution
in datacenter networks with controlled using chip-based soliton microcombs. In Optical Fiber International 4.0 License.
topologies. Communication Conf. The Optical Society (2020). © 2025 Copyright held by the owner/author(s).
21. Li, J. et al. Uniform-cost multi-path routing for
reconfigurable data center networks. In Proceedings
Acknowledgments of the ACM SIGCOMM 2024 Conf. ACM (2024),
433–448.
We would like to thank our interview- 22. Liang, C. et al. Negotiator: Towards a simple yet
ees Manya, Amin, and George, and all effective on-demand reconfigurable datacenter
network. In Proceedings of the ACM SIGCOMM 2024 Watch the authors discuss
our co-authors and students working Conf. ACM (2024), 415–432. this work in the exclusive
23. Mellette, W.M. et al. Expanding across time to Communications video.
with us on self-adjusting networks over deliver bandwidth efficiency and low latency. In 17th https://cacm.acm.org/videos/
the last years. Research supported by USENIX Symp. on Networked Systems Design and datacenter-networks

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 53
research and advances
DOI:10.1145/ 3709599
drug discovery.35 There is growing con-
Can prompt engineering be made more cern, however, that a large portion of
this success hinges on prompt engi-
reliable, verifiable, and generalizable? neering, which is often an ad hoc meth-
od to revise prompts being fed into an
BY CHIRAG SHAH LLM to achieve a desired response or
analysis.24

From Prompt
Unfortunately, using prompt engi-
neering for scientific research is dan-
gerous. At best, there is the possibility
of creating a feedback loop with a self-

Engineering
fulfilling prophecy. At worst, research-
ers may be overfitting hypotheses to
data, leading to untrustworthy claims
and findings. In most cases, overreli-
ance on prompt engineering can lead

to Prompt
to unexplainable, unverifiable, and less
generalizable outcomes,6,25,27,37 lacking
a scientific approach.
We consider a scientific approach to

Science with
be the one that is well-documented and
transparent, verifiable, repeatable, and
free of individual biases and subjec-
tivity. Most prompt-engineering tech-
niques violate at least one of these prin-

Humans in
ciples. While such ad hocness can be
bad in most situations, it is especially
problematic when it comes to scientific
research, which demands a certain lev-

the Loop
el of rigor. Thus, if we want to continue
harnessing the power of LLMs, we need
to do so in a responsible manner and
with enough scientific rigor.
Here, we focus on research areas and

key insights
˽ LLMs are increasingly being used in
scientific research, but their application
often involves ad hoc decisions that
can affect research quality. Forcing
as the sophistication and
I N R ECEN T Y E A R S , LLMs to produce desired results through
prompt adjustments in an unscientific
capabilities of large language models (LLMs) have manner may lead to unverified and
harmful outcomes.
grown, so have the tasks for which they’re applicable,
˽ We could use humans in this process,
going beyond information extraction and synthesis15 but overreliance on them may defeat
the purpose of using LLMs as a strategic
to include analysis, content creation, and reasoning.8 tool. A new method shows how humans
Unsurprisingly, many researchers find them useful for and LLMs can work in collaboration,
bringing verifiability and trustworthiness
research tasks, such as identifying relevant papers,19 as well as objectivity and scalability in
IMAGE BY ZHITKOV BORIS

data analysis.
synthesizing literature reviews,3 writing proposals,11 ˽ The method, demonstrated here through
and analyzing data.31 They have also been found two applications, includes rigorous
labeling, deliberation, and documentation
effective for investigative tasks, such as steps to reduce subjectivity.

54 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 55
research and advances

applications in which an LLM is used neering process in a research setting. possible, community-validated metric
for labeling text data, such as identify- Here, a researcher is interested in get- for assessment.
ing user tasks or intents, extracting sen- ting some input data analyzed or la- ˲ Discuss individual differences and
timents, and deriving issues from user beled for a downstream application. biases with the goal of rooting them
reports. Building on an LLM’s ability to To get the desired outcomes, the set out. This will help create assessments
summarize or synthesize data, many of instructions—the prompt provid- and a set of instructions that can be
recent research projects have deployed ed to the LLM—is iteratively revised, understood and carried out by any re-
them for these tasks.24,31 There are also an approach used in many recent searcher with a reasonable set of skills
efforts to use LLMs for generating data- works.8,16,30,40 While this can be effec- so they can replicate the experiments
sets and benchmarks,20,34 which could tive, it leaves us with many potential and achieve similar results.
be foundations for future research. issues. For instance, if the prompt is Though a few recent works have
Therefore, it is paramount that out- revised by a single researcher in an ad shown promise in addressing these
comes from LLMs are thoroughly vali- hoc manner just to fine-tune the out- desiderata, they still lack enough rigor
dated and proven trustworthy. put, personal biases and subjectivity to meet the requirements of transpar-
In this article, we demonstrate how may be introduced. If this process is ency, objectivity, and replicability. For
to do this by creating a verifiable and not well-documented, it may be dif- instance, Shah et al.7 use a verifying
replicable method of prompt science ficult to understand or mitigate these method for prompt generation, but
that aids in prompt engineering. Spe- issues. This may also make it more dif- their approach lacks verification of cri-
cifically, we take inspiration from a rich ficult for someone to verify or replicate teria as well as replicability for differ-
literature and methods of qualitative the process. These issues may hinder ent datasets.
coding to employ a human in the loop scientific progress that relies on sys-
for verifying responses from LLMs and tematic knowledge being generated, Methodology
systematically constructing prompts. questioned, and shared. There may also In the previous section, we described
While not a universal solution for all be issues of feedback loops and self-ful- how a typical prompt-engineering pro-
prompt-engineering situations, this filling prophecies, which are especially cess works; essentially, it involves revis-
approach is nonetheless applicable to dangerous when we are dealing with ing the prompt to an LLM in order to
many scenarios involving human- or highly opaque systems such as LLMs. fit the desired response and/or down-
machine-generated annotations. When using LLMs for scientific re- stream application. We identified that
search, we need some level of reliabil- one of the problems with this endeavor
Prompt Engineering ity, objectivity, and transparency in the is that it is quite ad hoc, unreliable, and
Prospects and Problems process. Specifically, there are three not generalizable. Even with enough
LLMs are built on a transformer ar- problems we need to address: 1) indi- documentation of how the final prompt
chitecture that contains encoder and vidual bias and subjectivity, 2) vague was generated, one cannot ensure that
decoder modules, and are trained on criteria for assessment of bending the the same process will yield the same
large corpora of text data. What an LLM criteria to match LLM’s abilities, and 3) quality output in a different situation—
generates depends heavily on the inner narrow focus to a specific application be it with a different LLM or the same
workings of this architecture as well as instead of a general phenomenon. To LLM at a different time. To overcome
the training data. Most users and re- address these issues, we propose the these problems, in this section we de-
searchers do not have the ability to di- following steps: scribe an approach that first iterates
rectly change that architecture or the ˲ Ensure at least two qualified re- over the criteria for assessing an LLM’s
training process. They can, however, searchers are involved in the whole response and then iterates over the
apply various techniques such as fine- process. prompt in a systematic way, using a hu-
tuning,10,12 prompt engineering,33 and ˲ Before focusing on revising the man in the loop. Our method is inspired
retrieval-augmented generation (RAG)39 prompt to produce desirable outcomes, by a typical qualitative coding process.
to shape the outputs as they desire. clearly and objectively articulate what a Qualitative coding with multiple as-
Figure 1 shows the conceptual flow desirable outcome would be. This will sessors. The problem we are trying to
of a typical prompt revision or engi- involve establishing a reliable and, if address is that of constructing a prompt
or a set of instructions that produces
Figure 1. A typical flow of prompt engineering for a research process that involves desired outcomes for unseen data. This
labeling data.
is similar to building a codebook in
qualitative research. Take, for instance,
work described in Mitsui et al.17 Here,
the authors were attempting to label
Revision user behaviors (data) with user intents
(discrete labels). For this, they needed
Prompt
to identify a finite (and preferably small)
LLM Response Application set of labels and their desired descrip-
Data tions. This is called a codebook.
They began their process by having
an initial codebook that was inspired

56 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


research and advances

by prior work and literature. But then Figure 2. A typical flow of prompt engineering for a research process that involves
they needed to fine-tune and validate labeling data.
that codebook for the given application,
which they addressed by following the
qualitative coding process depicted in Instructions/
Labels
Figures 2 and 3. Codebook
As shown in the figures, for a typi-
cal qualitative coding process, two re- ICR
searchers who are sufficiently familiar
with the task use the initial codebook
and a small set of data to independently Training Data Labels
provide labels during the training por-
tion. The process continues until there
is good enough agreement, or inter-
coder reliability (ICR), at which point
the codebook is considered ready. It
can then be used by the same or other
researchers to independently and non-
overlappingly label unseen data during Figure 3. Testing portion of qualitative coding.
the testing portion.
It is important to emphasize the
value of iterating with a human in the
loop. During this process, it is not just Revised
Instructions/ Labels
the codebook that is getting revised,
Codebook
but also the understanding of the two
researchers. Through their discus-
sions of disagreements, they are able
to think through the task, the data,
and the best ways to describe that data.
Split 1
This enhanced understanding impacts Testing Data
the codebook and vice versa. In other Split 2 Labels
words, this process is used to both pro-
duce the desired outcome and help the
researchers learn and discover. But this
is not just simple brainstorming. The same or similar experiments by other can tweak the prompt, it is important
researchers are discussing their differ- researchers, by other LLMs, and at dif- to establish that we know how to evalu-
ences and charting their way forward ferent times. We will describe the ab- ate the response. Establishing the cri-
bound by a rigorous set of criteria. For stract process in this section and its op- teria for assessing the outputs outside
instance, in this example, the research- erationalization in the next section. To of the prompt-tuning process ensures
ers were aiming to produce a codebook begin, we list two primary desiderata: the researchers are focused on the task
with categories that were clear, concise, ˲ The desired response from the LLM and not the LLM’s capabilities. A set of
comprehensive, and reliable. They had must be verified along a set of dimen- appropriate criteria could be obtained
specific definitions of these criteria, sions and for a large enough sample from prior work or literature, but these
which they used during their delibera- size by at least two humans. criteria still need to be tested and vali-
tions. This qualitative coding process ˲ The prompt must consistently dated for the given application. More-
ensures that individual subjectivity is generate the desired responses and over, eventually we will need humans
dissolved to the extent possible while must be justifiable, explainable, and to assess whether the LLM is generating
producing a coding scheme that is ro- verifiable. desired responses in order to revise the
bust, verifiable, and explainable. To operationalize these two crite- prompt. For these reasons, we need to
Codebook building as a way to con- ria, we propose a multi-phase process, execute our first human-in-the-loop as-
struct prompts. We will now describe working backward in the pipeline from sessment, involving the following steps:
a new method for prompt generation prompt to application, as shown in Fig- ˲ Run the existing pipeline to gener-
inspired by the qualitative codebook- ure 4. This method has four phases. ate a reasonable number of responses.
building process discussed above. Phase 1: Set up the initial pipeline. Give each assessor these responses,
Once again, our objective is to gener- This can be done using any reasonable the existing set of criteria (codebook),
ate a prompt for a given LLM in order prompt. For instance, the initial prompt and sufficient details about the appli-
to achieve a desired response. We also could be a simple and direct instruction cation. The assessors should review
need this prompt to be systematically to generate the desired response. these outcomes and instructions be-
created with enough generalizability Phase 2: Identify appropriate crite- fore starting their assessment. This
and explainability to be useful for the ria to evaluate the responses. Before we can be supervised by the lead research-

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 57
research and advances

Figure 4. Proposed methodology with multi-phase approach to response and prompt validations.

Prompt
LLM Response Criteria Application
Data

Literature,
Prior Work,
Hypothesis

er or by an expert. Note that due to the Phase 3: Iteratively develop the the process is finished. The assessors
interactive learning and collaboration prompt. Continuing to work backward, or the supervising researcher may still
that takes place among the human we now come to the next phase of co- make minor adjustments to the prompt
assessors (described in the previous debook development, which happens for improved readability, interpretabil-
section), they do not need special qual- to be the first phase of the pipeline de- ity, and generalization.
ifications other than being able to un- picted in Figure 4. The objective here is Phase 4: Validate the whole pipeline.
derstand the task at hand and discuss to see if the LLM, with the current ver- As an optional final validation phase,
their disagreements. sion of the prompt, is meeting our crite- run through the whole pipeline using
˲ Each assessor applies the criteria to ria for the responses. If not, we need to a portion of the test data and evaluate
the given responses independently. figure out what changes can be made to random samples of the responses to en-
˲ Compare the assessors’ outcomes the prompt. sure the entire process still yields qual-
and compute the level of agreement. At this point, we know what makes a ity results that can be independently
This could be as simple as measuring good response from our LLM. More im- and objectively validated. For this, it is
the percentage of times they agree, or portantly, we have now trained at least ideal to use a different set of assessors
using ICR with an appropriate method, two researchers in reasonably, objec- and have them independently label that
such as Cohen’s kappa36 or Krippen- tively, consistently, and independently same set of randomly sampled respons-
dorfff’s alpha.13 evaluating these responses. They can es. Compute their ICR on this sample to
˲ If the amount of agreement is not also more effectively assess which in- see whether there is good enough agree-
sufficient (this could vary from applica- structions can help produce the desired ment among them on labeling, and
tion to application), bring the assessors responses. To iteratively develop the whether the generated responses are
together to discuss their disagreement. prompt, we execute the following steps: meeting the desired criteria. If either
This is a very important step, as it not ˲ Run a reasonable number of data of these objectives are not met, take ap-
only offers an opportunity to remove inputs using the current prompt. propriate corrective actions based on
individual biases and subjectivity, but ˲ Give the generated responses to phases 2 and 3.
also influences researchers’ collective the human assessors, ideally the same As one can see, there are multiple
understanding of the task and the cri- ones who were involved in the previ- checkpoints and validations in this
teria for evaluation. The discussion ous phase. These assessors should use method that ensure removal of ad hoc-
could result in both resolution of the the codebook produced in the previous ness, subjectivity, and opaqueness as
conflict and changes to the codebook, phase to assess the responses. much as possible. A proper documen-
which could be about the number or ˲ Have them assess whether the gen- tation of its execution will yield a more
the description of the criteria for as- erated responses meet the criteria for informative, meaningful, and scien-
sessment. Then, a new set of responses the application in question. Compare tific communication that can help
are generated for assessment using the the assessments by different assessors other researchers trust the outputs as
revised codebook and the above pro- to find their level of agreement. If that well as test and build these pipelines
cess is repeated. agreement is insufficient, have the as- themselves.
˲ Once a sufficient level of agreement sessors discuss their disagreements,
is achieved, the codebook for assessing possibly under the supervision of a se- Demonstrations
the responses is finalized. At this point, nior researcher or an expert, and revise To demonstrate the methodology ex-
the assessors may still choose to make the codebook as necessary. Here, the plained here, we will now walk through
minor changes to the codebook to en- codebook is the prompt fed to the LLM. its execution for two specific research
hance its readability, with the idea that Repeat the above process. applications, resulting from collabo-
anyone who did not participate in this ˲ Once there is enough agreement rations with two different sets of re-
development process can still under- among the assessors that the prompt searchers.
stand and use it in much the same way meets the objective of generating de- Identifying user intents. One of the
the assessors would. sired responses above some threshold, common problems in the areas of in-

58 COM MUNICATIO NS O F TH E ACM | J U NE 2025 | VO L . 68 | NO. 6


research and advances

formation retrieval and recommender with the testing phase, in which the
systems is that of understanding user LLM was given new data and asked
intents.38 Doing so requires having a to use the human-validated taxono-
taxonomy of user intents and then us- my to label user intents. A set of two
ing this taxonomy to label user queries,
questions, or behaviors with appropri- It is important to different researchers independently
assigned their own labels using the
ate intents. However, as shown by Shah
et al.,28 it is very challenging to con-
emphasize the same taxonomy.
We then compared the labels among
struct and use these intent taxonomies. value of iterating the human annotators, as well as the
Taking one of the existing taxonomies
could help with construction, but may
with a human in the humans and the LLM, finding high
enough ICR for all of them. We repeated
not be sufficiently comprehensive or loop. During this this process with two other LLMs—Mis-
accurate to be applicable. Conversely,
creating a customized taxonomy for a
process, it is not tral and Hermes, both open sourced—
and found their labeling also had
given application could help with appli- just the codebook strong agreements with those obtained
cability, but would incur a high cost.
LLMs could help in constructing and that is getting from GPT4, as well as from the human
annotators. This indicated that both
applying taxonomies, but are faced with revised, but also the taxonomy and the process for using

the understanding
the same challenges described in the that taxonomy through an LLM were
first section, namely, a lack of reliability robust and stable. At this point, we not
and validation, the danger of creating
feedback loops, and issues with repli-
of the two only had a reasonably reliable and ro-
bust pipeline for generating and using
cability and generalizability. To over- researchers. a user-intent taxonomy, but it also had
come these issues, the author, along been validated through a rigorous pro-
with a set of collaborators, applied the cess with a human in the loop, making
method proposed here. The following the pipeline considerably trustworthy
are the specific steps we took based on and generalizable. The details of this
the pipeline presented in Figure 4 and work, along with results and their dis-
described in the previous section. cussions, can be found in Shah et al.29
Phase 1: Set up the initial pipeline. A Auditing an LLM. We now turn to
pipeline that included an initial prompt a different type of problem involving
and a small sample of log data from LLMs: auditing. While there have been
Bing Chat that could create a user-in- many recent efforts in this area,18,22 we
tent taxonomy (zero-shot approach) was still lack systematic and scientifically
set up for initial testing. rigorous approaches. We attempted to
Phase 2: Identify appropriate criteria address this using a simple but effec-
to evaluate the responses. We identified tive method: probing the same ques-
five criteria—comprehensiveness, con- tion differently to the given LLM to see
sistency, clarity, accuracy, and concise- how it responds. Different versions of
ness—for evaluating the quality of a the same question could be generated
user-intent taxonomy using prior work21 by humans, but that would hinder scal-
as well as human assessments, shown ability. Using an LLM (ideally, one oth-
in Figure 4. er than the one being audited) could
Phase 3: Iteratively develop the help (as shown in Rastogi et. al22) but,
prompt. We first ran the zero-shot ap- like before, we are faced with the chal-
proach to obtain the initial taxonomy. lenge of ensuring that the generation
Two researchers familiar with the task of those probes is reliable, robust, and
of creating and using a user-intent generalizable. To meet these challeng-
taxonomy assessed the generated tax- es, we again ran through the proposed
onomy through those five criteria. Ap- method in the codebook-building sec-
propriate revisions were made in the tion above.
original prompt and a new sample of Phase 1: Set up the initial pipeline. The
data was passed through the LLM (here, initial prompt for the LLM was “Gen-
GPT4) to create a new version of the erate five different questions for the
taxonomy. The researchers repeated following question that represent the
the process until they obtained a good same meaning, but are different.” We
agreement (ICR9). used the Mistral LLM and the data we
Phase 4: Validate the whole pipeline. used for the original questions came
Once the taxonomy was finalized (see from the TruthfulQA dataset.14 Given
details in Shah et al.29), we proceeded the application—generating different

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 59
research and advances

versions of the same question for au- generation tools5 that are effective at
diting an LLM—we set the criteria for predicting next tokens. But for many re-
responses generated by Mistral as rel- searchers, they are proving to be useful
evance and diversity. in many research problems that involve
Phase 2: Identify appropriate criteria
to evaluate the responses. We recruited The approach labeling or classifying input data based
on some analysis or reasoning. How-
two researchers who understood the
application well. Using the existing
itself does not ever, using LLMs blindly for these re-
search tasks, just because their outputs
literature and the task at hand, we de- guarantee removal seem reasonable, can be dangerous. It
termined that a good set of probes will
have two important characteristics: rel-
of individual biases. can perpetuate biases,26 create fabricat-
ed or hallucinated responses,23 and pro-
evance or semantic similarity of each Rather, it promotes vide unverifiable results.32 All of these
probe with the original question and
adequate diversity among the probes.
counteracting are harmful for scientific progress.
Creatively or iteratively designing
As the two researchers started indepen- them by engaging prompts to produce desirable out-
dently assessing the generated probes
from a small set of data (10 original multiple comes—what is often referred to as
prompt engineering—does not address
questions that resulted in 50 probes) researchers in these concerns. In fact, prompt engi-

a well-defined
and then comparing their labels with neering may even worsen the problems
each other, two key things happened: by focusing too much on bending the
They started firming up the defini-
tions of relevance and diversity, and
process with process to generate the desired out-
comes than on developing a verifiable,
they started getting more agreements objective criteria. validated, and replicable process. In
with their assessments. We had to go this article, we showed how prompt en-
through three rounds of this exercise to gineering can be turned into or aided
finalize our definitions of the two crite- by prompt science by introducing a
ria, as well as how to label them. multi-phase process with a human in
Phase 3: Iteratively develop the the loop. Specifically, we divided the
prompt. We wanted to have a prompt prompt-to-application pipeline into two
that consistently generates outputs (a major phases that provide two levels of
set of questions) that could achieve high verification: one for the criteria for eval-
enough relevance and diversity for their uating the responses generated by the
use in a downstream application. We set LLM, and the other for the prompt used
the threshold for this at 75%. The same to generate the responses. Through a
two researchers marked the generated method inspired by qualitative coding
responses for relevance and diversity for codebook development, we showed
using a new set of input data. The initial how to have scientific rigor in devel-
round of assessment showed that only oping a reliable prompt and getting a
about 50% of the outcomes met the goal trustworthy response for a downstream
of relevance (medium or high) and diver- application. We demonstrated this
sity. To change this, the original prompt method in two different applications:
was modified to include more details generating and using a user-intent tax-
about the application and the criteria. onomy and auditing an LLM.
This process was then repeated. After We believe there are three main rea-
two more iterations, the LLM was gen- sons this method provides scientific
erating outcomes that were relevant and rigor:
diverse for at least 80% of the cases. ˲ It offers a systematic and scientifi-
Phase 4: Validate the whole pipeline. cally verifiable way of producing prompt
A senior researcher tested the whole templates, while potentially reducing
pipeline with the final versions of the individual subjectivity and biases. Note
prompt and the criteria with a few that the approach itself does not guar-
rounds of random samplings. They antee removal of individual biases.
made a few minor adjustments to the Rather, it promotes counteracting them
prompt for improved readability and by engaging multiple researchers in
generalizability. The details of these ex- a well-defined process with objective
periments, along with a demo, can be criteria. This is a known benefit of the
found in Amirizaniani et al.1,2 qualitative coding process.4
˲ Through the involvement of mul-
Conclusion tiple researchers documenting their
LLMs could simply be stochastic string- deliberations and decisions, it fosters

60 COM MUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


research and advances

openness and replicability. This docu- experiments described in the “Identi- 19. Paroiu, R. et al. Asking questions about scientific
articles—Identifying large N studies with LLMs.
mentation can and should be made fying user intents” section are results Electronics 12, 19 (2023), 3996.
available, much like the code for an of collaboration with Ryen White and 20. Peng, L. et al. Generating efficient training data via
LLM-based attribute manipulation. arXiv preprint
open source tool, allowing other re- other researchers from Microsoft, as arXiv:2307.07099 (2023).
searchers to validate, replicate, and listed in Shah et al.29 The LLM auditing 21. Raad, J. and Cruz, C. A survey on ontology evaluation
methods. In Proceedings of the Intern. Conf. on
extend the work. Engaging multiple experiments described in the “Audit- Knowledge Engineering and Ontology Development,
researchers in constructing and apply- ing an LLM” section are results of hard part of the 7th Intern. Joint Conf. on Knowledge
Discovery, Knowledge Engineering and Knowledge
ing objectively defined labels also helps work by Maryam Amirizaniani, Adrian Management (2015).
break the feedback loop or self-fulfilling Lavergne, Elizabeth Okada, and Jihan 22. Rastogi, C. et al. Supporting human-AI collaboration
in auditing LLMs with LLMs. In Proceedings of the
prophecy in which a single researcher Yao. We are also thankful to Aman 2023 AAAI/ACM Conf. on AI, Ethics, and Society. ACM
(2023), 913–926.
tweaks the prompt to an LLM to achieve Chadha and Tanya Roosta for their 23. Rawte, V. et al. A survey of hallucination in large
desired results. valuable comments and guidance dur- foundation models. arXiv preprint arXiv:2309.05922
˲ In addition to producing a prompt (2023).
ing this process. 24. Reynolds, L. and McDonell, K. Prompt programming
template that can generate desired out- for large language models: Beyond the few-shot
paradigm. In Extended Abstracts of the 2021 CHI
comes reliably and consistently, the pro- References
Conf. on Human Factors in Computing Systems. ACM
1. Amirizaniani, M. et al. AuditLLM: A tool for auditing
cess also adds to our knowledge about large language models using multiprobe approach.
(2021), 1–7.
25. Sahoo, P. et al. A systematic survey of prompt
the problem at hand (application), how In Proceedings of the 33rd ACM Intern. Conf. on
engineering in large language models: Techniques and
Information and Knowledge Management. ACM
to best assess what we need for that (2024), 5174–5179.
applications. arXiv:2402.07927 (2024); https://arxiv.
org/abs/2402.07927
problem, and how we can consistently 2. Amirizaniani, M. et al. Developing a framework for
26. Shah, C. and Bender, E.M. Situating search. In
auditing large language models using human-in-the-
and objectively analyze the given data. loop. arXiv preprint arXiv:2402.09346 (2024).
Proceedings of the 2022 Conf. on Human Information
Interaction and Retrieval. ACM (2022), 221–232.
This rigorous process comes at a 3. Antu, S.A., Chen, H., and Richards, C.K.. Using LLM
27. Shah, C. and Bender, E.M. Envisioning information access
(Large Language Model) to improve efficiency in
cost. Comparing the typical process of systems: What makes for good tools and a healthy web?
literature review for undergraduate research. LLM@
ACM Trans. on the Web 18, 3 (2024), 1–24.
prompt engineering (Figure 1) with the AIED (2023), 8–16.
28. Shah, C. et al. Taking search to task. In Proceedings
4. Auerbach, C. and Silverstein, L.B. Qualitative Data:
proposed method of prompt science of the 2023 ACM Conf. on Human Information
An Introduction to Coding and Analysis, Vol. 21. NYU
Interaction and Retrieval. ACM (2023), 1–13.
(Figure 4), there is at least a threefold Press (2003).
29. Shah, C. et al. Using large language models to
5. Bender, E.M. et al. On the dangers of stochastic parrots:
generate, validate, and apply user intent taxonomies.
cost increase. Guided deliberations and Can language models be too big?. In Proceedings of
ACM Trans. on the Web (forthcoming).
the 2021 ACM Conf. on Fairness, Accountability, and
detailed documentation can also add Transparency. ACM (2021), 610–623.
30. Shah, N.H., Entwistle, D., and Pfeffer, M.A.. Creation
and adoption of large language models in medicine.
to the cost. Finally, the optional phase 6. Cain, W. Prompting change: Exploring prompt
JAMA 330, 9 (2023), 866–869.
engineering in large language model AI and its
4 adds a lightweight but more continual potential to transform education. TechTrends 68, 1
31. Shen, H. et al. Shaping the emerging norms of using
large language models in social computing research.
cost to the whole pipeline to maintain (2024), 47–57.
In Companion Publication of the 2023 Conf. on
7. Chen, B., Yi, F., and Varró, D. Prompting or fine-tuning?
its quality and validity. We can compare A comparative study of large language models for
Computer Supported Cooperative Work and Social
Computing. ACM (2023), 569–571.
this to an assembly line that is run pri- taxonomy construction. In 2023 ACM/IEEE Intern.
32. Stechly, K., Marquez, M., and Kambhampati, S. GPT-4
Conf. on Model Driven Engineering Languages and
marily through automated processes Systems Companion. IEEE (2023), 588–596.
doesn’t know it’s wrong: An analysis of iterative
prompting for reasoning problems. In NeurIPS 2023
with some human supervision. De- 8. Chew, R. et al. LLM-assisted content analysis: Using
Foundation Models for Decision Making Workshop.
large language models to support deductive coding.
pending on the risk of bad quality and arXiv preprint arXiv:2306.14924 (2023).
33. Strobelt, H. et al. Interactive and visual prompt
engineering for ad hoc task adaptation with large
the cost of the manual quality check, 9. Cohen, J. A coefficient of agreement for nominal
language models. IEEE Trans. on Visualization and
scales. Educational and Psychological Measurement
one could decide how often and how Computer Graphics 29, 1 (2022), 1146–1156.
20, 1 (1960), 37–46.
34. Thomas, P. et al. Large language models can
rigorously humans should be involved. 10. Ding, N. et al. Parameter-efficient fine-tuning of large-
accurately predict searcher preferences. In
scale pre-trained language models. Nature Machine
This then allows us to create more au- Proceedings of the 47th Intern. ACM SIGIR Conf.
Intelligence 5, 3 (2023), 220–235.
on Research and Development in Information. ACM
tomated systems with enough control 11. Gómez-Rodríguez, C. and Williams, P. A Confederacy
(2024), 1930–1940.
of models: A comprehensive evaluation of LLMs on
35. Vert, J.-P. How will generative AI disrupt data science
and quality assurance—going from co- creative writing. In Proceedings of the 2023 Conf. on
in drug discovery? Nature Biotechnology 41, (2023),
Empirical Methods in Natural Language Processing.
pilot to autopilot mode with LLMs. The ACL (2023).
750–751.
36. Warrens, M.J. Five ways to look at Cohen’s kappa. J. of
key, as argued and demonstrated here, 12. Hu, Z. et al. LLM-adapters: An adapter family for
Psychology & Psychotherapy 5 (2015).
parameter-efficient fine-tuning of large language
is to do so with enough human agency models. In Proceedings of the 2023 Conf. on Empirical
37. Ye, Q. et al. Prompt engineering a prompt engineer.
In Findings of the Association for Computational
and scientific rigor. Methods in Natural Language Processing. ACL (2023),
Linguistics. ACL (2024), 355–385.
5254–5276.
Even if the goal is not to create a fully 13. Krippendorff, K. Computing Krippendorff’s alpha-
38. Zhang, P. et al. Efficiently leveraging multi-level user
intent for session-based recommendation via atten-
automated pipeline, considering that reliability. (2011).
mixer network. In Proceedings of the Sixteenth ACM
14. Lin, S., Hilton, J., and Evans, O. TruthfulQA: Measuring
the proposed method not only leads how models mimic human falsehoods. In Proceedings
Intern. Conf. on Web Search and Data Mining. ACM
(2023), 168–176.
to outputs with higher quality, consis- of the 60th Annual Meeting of the Association for
39. Zhao, P. et al. Retrieval-augmented generation for AI-
Computational Linguistics (Volume 1: Long Papers).
tency, and reliability, but also an in- ACL (2022), 3214-3252.
generated content: A survey. CoRR (2024).
40. Ziems, C. et al. Can large language models transform
creased understanding of the task and 15. Lindemann, N.F. Sealed knowledges: A critical
computational social science? Computational
approach to the usage of LLMs as search engines.
the data, we believe the added cost is Linguistics 50, 1 (2023), 1–53.
In Proceedings of the 2023 AAAI/ACM Conf. on AI,
justifiable. The broader scientific com- Ethics, and Society. ACM (2023), 985–986.
16. Ma, P. et al. InsightPilot: An LLM-empowered Chirag Shah ([email protected]) is a professor in the
munity could also benefit from exercis- automated data exploration system. In Proceedings Information School at the University of Washington,
ing and perhaps insisting on such rigor of the 2023 Conf. on Empirical Methods in Natural Seattle, USA.
Language Processing. ACL (2023), 346–352.
so we can leverage LLMs ethically and 17. Mitsui, M., Shah, C., and Belkin, N.J. Extracting
information seeking intentions for web search
responsibly in our research. sessions. In Proceedings of the 39th Intern. ACM
This work is licensed under a Creative
SIGIR Conf. on Research and Development in
Commons Attribution-NonCommercial-
Information Retrieval. ACM (2016), 841–844.
Acknowledgments 18. Mökander, J. et al. Auditing large language models: a
ShareAlike International 4.0 License.
The taxonomy generation and usage three-layered approach. AI and Ethics (2023), 1–31. © 2025 Copyright held by the owner/author(s).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 61
research and advances
DOI:10.1145/ 3704724
the primary methods of introducing
Unveiling the dark side. malicious AI models into the software
supply chain. Developers often rely on
BY ADITYA K. SOOD AND SHERALI ZEADALLY libraries and frameworks to import
pre-trained AI models to expedite soft-

Malicious
ware development. Attackers can easily
compromise the underlying system if
the application consumes pre-trained
AI models tainted with malicious code.
For instance, the MLOps platform20

AI Models
fails to detect malicious AI models af-
ter scanning the model file containing
serialized data. Deploying a malicious
AI model would jeopardize the security

Undermine
of the systems and could put end users
at risk in some applications.
The sophistication of these mali-
cious AI models allows them to adapt

Software
and evade traditional security mea-
sures by leveraging their ability to ana-
lyze vast amounts of data, learn com-
plex patterns, and generate responses
that mimic human behavior. These

Supply-Chain
malicious AI models can simulate di-
verse attack scenarios, discover previ-
ously unknown vulnerabilities, and
create evasive techniques that tradi-

Security
tional defenses, reliant on predefined
signatures and heuristics, struggle to
detect. It is also important to note that
adversaries can exploit vulnerabilities
in the AI framework or the environ-

key insights
˽ Attackers inject malicious code into AI
models hosted on the public repositories.
These models allow attackers to
manipulate or exploit the environment
when deployed in software systems.
models6 into software
I N T EGR AT I NG M A L ICIOUS A I Incorporating malicious AI models
in dependencies or libraries also
supply chains presents a significant and emerging compromises the integrity of software
threat to cybersecurity. The attackers aim to embed products downstream.

malicious AI models in software components and ˽ Malicious AI models are distributed


to compromise the software supply
widely used tools, thereby infiltrating systems at a chain and trigger infections on a large
scale. The absence of rigorous testing
foundational level. Once integrated, the malicious AI or verification processes for AI models
allows adversaries to inject malicious
models execute embedded unauthorized code, which functionality into them.
IMAGE BY MA KSIM KA BAKO U

performs actions such as exfiltrating sensitive data, ˽ Organizations need robust processes
manipulating data integrity, or enabling unauthorized to validate the origin and integrity of AI
models. Organizations should use trusted
access to critical systems. Compromised development repositories, cryptographic validation,
and controlled access to mitigate risks
tools, tampered libraries, and pre-trained models are associated with third-party AI models.

62 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 63
research and advances

ment when the AI model is deployed, execution by containing any harmful different stages of development, from
such as weaknesses in deserialization behavior within a controlled space. training to maintenance control over
processes or insufficient validation Despite these solutions, the constantly model versions and updates. Reposito-
checks. This underscores the need for evolving nature of AI-based threats ries such as Hugging Face, TensorFlow
constant vigilance and proactive mea- makes it essential for organizations to Hub, PyTorch Hub, and Model Zoo of-
sures to secure AI systems. Integration continuously adapt and integrate mul- fer additional features over traditional
of AI models without adequate security tiple layers of security by building a model-hosting and version-control
checks can lead to the execution of un- holistic approach to security to defend systems such as GitHub. These fea-
authorized code. AI models, especially against malicious AI models. Overall, tures include model discovery; built-in
those shared via untrusted sources, malicious AI models are programmed tools for model evaluation, including
may contain embedded malicious to recognize and adapt to differ- model training, integrated inference
code or payloads that can be triggered ent security environments. They can application programming interfaces
during execution. Attackers can tam- identify patterns in security protocols (APIs), private model hosting, sharing
per with the model weights, scripts, and learn to avoid detection by mim- pre-trained models; and integration
or dependencies to insert hidden mal- icking normal system behavior. This with various machine learning (ML)
ware or back doors, allowing for unau- adaptability makes them particularly frameworks.
thorized code execution once the mod- dangerous because they can remain The AI model integration process
el is loaded into memory. The absence dormant and undetected for extended involves transferring a pre-trained
of robust security checks, sandboxing, periods, only activating under specific model from an external source or re-
or code-integrity verification can lead conditions to execute their malicious pository into a local development envi-
to arbitrary code execution, underscor- payload. This stealthy nature of at- ronment or application for further use
ing the critical need for stringent secu- tacks using AI complicates incident re- or fine-tuning. This process typically
rity measures. sponse efforts because traditional de- starts by selecting the appropriate
Detecting and mitigating malicious tection methods may not be effective model from a repository, which pro-
AI models is challenging due to their against such advanced cyber threats. vides models in various formats. Once
complexity and opacity. First, AI mod- Research contributions of this selected, the model is downloaded and
els are distributed as opaque files (for work. Considering the above, this ar- loaded into the chosen ML framework,
example, model weights) containing ticle makes the following contribu- such as TensorFlow or PyTorch, using
parameters learned during training. tions. First, we describe the AI model standardized functions or APIs. Dur-
As these parameters can be tampered integration process and discuss threat ing integration, it is crucial to ensure
with, AI models act maliciously when classification and associated attack that the model’s architecture, depen-
specific triggers are activated, making payloads. Second, we present a com- dencies, and input-output configura-
it challenging to distinguish between a plete attack flow model explaining tions are compatible with the existing
benign and compromised model with- how attackers use malicious AI mod- environment. The model may undergo
out deep analysis. Second, AI models els to compromise the target systems. additional steps, such as fine-tuning
rely on external dependencies and Third, we discuss the limitation of tra- domain-specific data, integration
frameworks, and attackers can exploit ditional security defenses to restrict into a larger system, or deployment
vulnerabilities in these libraries to in- the impact of malicious AI models. in a production environment. Under-
troduce malicious payloads. Third, AI Fourth, we discuss recommendations standing the complete AI model inte-
models are deployed in environments with granularity, which organizations gration process is essential to dissect
that lack proper validation or isolation can opt to strengthen software supply- the associated risks and threats. Table
mechanisms, allowing malicious code chain security. 1 presents the workflow of integrating
hidden within a model to execute un- the AI model.
detected. Several existing solutions Understanding the AI Model By following the steps in Table 1, us-
aim to address these issues, such as Integration Process ers can effectively import pre-trained
secure model validation to check the Hosting AI models in a repository is a AI models from repositories for devel-
integrity of models before they are de- crucial aspect of modern AI develop- opment.
ployed. Another key solution is to use ment and deployment.17 The reposito-
a secure model serialization format, ries hosting AI models are centralized Threat Modeling:
such as Safetensors,16 for AI models. locations where models, metadata, Malicious AI Models
This format provides a robust way to dependencies, and documentation Threat modeling10 offers a structured
verify the embedded code, offering a are stored securely, facilitating easy framework for understanding threats.
sense of reassurance about the safety access and collaboration. The AI mod- The primary importance of threat
of the AI models. In addition, model els deployed in the repositories are modeling lies in its proactive nature,
watermarking involves embedding often open source, allowing users to allowing organizations to anticipate
models with unique identifiers to ver- directly deploy them in the produc- and mitigate risks before malicious
ify their source and integrity. Sandbox- tion environment or fine-tune those AI actors can exploit them. The goal is
ing models in isolated execution envi- models for specific use cases. By host- to develop a systematic approach for
ronments is another technique that ing AI models in a repository, organi- identifying, assessing, and addressing
mitigates the risk of malicious code zations can ensure consistency across potential security threats to applica-

64 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


research and advances

tions, systems, or networks. This sec- Table 1. Basic workflow for integrating an AI model from the repository.
tion discusses malicious AI models,
threat classification, attack payloads, Integration Steps Description
and attack flow models specifically for Choose repository Select a repository or platform where the desired pre-trained model is hosted.
malicious AI models impacting soft- Standard repositories include:
ware supply chains. ˲ GitHub: A platform for hosting open source projects, including AI models.
Understanding malicious AI mod- ˲ Hugging Face Model Hub: A repository for hosting and sharing pre-trained
els. A malicious AI model is designed natural language processing models.
or tampered with to cause harm or ˲ TensorFlow Hub: A platform for sharing AI models compatible with the Tensor-
Flow framework.
act against the intended purpose of
˲ PyTorch Hub: A repository for sharing AI models compatible with the PyTorch
the system it’s integrated into. While framework.
many might assume that an AI model
Identify model Select the specific model to import from the repository. Hosted models are
is simply a tool for prediction or classi- organized by task, architecture, or domain.
fication, like predicting sentiment in a
Install required Install the necessary libraries or packages to interact with the repository
text, malicious actors can exploit these
packages and download the model. The list below includes widely used packages or
models in various ways to compromise dependencies.
security, privacy, and integrity. Several ˲ Hugging Face Model Hub: Install the transformers library using pip (pip install
examples of malicious AI models are transformers).
listed below. ˲ TensorFlow Hub: Install TensorFlow and TensorFlow Hub (pip install tensorflow
Embedding malicious code in model tensorflow-hub).
˲ PyTorch Hub: Install PyTorch (pip install torch).
artifacts. To speed up development, a
user downloads a pre-trained model Import model Import the desired model into one’s Python environment using the appropriate
from an online repository. The mod- library or framework.
˲ Hugging Face Model Hub (using transformers library):
el file is embedded with malicious
from transformers import AutoModelForSequenceClassification
code that executes when the model is model = AutoModelForSequenceClassification.from_pretrained
loaded, taking advantage of the load- (“distilbert-base-uncased”)
ing process to infect the system. This ˲ Tensor Hub:
could compromise the system, giving import tensorflow_hub as hub
model = hub.load(“https://tinyurl.com/22ts9aop”)
attackers control over the host ma- ˲ PyTorch Hub:
chine, enabling data exfiltration, or import torch
spreading the malware further. In re- model = torch.hub.load(‘pytorch/vision’, ‘resnet18’, pretrained=True)
al-world scenarios, Python pickle files Verify compatibility ˲ Verify that the model’s input and output formats match the data pipeline. If
have been used to embed malicious necessary, adjust preprocessing steps to ensure compatibility.
code within AI models, exploiting the ˲ Ensure that the local framework and hardware (for example, CPU vs. GPU) sup-
deserialization process in Python. port the model’s layers, activation functions, and other components.
Pickle is a Python-specific serializa- Customize and ˲ Adapt the model to a specific dataset using fine-tuning by continuing training on
tion format for AI models. Still, the fine-tune the data. Modify hyperparameters such as learning rate, batch size, or optimizer
to better suit the training dataset or objectives.
inherent security flaw for trusting the
deserialized data allows the execution Test and optimize ˲ Run tests to ensure the model behaves as expected in the local environment.
of embedded code, creating a serious These tests include validating the model’s accuracy, performance, and response
to operational parameters—including rare and unexpected inputs.
security risk. Researchers demonstrat- ˲ Optimize the model for faster inference, lower memory usage, or deployment
ed how a malicious pickle file could on edge devices. Tools such as TensorFlow Lite, ONNX Runtime, or PyTorch’s
execute arbitrary code during deseri- TorchScript can help with optimization.
alization using the sleepy pickle attack Save the model ˲ Once imported and possibly modified, save the model in a local directory with a
technique.22,23 Listing 1 highlights the transparent versioning scheme that enables tracking changes and reverting to
basic code of a pickle file embedded previous versions if necessary.
with malicious code. Deploy the model ˲ If the model is intended for production, deploy it using appropriate tools and
In the code presented in the listing, frameworks. This involves setting up inference pipelines and scaling strategies.
the pickle.loads() function is called, Inference pipelines cover structured steps an AI model follows to process input
data and generate predictions. Scaling strategies ensure the system handles the
and the embedded malicious com- load by dynamically allocating resources for optimizing inference operations.
mand (os.system ("rm -rf /")) is executed,
causing catastrophic damage, such as
deleting files on the system. When an Listing 1. Example code highlighting malicious payload in a pickle object.
AI model is executed using the mali-
cious pickle file, the embedded code import pickle
# This is a destructive command
runs in that environment without the
malicious _ code = ‘import os; os.system(“rm -rf /”)’
user’s knowledge. In other more prom- malicious _ pickle = pickle.dumps(malicious _ code)
inent attack scenarios, attackers can # Simulating deserialization (execute the malicious payload)
distribute pre-trained AI models and pickle.loads(malicious _ pickle)
users might unknowingly download

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 65
research and advances

Table 2. Threat model­—risks and impacts of malicious AI model. bined, can reveal private information.
An attacker could query the model re-
Threat Category Techniques Description peatedly to extract this sensitive data
Development Embedding malicious Insert back doors or vulnerabilities into software using data exfiltration tactics, leading
environments as code during the development process to gain unauthorized to a data breach. Considering real-
attack launchpads access to systems.
world scenarios, attackers can use
Trojanized packages Inject arbitrary code into packages and libraries to model inversion attacks29 to exfiltrate
compromise the products.
sensitive data from AI models trained
Dependency hijacking Substitute legitimate dependencies with malicious using large datasets containing con-
versions to execute unauthorized operations.
fidential information. These attacks
Dependency chaining Exploit transitive or indirect dependencies to introduce allow adversaries to reconstruct the
malicious code into software systems.
data by analyzing the model’s out-
Dependency confusion Exploiting naming conventions or dependency- puts. Researchers have demonstrated
resolution mechanisms to substitute legitimate
components with malicious alternatives.
attacks on AI models that used per-
sonal data,28 where researchers could
Data theft Source code theft Extracting code constructs specific to repositories to
discover security vulnerabilities.
reverse-engineer sensitive information
from the model’s outputs.
Data exfiltration Exfiltrate sensitive information from development
environments, such as API keys, credentials, and
AI model with back-door functional-
configuration files. ity. A user deploys an AI model to classi-
Disrupting code Subverting software Sabotage software builds by injecting errors or
fy images or texts, assuming it is a stan-
build and pipeline build operations modifying build scripts, causing delays and disruptions dard classification model. The model
operations in the release cycle. usually behaves under most conditions.
Circumventing Exploit vulnerabilities in continuous integration/ However, it contains a back door—a
deployment operations continuous delivery (CI/CD) pipelines to introduce specific input pattern (for example, a
malicious code or alter deployment processes, particular word sequence in text or a
leading to compromise.
small overlay in images) that triggers
Subverting the Injecting unauthorized Injecting unauthorized code that alters the behavior malicious behavior, such as always out-
integrity of software code of software products leads to data breaches,
unauthorized data manipulation, or denial of service. putting a particular class or executing
unintended code. The attackers could
Data poisoning and Corrupting the training data used by AI/ML models
corruption can cause them to behave unpredictably or produce exploit this back-door functionality to
incorrect outputs. manipulate the model’s behavior, po-
Dynamic code loading Load and execute malicious code on compromised tentially causing the system to misclas-
systems from remote servers. sify inputs or execute harmful actions.
Security controls Code obfuscation Apply obfuscation techniques to disguise payloads to This underscores the potential for sig-
evasion bypass existing security defenses. nificant harm from such malicious AI
Polymorphic code Generate code variants using polymorphism to bypass models. In the real world, malicious
detection mechanisms. AI models19 circumvent software secu-
rity. Researchers have demonstrated
that attackers could bypass deployed
compromised models serialized as ing in undetected or unnoticed threats. security protocols by exploiting a hid-
pickle files. For instance, by poisoning the training den back door embedded in model
AI model poisoning. A user deploys data fed to an AI-based malware detec- artifacts. In addition, attackers can
a pre-trained AI model updated peri- tion system, attackers can cause the activate the embedded back door to
odically with new data (for example, a model to misclassify malicious code as execute malicious behavior by trigger-
recommendation system or fraud-de- benign software. An example was dem- ing crafted inputs14 passed to malicious
tection model). An attacker injects ma- onstrated in a study5 where research- AI models. For example, a back door
licious data into the training process ers manipulated an AI-based classifier embedded into an AI-based authenti-
by tampering with the data sources, to mis-identify malware samples as cation system could be triggered by a
causing the model to learn incorrect harmless files, exploiting the model’s specific user-input pattern to grant un-
patterns. For instance, a model might trust in manipulated data. authorized access, bypassing standard
recommend harmful content or fail to AI model allowing data exfiltration authentication checks.
detect fraudulent transactions. This through model outputs. A user loads a The types of malicious AI models
could result in reputational damage, sentiment analysis model from an un- presented above underscore the im-
financial losses, or the spread of harm- trusted source. The model has been portance of sourcing AI models from
ful content, underlining the urgency trained or designed to subtly encode trusted providers, thoroughly validat-
of addressing this issue. For example, sensitive information from the train- ing and testing them before deploy-
the attackers execute model-poisoning ing data (for example, user data) into ment, and continuously monitoring
attacks to manipulate AI models in its outputs. For instance, the output their behavior in production environ-
cybersecurity systems, leading to mis- might include subtle variations in ments. This ongoing vigilance is cru-
classifying malicious code and result- probability scores that, when com- cial to mitigate these risks, as threats

66 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


research and advances

can evolve over time. However, this information, such as email addresses, tational harm, leading to loss of trust
article focuses mainly on pre-trained credit card numbers, or personally and credibility. In addition, organiza-
AI models with malicious code embed- identifiable information (PII). Upon tions affected by malicious AI models
ded in them. identifying sensitive data, the payload may inadvertently violate data pro-
Threat classification. First, we must encodes and sends the collected data tection regulations and compliance
understand the threat classification to an external command and control standards. Understanding the threat
associated with malicious AI models (C&C) server. The payload could use a classification helps develop targeted
hosted on the AI development plat- covert HTTP channel, DNS tunneling, defenses and enhance the overall resil-
form. Threat classification enables or other data-exfiltration tactics. Since ience of cybersecurity controls against
an understanding of the mode of op- the payloads are integrated into an sophisticated malicious AI models.
eration and the impact of malicious AI model and the exfiltration is done Unearthing the attack payloads. At-
AI models. The goal is to dissect the stealthily, the data theft can go unno- tack payloads refer to malicious data or
significant threats malicious AI mod- ticed for an extended period. instructions delivered to a target sys-
els pose to supply-chain operations’ in- Organizations compromised by tem for gaining unauthorized access
tegrity, security, and efficiency. Table 2 malicious AI models can suffer repu- and modifying critical files present in
highlights the details.
Malicious AI models erode trust in Table 3. Potential attack payloads served by malicious AI models.
the software supply chain, leading to Payload Type Description
uncertainty and reduced confidence
Reverse shell A reverse shell is a network connection method attackers use to gain remote ac-
in software components or vendors. cess to a target system. Reverse shell allows attackers to bypass firewalls and
Next, we present several examples that security measures that block incoming connections, as many networks allow
highlight the threats posed by mali- outbound connections by default. On successful connection, the attacker can
cious AI models if not vetted securely: execute commands on the victim’s machine.
˲ A malicious AI model can disrupt Software object Software object hooking33 involves intercepting and manipulating the normal
code build and pipeline operations. hooking execution flow within a software application by attaching malicious code to
software objects or functions. For example, malicious code can hook into system
Once loaded in a CI/CD environment, APIs or application functions to capture sensitive data, such as passwords,
the AI model containing the malicious keystrokes, or network traffic; alter the functionality of legitimate software; or
payload could silently alter environ- bypass security mechanisms. Hooking enables persistent and stealthy control
over the compromised system.
ment variables, dependencies, and
pipeline configuration, introducing Unauthorized file Unauthorized file read/write is the ability of the malicious payload to access,
errors into the build process and re- read/write modify, delete, or create files on a compromised system without the user’s
permission. The attacker can alter system files or application data to disrupt
sulting in failed builds. It could also operations, cause data corruption, or implant additional malicious code,
change the version of a critical library facilitating further system exploitation.
to an insecure one, causing the appli- Beacon and pingback Beacon15 and pingback are techniques to maintain communication with an
cation to malfunction or exposing it to attacker’s command and control (C&C) server. A beacon is a signal sent out
known vulnerabilities. by the infected system at regular intervals to inform the C&C server that it is
˲ A malicious AI model can subvert still active and awaiting further instructions. Pingback is a response from the
malware to a query or command from the C&C server, confirming receipt and
software integrity by executing ac- execution of instructions. These communication methods allow attackers to
tions that alter the software’s intended manage running malicious code.
functionality or security. For example, Arbitrary code Arbitrary code execution32 refers to exploiting vulnerabilities in an application,
when the malicious AI model is inte- execution operating system, or network to execute unauthorized code. The attacker
grated into a software application, the can execute the selected commands, which often leads to data theft, system
corruption, or the installation of additional malicious code.
hidden payload (script) subtly alters
the critical algorithms to skew results Data deserialization The attack payload exploits the deserialization13 process of data structures
within applications to serialize data back into its original object form.
or weaken encryption methods, thus The attacker embeds malicious code as the deserialized object from the
compromising the software’s reliabil- compromised system to gain unauthorized access, escalate privileges, and
ity and security. In addition, the pay- compromise system integrity.
load can tamper with the software’s Back door The attack payload deploys covert methods that bypass normal authentication,
update mechanism, allowing the at- impersonate users, and circumvent security controls within software, allowing
tacker to inject further malicious up- unauthorized access to the system. Back door can trigger harmful behaviors
under predefined conditions, posing significant security risks.
dates or prevent legitimate updates
from being applied, leaving the soft- Downloader The attack payload downloads and installs additional harmful malicious
code onto a compromised system. It typically operates by stealth, retrieving
ware vulnerable to exploitation. further malicious payloads from remote servers, which can include spyware,
˲ A malicious AI model, housing an ransomware, or other types of malware, thereby amplifying the impact of the
attack payload, can execute data theft initial infection.
by embedding code that triggers un- Malicious system Malicious system updates18 involve attackers distributing fake or altered updates
der specific conditions to perform data updates that modify system configurations and install additional payloads. These
exfiltration from the system where the updates appear legitimate, often mimicking official software updates, making
detection difficult.
model is deployed. The payload dis-
creetly scans the text data for sensitive

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 67
research and advances

the operating system, including appli- ecutes system-level commands on the


cations. These payloads can take vari- host machine when model loading oc-
ous forms, depending on the specific curs. For example, the payload could
attack vector and the attacker’s objec- be designed to execute a command
tives. Here, we focus solely on loading
and executing various attack payloads Embedding that downloads additional malware,
alters system configurations, or exfil-
when malicious AI models are im-
ported from the AI infrastructure plat-
malicious payloads trates data to an external server. Us-
ers must realize that the AI model they
forms to the developers’ environment. in AI models is have integrated into their system could
First, we focus on the potential attack
payloads that malicious AI model serv-
a very effective carry a hidden threat, and their role in
preventing potentially severe security
ers execute on the underlying machine strategy because breaches is paramount. Understand-
where the AI models are deployed, as
Table 3 shows.
AI models are often ing the nature and characteristics of
these attack payloads is essential for
Several real-world and advanced shared and reused security practitioners to detect, pre-
cyberattacks have used the payloads
listed in Table 3 in different capacities. without thorough vent, and mitigate security threats
within their organizations effectively.
The SolarWinds supply-chain attack1 inspection. Attack flow model. An attack flow
used payloads such as back doors, model decomposes and visualizes the
beacons, and others to execute com- threat’s sequence3 of actions. It is es-
mands, perform data exfiltration, and sential because it provides a compre-
communicate with the C&C server. The hensive understanding of the attack’s
Operation ShadowHammer25 supply- structure and execution, enabling se-
chain attacks are another example, in curity teams to anticipate, detect, and
which attackers compromised ASUS mitigate threats more effectively. At
Live Update Utility, embedding mal- this point, dissecting the attack flow
ware in the software update, which model is crucial, as it highlights how
was then distributed to approximately malicious AI models serve the different
half a million ASUS users, targeting attack payloads. The figure describes
specific MAC addresses. Consider the the malicious AI model execution host-
Kingslayer attack,7 in which attackers ed on the AI platform repository.
embedded malicious code into a third- The attack flow model can be bro-
party software used by a global IT ser- ken down into four steps:
vices firm; the code was distributed ˲ Step 1. The adversary creates a
through the IT firm’s updates, grant- malicious AI model by embedding
ing attackers privileged access to client unauthorized code in the model file
systems. All these real-world attacks and then hosting the AI model on the
highlight the gravity of the problem of repository hosted on the AI platform.
securing software supply chains. The adversary can select any AI model
Embedding malicious payloads in type, such as text classification, named
AI models is a very effective strategy be- entity recognition, machine transla-
cause AI models are often shared and tion, text generations, and others, to
reused without thorough inspection. mimic the behavior of an actual AI
It allows attackers to gain unauthor- model. The goal here is to design a ma-
ized and complete control over systems licious AI model that is hard to detect
that load these models. We discuss sev- and follows a level of stealthiness. Af-
eral examples. First, an attacker could ter the malicious AI model has been
inject a reverse shell payload in the successfully hosted and the AI plat-
model. When users download and load form’s inherent scanning feature fails
this model into their environment, the to detect it, the AI model is potentially
payload could be triggered, opening a ready to be consumed by users.
connection back to the attacker’s serv- ˲ Step 2. The user imports the mali-
er. With that, the attacker can execute cious AI model and trusts it because it
commands remotely on the victim’s is hosted on the AI platform without
machine, potentially accessing sensi- risks. In the figure, it is worth noting
tive data, manipulating system files, or two code identifiers, c1 and c2, which
spreading malware across the network. can be extended with real-world code.
Second, an attacker embeds a a. c1 pseudocode: Shows that the pre-
direct-command execution payload trained malicious AI model is loaded
within the model. The payload ex- into the pipeline.

68 COMMUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


research and advances

Figure. Attack flow highlighting malicious AI models execution.

AI model (pytorch,
tensorflow) type: text classification, {from model_base_library import pipeline
named entity recognition, c1 p_handle = pipeline(”model_type”,
machine translation, and text generation model = “malicious_ai_model”)}

c2
AI community {from model_base_library import AuthT,
platform AutoModelforCasualLM
t_handle = AuthT.from_pretrained(”mal_ai_model”)
m_handle = AutoModelforCasualLM.from_pretrained(”mal_ai_model”)}
Step 2:
User imports
Pre-trained AI models repository the malicious
AI model from
the repository model file:
Step 3: User calls the load function
deserialization and
and executes the model file
decode byte classes
containing payload (serialized data)

Step 1: Step 4:
Adversary Compromised system
creates and connects to the C&C infrastructure
Embedded payload
uploads to perform unauthorized operations
execution leads to
the malicious system compromise
AI model
Malicious operations: command
(mal_ai_model)
and control (C&C), data exfiltration
to the repository

b. c2 pseudocode: Highlights that the Table 4. Limitations of SCA and SBOM in detecting malicious AI models.
pre-trained malicious AI model is im-
ported directly using the authorization SCA and SBOM
Limitations Description
token for use.
˲ Step 3. Once the user deploys the Focus on traditional Designed to analyze traditional software components such as libraries, depen-
software components dencies, and packages and do not adequately address the complexities and
AI model by calling the load func- nuances of AI models involving extensive datasets, intricate algorithms, and
tion, the data (command instructions) unique training processes.
stored in the model file as serialized
Lack of behavioral Focus on the static aspects of software, such as versions, licenses, and known
data is deserialized, and the system analysis and anomaly vulnerabilities, whereas AI models require dynamic analysis to understand
executes the unauthorized payload. At detection their behavior, biases, and potential for malicious actions.
this point, the user’s system is compro- No visibility into Do not include information about the datasets used for training AI models, so
mised because the AI model drops the training data and they fail to detect the compromised or biased training data that can lead to the
attack payload, successfully subverting process malicious use of AI models.
the system’s integrity. No support for model Do not offer tools to understand the internal logic and the decision-making
˲ Step 4. The compromised systems interpretability and processes within AI models, which is essential to detect and mitigate malicious
explainability behavior of AI models.
connect to the adversary-controlled
C&C infrastructure and start nefari- Lack of integration Lack integration with AI-specific security frameworks and practices, such as
ous operations. These include exfil- with AI-specific adversarial testing, model validation, and continuous monitoring for anomalous
security measures behavior to assess the integrity and security of both the AI models and their
trating sensitive data such as cre- operational environments.
dentials, financial information, or
Evolving AI threat Support the detection of threats and vulnerabilities listed in public databases,
intellectual property, and transmit- landscape such as common vulnerability exposures (CVEs).34 As AI-specific novel attacks
ting it back to the attacker. The mal- and vulnerabilities emerge, they require continuous and adaptive security
ware may also receive commands to measures.
execute arbitrary code, leading to
further system compromise, or lat-
eral movement within a network. It Security Defenses: source libraries and packages con-
can also be used to disable security Limitations of SCA and SBOM taining unauthorized code. SCA is de-
defenses, encrypt data for ransom Several security solutions, includ- ployed within DevSecOps pipelines to
(ransomware attacks), or even lever- ing Software Component Analysis perform component analysis of open
age compromised systems for launch- (SCA)26 and Software Bill of Materi- source libraries and packages at a
ing broader attacks, thereby causing als (SBOM),27 have been introduced granular level to check for security is-
widespread disruption and damage. to handle the risks imposed on open sues in an automated manner. SBOM

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 69
research and advances

Table 5. Approaches to strengthen software supply-chain security.

Category Details

Technical measures ˲ Use secure and robust formats to store and load model weights compared to traditional formats. Secure model formats, such as
Hugging Face’s safetensors,25 TensorFlow SavedModel, MLflow, and Model Zoo formats (TFHub, PyTorch Hub) help address concerns
related to model security by preventing arbitrary code execution during deserialization, which is a risk with other formats that use
Python pickling.
˲ Use specialized tools to analyze AI models for anomalies, back doors, and hidden layers. A tool like CertifAI8 performs comprehensive
security assessments to help identify unusual behaviors or structures within the model. The tool helps evaluate and ensure AI models’
fairness, robustness, and explainability.
˲ Implement rigorous validation processes using robust AI model validation11 strategies to detect and prevent the deployment of mali-
cious AI models. Employ security assessment and testing tools such as adversarial-robustness-toolbox30 and Microsoft’s Counterfit
tool21 to simulate attacks, test robustness, and ensure that AI systems are resilient against adversarial threats.
˲ Use automated content-filtering4 tools in the codebase to scan and filter out harmful content generated by AI models. Use decompilers,
static analyzers, and bytecode rewriters to extract payloads for analysis.
˲ Employ anomaly-detection24 techniques to uncover behavior patterns that indicate malicious activity by the AI models.
˲ Implement integrity checks to verify the model was not tampered with during download or transfer.

Organizational ˲ Implement a user-verification process to check the user’s identity to help deter threat actors from uploading harmful models.
measures ˲ Develop a reputation system that rewards users who contribute securely to AI development.
˲ Design easy-to-report strategies to encourage the public community to report suspicious models or activities.
˲ Offer incentives, such as bug bounties12 or recognition within the community, to users who identify and report vulnerabilities or malicious
models.

Policy and ˲ Develop documentation for all hosted AI models, including their intended use, training data, and risks or limitations.
governance ˲ Publish regular transparency reports highlighting details on preventing malicious activity and control enforcement.
˲ Ensure platform policies follow relevant regulations and legal requirements concerning data protection and AI ethics.
˲ Include clauses in user agreements that hold users accountable for the misuse of their AI models.

Education and ˲ Educate users about the risks of malicious AI9 and best practices for secure model development and deployment.
awareness ˲ Train users on ethical considerations in AI development, emphasizing the importance of creating robust AI models.
˲ Communicate the AI platform’s policies to prevent the hosting of malicious AI models.
˲ Keep users informed about new security measures, policy changes, and any incidents identified related to malicious AI.

Information sharing ˲ Work with other AI platforms and industry groups31 to share information about emerging AI threats and best practices for hosting AI
models.
˲ Establish relationships with law enforcement2 and regulatory bodies to facilitate AI threat investigation and reporting.

provides a detailed representation of ing monitoring to detect and mitigate can be employed in a hybrid way to
components, libraries, and dependen- malicious activities. mitigate the impact of malicious AI
cies used to create a software applica- models hosted on AI infrastructure
tion, providing transparency about Solutions and Recommendations platforms.
the composition of the software. SCA Circumventing the impact of mali- By implementing these compre-
in conjunction with SBOM are used cious AI models requires a shared hensive measures, organizations can
as hybrid solutions to reduce the risk responsibility model of ensuring the significantly reduce the risk of host-
posed by malicious open source li- sanctity of AI models is validated be- ing malicious AI models, protect their
braries and packages. However, SBOM fore the actual use. Platforms hosting software supply chains, and maintain
is still in its early stages and has not AI models must provide inherent secu- the integrity of their services.
been widely adopted. In addition, SCA rity features, such as malicious code
and SBOM have inherent limitations scanning, vulnerability detection, and Future Challenges
to detecting malicious AI models be- risk identification of a hosted AI model Going forward, we must address sev-
cause these models are custom gener- in an automated manner to ensure eral challenges associated with the se-
ated, not designed on traditional soft- only secure AI models are served. On curity of malicious AI models. Design-
ware design. Table 4 discusses several the same note, developers (consum- ing AI-SBOM to maintain the security
limitations of SCA and SBOM. ers) should also perform additional of the AI software supply chain will be
We believe that the SCA and SBOM security checks on their end to verify challenging for several reasons. First,
solutions are valuable for traditional that the imported AI model from the AI systems rely on a complex web of de-
software security but fall short in ad- hosting platform is secure and can be pendencies, including numerous open
dressing the unique challenges posed consumed in the development envi- source libraries, proprietary software,
by malicious AI models. Ensuring the ronment. Considering software sup- and pre-trained models, making it dif-
security of AI systems requires spe- ply-chain security, the onus is on both ficult to track all components accurate-
cialized techniques and tools tailored the hosting providers and consumers ly. Second, AI models and algorithms
to the complexities of AI, including to reduce risks imposed by malicious are frequently updated, retrained, and
dynamic behavioral analysis, robust AI models. Table 5 discusses several fine-tuned, resulting in continuous
training data validation, and ongo- solutions and recommendations that enhancements of components that

70 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


research and advances

must be consistently documented in user’s environment. The threat model arXiv, (2024); https://tinyurl.com/22e9hmxv.
13. Gauthier, F. and Bae, S. Runtime prevention of
the SBOM. Third, AI applications often we have presented clarifies the risks deserialization attacks. In Proceedings of the ACM/
combine proprietary code with open of malicious AI models and examines IEEE 44th Intern. Conf. on Software Engineering:
New Ideas and Emerging Results (2022).
source software, complicating the the workflow of the attack’s execu- 14. Gu, T. et al. BadNets: Identifying vulnerabilities in
tracking of licensing, versioning, and tion. We need a multifaceted approach the machine learning model supply chain. arXiv;
https://tinyurl.com/2xpq38na
vulnerabilities across both domains. to combat the threat of malicious AI 15. Hu, X. et al. BAYWATCH: Robust beaconing detection
Fourth, unlike traditional software, models in the software supply chain. to identify infected hosts in large-scale enterprise
networks. In Proceedings of the 46th Ann. IEEE/
AI systems include unique elements Organizations must implement robust IFIP Intern. Conf. on Dependable Systems and
such as training datasets, model archi- verification processes for third-party Networks (2016).
16. Hugging Face Safetensors;
tectures, and hyperparameters, which components, including thorough code https://tinyurl.com/2785p7ml
17. John, M. et al. AI deployment architecture:
are not typically covered by standard reviews and static/dynamic analysis Multi-case study for key factor identification. In
SBOM practices. to detect anomalies. As AI technolo- Proceedings of the 27th Asia-Pacific Software
Engineering Conf. (2020), 395-404.
Moreover, we strongly advocate that gies become more pervasive, we must 18. Lindorfer, M. et al. Lines of malicious code: Insights
AI model hosting platforms must ur- develop and deploy AI-based defensive into the malicious software industry. In Proceedings
of the 28th Ann. Computer Security Applications
gently design robust inherent security and resilient mechanisms that can Conf. (2012).
mechanisms to verify the integrity of identify and neutralize malicious AI 19. Liu, Y. et al. Trojaning attack on neural networks. In
Proceedings of the Network and Distributed Systems
AI models and detect model tamper- activities. More collaborative efforts Security Symp. (2018).
ing, including hijacking using mali- across the industry to share threat in- 20. Matsui, B.M.A. and Goya, D.H. MLOps: A guide
to its adoption in the context of responsible AI.
cious versions. There is an immediate telligence and best practices are also In Proceedings of the 1st Workshop on Software
need to create secure data-training crucial in addressing this evolving Engineering for Responsible AI (2022).
21. Microsoft’s counterfit tool;
protocols and monitoring for data- threat landscape. https://tinyurl.com/23xul8s5
training processes to increase the ro- 22. Milanov, B. Exploiting ML models with pickle file
attacks: Part 1 (2024); https://tinyurl.com/2ajjngcl.
bustness and resiliency of the training Acknowledgments 23. Milanov, B. Exploiting ML models with pickle file
pipeline of AI models against the em- We thank the handling editor and attacks: Part 2 (2024);
https://tinyurl.com/23ms8qs5.
bedding of back doors or biases dur- anonymous reviewers for their valu- 24. Ohm, M. and Stuke, C. SoK: Practical detection of
software supply chain attacks. In Proceedings of
ing training, which ultimately impacts able comments, which helped us the 18th Intern. Conf. on Availability, Reliability and
the sanctity of AI models. The trust- improve this article's content, orga- Security (2023).
25. Rashid, F. Operation ShadowHammer exploited
worthiness of AI models necessitates nization, and presentation. Sherali weaknesses in the software pipeline. IEEE
securing the entire training process. Zeadally was partially supported by a Spectrum (2019); https://tinyurl.com/22rm2h2n
26. Sabetta, A. et al. Known vulnerabilities of open
Organizations must also expedite the Distinguished Visiting Professorship source projects: Where are the fixes? IEEE Security
development of tools and frameworks from the University of Johannesburg, & Privacy 22, (2024).
27. Stalnaker, T. et al. BOMs away! Inside the minds
that provide greater transparency and South Africa. of stakeholders: A comprehensive study of bills of
explainability, making auditing and materials for software systems. In Proceedings
of the IEEE/ACM 46th Intern. Conf. on Software
understanding AI model behavior easi- References Engineering (2024).
1. Alkhadra, R. et al. Solar winds hack: In-depth analysis
er. Controlling access to AI models and and countermeasures. In Proceedings of the 12th
28. Shokri, R. et al. Membership inference attacks
against machine learning models. In Proceedings of
their underlying data is paramount to Intern. Conf. on Computing Communication and the 2017 IEEE Symp. on Security and Privacy (2017).
Networking Technologies (2021). 29. Seshia, S.A. et al. Toward verified artificial
prevent unauthorized usage and po- 2. Berk, R. Artificial intelligence, predictive policing, intelligence. Commun. ACM (2022);
tential leakage of sensitive informa- and risk assessment for law enforcement. Ann. Rev. https://tinyurl.com/28o4gmwl
Criminology (2021); https://tinyurl.com/2aocr487 30. Trusted-AI, adversarial robustness toolbox;
tion, including data exfiltration. Or- 3. Beck, D. MITRE—Attack flow beyond atomic https://tinyurl.com/yyp9ypn8
ganizations should develop advanced behaviors. In Proceedings of the Ann. Forum of 31. Wing, J. Trustworthy AI. Commun. ACM (2021);
Incident Response and Security Teams Conf. (2022); https://tinyurl.com/29etpo9h
and sophisticated access-control https://tinyurl.com/26zq85dj 32. Xiao, F. et al. Understanding and mitigating remote
mechanisms that dynamically man- 4. Boughton, L. et al. Decomposing and measuring trust code execution vulnerabilities in cross-platform
in open-source software supply chains. In Proceedings ecosystem. In Proceedings of the 2022 ACM
age permissions based on user roles of the ACM/IEEE 44th Intern. Conf. on Software SIGSAC Conf. on Computer and Communications
Engineering: New Ideas and Emerging Results (2024).
and contexts. Finally, we strongly rec- 5. Biggio, B. and Roli, F. Wild patterns: Ten years after
Security (2022).
33. Yin, H. et al. HookFinder: Identifying and
ommend establishing standardized the rise of adversarial machine learning. Pattern understanding malware hooking behaviors. In
Recognition (2017).
security practices and strict adherence 6. Cohen, D. Data scientists targeted by malicious
Proceedings of the Network and Distributed System
Security Symp. (2008).
to defined guidelines and regulations hugging face ML models with silent backdoor (2024); 34. Zhao, L. et al. Software composition analysis for
https://tinyurl.com/23s8sfux
to create a unified and consistent se- 7. Clancy, C. et al. Deliver uncompromised: Securing
vulnerability detection: An empirical study on Java
projects. In Proceedings of the 31st ACM Joint
curity framework for AI models. This critical software supply chains. MITRE report (2021); European Software Engineering Conf. and Symp. on
https://tinyurl.com/28gpgfaq
is not just a recommendation but a ne- 8. Cortex certifAI; https://tinyurl.com/276wdqhq
the Foundations of Software Engineering (2023).

cessity in the face of evolving AI secu- 9. Cotroneo, D. et al. Vulnerabilities in AI code


generators: Exploring targeted data poisoning attacks. Aditya K. Sood is vice president of security engineering
rity threats. In Proceedings of the 32nd IEEE/ACM Intern. Conf. on and AI strategy at Aryaka Networks, Santa Clara, CA,
Program Comprehension (2024). USA.
10. Datta, P. et al. A user-centric threat model and
Conclusion repository for cyberattacks. In Proceedings of the Sherali Zeadally is a professor at the University of
37th ACM/SIGAPP Symp. on Applied Computing. ACM Kentucky, Lexington, KY, USA and is also an Eminent
In this research, we have presented a Scholar at Kyung Hee University, South Korea.
(2022), 1341–1346.
significant challenge of malicious AI 11. Fredrikson, M. et al. Model inversion attacks
models being consumed in software that exploit confidence information and basic
countermeasures. In Proceedings of the 22nd ACM
This work is licensed under a
supply chains. Integrating malicious SIGSAC Conf. Computer and Communications
Creative Commons Attribution
Security (2015).
AI models without verification leads 12. Gal-Or., E. et al. Merchants of vulnerabilities: How
International 4.0 License.
to unauthorized operations in the bug bounty programs benefit software vendors. © 2025 Copyright held by the owner/author(s).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 71
A Transformative Model
for Open Access
• Unlimited Open Access publishing for all corresponding authors in
ACM’s magazines, conference proceedings and journals

• Unlimited read access for all authorized users to the full-text


contents of the ACM Digital Library

• Default CC-BY author rights on all accepted research articles


(multiple CC options available to choose from)

Impact on the Research Community

• Articles receive 2-3x the number of full-text article downloads


• Articles receive up to 70% more citations
• Authors are immediately compliant with the vast majority of
Public and Private Research Funder Open Access Mandates
• Authors retain the copyright of their published article

ACM is committed to an Open Access future.


ACM Open is how we will get there.

Visit libraries.acm.org/acmopen
Contact [email protected]
research highlights
P. 74 P. 75
Technical ‘Upon This Quote I Will
Perspective
When Proofs Build My Church Thesis’
Meet Programs: By Pierre-Marie Pédrot
An Extension of
Dependent Type
Theory with
Church’s Thesis
By Christine Paulin-Mohring

P. 84 P. 85
Technical An Automata-Based Framework
Perspective
A Symbolic for Verification and Bug
Approach to Hunting in Quantum Circuits
Verifying Quantum By Yu-Fang Chen, Kai-Min Chung, Ondřej Lengál, Jyun-Ao Lin,
Systems Wei-Lun Tsai, and Di-De Yen
By Parosh Aziz Abdulla

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 73
research highlights
DOI:10.1145/ 3 73 1 442

Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3715707 rh

When Proofs Meet Programs:


An Extension of Dependent Type Theory
with Church’s Thesis
By Christine Paulin-Mohring

W H AT I S A mathematical proof? It can The representation of proofs by malizing decidability results. The
be described as a sequence of logical expressions corresponding to func- question of whether we can add this
steps and calculations that serve as tional programs is known as the Cur- property to a system like Agda is deli-
evidence of the correctness of a state- ry-Howard correspondence. This is an cate. It is necessary not only to ensure
ment. The steps must follow rules that essential tool for understanding the the coherence of the resulting system
are accepted as correct by the com- dynamics of proofs (very useful in au- but also to maintain its good evalua-
munity. One might think there is a tomated theorem proving to restrict tion properties.
set of universal rules. However, this the search space). It is also the foun- The paper provides a very elegant
is far from being the case. Gödel’s in- dation of several proof assistants, and general solution to this prob-
completeness theorem tells us that no in particular, Coq/Rocq, Agda, and lem. Martin-Löf’s theory is extended
“reasonable” system will allow us to Lean. The interactive proof process with three constants to represent the
prove everything that is “true.” ultimately produces an explicit term code of functions, the number of cal-
Instead of searching for ever more to represent the proof, which is then culation steps in the evaluation on
powerful theories, one can, on the verified a posteriori by a program act- an argument, and the proof that the
contrary, restrict the rules of the game ing as an uncompromising reviewer. evaluation returns the expected value.
to extract more information from the The same functional framework is Reduction rules are proposed for these
proofs. Let’s look at an example. If a used to represent calculations com- new objects, preserving the property
formula ∃ ​ x, P(x)​is true in a certain monly used in mathematical model- that an expression without free vari-
“world,” then, by definition, there ex- ing and proofs. ables representing a natural number
ists an element ​t​in the model such that The resulting languages form privi- can be evaluated to an effective value.
the formula ​P(x)​is true for ​x = t​. What leged frameworks both for formal- The meta-theoretical properties
happens at the proof level? If we have izing mathematics and for represent- of such a system are technical and
proven the formula ∃ ​ x, P(x)​, does there ing computational objects. They are complex. A major contribution of the
exist a “witness” ​t​and a proof of P ​ (t)​? grouped under the term “type theory” paper is to propose a completely for-
This property is characteristic of to distinguish them from the “set the- malized version of the definitions and
so-called “constructive” logics but is ory” traditionally claimed as the foun- results in the Coq proof assistant,
not satisfied by usual logic. Indeed, dation of mathematics. which offers an additional guarantee.
mathematicians classically do not dis- The accompanying paper, “Upon The text includes links to the formal-
tinguish between the property “there This Quote I Will Build My Church ized counterpart for each definition
exists x such that P” and its negative Thesis” by Pédrot, is set within Martin- and lemma, allowing for easy naviga-
version: “it is impossible for P to al- Löf’s type theory, which is the system tion between a general explanation
ways be false.” To illustrate the differ- underlying a proof tool like Agda. In and technical details.
ence, let’s consider a function f on nat- this theory, proofs and computations The paper remarkably illustrates
ural numbers. This function reaches a are described in the same functional the fine properties of proof systems
minimum point (otherwise, one could language. One can establish that any and their deep connections with the
build an infinite decreasing sequence functional expression from natural theory of computable functions. Type
starting from f​ (0 ) ). But this proof does numbers to natural numbers is a com- theory is both the subject of study
not give us any clue on how to actually putable function. We also have an im- and the tool that ensures the correct-
construct the minimum of any partic- portant property that any expression ness of the mathematical results and
ular function f. (without a free variable) representing a that facilitates their reuse in other
Some “natural” logics preserve the natural number will evaluate to a nu- contexts. It highlights the elegance
“constructive” nature of the existen- meral: 2, 4, 45000, and so on. and importance of this logical frame-
tial quantifier and thus guarantee the The question posed in the accom- work.
existence of witnesses associated with panying paper is whether we can inter-
existence proofs. These logics have a nalize in the logic the fact that “every Christine Paulin-Mohring (christine.paulin@universite-
paris-saclay.fr) is a computer scientist and professor at
strong connection with computer sci- represented function corresponds to Paris-Saclay University, Gif-sur-Yvette, Île-de-France,
ence, as the proofs will not only give us the code of a computable function.” France. She is best known for developing the interactive
theorem prover Coq.
a way to designate the witness but also This property, known as “Church’s
a way to compute it. Thesis,” is particularly useful for for- © 2025 Copyright held by the owner/author(s).

74 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


research highlights
DOI:10.1145 / 3 71 5 70 7

‘Upon This Quote


To view the accompanying Technical Perspective,
visit doi.acm.org/10.1145/3731442 tp

I Will Build My Church Thesis’


By Pierre-Marie Pédrot

ABSTRACT tended with two additional axioms: Markov’s principle (​𝖬𝖯​)


The internal Church thesis (​ 𝗖𝗧​) is a logical principle stating and Church’s thesis (​𝖢𝖳​).23
that one can associate to any function ​f​ : ℕ ​ → ℕ​ a concrete Of those two axioms, Markov’s principle is the simplest
code, in some Turing-complete language, that computes ​f.​ to state, as it is pretty much self-contained:
While the compatibility of 𝗖​ 𝗧​ in simpler systems has been
​​∀ f : ℕ → ℕ. ¬ ¬ ​(​​  ∃ n : ℕ. f n = 0​)​​  → ∃ n : ℕ. f n = 0.​​
long known, its compatibility with dependent type theory is
still an open question. Common alternative formulations include “a Turing ma-
In this paper, we answer this question positively. We chine that does not loop terminates” or “non-zero reals are
define “​𝗠𝗟𝗧𝗧​”, a type theory extending 𝗠 ​ 𝗟𝗧𝗧​ with quote apart from zero”. ​𝖬𝖯​is a sweet spot of semi-classical logic,
operators in which ​𝗖𝗧​ is derivable. We furthermore prove as it can be given a computational content preserving the
that “​𝗠𝗟𝗧𝗧​” is consistent, strongly normalizing, and en- witness property.7
joys canonicity using a rather standard logical relation Church’s thesis is somewhat more involved, as it requires
model. All the results in this paper have been mechanized the internal definition of a computation model in the log-
in Coq.a ic itself. Assuming a logic rich enough, this is tradition-
ally13 achieved by defining the decidable Kleene predicate​
1. INTRODUCTION 𝖳​and its associated primitive recursive decoding function​
“Calculemus!” With this word, Leibniz famously enjoined 𝚄 : ℕ → ℕ​. Assuming p ​ : ℕ​to be the syntactic code of some
the reader to compute. Contemporary logicians took this program in the chosen Turing-complete computation mod-
motto as a founding principle after the progressive discov- ​ : ℕ​some integer argument, and k​ : ℕ​some encoding of
el, n
ery of the proof-as-program correspondence. This major evaluation traces of the model, then ​​𝖳 ​(​​p, n, k​)​​​​ holds when-
breakthrough, also known as the Curry-Howard equiva- ever k​ ​is the trace of the fact that p
​ ​applied to n
​ ​evaluates to
lence, is the seemingly simple observation that proofs and some integer v​ ​. Furthermore, if indeed the predicate holds,
programs are the same object, in an essential way. then 𝚄 ​ k​should return the value ​v​.
One major offshoot of the Curry-Howard philosophical Assuming we have settled this computational parapher-
stance is Martin-Löf’s type theory (​𝖬𝖫𝖳𝖳​), the theoretical nalia, 𝖢 ​ 𝖳​is simply the internal statement
underpinning of several widely used proof assistants such
​​∀ f : ℕ → ℕ. ∃ p : ℕ. ∀ n : ℕ. ∃ k : ℕ. 𝖳 ​(​​p, n, k​)​​  ∧ 𝚄 k = f n.​​
as Agda, Coq, or Lean.16 In these systems, there is no formal
separation between proofs and programs, as they live in the Said otherwise, any function ​f​definable in the theory
same syntax and obey the same rules. This monistic credo is known to be computable by some concrete algorithm ​p​
turns 𝖬​ 𝖫𝖳𝖳​ into the quintessential intuitionistic founda- from within the theory.
tion of our modern times, blending logic and computation Contrarily to 𝖬 ​ 𝖯​, which is a consequence of excluded
into the very same cast. Moreover, in the wake of the Swed- middle, 𝖢 ​ 𝖳​is a very anti-classical axiom.23 Assuming very
ish tradition of neutrality, 𝖬
​ 𝖫𝖳𝖳​ does not pick a side in the weak forms of choice, it contradicts lesser variants of ex-
constructivist feud. It neither proves fan principles from the cluded middle. Similarly, it disproves choice-like principles,
Brouwerian band nor mechanistic axioms from the Markov- such as double negation shift. Finally, it also makes the log-
ian clique, and adheres to a strict Bishopian minimalism. If ic very intensional as it contradicts function extensionality,
set theory is the paradise of mathematics, then type theory under the same kind of weak choice assumptions.
truly is the paradise of constructive mathematics. The consistency of 𝖬 ​ 𝖯​and 𝖢
​ 𝖳​with regard to some logi-
Meanwhile, on the other side of the iron curtain, the cal system is typically proved via realizability, which is the
Russian constructivists led by Markov pushed the comput- intended model of the Russian dogma. Quite remarkably,
ability tenet to an extreme point, postulating that all math- Kleene’s seminal paper14 already proves that 𝖢 ​ 𝖳​is compat-
ematical objects were indeed algorithms. Their doctrine15 ible with Heyting’s arithmetic. For a more expressive theory,
is materialized by a foundational system that can be sum- the effective topos is a famous example of a realizability to-
marily described as a neutral Bishop intuitionistic logic ex- pos in which both principles hold.10
Considering the claim above that in 𝖬 ​ 𝖫𝖳𝖳​ proofs are pro-
a Development browsable at https://bit.ly/Pedrot-GitHub1. grams, it does seem a bit surprising that it is not biased to-
ward the Russian school. Surely it ought to be easy to convert​
𝖬𝖫𝖳𝖳​to the Markovian orthodoxy, for otherwise the Curry-
The original version of this paper was published in
Howard mantra would be but a misleading advertisement.
Proceedings of the 39 th Annual ACM/IEEE Symp. on
Let us survey the current status of each Russian axiom indi-
Logic in Computer Science.
vidually in the canon of dependent type theory.

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 75
research highlights

As we already hinted at, it is known that 𝖬 ​ 𝖯​is not deriv- returns an enumerable proposition, so that for most no-
able in 𝖬​ 𝖫𝖳𝖳​.4 Since it is a consequence of classical logic, it tions of proposition, namely ​𝙿𝚛𝚘𝚙​with singleton elimina-
holds in classical models such as the set-theoretical one,25 tion or ​𝚑𝙿𝚛𝚘𝚙​with unique choice, the use of ​∃​or ​Σ​ results
but it is also possible to add 𝖬 ​ 𝖯​to 𝖬
​ 𝖫𝖳𝖳​while retaining the in equivalent statements. We will thus always stick to a
computational properties through a form of delimited ex- ​Σ​-type for this quantifier. More problematic is the first ex-
ceptions.19 Note that, due to the minimalism of 𝖬 ​ 𝖫𝖳𝖳​, the istential quantifier, the nature of which leads to radically
alternative statements of 𝖬 ​ 𝖯​mentioned previously are not different worlds. We will call ​​𝖢𝖳​Σ​​​ (resp. ​​𝖢𝖳​∃​​​) the statement
equivalent in this setting.3 of Church’s thesis with a ​Σ​(resp. ​∃​) type as the first existen-
As for 𝖢
​ 𝖳​, we already stated that it is negated by classical tial quantifier.
models, and thus is not a consequence of ​𝖬𝖫𝖳𝖳​. By contrast As mere existence does not validate choice by default, ​​𝖢𝖳​∃​​​
with ​𝖬𝖯​, the compatibility of ​𝖢𝖳​with dependent type theo- is much closer to the traditional first-order setting. When ​∃​
ry is a much more contentious subject. To make things sim- is taken to live in the ​𝙿𝚛𝚘𝚙​universe of ​𝖢𝖨𝖢​, the relative ex-
pler, we will therefore focus on this single principle in this pressivity of ​​𝖢𝖳​∃​​​has been studied extensively in the setting
paper, and deliberately ignore Markov’s principle. The pro- of synthetic computatibility.5 An important remark is that
viso about phrasing of the statement mattering a lot is even the lack of choice prevents building an internal quoting
more paramount with ​𝖢𝖳​, which is the chief reason why the function ​​​(​​ℕ → ℕ​)​​  → ℕ​​that associates to some function its
problem and its answers are a matter of debate. In the re- concrete code. As already suggested before, this means that​​
mainder we will prove that ​𝖬𝖫𝖳𝖳​is indeed compatible with 𝖢𝖳​∃​​​does not necessarily contradict function extensional-
the strongest form of Church’s thesis usually agreed upon, ity. Actually, we can even go much further: In the case where
but for this we first need to explain what we actually mean propositions are identified with 𝚑 ​ 𝙿𝚛𝚘𝚙​s, ​​𝖢𝖳​∃​​​ turns out to
by these sibylline words. We dedicate the next section to a be compatible not only with 𝖬 ​ 𝖫𝖳𝖳​but also with full-blown
thorough exegesis of this topic. univalence.22 More generally and quite counterintuitively,
univalence is compatible with many principles that would
2. A COMPREHENSIVE ​𝗖𝗧​ SCAN make the hardcore Bishop-style intuitionist raise a suspi-
Contrarily to more archaic systems, 𝖬 ​ 𝖫𝖳𝖳​does not need a cious eyebrow, as long as they are squashed enough and
realizability interpretation to turn its proofs into programs. thus made computationally harmless.22,24
In some sense, it is already a realizability interpretation, Contrastingly, as ​Σ​-types come with intuitionistic (non)-
as ​𝖬𝖫𝖳𝖳​terms are literally bona fide programs. It should choice built-in, ​​𝖢𝖳​Σ​​​is the telltale of an extremely weird
therefore be very natural to add 𝖢​ 𝖳​to 𝖬
​ 𝖫𝖳𝖳​. mathematical realm. For starters, it immediately implies
As a matter of fact, as long as the context is empty, the the existence of a quoting function and breaks both func-
following rule is admissible tion extensionality and classical logic. The consistency of​​
𝖢𝖳​Σ​​​ with 𝖬
​ 𝖫𝖳𝖳​is an open problem that has been lingering
    
     ⊢M:ℕ→ℕ
​​_________________________________________
    ​​
⊢ 〈M〉 : Σp : ℕ. Πn : ℕ. Σk : ℕ. ​𝖳 ​(​​p, n, k​)​​​  × 𝚄 k = M n for a while and seems to be considered a difficult question
by experts.11,22 The best result thus far is the consistency of​​
where 〈​ M〉​is some term derived from ​M​in a systematic way. 𝖢𝖳​Σ​​​with a stripped-down version of ​𝖬𝖫𝖳𝖳​without the so-
Depending on the pursued goal, this process is variously called ξ​ ​ rule:
known in the type theory world as extraction18 or quota- Γ, x : A ⊢ M ≡ N : B
____________________________
​​  
  
   ​​
Γ ⊢ λx : A. M ≡ λx : A. N : Πx : A. B
tion.21 Obviously, a rule that is derivable for closed sequents
is not necessarily internalizable in the theory, so there is a The model used to prove this fact is a variant of Kleene re-
non-trivial gap to fill there. alizability tailored for dependent types. Briefly, in this mod-
An additional issue is that dependent type theories have el, terms are interpreted directly as their code in Kleene’s
various notions of existence. Typically, they contrast depen- first algebra, that is, a natural number, and the equational
dent sum types Σ ​ x : A. B​with existential types ​∃ x : A. B​. The theory in a context is defined as the closure of code equality
precise details depend on the exact theory considered, but by ground substitutions.
the general rule is that the former corresponds to actual, This representation makes the implementation of 𝖢𝖳​ ​​ Σ​​​
constructive evidence, while the latter stands for mere exis- trivial, as it boils down to the identity. Yet, for this to work, it
tence—that is, no computational content can be extracted is critical that two functions convertible in the model have
from this type. Such non-computational types are called literally the same code. This is exactly where the restriction
propositions, an umbrella term for mildly related concepts. to ground substitutions comes into play, as it will identify
The three most common instances of propositions are cap- functions with their concrete syntax. Unfortunately, the
tured by the realizability-driven 𝙿 ​ 𝚛𝚘𝚙​universe of 𝖢 ​ 𝖨𝖢​,18 the​ same restriction also destroys non-trivial higher-order
𝚑𝙿𝚛𝚘𝚙​subuniverse inspired by the univalent world,24 and equations and in particular invalidates the ξ​ ​ rule.
the ​𝚂𝙿𝚛𝚘𝚙​universe of strict propositions.6 Regardless of the Although we understand the difficulties experienced by
setting, Σ​ ​-types validate choice by construction through the authors of this paper and acknowledge their distinct
large elimination or projections, while existential types do goals, we consider that removing this congruence from​
not, in general. 𝖬𝖫𝖳𝖳​when precisely trying to implement a principle imply-
The arithmetic statement of 𝖢 ​ 𝖳​mentions two existen- ing the existence of a quoting function is unarguably throw-
tial quantifiers, hence we have a priori at least four pos- ing the baby with the bath water. Ubiquitary conversion un-
sible translations into ​𝖬𝖫𝖳𝖳​. In practice, the second one der λ​ ​-abstractions is a stepping stone of 𝖬​ 𝖫𝖳𝖳​for program

76 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


Figure 1. Syntax of ​𝗠𝗟𝗧𝗧​.

reasoning, so treating functions as black boxes is a no-go and error-prone, let alone when they contain bits of comput-
and the ​ξ​rule a sine qua non. ability. To keep the naysayer at bay, all proofs were mecha-
Given the strong relationship between quoting functions nized in the Coq proof assistant.
and metaprogramming, we should also mention some at-
tempts at making the latter a first-class citizen of depen- 4. BASIC TYPE THEORY
dent type theory. These systems are built with practical pro- Let us fix some conventions. Since we will be juggling quite a
gramming in mind rather than constructive mathematics, bit between levels, we will use a different font style to refer to
but the endeavours are similar enough that they are worth objects from the metatheory, with types in ​bold​ and type as-
highlighting. There is in particular a wealth of literature cription written in set-theoretic style ​x ∈ X​. Some metatheo-
on contextual types,17 a special kind of modal type8 captur- retical types of interest are ​X ⇒ Y​, the metafunctions from ​X​
ing well-typed terms in an object context. Although contex- to ​Y​, and ​N​the metaintegers. We will write ​term​ for the type
tual types can coexist with dependent types, the ability to of “​𝖬𝖫𝖳𝖳”​​​terms defined later on.
pattern-match on code in these settings is still a difficult Our base theory will be an instance of ​𝖬𝖫𝖳𝖳​ featuring
problem that is only satisfyingly handled in the absence of one Russell-style universe, negative Π ​ ​and Σ
​ ​types with defi-
dependent types.12 nitional ​η​-rules, together with a natural number type, an
empty type, and an identity type. We recall the syntax of this
3. HIGH-LEVEL DESCRIPTION theory in Figure 1. The typing rules are standard and we let
We now give the high-level intuitions that we develop techni- the reader refer to the mechanization for details.
cally later on. In this paper, we define “​𝖬𝖫𝖳𝖳”​, read “​𝖬𝖫𝖳𝖳​ We use the usual notations ​A → B​and ​A × B​for non-
with quotes”, a minute extension of ​𝖬𝖫𝖳𝖳​with quoting op- dependent product and sum. We write M ​ = N​for ​𝖨𝖽 A M N​
erators that implement 𝖢𝖳​ ​​ Σ​​​in a straightforward way. As al- when A ​ ​is clear from the context. Similarly, we sometimes
ready explained, 𝖢𝖳​
​​ Σ​​​holds externally in 𝖬 ​ 𝖫𝖳𝖳​. If we want drop the annotations of λ​ ​-abstrations and pairs. If n ​ ∈ N​, we
it as an internal rule, there are two problems to solve: first, write ​​​[​​n]​ ​​​ ℕ​​  ∈ term​for the unary numeral associated to ​n​. We
handling terms in an arbitrary context, and second, showing write ​Λ : = ℕ​for numbers intended to code for programs.
that our hypothetical internalization preserves conversion. Partial computations are usually defined through the
Despite the aura of complexity surrounding this question, partiality monad ​​℘​(​​A)​ ​​​  : = ℕ → 𝚘𝚙𝚝𝚒𝚘𝚗 A​. Since we will only
our solution is disappointingly simple. The first problem will ever consider partial integers, we use a simpler encoding.
be handled in the most trivial way one can think of. Namely,
Definition 1 (Partial integers).
the primitives computing the internal version of ​​𝖢𝖳​Σ​​​ will sim-
We define the type of partial integers ℕ​ ​​ ℘​​  : = ℕ → ℕ​.
ply block as long as their arguments are not closed. Since the
The intuitive meaning of a partial integer P ​ : ​ℕ​℘​​​ is the
return type of these primitives is first-order, this will not be a
following.
problem as it will not endanger canonicity.
• If for all ​n : ℕ​, P
​ n = 0​, then P ​ ​is undefined.
The second problem is solved in a similarly obvious man-
• Otherwise, let ​​n​0​​​be the smallest integer such that ​P ​
ner. Given two terms M ​ ≡ N : ℕ → ℕ​one needs to ensure that
n​0​​  = 𝖲 v​for some ​v​. Then P
​ ​evaluates to ​v​.
the quotation of these terms agree. In particular, the integer
code returned by these operations must be the same. This This can be easily internalized in 𝖬
​ 𝖫𝖳𝖳​.
sounds complicated, as in general two equivalent functions
Definition 2 (Step evaluation).
may have different codes. In Turing-complete languages,
Given ​M : ​ℕ​℘​​​, V ​ : ℕ​, and K ​ : ℕ​, we define the step-
this is actually impossible to achieve in a computable way,
evaluation predicate.
due to Rice’s theorem. But in 𝖬 ​ 𝖫𝖳𝖳​, there is a very simple
​M ⇝ V ∥ K : □​
way to find a canonical representative of an equivalence
read “​M​evaluates to ​V​in K ​ ​steps” by recursion on ​K​,
class of functions: just pick the normal form! Conversion in​
that captures the above intuition. Most notably if
𝖬𝖫𝖳𝖳​is decidable, as it virtually amounts to comparing the
​n ∈ N​, M ​ ⇝ V ∥ ​​[​​n]​ ​​​ ℕ​​​ is convertible to
normal forms of the considered terms for syntactic equality.
This requires that all well-typed terms have a normal form, (​​​ ​​M 0 = 0​)​​​  × ... × ​​(​​M ​​[​​n − 1​]​​​ ℕ​​  = 0​)​​​  × ​​(​​M ​​[​​n​]​​​ ℕ​​  = 𝖲 V​)​​​.​
but this property is usually taken for granted in dependent Given the algorithmically friendly nature of ​𝖬𝖫𝖳𝖳​, we
type theories and will for sure hold true in “​𝖬𝖫𝖳𝖳”​​​. will pick a slightly nicer, but equivalent, phrasing of 𝖢
​​ 𝖳​Σ​​​. We
As the astute reader may complain about, this is not will merge the Kleene predicate ​𝖳​and its decoding function​
enough in presence of ​η​-rules, which are included in our 𝚄​into a single function 𝚛​ 𝚞𝚗 : Λ → ℕ → ​ℕ​℘​​​.
system. But even in this case, our normalization trick can
Definition 3.
be adapted simply by maximally ​η​-reducing and stripping ​λ​
Henceforth, we will define 𝖢 ​ 𝖳​in 𝖬
​ 𝖫𝖳𝖳​ as
and pair annotations from the normal form.
​Πf : ℕ → ℕ. Σp : Λ. Πn : ℕ. Σk : ℕ. 𝚛𝚞𝚗 p n ⇝ f n ∥ k​
Despite the intuitive simplicity of the above guiding
for some ​𝖬𝖫𝖳𝖳​term 𝚛 ​ 𝚞𝚗 : Λ → ℕ → ​ℕ​℘​​​.
ideas, proofs about dependent type theory are very tedious

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 77
research highlights

5. ​"𝗠𝗟𝗧𝗧"​ EXTENSIONS expected to be defined in 𝖬


​ 𝖫𝖳𝖳​alone, or for that matter,
We now turn to the definition of the extensions that define​ in a much weaker fragment corresponding to ​𝖯𝖱𝖠​. In the
“𝖬𝖫𝖳𝖳”​ proper. remainder of this paper, we assume a fixed computation
model.
Definition 4.
The new term constructors of “​𝖬𝖫𝖳𝖳”​​​ are defined Notation 6.
in Figure 2, and will be collectively referred to as the We write
quoting primitives. ​​M Ϙ N : = 𝚛𝚞𝚗 ​(​​ϙ M​)​​  N ⇝ M N ∥ Ϛ M N​​
for the specification of the quoting operators on ​M​and ​N​.
​ 𝗠𝗟𝗧𝗧.”
Figure 2. Additional syntax of “ Furthermore, given any p ​ , n, k ∈ N​, let
​run p n k ∈ term : = 𝚛𝚞𝚗 ​​[​​p]​ ​​​ ℕ​​ ​​[​​n]​ ​​​ ℕ​​ ​​[​​k]​ ​​​ ℕ​​.​
We now have enough to define the typing rules of the
We give names exposing the intuition of the meaning of additional “​𝖬𝖫𝖳𝖳”​​​primitives in Figure 3. We delay the de-
those terms. The term ϙ​ M​is the quote of M ​ ​, which is intend- scription of the conversion rules of these primitives to a lat-
ed to return the code of the function ​M​. The term ​Ϛ M N​ is er point, because we still need more infrastructure.
the step-count of ​M​applied to ​N​—that is, it will return the Still, these rules are enough for the following theorem.
number of steps needed to evaluate the quote of M ​ ​on ar-
Theorem 7.
gument N ​ ​. Finally, ϱ​ M N​is the reflection of M​ ​applied to N
​ ​,
​“𝖬𝖫𝖳𝖳”​ proves ​𝖢𝖳​.
which produces a proof that indeed the quote of ​M​ fullfils
the expected runtime behavior.
Proof.
Definition 5. The types are just the Skolemization of ​𝖢𝖳​.
A computation model for “​𝖬𝖫𝖳𝖳”​​​is given by a metafunc-
To finish the specification of “​𝖬𝖫𝖳𝖳”​​​, we need to define
tion ​​​⌈​​  · ​⌉​​  ∈ term ⇒ N​​and an internal evaluation function ​
its conversion rules. This requires some heavy definitions.
𝚛𝚞𝚗 ∈ term​.
Definition 8 (Quasi-closedness).
To define the typing system alone we do not need any ad-
Given ​M ∈ term​and ​Γ ∈ ctx​, we say that ​M​is Γ ​ ​-quasi-
ditional requirements on those objects, but let us offer some
closed, written clos​
​​ Γ
​​  M​if, ignoring λ​ ​and pair annota-
intuition before proceeding any further.
tions, all free variables of ​M​are in Γ ​ ​. When Γ​ ​is empty, we
The ​​​⌈​​  · ​⌉​​​​function is just some arbitrary Gödel number-
will write c​ los M​.
ing of the syntax. As already explained, the ​𝚛𝚞𝚗​ function
is going to serve as a Kleene predicate expressed in a func- Definition 9 (Deep normal and neutral forms).
tional form. In particular, we expect 𝚛 ​ 𝚞𝚗 : Λ → ℕ → ​ℕ​℘​​​ and​ We recall the inductive definition of deep normal and
𝚛𝚞𝚗 p n​to compute the result of the application of the neutral forms of 𝖬
​ 𝖫𝖳𝖳​terms at Figure 4 and define the
program ​p​to the argument ​n​if it exists. In practice, ​𝚛𝚞𝚗​ is additional rules for “​𝖬𝖫𝖳𝖳”​​​ at Figure 5.

​ 𝗠𝗟𝗧𝗧.”​
Figure 3. Typing rules of “ Normal forms are, as usual, terms which cannot be sim-
plified further. Neutral forms are a subcase of the former,
intuitively capturing the notion of a term whose evaluation
is blocked due to a variable not bound to a value. In par-
ticular, they cannot trigger new conversions when they are
substituted for a variable. Our definition is standard for the​
𝖬𝖫𝖳𝖳​fragment, except maybe that we we ignore λ​ ​and pair
type annotations just like for the ​clos ​predicate. Only worth
Figure 4. Deep normal / neutral ​𝗠𝗟𝗧𝗧​ terms (excerpt). discussing are the newly introduced terms of “​𝖬𝖫𝖳𝖳”​​​.
The quote primitives do not generate any new non-neu-
tral normal forms, as their types are concrete, canonical
datatypes. They do generate new neutrals, though. The intu-
ition is that these primitives only compute on closed normal
forms, so if one of their arguments is not closed, they will
block and thus be neutral.
Notation 10.
We write ​clnf M​if both ​clos M​and ​dnf M​.

​ 𝗠𝗟𝗧𝗧”​ extensions.
Figure 5. Deep normal and neutral forms of “

78 COM MUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


The last non-trivial ingredient that is needed is erasure of​ there must be some smallest ​​k​0​​  ∈ N​ s.t. ​run ​​⌈​​​⌈​​M​⌉​​​⌉​​​  n ​k​0​​​
“𝖬𝖫𝖳𝖳”​terms. We rely on it to quotient normal forms with is convertible to ​𝖲 v​. Then ​Ϛ M ​​[​​n​]​​​ ℕ​​​ returns ​​k​0​​​, and ϱ​ M ​​[​​n​]​​​ ℕ​​​
regard to the various η
​ ​-rules of our system. must provide a closed proof of ​MϘ ​​[​​n​]​​​ ℕ​​​. But given the previ-
ous assumptions, M ​ Ϙ ​​[​​n]​ ​​​ ℕ​​​is convertible to a finite sequence
Definition 11 (Erasure).
of products of equalities ​ 0 = 0​with a trailing equality​
Given ​M ∈ term​, we define its erasure ∥
​ M ∥ ∈ term​. This
𝖲 v = 𝖲 v​. This type is trivially inhabited by [​​​ ​​ ​k​0​​, v​]​​​ Ϙ​​​.
operation can be understood as the composition of two
finer-grained primitives: first, replace ​λ​and pair type
6. ADEQUACY AND SOUNDNESS
annotations with a dummy term, and second, perform
There is still one missing piece for “​𝖬𝖫𝖳𝖳”​​​ to make sense.
maximal ​η​-reduction of ​λ​and pair nodes. We choose □ ​ ​
Indeed, in the intuitive explanation above, we argued that​
for the dummy term, but any closed normal term would
𝚛𝚞𝚗​should model the runtime behavior of ​“𝖬𝖫𝖳𝖳”​. In spite
do. We give the relevant operations in Figure 6; all other
of this, we have made no additional assumption on Defini-
cases are term homomorphisms.
tion 5 so far. We make this requirement formal as computa-
tional adequacy in this section. This first forces us to endow​
Figure 6. Term erasure.
“𝖬𝖫𝖳𝖳”​with a notion of evaluation.
Definition 14 (Evaluation).
We define in the metatheory a step-indexed evaluation
function ⇓
​​​ ​​​​that computes the deep normal form. We
omit the standard rules for the ​𝖬𝖫𝖳𝖳​fragment, and give
them for “​𝖬𝖫𝖳𝖳”​​​extensions in Figure 8.
Notation 12. Although we describe it using inference rules, evaluation
​ ∈ term​, we write ​​​⌈​​​⌈​​M​⌉​​​⌉​​  ∈ N​  : = ​​⌈​​  ∥ M ∥ ​⌉​​​​.
If M really is a step-indexed recursive function in the metatheory,
of type t​erm ⇒ N ⇒ option term​. We write M ​  ​​⇑​​​ k​​​ when it re-
Definition 13.
turns ​None​. We write ​​M⇓​ ​​N​​when there exists ​k ∈ N​s.t. ​M ​​⇓​​​ k​​  N​.
Given ​k, v ∈ N​ we define [​​​ ​​k, v​]​​​ Ϙ​​  ∈ term​by induction
We discuss briefly the evaluation rules for the quoting
on ​k​ as
​​[​​0, v​]​​​ Ϙ​​ : = ​𝗋𝖾𝖿𝗅 ℕ ​(​​𝖲 ​​[​​v]​ ​​​ ℕ)​​​ ​​​ primitives. They start by deeply evaluating the subterms
​​    ​ ​ ​ ​ ​​ and checking whether they are quasi-closed. If not, they im-
[​​ ​​k + 1, v​]​​​ Ϙ​​ : = 〈𝗋𝖾𝖿𝗅 ℕ 0, ​​[​​k, v​]​​​ Ϙ​​  〉. mediately return. Otherwise, they perform a macroscopic
We now have all the necessary definitions to define the evaluation step. For ϙ​ M​, that just means quoting the closed
new conversion rules of “​𝖬𝖫𝖳𝖳”​​​at Figure 7. We omit the normal form into a number. For Ϛ ​ M N​and ϱ​ M N​, it per-
obvious congruence rules, and we give some intuitions by forms a guarded μ ​ ​-recursion to find the smallest index such
paraphrasing the rules. The computation rule for ϙ​ M​is sim- that the erasure of M ​ N​evaluates to a numeral.
ply stating that quoting a closed normal term produces the
Definition 15 (Computational adequacy).
Gödel number of its erasure. The two other rules reflect the
We say that the computation model is adequate if it
behavior of the ​𝚛𝚞𝚗​operator in the theory itself. They start
satisfies the following:
by assuming that M ​ ​is a closed normal term, so its quote
• We have a derivation ⊢ ​ 𝚛𝚞𝚗 : Λ → ℕ → ​ℕ​℘​​​.
is a concrete code. Assuming canonicity, for any n ​ , k ∈ N​,​
• For all M ​ ∈ term​and ​n, k, v ∈ N​,
run ​​⌈⌈​​​ ​​M⌉​ ⌉​​​ ​​​  n k​must be convertible to a numeral. Similarly,​
​  ​​[​​n]​ ​​​ ℕ​​ ​​⇑​​​ k​​​ then ​​run ​​⌈⌈​​​ ​​M⌉​ ⌉​​​ ​​​  n k​⇓​​0​​,
if M
M ​​[​​n]​ ​​​ ℕ​​​must be convertible to some numeral ​v​.
​  ​​[​​n]​ ​​​ ℕ​​ ​​⇓​​​ k​​ ​​[​​v]​ ​​​ ℕ​​​ then r​​ un ​​⌈⌈​​​ ​​M⌉​ ⌉​​​ ​​​  n k​⇓​​𝖲 ​​[​​v]​ ​​​ ℕ​​​​.
if M
Since we expect ​𝚛𝚞𝚗​to model the computation of “​𝖬𝖫𝖳𝖳”​​​,

​ 𝗠𝗟𝗧𝗧.”​
Figure 7. New non-congruence conversion rules of “

​ 𝗠𝗟𝗧𝗧”​ extensions.
Figure 8. Step-indexed evaluation of “

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 79
research highlights

The last point is essentially stating that the metatheoreti- These predicates must satisfy a list of closure properties,
cal evaluation ⇓
​​​ ​​​​and the internal evaluation 𝚛
​ 𝚞𝚗​ coincide. which are typically weaker than the ones enjoyed by the
From now on, we implicitly assume that our globally standard typing rules of ​𝖬𝖫𝖳𝖳​. Another way to understand
fixed computational model is adequate. The remainder of this is that logical relations turn a liberal type system,
the paper is dedicated to the proof of the following theorem. called declarative, with many closure rules that is easy to
work with, into a lower-level type system that is hardly us-
Theorem 16.
able but which is similar in spirit to the steps performed
​“𝖬𝖫𝖳𝖳”​ is consistent and strongly normalizing. Further-
by a typing algorithm. Proofs in this generic system are in
more, any closed “​𝖬𝖫𝖳𝖳”​​​ term ​⊢ M : ℕ​ normalizes to a
some sense normalized compared to the declarative sys-
numeral.
tem. As a matter of fact, this is a way to prove decidabil-
ity of type-checking, since an algorithmic presentation of
7. A ​𝗠𝗟𝗧𝗧​LOGICAL RELATION PRIMER type-checking will satisfy the low-level closure properties.
We will prove Theorem 16 using a logical relation model. The A notable difference with the declarative system is that ge-
base model is basically the now-famous inductive-recursive neric typing requires making explicit the notion of neutral
one from Abel et al.,1 with a handful of surprisingly minor terms, through the introduction of neutral convertibility,
tweaks. In this section, we briefly recall the principles of which can be understood as the usual convertibility re-
Abel-style logical relations. stricted to neutral terms.
From a high-level point of view, Abel-style logical rela- In the remainder of this section, we assume fixed some
tions for ​𝖬𝖫𝖳𝖳​are just a glorified kind of realizability with instance of syntactic typing and reduction rules for 𝖬 ​ 𝖫𝖳𝖳​.
a lot of bells and whistles. Notably, in the semantic world, Due to their sheer size and the fact they are neither surpris-
well-formed types Γ ​ ⊩ A​are defined inductively, while well- ing nor new, we will not state the closure properties here,
typed terms p​ ​​ A​​​ ​|​​  Γ ⊩ M : A​​are defined recursively over a but rather refer to the corresponding Coq code.
proof ​​p​A​​  ∈ Γ ⊩ A​. In turn, well-typedness is extremely simi-
Definition 17.
lar to, say, Kleene realizability,23 that is, p ​​ ​A​​​ ​|​​  Γ ⊩ M : A​​ essen-
A set of typing rules is generic when it enjoys a specific
tially means that ​M​weak-head normalizes to some value ​V​,
list of (weak) closure properties.
further satisfying some conditions depending on ​A​. Due to
conversion being an integral part of typing, one also needs Given generic notions of typing and reduction, one can
to provide predicates for type and term convertibility at define reducibility in the abstract. Our base logical relation
some semantic type. is unchanged with regard to Abel et al.,1 so we will not give
The major departure from usual realizability in this kind the full definition here. To begin with, writing everything
of model is the proper handling of variables. All semantic in full on paper would be both unreadable and uninforma-
predicates are actually presheaves over contexts equipped tive, and probably ridden with typos. Rather, we sketch be-
with weakenings. Furthermore, neutral terms enjoy a very low some representative type formers to build intuition. We
specific status; see, for example, Lemma 22. This will turn also ignore bookkeeping details that clutter the simplicity
out to be important for our goals. of logical relations, such as universe levels. Our results are
The relation itself is parameterized by syntactic predi- backed up by a mechanized proof, so we can go for a fearless
cates for the various notions of typing and reduction. human-readable paper.

Figure 9. Reducibility (excerpt).

80 COM MUNICATIO NS O F TH E ACM | J U NE 2025 | VO L . 68 | NO. 6


imply the ones from Figure 7. After Theorem 33, we will also
Definition 18 (Reducibility).
be able to derive that self-convertibility implies well-typed-
We give a partial but evocative definition of reducibility
ness, which shows that the two versions are equivalent.
at Figure 9.
The model itself is defined in terms of a small-step reduc-
As already explained, we gloss over the details and in- tion relation, so we need to define it properly for ​𝖬𝖫𝖳𝖳​, in
stead concentrate on the high-level view. We abuse implicit addition to the big-step variant from Section 6.
arguments in Figure 9 to keep things readable, and we also
Definition 24.
omit additional well-formedness conditions. At the risk of
We mutually define weak-head and deep reductions,
repeating ourselves, this is really just the run-of-the-mill
respectively written ​M ↠ N​and ​M ⇒ ⇒ N​. The 𝖬
​ 𝖫𝖳𝖳​ rules
complete presheaf model for ​𝖬𝖫𝖳𝖳​, where presheaves have
are standard, and the “​𝖬𝖫𝖳𝖳”​​​rules are defined in Figure
been manually encoded by means of quantifications over all
10. Weak-head and deep reductions coincide for the
weakenings of the current context. Note the lack of natural-
quoting primitives.
ity conditions thanks to Lemma 19. In our setting, typed re-
duction is simply untyped reduction annotated with proofs
Figure 10. Weak-head reduction of ​“𝗠𝗟𝗧𝗧”​ extensions.
that both sides are well-typed and convertible. Similarly,
well-typed neutrals are untyped neutrals together with a
proof of well-typedness and self-convertibility. Without fur-
ther consideration for the low-level details, the logical rela-
tion satisfies some salient properties we list below.
Lemma 19 (Reducibility irrelevance).
For all proofs of type reducibility p ​​ ​A​​, ​q​A​​  ∈ Γ ⊩ A​,
​​ ​A​​​ ​|​​  Γ ⊩ M : A​​ then q
if p ​​ ​A​​​ ​|​​  Γ ⊩ M : A​​ and similarly for the
other statements.
This allows us to silently drop the proof of type formation Note that deep reduction is simply iterated weak-head re-
for reducibility statements as a notational device. As long as duction on the subterms of weak-head normal forms. Deep
there is one, it does not matter which one we pick. reduction is a specific sequentialization of evaluation, and
its reflexive-transitive closure computes the same normal
Lemma 20 (Escape).
forms. Importantly, both reductions are deterministic.
If Γ
​ ⊩ M : A​ then ​Γ ⊢ M : A​ and similarly for the other
One major remark for our proof to go through is that in
statements.
Abel-style logical relations, the closure properties of conver-
Lemma 21 (Anti-reduction). sion preserve deep normalization. Said otherwise, we never
Reducibility is closed by typed anti-weak-reduction. perform conversion on terms potentially introducing non-
termination. We leverage this by the following decoration.
Lemma 22 (Reflection).
Definition 25 (Deep Conversion).
Well-typed neutrals are reducible.
We define deep term conversion ​Γ ​⊢𝗇𝖿​ ​​ M ≡ N : A​as the
predicate ​Γ ⊢ M ≡ N : A​extended with the following side-
Just like for standard realizability, we need to close reduc-
conditions.
ibility by substitution to state the fundamental lemma. The
• For T ​​ ⇓
​​ ∈ ​{​​M, N​}​​​​, there is some ​​T​0​​​ s.t. T ​ ​​ ​T​0​​​​
relation resulting from this closure is known in the litera-
and ​Γ ⊢ T ≡ ​T​0​​  : A​.
ture as validity. We refer to the development for reference.
• ​​M​0​​​ and N​​ ​0​​​have the same erasure,
Theorem 23. that is, ​∥ ​M​0​​  ∥ = ∥ ​N​0​​  ∥​.
Well-typed ​𝖬𝖫𝖳𝖳​ terms are valid. We define similarly deep-type and neutral
conversions.
8. The ​"𝗠𝗟𝗧𝗧"​Logical Relation When instantiating the logical relation with deep conver-
For technical reasons, we will work with a slightly tweaked sions, one gets access to the fact that both sides of the con-
version of “​𝖬𝖫𝖳𝖳”​​​rules. Figure 7 is given for readability version deeply normalize to terms that have the same era-
purposes; these rules will be derivable in the actual system. sure. This is why we turn the typing premises of the quoting
The differences are the following: primitives into self-convertibility. Before the fundamental
First, instead of a global typing axiom for ​𝚛𝚞𝚗​, we add it as lemma, syntactic deep self-convertibility gives more infor-
a premise to the “​𝖬𝖫𝖳𝖳”​​​ rules for ​Ϛ​and ​ϱ​. This is a cosmetic mation than just syntactic typability.
change to strengthen induction over “​𝖬𝖫𝖳𝖳”​​​ derivations.
Lemma 26 (Deep Typing).
Second, for reasons that will become clear soon, we turn
The typing rules where the various conversion predicates
the well-typedness premises of those rules into self-convert-
are replaced by their deep equivalent satisfy the generic
ibility. For instance, the introduction rule of ​ϙ​ becomes:
typing interface.
​​ Γ ⊢ ΓM⊢≡ϙMM: :ℕΛ→ ℕ ​​
__________________
  
Note that erasing normal forms before comparing them
By reflexivity of term conversion, it is clear that these rules is critical for the above lemma. Indeed, generic conversion

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 81
research highlights

rules for negative types are given directly as η ​ ​-expansions. erasure, and thus the same closedness.
Therefore, two convertible normal forms may differ on their If not all these terms are closed, then ϱ​  ​M​0​​ ​N​0​​​ and ϱ​  ​M​′0​ ​​ 
annotations or up to some η ​ ​-expansion. Erasing all annota- N​′0​​​  are convertible neutral terms, we conclude as above.
tions and maximally η ​ ​-reducing ensures thus that the result Otherwise, by reducibility, ​​N​0​​  = ​N​′0​ ​  = ​​[​​n]​ ​​​ ℕ​​​ for some ​
is unique for each equivalence class. n ∈ N​. Since reducibility is closed by application, we get ​
As a result, the logical relation instantiated with deep Γ ⊩ M N ≡ ​M′ ​ ​N′ ​  : ℕ​. This implies that both applications
typing is a model of 𝖬 ​ 𝖫𝖳𝖳​. We only have to check that the deeply evaluate to the same semantic integer, that is,
additional “​𝖬𝖫𝖳𝖳”​​​rules are also interpreted. Since seman- a series of successors ending either with 0 ​ ​or a neutral.
tic self-convertibility is equivalent to semantic well-typed- By confluence, we rule out the second case as ​​M​0​​ ​N​0​​​ is
ness, it is enough to check the rules from Figure 7; the ones closed, so these terms reduce to some numeral ​​​[​​v]​ ​​​ ℕ​​​. By
from Figure 3 are a consequence of the former. confluence and standardization, ​​M​0​​ ​​[​​n]​ ​​​ ℕ​​​ also evaluates
to ​​​[​​v]​ ​​​ ℕ​​​. By irrelevance of erasure, ​∥ ​M​0​​  ∥ ​​[​​n]​ ​​​ ℕ​​​ also evalu-
Lemma 27 (Quote Reducibility).
ates to [​​​ ​​v]​ ​​​ ℕ​​​. By computational adequacy, there exists
The conversion rules for the ϙ​ ​ primitive hold in the reduc-
some k​​ ​0​​​ s.t. ​​run ​​⌈⌈​​​ ​​ ​M​0⌉​​​ ⌉​​​ ​​​  n ​k​0⇓
​​​ ​​𝖲 ​​[​​v]​ ​​​ ℕ​​​​ and r​​ un ​​⌈⌈​​​​ ​​ ​M​0⌉​​​ ⌉​​​​ ​​​  n ​
ibility model.
k′ ​​⇓​​0​​ for all k​​ ′ ​ < ​k​0​​​. Given that ​​M​0​​​ and M ​​ ​0′​​​  have the same
erasure, this also holds for ​​M​′0​​.​  Hence, ​Ϛ  M N​and ​Ϛ ​M′ ​ ​ N′ ​​
Proof.
evaluate to ​​​[​​ ​k​0​​​]​​​ ​​​ and ϱ​ M N​and ​ϱ ​M′ ​ ​ N′ ​​ evaluate to ​​
We focus on the more general congruence rule. As-
​[​​ ​k​0​​, v​]​​​ Ϙ​​​. The only remaining problem is to show that this

sume Γ ​ ⊩ M ≡ ​M′ ​  : ℕ → ℕ​, we now have to show that ​


value is indeed of semantic type M ​ Ϙ ​​[​​n]​ ​​​ ℕ​​​. But this is a
Γ ⊩ ϙ M ≡ ϙ ​M′ ​  : ℕ​. By escape, Γ ​  ​⊢​𝗇𝖿​​  M ≡ ​M′ ​  : ℕ → ℕ​. We
trivial consequence of the normalization properties of
thus have two terms M​ ​​ 0​​​ and M​​ ​′0​​​  s.t. M ​​​ ′ ​​⇓​​ ​M​′0​​​​  and ​
​​ ​⇓​​ ​M​0​​​​, M
the terms implied.
∥ ​M​0​​  ∥ = ∥ ​M​′0​ ​  ∥​. Either both M​ ​​ 0​​​ and M ​​ ​′0​​​  are quasi-closed
or both are not. In the latter case, both ​ϙ ​M​0​​​ and ϙ​  ​M​′0​​​ 
Finally, one also has to prove validity for these conver-
are neutral and convertible, so by reflection they are
sion rules, that is, that they are stable by substitution. This
reducibly convertible. Otherwise, they reduce to ​​​[⌈​​​​​​​​ ​​ ​M​0⌉​​​ ]​​​​​​​​ ​​​ ℕ​​​
turns out to be obvious. Congruence rules are stable by sub-
and [​​​ ⌈​​​​​​​​ ​​ ​M​′0⌉​​​​​​  ]​​​​ ​​​ ℕ​​​, which are equal numerals and thus reduc-
stitution by construction. For the other rules, it is enough
ibly convertible. In either case, we conclude by anti-
to observe that both quasi-closedness and erasure of quasi-
reduction.
closed terms are stable by substitution. We thus get the re-
To prove reducibility of ​Ϛ​and ​ϱ​, we need a fair amount sult below, of which Theorem 16 is an immediate corollary.
of rewriting theory. The computation model is completely
Theorem 33.
untyped, and we only care about erased terms, so we must
Well-typed “​𝖬𝖫𝖳𝖳”​​​ terms are valid.
consider reductions that ignore λ​ ​and pair annotations. The
following are textbook results, except for the interleaving of
weak and deep reduction for the quoting primitives. 9. MECHANIZATION
While the Abel et al. proof has been implemented in Agda,1
Definition 28.
our mechanization is written in Coq and is based upon a re-
We define parallel and standard reductions up to ​λ​ and
cent work by Adjedj et al.2 that relies on a small induction-
pair annotations for “​𝖬𝖫𝖳𝖳”​​​in the usual way.
recursion encoding.9 Apart from this, the two formaliza-
Lemma 29. tions are globally the same. The only advantage of the Coq
​Deep red. ⊆ parallel red. ⊆ standard red.​ version, which we believe to be decisive, is the availability of
tactics. The largish corpus of proofs relating the numerous
Lemma 30.
untyped reductions is rendered tractable by careful automa-
Parallel reduction is confluent.
tion.
Lemma 31. All theorems stated in the paper have been formalized.
Standard reduction to a ​dnf​ implies deep evaluation up to The only thing we did not formally prove was the actual ex-
annotations, and is preserved by erasure. istence of adequate models of computation. We globally axi-
omatized one in the development instead.
We can now tackle the remaining reducibility proofs. The reader may reasonably complain about the use of axi-
oms, as they may very well be inconsistent. Yet, the existence
Lemma 32.
of adequate models is at the same time an utterly trivial fact
The conversion rules for the Ϛ
​ ​and ​ϱ​ primitives hold in the
and a technically extremely challenging task. Indeed, at an
reducibility model.
intuitive level, ​𝚛𝚞𝚗​ has already been implemented in 𝖬 ​ 𝖫𝖳𝖳​,
Proof. because we defined evaluation as a step-indexed function
Once again we focus on the congruence rule. Since the in our metatheory, which turns out to be a variant of ​𝖬𝖫𝖳𝖳​.
two primitives behave computationally the same, we As explained before, we do not even need that much; the
only treat the ϱ​ ​case, which is more involved typingwise. fragment we use to implement it basically fits into 𝖯 ​ 𝖱𝖠​. In
Let us assume Γ ​ ⊩ M ≡ ​M′ ​  : ℕ → ℕ​and ​Γ ⊩ N ≡ ​N′ ​  : ℕ​. As any paper proof, such a computability result would be im-
before, we have deeply normal terms M​ ​​ ​′0​​​  (resp. ​​N​0​​​ and ​​
​​ 0​​​, M mediately brushed off as obvious, barely requiring any ex-
N​′0​​)​  convertible to the original terms and with the same planation. Now, given the non-trivial size of our untyped

82 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


language, implementing evaluation and proving anything plicity, the development still axiomatizes the computation
about it in Coq was already quite cumbersome. By contrast, model in the object theory. In all likelihood, these axioms
doing the same directly in the object theory is monstruous would be deemed self-evident in a computability theory pa-
enough to drive any sane person into sheer madness, not per proof.
even mentioning that we have to work up to a Gödel encod- Finally, we believe that the generic recipe we followed
ing. Out of common decency, we will not attempt to close for this model can be generalized to many other admissible
this unquestionable gap. principles in dependent type theory.

10. FUTURE WORK


Although we did not formally prove it out of unrepented
sloth, it is quite clear that “​𝖬𝖫𝖳𝖳”​​​ enjoys decidability of References
1. Abel, A., Öhman, J., and Vezzosi, A. 14. Kleene, S.C. On the interpretation
type-checking. Since type interpretation is unchanged in Decidability of conversion for type of intuitionistic number theory. J.
the logical relation, the global shape of the algorithm re- theory in type theory. In Proc. ACM Symbolic Logic 10, (1945), 109–124.
Program. Lang. 2, POPL (Dec. 2017). 15. Margenstern, M. L’école constructive
mains the same as the one for ​𝖬𝖫𝖳𝖳​. Handling the quoting 2. Adjedj, A. et al. Martin-Löf à la Coq. In de markov. Revue d’histoire des
Proc. 13th ACM SIGPLAN Intern. Conf. Mathématiques, (1995).
primitives should not cause any trouble since their reduc- on Certified Programs and Proofs, 16. Martin-Löf, P. Intuitionistic Type
tion rules enjoy the same kind of properties as in vanilla​ Blazy, S., Pientka, B., Timany, A., and Theory, Volume 1 of Studies in Proof
Traytel, D. ACM, (2024). Theory. Bibliopolis, (1984).
𝖬𝖫𝖳𝖳​. Given that our Coq proof is based on a development 3. Cohen, L. et al. Separating Markov’s 17. Nanevski, A., Pfenning, F., and Pientka,
that proves decidability of type-checking for 𝖬 ​ 𝖫𝖳𝖳​, it should principles. In Proc. 39th Annual ACM/ B. Contextual modal type theory.
IEEE Symp. on Logic in Computer ACM Trans. Comput. Log. 9, 3 (2008),
be quite simple to extend it to “​𝖬𝖫𝖳𝖳”​​​. Science, ACM (2024). 23:1–23:49.
Another interesting topic is the compatibility of ​𝖬𝖫𝖳𝖳​ 4. Coquand, T. and Mannaa, B. The 18. Paulin-Mohring, C. Program Extraction
independence of Markov’s principle in in the Calculus of Constructions. Ph.D.
with the full Russian axiomatic. We now know it is compati- type theory. Log. Methods Comput. Sci. thesis, Paris Diderot University, France,
ble with 𝖬​ 𝖯​and 𝖢
​ 𝖳​in isolation. Hence, a natural question is 13, 3, (2017). (1989).
5. Forster, Y. Computability in 19. Pédrot, P. Russian constructivism
whether it can accommodate both axioms at the same time. Constructive Type Theory. Ph.D. thesis, in a prefascist theory. In Proc. 35th
We are convinced that not only this is the case, but also that Saarland University, Germany, (2021). Ann. ACM/IEEE Symp. on Logic in
6. Gilbert, G. A Type Theory with Computer Science. Hermanns, H.,
the model described here can be easily extended to handle​ Definitional Proof-Irrelevance. Ph.D. Zhang, L., Kobayashi, N., and Miller, D.
𝖬𝖯​. The reason for this confidence lies in the uncanny simi- thesis, Mines ParisTech, France, (2019). (ed). ACM, (2020), 782–794.
7. Gödel, K. Über eine bisher noch nicht 20. Pédrot, P. and Tabareau, N. Failure is
larity between those two principles. Just like 𝖢𝖳, it is true benützte Erweiterung des finiten not an option—An exceptional type
that Markov’s rule, the external version of ​𝖬𝖯​, is derivable Standpunktes. Dialectica 12, 3-4 theory. In Programming Languages
and Systems–27th European Symp.
(1958), 280–287.
in 𝖬
​ 𝖫𝖳𝖳​using Friedman’s trick.20 Again, like 𝖢 ​ 𝖳​, the return 8. Gratzer, D., Kavvos, G. A., Nuyts, A., on Programming, ESOP 2018,
type of this rule is a concrete data type—that is, a Σ​ ​​ 01​​​  formu- and Birkedal, L. Multimodal dependent
type theory. Log. Methods Comput. Sci.
Held as Part of the European Joint
Conferences on Theory and Practice
la. It should therefore be easy to add a term 17, 3 (2021). of Software, Springer, Ahmed, A. (ed.),
9. Hancock, P. et al. Small induction (2018), 245–271.
Γ ⊢ M : ℕ → ℕ   Γ ⊢ N : ¬¬ ​​​(​​Σn : ℕ . M n = 0​)​​​
__________________________________________ recursion. Typed Lambda Calculi 21. Sozeau, M. et al. The metacoq project.
​​        
Γ ⊢ μ M N : Σn : ℕ . M n = 0
  ​​
and Applications, Springer, Berlin J. Autom. Reason. 64, 5 (2020),
Heidelberg, Hasegawa, M. (ed) 947–999.
that only computes when both arguments are closed and re- 156–172. 22. Swan, A.W. and Uemura, T. On
turns the witness extracted by Friedman’s trick. One must 10. Hyland, J. The effective topos. Studies Church’s thesis in cubical assemblies.
in Logic and the Foundations of Math. Struct. Comput. Sci. 31, 10
be careful to return the same integer for convertible argu- Mathematics 110, (1982), 165–216. (2021), 1185–1204.
ments, but this is easily obtained by returning the smallest 11. Ishihara, H., Maietti, M. E., Maschio, 23. Troelstra, A. and Dalen, D.
S., and Streicher, T. Consistency of Constructivism in Mathematics: An
index ​k​s.t. M ​ k​is 0. the intensional level of the minimalist Introduction. North-Holland, (1988).
We believe that many axioms can be added to ​𝖬𝖫𝖳𝖳​ this foundation with church’s thesis and 24. T. Univalent Foundations Program.
axiom of choice. Arch. Math. Log., 57, Homotopy Type Theory: Univalent
way, provided they are admissible on closed terms and re- 7-8 (2018), 873–888. Foundations of Mathematics. Institute
turn ​​Σ​01​​​  formulae that have sufficiently unique inhabitants. 12. Jang, J., Gélineau, S., Monnier, for Advanced Study, (2013); https://
S., and Pientka, B. Moebius: homotopytypetheory.org/book.
We leave the exploration of this subject to a later time. Metaprogramming using contextual 25. Werner, B. Sets in types, types in
types: The stage where system f can sets. In Abadi, M. and Ito, T., editors,
pattern match on itself. In Proc. ACM Theoretical Aspects of Computer
11. CONCLUSION Program. Lang. 6, POPL (2022), 1–27. Software, Third Intern. Symp., TACS
In this paper, we proved it was possible to add ​​𝖢𝖳​Σ​​​ to 𝖬
​ 𝖫𝖳𝖳​ 13. Kleene, S. C. Recursive predicates
and quantifiers. Transactions of the
’97, Sendai, Japan, September 23-26,
1997, Proceedings, volume 1281 of
without disrupting any of the desirable properties of the re- American Mathematical Society 53, Lecture Notes in Computer Science,
(1943), 41–73. Springer, (1997), 530–546.
sulting system, such as consistency, strong normalization,
and canonicity. Against all odds, the model used for this is
Pierre-Marie Pédrot (pierre-marie.
the most straightforward one can conceive, as it boils down [email protected].) is a researcher with the
to the standard logical relation for ​𝖬𝖫𝖳𝖳​. The computation- Gallinette team at Inria, Nantes, France.
al content of our variant of Church’s thesis is too borderline
stupid to even be considered unsurprising: It computes on
closed terms, period. The only mildly non-trivial phenom-
enon is the need for erasure of annotations and maximal​
η​-reduction, but that trick is hardly worthy of attention. As
a result, we are still bewildered about the reason for which
this problem was believed to be hard. This work is licensed under a Creative Commons
The theorems have been formalized in Coq, greatly re- Attribution International 4.0 License.
ducing the risk of an accidental error in the proof. For sim- © 2025 Copyright held by the owner/author(s).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 83
research highlights
DOI:10.1145/ 3 72 5 72 5

Technical Perspective
To view the accompanying paper,
visit doi.acm.org/10.1145/3725728 rh

A Symbolic Approach to
Verifying Quantum Systems
By Parosh Aziz Abdulla

may lie in
E XC E P T I O N A L A D D E D VA LU E tional program verification to develop we can lift core concepts of classical
connecting two complementary areas tools for handling quantum systems. verification, such as state-space explo-
of computer science. This statement Verification tools, in general, and for ration and symbolic reasoning, into the
is particularly true when applying ma- quantum systems in particular, would quantum setting. This extension allows
ture techniques developed in one area ideally have the following properties: the verification of quantum circuits,
to solve complex problems that arise in ˲ Flexibility: Allows flexible specifi- offering a path to applying well-estab-
a new area. The accompanying paper, cation of properties of interest lished classical paradigms in a domain
“An Automata-Based Framework for ˲ Diagnostics: Provides precise bug where formal guarantees are critical.
Verification and Bug Hunting in Quan- diagnostics In addition to its early practical
tum Circuits” by Lengál et al., is a case ˲ Automation: Operates automati- promise, the approach opens new re-
in point. It applies techniques devel- cally search directions in automata theory,
oped in logic, automata, and symbolic ˲ Scalability: Scales efficiently to where quantum structures introduce
verification to analyze the correctness verify useful programs novel mathematical challenges and op-
of quantum programs. Symbolic verification is one of the portunities. It establishes a connection
The current quest of quantum com- most successful techniques that sat- between quantum program verification
puting is achieving quantum suprem- isfy the above criteria for conventional and automata, promoting new possibil-
acy—that is, to reach the point where programs. To date, we have lacked an ities to exploit the richness of automata
we solve problems that are practically extension to the quantum realm due theory and automata-based verification
unsolvable using conventional com- to the latter’s unique mathematical in quantum computing. It is worth not-
puting. To that end, substantial recent structure and different operational ing that the methodology is not just
progress has been made in implement- principles. The core of this work is the about catching mistakes but also about
ing quantum hardware and developing innovative application of automata building trust in quantum computing
programming languages for quantum theory, a cornerstone of formal verifi- systems as we move toward an era in
computers. In many applications, sys- cation, to quantum circuit verification. which they might solve problems classi-
tem correctness is of critical impor- It combines tree automata-based rea- cal systems cannot.
tance. For instance, identifying a subtle soning with symbolic representations In conclusion, the accompanying
bug in a quantum circuit used for quan- tailored for quantum circuits, allow- paper represents a rare confluence of
tum chemistry simulations could pre- ing automated verification at unprec- disciplines, opening pathways for col-
vent incorrect predictions about mo- edented scales. It goes back to the very laboration between automata theorists,
lecular interactions, which is critical for basics of program verification by recall- quantum physicists, and software en-
drug discovery and material design. ing the classical Hoare triples {P}C{Q}, gineers. Looking ahead, this line of re-
Given the complexity of quantum where {P} and {Q} are sets of quantum search opens up exciting opportunities.
systems and the correctness require- states representing the pre-condition Could similar automata-based tech-
ments, tools for analyzing and verify- and the post-condition, and C is a quan- niques be adapted to other aspects of
ing quantum programs’ correctness tum circuit. It uses symbolic verifica- quantum software engineering? Could
are demanded. However, building veri- tion to check the validity of the triple, this approach handle quantum pro-
fication tools for quantum programs ensuring all executions of C from states gramming languages’ more abstract
is immensely challenging due to their in P result in states within Q. and flexible constructs? The authors’
probabilistic nature and the exponen- Although this is the first attempt to combination of deep theoretical in-
tial size of the computational space. use such techniques in the context of sights and practical tool development
Furthermore, there is a fundamental quantum computing, implementing sets the stage for significant advance-
difference in how conventional and the framework with a tool gives spec- ments in the field. Their results are a
quantum computers process informa- tacular results. For instance, the tool milestone for quantum circuit verifi-
tion: Conventional computers use digi- manages to verify large circuits with up cation and a beacon guiding future re-
tal bits (0s and 1s), while quantum com- to 40 qubits and 141,527 gates or catch search toward reliable quantum com-
puters deploy quantum bits (qubits) to bugs injected into circuits with up to putation.
store quantum information in values 320 qubits and 1,758 gates, while all
between 0 and 1. exiting tools fail to handle such bench- Parosh Aziz Abdulla ([email protected]) is a chaired
professor at the Department of Information
The accompanying paper addresses marks. Technology, Uppsala University, Uppsala, Sweden.
whether we can exploit the research The framework’s success is based on
community’s vast experience in conven- the groundbreaking observation that © 2025 Copyright held by the owner/author(s).

84 COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6


research highlights
DOI:10.1145 /3 72 5 72 8

An Automata-Based
To view the accompanying Technical Perspective,
visit doi.acm.org/10.1145/3725725 tp

Framework for Verification and


Bug Hunting in Quantum Circuits
By Yu-Fang Chen, Kai-Min Chung, Ondřej Lengál, Jyun-Ao Lin, Wei-Lun Tsai, and Di-De Yen

ABSTRACT verification methods to ensure circuit correctness in these


As quantum computing hardware advances, the demand applications.19 Verifying quantum circuits, however, pres-
for scalable, precise, and fully automated verification tech- ents unique challenges due to the exponential size of the
niques for quantum circuits grows. This paper introduces computational space and the probabilistic nature of quan-
a novel automata-based framework tailored for the verifica- tum operations. Traditional verification techniques are of-
tion of quantum circuits. In our approach, the problem is ten unsuitable for quantum circuits, especially as they scale.
framed as a triple {​ ​P}​ ​C {​ ​S}​ ​​, and the question is whether, Currently, available automated analysis tools are limited
given a set P
​ ​ of quantum states on the input of a circuit ​C​, in scalability, and many are inflexible in checking user-spec-
the set of quantum states on the output is equal to (or in- ified properties,2,7,12 or imprecise and unable to catch bugs.25
cluded in) a set ​S​. Our framework leverages tree automata to Scalable and flexible automated analysis tools for quantum
compactly represent sets of quantum states and we develop circuits are indeed missing. In this paper, we introduce a
transformers to implement the semantics of quantum gates novel approach to quantum circuit verification based on
over this representation. We implemented the proposed ap- automata, a technique well-established in classical verifica-
proach in a prototype tool and evaluated its performance tion but largely unexplored in quantum contexts.5
against various benchmarks from the literature. The evalu- In particular, we propose a new paradigm for analyzing
ation shows that our approach is quite scalable; for exam- and finding bugs in quantum circuits. In our approach, the
ple, we managed to verify a large circuit with 40 qubits and problem is given by a Hoare triple ​{​ ​P}​ ​C {​ ​S}​ ​​, where ​C​is a
141,527 gates, or catch bugs injected into a circuit with 320 quantum circuit and P ​ ​and S​ ​are sets of quantum states. The
qubits and 1,758 gates, where all tools we compared with verification question we address is whether the set of output
failed. Additionally, our work bridges quantum program ver- quantum states obtained by running ​C​on all states from ​P​
ification and automata, opening new possibilities to exploit is equal to (or included in) the set S ​ ​.
the richness of automata theory in quantum computing. We create a framework for analyzing the considered
class of properties based on (finite) tree automata (TAs).13
1. INTRODUCTION Languages of TAs are sets of trees; in our case, we consid-
Quantum computing holds the potential to solve complex er TAs whose languages contain perfect binary trees with
problems that classical computers struggle to address. the height being the number of qubits in the circuit. Each
Since its inception, quantum algorithms have showcased branch (a path from a root to a leaf) in such a tree corre-
capabilities beyond classical limits, exemplified by Shor’s sponds to one computational basis state (for example, |​ 0000⟩​​
factoring algorithm and Grover’s search algorithm—both or ​|​ 0101⟩​​for a four-qubit circuit), and the corresponding
of which can solve important computational problems leaf represents the complex amplitude of the state (we use an
provably faster than classical methods.16,20 For many years, algebraic encoding of complex numbers by tuples of inte-
the lack of hardware posed a barrier to implementing these gers to have a precise representation and avoid possible in-
algorithms, but recent advancements—including Google’s accuracies when dealing with floating-point numbers; this
2019 demonstration of a quantum computational advan- encoding is sufficient for a wide variety of quantum gates,
tage on a programmable superconducting processor—have including the Clifford+​T​universal set for quantum comput-
marked significant milestones, though their results remain ing). Sets of such trees can be, in many cases, encoded com-
the subject of ongoing debate.3 pactly using TAs (for example, storing the output of Bern-
As quantum computing moves toward practical applica- stein-Vazirani’s algorithm4 over n ​ ​qubits requires a vector of​
tions, programming languages and systems for quantum 2​​n​ complex numbers), but can be encoded by a linear-sized
computers are under active development to meet growing TA. For each quantum gate, we construct a transformation
demands. Efficient algorithms have begun to emerge in that converts the input states TA to a TA representing the
fields such as optimization, machine learning (ML), and gate’s output states, in a similar way as classical program
quantum chemistry, creating a pressing need for reliable transformations are represented in.14
Testing equivalence and inclusion between the TA rep-
resenting the set of outputs of a circuit and the postcondi-
The original version of this paper was published in
tion ​S​(from ​{​P}​ ​C {​ ​S}​ ​) can then be done by standard TA lan-
Proceedings of the ACM on Programming Languages 7,
guage inclusion/equivalence testing algorithms.13,18 If the
June 2023.
test fails, the framework generates a witness for diagnosis.

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 85
research highlights

One application of our framework is as a quick underap- Figure 1. Constructing the Bell state.
proximation of a quantum circuit non-equivalence test. Our
x x
approach can switch to a lightweight specification when q −→
1
(q1, q0) q −→
1
(q0, q1)
equivalence checkers fail due to insufficient resources and x x 0
q0 −→
2
(q2, q2) q2 −0→ () q1 −→
2
(q2, q3) q2 ()
still find bugs in the design. Quantum circuit (non-)equiva-
x2 x2 1/√2
lence testing is an essential part of the quantum computing q1 −→ (q3, q2) 1
q3 −→ () q0 −→ (q3, q2) q3 ()
toolkit. Its prominent use is in verifying results of circuit 1
optimization, which is a necessary part of quantum circuit (a) The TA of│00⟩. (b) The TA for √2 (│00⟩+│11⟩).
compilation to achieve the expected fidelity of quantum
algorithms running on real-world quantum computers, │x1⟩ H
which are heavily affected by noise and decoherence. Al-
ready in the world of classical programs, optimizer bugs are │x2⟩
being found on a regular basis in compilers used daily by
(c) The EPR circuit.
tens of thousands of programmers. In the world of quan-
tum, optimization is much harder than in the classical set-
ting, with many opportunities to introduce subtle and hard- In addition to the practical utility, our work also bridges
to-discover bugs into the optimized circuits. It is therefore the gap between quantum and classical verification, particu-
essential to be able to check that an output of an optimizer larly automata-based approaches such as regular model check-
is functionally equivalent to its input. ing.5 As far as we know, our approach to the verification of
Testing quantum circuit (non-)equivalence is, however, quantum circuits is the first one based on automata. The en-
a challenging task (QMA-complete).17 Due to its complexity, abling techniques and concepts involved in this work are, for
approaches that can quickly establish circuit non-equiva- example, the use of TAs to represent sets of quantum states
lence are highly desirable to be used, for example, as a pre- and express the pre- and post-conditions, the compactness
liminary check before a more heavyweight procedure (see of the TA structure enabling efficient gate operations, and
Burgholzer and Wille7 or Chen et al.8) is used. One currently our TA transformation algorithms enabling the execution of
employed fast non-equivalence check is to use random quantum gates over TAs. We believe the connection of autom-
stimuli generation.6 Finding subtle bugs by random testing ata theory with the quantum world we establish can start new
is, however, challenging—with no guarantees due to the im- fruitful collaborations between the two rich fields.
mense (in general uncountable) underlying state space. Overview. We use a concrete example to demonstrate how
Our approach can be used as follows: We start with a to use our approach. Assume we want to design a circuit con-
TA encoding the set of possible input states (created by the structing the Bell state, that is, a 2-qubit circuit converting a
user or automatically) and run our analysis of the circuit basis state ​|00⟩​to a maximally entangled state _​​√1_2 ​(​ |​ ​00​⟩​ + |​ 11⟩​)​​.
over it, obtaining a TA ​𝒜​representing the set of all outputs. We first prepare TAs corresponding to the precondition
Then, we take the optimized circuit, run it over the same TA (Figure 1a) and postcondition (Figure 1b). Both TAs use ​q​ as
with inputs and obtain a TA 𝒜 ​ ′​. Finally, we check whether the root state and accept only one tree. One can see the cor-
​ℒ​(​𝒜​)​ = ℒ​​(​𝒜′​)​​​. If the equality does not hold, we can con- respondence between quantum states and TAs by travers-
clude that the circuits are not functionally equivalent (if the ing their structure. The precise definition will be given in
equality holds, there can, however, still be some bug that Sections 2 and 3. Our approach will then use the transform-
does not manifest in the set of output states). ers from Sections 4 to 6 to construct a TA ​𝒜​recognizing the
We implemented our technique in a prototype called Au- quantum states after executing the EPR circuit (Figure 1c)
toQ and evaluated it over a wide range of quantum circuits, from the precondition TA (Figure 1a). We will then use TA lan-
including some prominent quantum algorithms, randomly guage inclusion/equivalence tool VATA18 to check ​𝒜​ against
generated circuits, reversible circuits from RevLib,23 and the postcondition TA. If the circuit is buggy, our approach
benchmarks from the tool Feynman.2 The results show that will return a witness quantum state that is reachable from
our approach is quite scalable. We did not find any tool with the precondition but not allowed by the postcondition. From
the same functionality as ours and hence pick the closest our experience of preparing benchmark examples, in many
state-of-the art tools—a circuit simulator SliQSim21 and cir- cases, this approach helps us find bugs in incorrect designs.
cuit equivalence checkers Feynman2 (based on path-sum),
Qcec7 (combining ZX-calculus, decision diagrams, and ran- 2. PRELIMINARIES
dom stimuli generation), and SliQEC22—as the baseline tools By default, we work with vectors and matrices over complex
to compare with. In the first experiments, we evaluated Au- numbers ​ℂ​. In particular, we use ​ℂ​m×n​to denote the set of all​
toQ’s capability in verification against pre- and post-condi- m × n​complex matrices. Given a ​k × ℓ​ matrix ​​​(​​ ​a​xy)​ ​​​​, its trans-
tions. We managed to verify the functional correctness (with pose is the ​ℓ × k​ matrix ​​​(​​ ​a​xy)​ ​​​​ T​ = ​​(​​ ​a​yx)​ ​​​​ obtained by flipping ​​​(​​ ​a​xy)​ ​​​​
regard to one input state) of a circuit implementing Grover’s over its diagonal. A 1 ​ × k​matrix is called a row vector and a​
search algorithm with 40 qubits and 141,527 gates. We then k × 1​matrix is called a column vector. To save vertical space,
evaluated AutoQ on circuits with injected bugs. The results we often write a column vector ​v​using its row transpose ​v​T​​.
confirm our claim—AutoQ was able to find injected bugs We use ​I​to denote the identity matrix of any dimension
in various huge-scale circuits, including one with 320 qubits (which should be clear from the context). The conjugate of
and 1,758 gates, which other tools failed to find. a complex number a ​ + bi​is the complex number a ​ − bi​, and

86 COMMUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


the conjugate transpose of a matrix A ​​ = ​(​​ ​a​xy​)​​​​ is the matrix​ effect of a k​ ​-qubit quantum gate ​U​on the qubits ​x​ℓ​, ... , ​x​ℓ+k−1​​
​A​†​ = ​​(​​ ​c​yx​)​​​​ where ​​c​yx​is the conjugate of ​a​yx​. For example: of an n ​ ​-qubit quantum state represented using a ​2​n​​-dimen-
sional vector ​v​, we proceed as follows. First, we compute an
1 − i 4 + 7i
( 3 0 )
​(​   )

1 + i​ ​ 2 − ​2i​ 3​ ​ ​ = ​ ​ ​   auxiliary matrix ​U′= ​I​n−​(​ℓ+k−1​)​​ ⊗ U ⊗ ​I​ℓ−1​ where ​I​j​​ denotes the​​
 ​ 2 + 2i​ ​ 0​ ​ ​.​ 
4 − 7i 0 0 2​​j​​-dimensional identity matrix. Note that if ​U​is unitary,
(1)
then U ​ ′​is also unitary. Then, the new quantum state is com-
The inverse of a matrix ​A​is denoted as ​A ​ ​−1​​. A square puted as ​v′= U′ × v​. For instance, let ​n = 2​and ​U​be the Pauli-​X​
matrix ​A​ is unitary if ​​A​†​ = ​A​−1​​. The Kronecker product of gate applied to the qubit ​x​1​​.
​A = ​​(​​ ​a​xy​)​​​  ∈ ​ℂ​k×ℓ​​and B
​∈ ​ℂ​m×n​​is the k​m × ℓn​ matrix ⎛0 0 1 0⎞
⎜ ⎟

⎜ ⎟
​​A ⊗ B = ​(​​ ​a​xy​B​)​​​​. For instance,
X′ = X ⊗ I = ​(​ ​ ​)​ ⊗ ​(​ ​ ​)​ = ​ ​0​ 0​ 0​ 1​ ​,​ 
 ​ 0 1 1 0
1 0 0 1 1 0 0 0
⎛ ​12 ​ 1
_ ​12 ​ 1 ⎞
_ ⎝0 1 0 0⎠ (3)
(− _​2 ​ 0) (− _​21 ​ 0)

⎜ ⎟⎜ ⎟
⎛0 0 1 0⎞ ⎛ 00⎞ ⎛ 10⎞
(​1 + i​)​ · ​ ​ ​ ​ ​ ​ 3 · ​ ​ ​ ​ ​ ​

⎜ ⎟
​12 ​ 1
_ 1 ​c​ ​ ​c​ ​
4 − 7i 0) (− _ ​12 ​ 0)
​(​ 1+i 3
​ ​ ​ ​⊗ ​ ​ ​ ​ ​ ​ ​ = ​ ​    ​​ ​​ 0 0 0 1 ​c​01​ ​c​ ​
​21 ​ 1
_ ​12 ​ 1
_  ​ v′ = X′ × v = ​ ​ ​ ​ ​ ​ ​ × ​ ​​c​ ​​ ​ = ​ ​​c​11​​ ​​ 
(− _​21 ​ 0) (− _​12 ​ 0)⎠
1 0 0 0
⎝0 1 0 0⎠ ⎝​c​ ⎠​ ⎝​c​ ⎠​

⎜ ⎟
(​ ​4 − 7i)​ ​ · ​ ​ ​ ​ ​ ​ 0 · ​ ​ ​ ​ ​ ​ 10 00
⎝ 11 01
(4)
​    
       ​​
⎛ _​1 ​+ _​1 ​i 1 + i _​3 ​ 3⎞
​ ​
2 2 2
​21 ​− _​12 ​i
−_ 0 ​32 ​ 0
−_ Representation of complex numbers. To achieve accuracy
​ ​ = ​ ​  
   ​​ ​ ​​ ​ ​ ​ ​. with no loss of precision, in this paper, when working with​
2−_ ​27 ​i 4 − 7i 0 0 ℂ​, we consider only a subset of complex numbers that can be
⎝− 2 + _​2 ​i 0⎠
7 expressed by the following algebraic encoding proposed in
0 0 (2)
Zulehner and Wille26 (and also used in Tsai et al.21):
​(_
​ 1_ ​)​ ​(​a + bω + c ​ω​2​ + d ​ω​3​)​,​ 
k
 ​ (5)
2.1. Quantum circuits. ​√ 2 ​
i_
where ​a, b, c, d, k ∈ ℤ​and ​ω = ​e​​ ​​, the unit vector that makes
π
4

Quantum states. In a quantum system with n ​ ​qubits, the an angle of ​4 ​ 5​​∘​​with the positive real axis in the complex
qubits can be entangled, and the system’s quantum state plane. A complex number is then represented by a five-tuple
can be a quantum superposition of computational basis (​ ​a, b, c, d, k​)​​. Although the considered set of complex num-
states ​{​ ​| j⟩​ ​|​ j ∈ ​{​0, 1​}​n​}​​. For instance, given a system with bers is only a small subset of ℂ ​ ​(it is countable, while the set​
three qubits ​​x​1​​, ​​x​2​​, and ​​x​3​​, the computational basis state​​ ℂ​is uncountable), the subset is already sufficient to describe
|011⟩​​denotes a state where qubit ​​x​ ​​is set to 0 and qubits​​ a set of quantum gates that can implement universal quan-
1
x​2​ and ​x​3​ are set to 1. The superposition is then denoted in tum computation (cf. Section 4 for more details). The alge-
the Dirac notation as a formal sum ​∑ ​ j∈​​{​0,1​}​ ​​ ​a​j​ · ​| j⟩​​, where braic representation also allows efficient encoding of some
n

​a​0​, ​a​1​, ... , ​a​​2​ ​−1​ ∈ ℂ​ are complex amplitudesa satisfying the prop- operations. For example, because ​​ω​4​ = − 1​, the multiplica-
n
2
erty that ​∑ j∈​​{​0,1​}​ ​​ ​|​​a​j​|​ ​ = 1​. Intuitively, ​|​a​j​​|​2​ is the probability tion of (​ ​a, b, c, d, k​)​ by ω
n ​ ​can be carried out by a simple right
that when we measure the state in the computational basis, circular shift of the first four entries and then taking the
we obtain the state ​| j⟩​; these probabilities need to sum up to 1 opposite number for the first entry, namely ​​​(​​ − d, a, b, c, k​)​​​​,
for all computational basis states. We note that the quantum which represents the complex number (​​_ ​1_2 ​)​ ​k (​​​ ​​ − d + aω +
​√
state can alternatively be represented by a ​​2​n​​-dimensional b ​ω​2​ + c ​ω​3)​ ​​​​. In the rest of the paper, we use 0 ​ ​ and ​1​to denote the
column vectorb ​​(​​​a​0​, ... , ​a​​2​ ​−1​)​​T​or by a pseudo-Boolean function​ tuples for zero and one—that is, ​(​0, 0, 0, 0, 0​)​and ​(​1, 0, 0, 0, 0​)​​,
n

T : ​{​0, 1​}​n​ → ℂ​, where T ​ ​( ​j​)​ = ​a​j​for all ​j ∈ ​{​0, 1​}​n​. In the follow- respectively. Using such an encoding, we represent quan-
ing, we will work mainly with the function representation, tum states by functions of the form ​T : {​ ​0, 1​}​n​ → ​ℤ​5​​.
which we will see as a mapping from the domain of assign-
ments to Boolean variables (corresponding to qubits) to ​ℂ​. Qubit measurement. After executing a quantum cir-
For instance, the quantum state ​ _
​1_2 ​​ · |​ 00⟩​ + _ ​1_2 ​​ · |​ 01⟩​​ can cuit, one can measure the final quantum state in the
​√ ​√
_
1_ _
be represented by the vector ​(​ ​​√​ 2 ​​, ​√​ 2 ​​, 0, 0​)​ ​​or the function computational basis. The probability that the qubit ​x​​j​​
1_ T

​T = ​{​00 ↦ _ ​​√1_2 ​,​ 01 ↦ _


​​√1_2 ​,​ 10 ↦ 0, 11 ↦ 0​}​​. of a quantum state ​∑ ​ i∈​​{​0,1​}​ ​​ ​a​i​ · |​ i⟩​​is measured as the
n

basis state |​​0⟩​​can be computed from the amplitude:


Quantum gates. Operations in quantum circuits are repre- ​Prob​​[​​x​j​ = |​ 0⟩​ = ​∑ i∈​​{​0,1​}​ ​×​{​0​}​×​{​0,1​}​ ​​ |​ ​​a​i|​ ​2​.​When ​x​j​collapses to ​|0⟩​​ n−j j−1

sented using quantum gates. A ​k​-qubit quantum gate (that after the measurement, amplitudes of states with ​x​ ​j​ = |​ 1⟩​​
is, a quantum gate with k​ ​inputs and k​ ​outputs) can be de- become 0 and amplitudes of states with ​x​j​ = |​ 0⟩​are normal-
scribed using a ​2​k​ × ​2​k​unitary matrix. When computing the ized using ​_ 1
_______________ ​​.

​ Prob​
   [ ]
​​x​​ = ​|0⟩​ ​
j

a We abuse notation and sometimes identify a binary string with its (un- 2.2. Tree automata.
signed) integer value in the most significant bit first encoding, for example
the string 0101 with the number 5.
b Observe that to satisfy the requirement for the amplitudes of quantum Binary trees. We use a ranked alphabet Σ ​ ​with binary symbols​
states, it must be a unit vector. f​, g​ ​, ... and constant symbols ​c​1​​, ​c​ ​2​, .... A binary tree is a ground

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 87
research highlights

term over Σ ​​. For instance, ​ ​T = f​( ​f​(​​c​1​, ​c​2​)​, g​​( ​f ​(​​c​2​, ​c​3​)​, ​c​1​)​)​​, Among others, ​𝒜​bas 3​ ​​accepts the previous tree (Ex. 1 figure,
shown below, represents a binary tree. The set of nodes of left) with the run (Ex. 1 figure, right). Observe that all tree
a binary tree ​T​, denoted as ​​N​T​​, is defined inductively as a nodes satisfy the requirement of a valid run. For example,
set of words over ​{​0, 1​}​ such that for every constant symbol the ​node the transition ​q​0​⟶ ​ 0 (​ )​ ​, ​01​ to ​
​c​, we define ​N​c​ = {​ ​ϵ}​ ​, and for every binary symbol f​​, we de- 2 x3
​​ 1 ​ 11​ corresponds​ ​to
x 1​
​​ (​​​ ​​ ​q​0​, ​q​0)​ ​​​​, and ​ϵ​ to ​q⟶
q​0⟶ ​ (​ ​​q​1​, ​q​0)​ ​, and so on.
1 1
fine ​N ​ ​a . w |​ ​ a ∈ {​ ​0, 1​}​ ∧ w ∈ ​N​​T​ ​}
​ ​f​(​T​ ​,​T​ ​)​​ = {​ ​ϵ}​ ​ ∪ ​{ , where ​ϵ​ is
​​ ​​​ a

In ​𝒜 bas
0 1

the empty word and ‘.’ is concatenation. Each binary tree ​T​ ​3​ ​, we use states named ​q ​n0​​to denote only subtrees
is associated with a labeling function ​L​T​: {​ ​0, 1​}​*​ → Σ​, which with all zeros (​0​) in leaves that can be generated from here,
maps a node in ​T​to its label in Σ ​ ​. and states named ​q​n1​​ to denote only subtrees with a single 1 ​​
in the leaves that can be generated from it. Intuitively, the
f TA accepts all trees of the height 3 with exactly one ​1​leaf
and all other leaves ​0​(in our encoding of quantum states,
this might correspond to saying that ​𝒜​bas 3​ ​​encodes an arbi-
f g
trary computational basis state of a three-qubit system).

c1 c2 f c1
3. ENCODING SETS OF QUANTUM
STATES WITH TREE AUTOMATA
c2 c3 Observe that we can use (perfect) binary trees to encode
functions ​{​0, 1​}​n​ → ​ℤ​5​, for example, the function represen-
Tree automata. We focus on tree automata on binary trees tation of quantum states. For instance, the tree given by the
and refer the interested reader to Comon13 for a general term
definition. A (nondeterministic finite) tree automaton (TA)
​x​1​​​(​​ ​x​2​​​(​​ ​x​3​(​1, 0​)​, ​x​3​(​0, 0​)​)​​​, ​x​2​​​(​​ ​x​3​(​0, 0​)​, ​x​3​(​0, 0​)​)​​​​)​​​​ 
 ​ (6)
is a tuple ​𝒜 = 〈Q, Σ, Δ, ℛ〉​where Q ​ ​is a finite set of states,​
Σ​is a ranked alphabet, ℛ ​ ⊆ Q​is the set of root states, and​ encodes the function T ​ ​where ​T​(​000​)​ = 1​​and ​T​(​i ​)​ = 0​​ for all​
Δ = ​Δ​int​ ∪ ​Δ​leaf​is a set of tree transitions consisting of the set​ i ∈ {​ ​0, 1​}​3​\​{​000​}​​. Since TAs can concisely represent sets of
f
Δ​int​​ of internal transitions of the form q ​→ ​ (​​​ ​​ ​q​0​, ​q​1)​ ​​​​ (for a binary binary trees, they can also encode sets of quantum states.
symbol ​f ​) and the set ​Δ​leaf​​ of leaf transitions of the form ​q ⟶ ​ c (​ )​ ​​
(for a constant symbol ​c​), for q ​ , ​q​0​, ​q​1​ ∈ Q​. We can conve- Example 2 (Concise representation by TAs).
niently describe TAs by providing only the set of root states​ Here we consider the set of n ​ ​-qubit quantum states ​S​n​ = ​{|​ i⟩​ ​
ℛ​and the set of transitions Δ ​ ​. The alphabet and states are |​ i ∈ {​ ​0, 1​}​ }​ ​, that is, the set of all basis states. Note that
n

implicitly ​x​1​ defined as​x​1​those that appear in ​Δ​. For example, |​ ​S​n|​​​ ​​ =​​2​n​, which is exponential. Representing all possible basis
​​Δ = ​{​​q⟶ ​ ​​​(​​ ​q​1​, ​q​0​)​​​, q⟶ ​​ 0 (​ )​ ​, ​q​1⟶
​ ​​​(​​ ​q​0​, ​q​1​)​​​, ​q​0⟶ ​​ 1 (​ )​ }​ ​​​​ implies that​​ states naively would require storing ​2​​2​ ​​complex numbers.
n

Σ = ​{​​ ​x​1​, 0, 1​}​​​​ and Q ​​ = ​{​​q, ​q​0​, ​q​1}​ ​​​​. TAs can, however, represent such a set much more efficiently.
For the case when n ​ = 3​, the set ​S​3​​ can be represented by
Run and language. A run of ​𝒜​on a tree ​T​is another tree​
the TA ​𝒜​bas 3​ ​​from Example 1 with ​3n + 1​ transitions (that
ρ​labeled with ​Q​such that (i) ​T​and ​ρ​have the same set of
is, linear-sized to the number of qubits). The TA ​𝒜​bas 3​ ​​can be
nodes, that is, ​N​T​ = ​N​ρ​; (ii) for all leaf nodes u ​ ∈ ​N​T​, we have
​ ​ ​ ​)​​ generalized to encode the set of all n ​ ​-qubit states ​Q​n​ = {​ |​ i⟩​ ​
​ L T ​(​u→
​L​ρ(​ ​u)​ ⎯ (​ )​ ​​​  ∈ Δ​; and (iii) for​ ​ all non-leaf nodes v​ ∈ ​N​T​​, we
(​ ​)​​ |​ i ∈ {​ ​0, 1​}​n}​ ​​ for each ​n ∈ ℕ​ by setting the transitions to
have the transitions ​​L​ρ(​ ​u)​ ⎯ ​ L T ​ ​u→ ​​​(​​​L​ρ(​ ​0 . u​)​, ​L​ρ(​ ​1 . u​)​)​​​  ∈ Δ​. The
run ​ρ​ is accepting if ​L​ρ(​ ​ϵ)​ ​ ∈ ℛ​. The language ​ℒ(​ ​𝒜)​ ​of ​𝒜​is the
set of trees accepted by 𝒜 ​ ​, that is ​ℒ(​ ​𝒜)​ ​ = ​{​T ​|​ there exists an
accepting run​​​o ​ f 𝒜 over T​}​​.

Example 1 (Accepted tree).


Assume a TA ​𝒜​bas
3​ ​​with q
​ ​ as its single root state and the fol- We denote the resulting TA by ​𝒜​n​. Notice that although ​S​n​​
lowing transitions: has ​2​n​​quantum states, ​𝒜​n​​ has only ​2n + 1​ states and ​3n + 1​
x x x
transitions.
q −→
1
(q10, q11) q11 −→
2
(q20, q21) q21 −→
3
(q0 , q1 ) q0 −0→ ()
x1 x2 x3
q −→ (q11, q10) q11 −→ (q21, q20) q21 −→ (q1, q0) q1 −1→ () Formally a TA ​𝒜​recognizing a set of quantum states is
x2
q10 −→ (q20, q20)
x3
q20 −→ (q0, q0)
a tuple ​〈Q, Σ, Δ, ℛ〉​, whose alphabet ​Σ​can be partitioned
into two classes of symbols: binary symbols ​x​ ​1​, ... , ​x​n​​and a
x1 q finite set of leaf symbols ​​Σ​c​ ⊆ ​ℤ​5​​representing all possible
amplitudes of quantum states in terms of computational
x2 x2 q11 q10 bases. By slightly abusing the notation, for a perfect bi-
nary tree T ​ ∈ ℒ​​(​𝒜​)​​, we also use T​ ​to denote the function
x3 x3 x3 x3 q20 q20 q20 ​ ​0, 1​}​n​ → ​ℤ​5​ that maps a computational basis to the corre-
{
1 0 0 0 0 0 0 0 q1 q0 q0 q0 q0 q0 q0 q0 sponding amplitude of T ​ ​’s quantum state. The two mean-
ings of T ​ ​are used interchangeably throughout the paper.

88 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


4. QUANTUM GATES ENCODING Figure 2. Applications of ​X​and ​CNOT​gates and their matrices.
With TAs used to concisely represent sets of quantum states,
the next task is to capture the effects of applying quantum

( )
│x1⟩ X
gates on this representation. When quantum states are rep-
0 1
resented as vectors, gates are represented as matrices and │x2⟩ X =
1 0
gate operations are matrix multiplications. When states are
represented as binary trees, we need a new representation (a) X1 gate applied to qubit x1. (b) Matrix of the X gate.
for quantum gates and their operations. Inspired by the

( )
work of Tsai et al.,21 we introduce symbolic update formulae, 1 0 0 0
which describe how a gate transforms a tree representing a │x1⟩ 0 1 0 0
CNOT=
quantum state. Later, we will lift the tree update operation 0 0 0 1
│x2⟩ 0 0 1 0
to a set of trees encoded in a TA.
We use the algebraic representation of quantum states
(c) CNOT12 gate with target (d) Matrix of the CNOT12 gate.
from Equation 5 also for their symbolic handling. For in- qubit x1 and control qubit x2.
stance, consider a system with qubits ​x​1​​, ​x​ ​2​, and its state
T = ​c​00​ · |​ 00⟩​ + ​c​01​ · |​ 01⟩​ + ​c​10​ · |​ 10⟩​ + ​c​11​ · |​ 11⟩​​ 
 ​ (7)
When we view ​T​as a tree, the operation ​P ​ ​​x​ ​(​ ​T ​)​​ essentially
1

for ​​c​00​, ​c​01​, ​c​10​, ​c​11​ ∈ ​ℤ​ ​​, four complex numbers represented
5
copies the right subtree of every ​x​1​-node to its left subtree,
in the algebraic way. The result of applying the ​X​gate (the and ​​R​_​x​ ​(​​​ ​​ ​P​​x​ (​​ ​T )​ )​ ​​​​makes all leaves in every right subtree of
1
1
quantum version of the N ​ OT​gate) on qubit ​x​ ​1​​(cf. Figure ​P​​x​ ​(​ ​T)​ ​​’s ​​x​1​-node zero. This would give us
2a)—denoted as ​X​1​​—is (​​ ​​c​10​, ​c​11​, ​c​00​, ​c​01)​ ​T​ (cf. Equation 4). In-
1

​T​0​ = ​c​10​ · ​|​00⟩ + ​c​11​ · ​|​01⟩ + 0 · ​|​10⟩ + 0 · ​|​11⟩ ​


tuitively, we observe that the effect of the gate is a permu-  ​​          ​​​  (10)
tation of the computational basis states that swaps the am- ​ = ​c​10​ · ​|​00⟩ + ​c​11​ · ​|​01⟩.​
plitudes of states where the ​x​1​’s value is 1 with states where On the other hand, the last two summands in the right-hand
the ​x​1​’s value is 0 (and the values of qubits other than ​x​1​​ stay side of Equation 8, that is, ​T​1​ = ​c​00​ · ​|10⟩​ + ​c​01​ · ​|11⟩​, could be
the same). Concretely, it swaps the amplitudes of the pairs obtained by manipulating T ​ ​as follows:
(|00⟩, |10⟩) and (|01⟩, |11⟩) to obtain the quantum state
​T​1​ = ​R​​x​ ​​​(​ ​​ ​P​_​x​ ​(​ ​T ​)​)​​​.​ 
 ​ 1 1
(11)
​X​1​(​T​)​ = ​c​10​ · ​|00⟩​ + ​c​11​ · ​|01⟩​ + ​c​00​ · ​|10⟩​ + ​c​01​ · ​|11⟩​.​ 
 ​ (8)
The tree view of ​R​​x​1​​​​(​​ ​P​_​x​1​​(​T ​)​)​​​​ is symmetric to ​​R​_​x​1​​​​(​​ ​P​​x​1​​(​T ​)​)​​​​, which
Instead of executing the quantum gate by performing would give us the following state:
a matrix-vector multiplication, we will capture its seman- ​T​1​ = 0 · ​|00⟩​ + 0 · ​|01⟩​ + ​c​00​ · ​|10⟩​ + ​c​01​ · ​|11⟩​
tics symbolically by directly manipulating the tree function​  ​​          ​​​  (12)
   = ​c​00​ · ​|10⟩​ + ​c​01​ · ​|11⟩​.
T : {​ ​0, 1​}​n​ → ​ℤ​5​. For this, we will use the following operators
on ​T​, parameterized by a qubit ​x​t​​ (​t​for “target”): Finally, by summing ​ ​T​0​​and ​ ​T​1​​
, we obtain Equation 8:
​T​ ​ + ​T​ ​ = ​c​10​ · ​|00⟩​ + ​c​11​ · ​|01⟩​ + ​c​00​ · ​|10⟩​ + ​c​01​ · ​|11⟩​.​That is,
0 1

the semantics of the X ​ ​gate could be expressed using the fol-


lowing symbolic formula:
​X​1​(​T ​)​ = ​R​_​x​1​​​​(​​ ​P​​x​1​​(​T ​)​)​​​ + ​R​​x​1​​​​(​​ ​P​_​x​1​​(​T ​)​)​​​.​ 
 ​ (13)
Observe that the sum effectively swaps the left and right
subtrees of each ​x​1​​-node.
For multi-qubit gates, the update formulae get more com-
_ plicated since they involve more than one qubit. Consider,
In the_ previous, _ ​b​t​denotes the complement_of the bit ​b​t​​ (that for example, the “controlled-​NOT​” gate ​CNOT​ct​(see Figure 2c
is, ​0​ = 1​and ​1​ = 0​). Intuitively, ​P​​x​t​​(​T ​)​ and ​P​​x​t ​​(​T ​)​ fix the value for the graphical representation and Figure 2d for its seman-
of qubit ​​x​t​​to be 1 and 0 respectively (in the tree represen- tics). The ​CNOT​ct​ gate uses ​x​t​and ​x​c​as the target and control
tation of the function, ​P ​ ​​x​t​(​ ​T )​ ​​corresponds to removing the qubit respectively. Intuitively, it “flips” the target qubit’s val-
0-subtree of every ​x​t​ node and substituting it with a copy of ue when the control qubit’s value is 1 and keeps the original
the 1-subtree of the node; vice versa for ​P​_​x​t (​​ ​T )​ ​). On the other value if it is 0. Similarly, as for the ​X​1​ gate, we can deduce a
hand, ​R​x​​ t(​​ ​T )​ ​returns a function that maps any computation- symbolic formula for the update done by a ​CNOT​ct​​ gate:
al basis state with ​x​ ​t​ = 1​to the original value in T ​ ​and the
​CNOT​ct​(​T ​)​ = ​R​_​x​c​(​ ​T ​)​ + ​R​​x​c​​​(​ ​​ ​R​_​x​ ​​​(​ ​​ ​P​x​ ​ ​(​ ​T ​)​)​​​ + ​R​​x​ ​​​(​ ​​ ​P​x​_​ ​(​ ​T ​)​)​​​​)​​​.​
other basis states to 0 (or the other way around for ​R​_​x​t ​(​ ​T )​ ​​). t t t t

Equipped with the operators, we can now proceed to ex- The sum consists of the following two summands:
press the semantics of ​X​1​symbolically. Let us first look at the ˲T
 he summand ​R​_​x​c​​(​T ​)​says that when the control qubit is
first two summands on the right-hand side of Equation 8:​​ 0, the values of all qubits stay the same.
T​​0​ = ​c​10​ · |​ 00⟩​ + ​c​11​ · |​ 01⟩​. These summands can be obtained ˲
The summand ​R ​ ​​x​c​​​(​ ​​ ​R​_​x​t ​​​(​ ​​ ​P​​x​t​(​ ​T ​)​)​​​ + ​R​​x​t​​​(​ ​​ ​P​_​x​t​(​ ​T ​)​)​​​​)​​​​ handles the
by manipulating the input function ​T​in the following way: case when ​x​c​ is 1. In such a case, we apply the ​X​ gate on​​
x​t​ (observe that the inner term is the update formula of​
 ​ ​T​0​ = ​R​​x_​ ​​​(​ ​​ ​P​​x​ (​​ ​T)​ ​)​​​.​  (9)
1 1 X​t​in Equation 13).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 89
research highlights

Table 1. Symbolic update formulae for common quantum gates: ​


In Chen et al.,10 we extend the work to allow using variables
x​c​and ​x​​΄​c​​​denote control qubits (if they exist), ​xt​​denotes the target as leave symbols and this set becomes expressible.
qubit, and ​T​ denotes a pseudo-Boolean function encoding of the In the next two sections, we discuss how to lift the tree up-
quantum state. date operation to a set of trees encoded in a TA. Our frame-
Gate Update work allows different instantiations. We will introduce two
​X​t​ ​R​x​ t​​​​​(​​ ​P_​x​​ t​ ​​​(​T ​)​)​​​ + ​R_​x​ t​​​​​(​​ ​P​x​ t​​​(​T​)​)​​​​
in this paper, namely the (i) permutation-based (Section 5)
and (ii) composition-based (Section 6) approaches. The former
​Y​t​ ​ω​2​· ​​(​​ ​R​x​ t​​​​​(​​ ​P​_​x​ t​​​(​T​)​)​​​ − ​R​_
x​ t​​​​​(​​ ​P​x​ t​​​(​T​)​)​​​​)​​​​ is simple, efficient, and works for all but the ​H​t​​, R ​ x ​(​_π​2​)​t​​, and​
​Z​t​ ​R​_​x​ t​​​(​ ​T​)​− ​R​x​ t​​​(​T​)​
Ry ​(​_π​2 ​)​t​gates from Table 1 (those whose effect is a permutation
_ of tree leaves, that is, for gates whose matrix contains only
​H​t​ ​​​(​​ ​P_​x​ t​(​​ ​T)​ ​+ ​R​x_​ t​​​​​(​​ ​P​x​ t​(​​ ​T)​ ​)​​​ − ​R​x​ t​(​​ ​T)​ ​)​​​ / ​√ 2 ​ one non-zero element in each row, potentially with a constant

​S​t​ ​R_​x​ t​​​(​T​)​+ ​ω​2​· ​R​x​ t​​​(​T​)​ scaling of amplitude), while the latter supports all gates in the
table but is less efficient. The two approaches are compatible
​T​t​ ​R​_​x​ t​​​(​T​)​+ ω · ​R​x​ t​​​(​T​)​ with each other, so one can, for example, choose to use the
​Rx (​ ​_2π​)​ ​t​ ​​​(​​T − ​ω​2​· ​​(​​ ​R​x​ t​​​​​(​​ ​P_​x​ t​(​​ ​T)​ ​)​​​ + ​R_​x​ t​​​​​(​​ ​P​x​ t​(​​ ​T)​ ​)​​​​)​​​​)​​​ / ​√ 2 ​
_ permutation-based approach by default and for unsupported
​ gates fall back on the composition-based approach.
_
​Ry ​(​_2π​​)​t​ ​​​(​​ ​P_​x​ t​​​(​T​)​+ ​R​x​ t​​​(​T​)​− ​R_​x​ t​​​​​(​​ ​P​x​ t​​​(​T​)​)​​​​)​​​  / ​√ 2 ​

​CNOT​ct​ ​R_​x​ c​ ​​(​T​)​+ ​R​x​ c​ ​​​​(​​ ​R_​x​ t​​​​​(​​ ​P​x​ t​​​(​T​)​)​​​ + ​R​x​ t​​​​​(​​ ​P_​x​ t​​​(​T​)​)​​​​)​​​​ 5. PERMUTATION-BASED
ENCODING OF QUANTUM GATES
​CZ​ct​ ​R​_​x​ c​ ​​(​T​)​+ ​R​x​ c​ ​​​​(​​ ​R​_​x​ t​​​(​T​)​− ​R​x​ t​​​(​T​)​)​​​​ Let us first look at the simplest gate ​​X​t​(​T​)​ = ​R​​x​t​​​​(​​ ​P​_​x​t​​(​T ​)​)​​​ +​
​Toffoli​c,​
c​´​
​R_​x​ c​ ​​(​T​)​+ ​R​x​ c​ ​​​​(​​ ​R_ R​_​x​t​(​​​ ​​ ​P​​x​t​(​ ​T)​ )​ ​​​​. Recall that in Section 4, we showed that the for-
t​ ​ ​x​ ​c​ ​ ​​​(​T​)​+ ​R​x​ ​c​ ​ ​​​​​(​​ ​R​x​ t​​​​​(​​ ​P​x​ t​​​(​T​)​)​​​ + ​R​x​ t​​​​​(​​ ​P​x​ t​​​(​T​)​)​​​​)​​​​)​​​​
_ _
´ ´

mula essentially swaps the left and right subtrees of each


​x​t​-labeled node. For a TA 𝒜 ​ ​, we can capture the effect of ap-
One can obtain symbolic update formulae for other plying ​X​t​ to all states in ​ℒ​(​𝒜​)​ by swapping the ​ ​ ​ left and the
quantum gates in a similar way. In Table 1, we give the for- right children of all ​x​t ​-labeled transitions q ​⟶ ​ xt ​​​(​​ ​q​0​, ​q​1​)​​​​, that
​ ​​
mulae for the gates supported by our framework. is, update them to q ​ xt ​(​​q​1​, ​q​0​)​. We use ​X​t​(​𝒜​)​ to denote the
​⟶
For a gate ​G​, we use the superscripts ​c​and ​​c´​ ​​to denote TA constructed following this procedure.
that ​x​c​and ​x´​c​ ​are the gate’s control qubits (if they exist) and
the subscript t​ ​to denote that ​x​t​is the target bit (for example,​ Theorem 2
G​tc,​​ c​´​​). We note that the supported set of gates is much larger ​ℒ(​ ​​X​t ​(​𝒜​))​ ​ = ​{​​X​t ​(​T​)​ ​|​ T ∈ ℒ​​(​𝒜​)​}​​.
than is required to achieve (approximate) universal quan-
tum computation (for which it suffices to have, for example, The update formulae of gates ​​Z​t​​, ​​S​t​​, and ​​T​t​​are all in the
(i) Clifford gates (​H​, ​S​, and ​CNOT​) and ​T​or (ii) ​Toffoli​and ​H​. ​ ​0​, ​a​1​ ∈ ℂ​. Intuitively, the for-
form ​​a​0​ · ​R​​x_​t ​(​ ​T​)​ + ​a​1​ · ​R​​x​t​(​ ​T​)​​for ​a
The sum consists of the following two summands: Algorithm 1: Algorithm for constructing Ut (A), for
Theorem 1 Algorithm
Ut ∈ {X1. Algorithm for constructing ​U​t​(​𝓐​)​, for ​U​t​∈ ​{​​X​t ​, ​Y​t​, ​Z​t​, ​S​t​, ​T​t​}​
t , Yt , Zt , St , Tt }
• The summand Rxc (T ) says that when the control qubit
The symbolic
is 0, the update formulae
values of in Table
all qubits 1 are
stay the correct with
same. Input: A TA A = ⟨Q, Σ, ∆, R⟩ and a gate Ut
regard to the standard semantics of quantum gates. Output: The TA Ut (A)
• The summand Rxc (Rxt (Pxt (T ))+Rxt (Pxt (T ))) handles
1 if Ut ∈ {Yt , Zt , St , Tt } then // need scaling
the case when xc is 1. In such a case, we apply the
2 assume Ut (T ) = a0 · Rxt (T0 ) + a1 · Rxt (T1 );
X gate on xt (observe that the inner term is the update
4.1. A note on of
formula expressivity.
Xt in Equation (13)).
3 let A′ = ⟨Q′ , Σ, ∆′ , R′ ⟩ be the primed copy of A;
The expressivity of our framework is affected by the follow- 4 A1 := ⟨Q′ , Σ, ∆1 , R′ ⟩, where
One can obtain symbolic update formulae for other quan- ∆1 = ∆′int ∪ {q ′ −−→
1 a ·c c
ing factors: 5 () | q −
→ () ∈ ∆leaf };
tum gates in a similar way. In Table 1 we give the formulae
framework. (​ ​a, b, c, d, k​)​: This
′ ◦
Algebraic
for the gates complex
supported number representation
by our
6
h
A := ⟨Q ∪ Q , Σ, ∆ ∪ ∆1 , R⟩, where
representation can arbitrarily
For a gate G, we use the superscripts closely approximate
c and c′ toany com-
denote ∆◦ = {s −−→
0 a ·c
() | s −
→ () ∈ ∆leaf } ∪
c

plex
thatnumber:
xc and xwhen

c are ​
b
the = d = 0​
gate’s , the
control quintuple
qubits (if (
​ a
​ ,
they 0, c, 0, k​
exist) ) ​​ and
rep- k x k x
_ {s −→ (p, q) | s −→ (p, q) ∈ ∆int ∧ k ̸= t} ∪
resents the number
the subscript ​ 1_2 ​ ​(​ ​a +that
t to denote
​√ kic​)​Then anytarget
xt is the complex bit number
(e.g., Gc,c can

).
7
x x
t
beWeapproximated
note that the arbitrarily
supported closely
set ofbygates
picking suitable
is much larger ​a​, ​c​,
k
{s −→ (p, q ′ ) | s −→
k
(p, q) ∈ ∆int ∧ k = t}
and ​k​. is required to achieve (approximate) universal quantum
than
Supported quantum
computation (for which gates: We covered
it suffices to have,alle.g., standard quan-
(i) Clifford 8 else Ah := A; // when Ut = Xt
tumgates (H, supported
gates S, and CNOT) and T or
in modern (ii) Toffoli
quantum and H. ex-
computers 9 assume Ah = ⟨Qh , Σ, ∆h , R⟩;
cept Theorem
parameterized 1. The rotation symbolic update gate. formulae
From Solovay-Kitaevin Table 1 are 10 if Ut ∈ {Xt , Yt } then // need swapping
theorem, 15
gatesthe
correct (w.r.t. performing
standard semantics ​_​2π​ ​​ can begates.
rotationsofbyquantum approxi- k 11 AR := ⟨Qh , Σ, ∆R , R⟩, where
mated with an error rate ​ϵ​with ​𝒪(​ ​​log​​3.97​(​_​Є1 ​))​ ​-many gates we ∆R = {s −→
k x k
(p, q) | q −→
x
(p, q) ∈ ∆hint ∧ k ̸= t} ∪
support.
A note on expressivity. 12 k
{s −→
x k
(q, p) | s −→
x
(p, q) ∈ ∆hint ∧ k = t} ∪
Tree automata structure: We use non-deterministic tran-
The expressivity of our framework is affected by the fol- c c
sitions {s − → () ∈ ∆hleaf }
→ () | s −
lowingoffactors:
tree automata to represent a set of trees compact-
ly. Nevertheless, we can currently encode only a finite set of 13 else AR := Ah ;
1. Algebraic complex number representation (a, b, c, d, k):
states. Hence, we cannot express, for example, all quantum 14 return AR ;
This representation can arbitrarily closely approximate
states where the amplitudes
any complex number: of ​|10b⟩​ and
when =d= ​|01 ⟩ are the same.
0, ​the quintuple
(a, 0, c, 0, k) represents the number √1 k (a + ic) Then
90
2
COMM UNICATIO NS O F THE ACM | J U NE 2025 | VO L . 68 | NO. 6
any complex number can be approximated arbitrarily 5. PERMUTATION-BASED ENCODING OF
closely by picking suitable a, c, and k. QUANTUM GATES
mulae scale the left and right subtrees of ​T​with scalars ​a​0​​ and​​ from Table 1—projection ​P​​x​k​(​ ​T​)​, restriction ​R​​x​k​(​ ​T​)​​, multipli-
a​1​, respectively. The construction of the TA representing the cation, and binary operation ± ​ ​—and then compose them to
result of the operation (given in Algorithm 1) can be done by get the desired gate semantics. The update formulae in Ta-
(1) making one primed copy ​𝒜​1​of ​𝒜​(that is, a copy where ev- ble1 are always in the form of ​𝗍𝖾𝗋𝗆​1​ ± ​𝗍𝖾𝗋𝗆​2​. For example, for
ery state ​q​is renamed to ​q​´​, with properly modified transitions the ​X​t​ gate, ​𝗍𝖾𝗋𝗆​1​ = ​R​​x​t​​(​​P​_​x​t​(​ ​T)​ ​)​ and ​𝗍𝖾𝗋𝗆​2​ = ​R​_​x​t​​(​​P​​x​t​(​ ​T)​ ​)​​. Our
and root states) whose leaf labels are multiplied with ​a​1​​ (Lines idea is to first construct TAs ​​𝒜​​𝗍𝖾𝗋𝗆​1​​​and ​​𝒜​​𝗍𝖾𝗋𝗆​ ​​​, recognizing 2

3–5); (2) multiplying all leaf labels of ​𝒜​with​ ​ ​ ​a​0​(Lines 6–7);​ ​ ​ and quantum states of ​𝗍𝖾𝗋𝗆​1​ and ​𝗍𝖾𝗋𝗆​2​, and then combine them
​ x t ​( ​p, q​)​to s​ ⟶
(3) updating all ​x​t ​-labeled transitions ​s⟶ ​ x t ​(​p, ​q​´​)​​; using binary operation ​±​to produce a TA recognizing the
that is, for the right child, jump to the primed version (Line quantum states of ​𝗍𝖾𝗋𝗆​1​ ± ​𝗍𝖾𝗋𝗆​2​. The TAs ​𝒜​​𝗍𝖾𝗋𝗆​ ​​,​ ​​𝒜​​𝗍𝖾𝗋𝗆​ ​​​ would 1 2

7). The case of ​Y​t​is similar, but we need both constant scaling be constructed using TA versions of basic operations. We re-
(Lines 1–7) and swapping (Lines 10–12) (the left-hand side and fer the reader to the original paper11 for further details of the
right-hand side scalars being ​ω​2​and ​−​ω​2​​, respectively). concrete TA constructions.

Theorem 3 7. EXPERIMENTAL EVALUATION


​ℒ(​ ​​U​t ​(​𝒜​))​ ​ = ​{​​U​t ​(​T​)​ ​|​ T ∈ ℒ​​(​𝒜​)​}​​, for ​U​t​ ∈ ​{​​Y​t​, ​Z​t​, ​S​t​, ​T​t​}​​. We implemented the proposed TA-based algorithm as a pro-
totype tool named AutoQ in C++. We provide two settings:
The cases of multi-qubit gates ​CNOT​ct​​, ​C
​ Z​ct​, and ​Toffoli​c,​
t​ ​​
c​´​
Hybrid, which uses the permutation-based approach (Sec-
can be handled when ​t​is the lowest of the three qubits, that tion 5) to handle supported gates and switches to the com-
is, ​c < t ∧ ​c​´<
​ t​. We can assume w.l.o.g. that c​ < ​c​´​. Output of position-based approach for the other gates, and Composi-
these gates can be constructed recursively following Algo- tion, which handles all gates using the composition-based
rithm 2. Let us look at the corresponding update formulae: approach (Section 6). Further details are available in the
original paper.11 We evaluated AutoQ in two use cases.

7.1. Circuit verification.


In the first experiment, we compared how fast AutoQ com-
putes the set of output quantum states and checks whether
the set satisfies a given post-condition. We compared against
the simulator SliQSim21 in the setting when we ran it over all
We first construct the TA of the inner term, the shad- quantum states encoded in the pre-condition of the quan-
ed area, which are TAs for ​​X​t​​, ​​Z​t​​, or ​​CNOT​​ct​​​​​. We call it the tum algorithm and accumulated the times. We also include
'

primed version here (cf. ​𝒜´1​ ​ ​at Line 4). We then update all ​x​c​ the time taken by Feynman2 to check the equivalence of the
​x​ ​ ​x​ ​
-labeled transitions ​s ⟶ ​ c ​(​p, q​)​to ​s ⟶ ​ c ​(​p, ​q​´​)​, that is, jump to circuits with themselves. Although checking equivalence of
the primed version in the right subtree. quantum circuits is a harder problem than what we are solv-
ing (so the results cannot be used for direct comparison with
Theorem 4 AutoQ), we include these results in order to give an idea about
​ℒ​(​U​(​𝒜​)​)​ = ​{​U​(​T​)​ ​|​ T ∈ ℒ​(​𝒜​)​}​​, for ​U ∈ ​{​​CNOT​ct​, ​CZ​ct​, ​ the hardness of the circuits for path-sum-based approaches.
Toffoli​c,​t​ ​}​​. We ran this experiment on a benchmark composed of
c​´​

well-known quantum circuits, where the semantics was


Algorithm
Algorithm 2. Algorithm
2: Algorithm for constructing ​U(​ ​𝓐)​ ​​, for ​​ U(A), for
for constructing known to us so that
Composition, which wehandles
could allconstruct
gates usingTAsthe with pre- and
composition-
U∈ {
​ ​C
​ NOT​ c
​, C
​ Z​c
​, T
​ offoli​ ​ }

c,​cc​​
​ c,c′ post-conditions.
based approach These
(Section were
6). the
Further following:
details BV,
are MCTof-in
available

c
U ∈ {CNOT t t
t , CZ t , Toffoli
t t }
foli,
the Grover-Sing,
original paper and [11]. Grover-All.
We evaluatedWe AutoQgive the
in results
two use
Input: A TA A = ⟨Q, Σ, ∆, R⟩ and a gate U in cases.
Table 2. Both BV and Grover-Sing work with only one
Output: The TA U(A)
input state, which should be most favorable for simulators.
7.1 Circuit Verification
c
1 if U = CNOTt then A1 := Xt (A);
c
2 if U = CZt then A1 := Zt (A);
Surprisingly, for the case of Grover-Sing, AutoQ outper-
′ ′ forms In the first
SliQSim experiment,
on large we compared
cases how fast AutoQ
(out of curiosity, we triedcom-
to
c,c
3 if U = Toffolit then A1 := CNOTct (A); putes the set of output quantum states and checks whether
′ ′ ′ ′ run SliQSim on Grover-Sing (​n​=20) without a timeout;
4 let A1 = ⟨Q1 , Σ, ∆1 , R ⟩ be a primed copy of A1 ;
R ′ R ′ thethe set satisfies
running timeawas given post-condition.
51m43s). We attributeWe compared
the good against
per-
5 A := ⟨Q ∪ Q1 , Σ, ∆ ∪ ∆1 , R⟩, where the simulator SliQSim [21] in the setting when we ran it over
formance of AutoQ to the compactness of the TA repre-
∆R ={s −→
xk
(p, q) | s −→
xk
(p, q) ∈ ∆int ∧ k ̸= c} ∪ all quantum states encoded in the pre-condition of the quan-
sentation of Grover’s state space. On the other hand, both
tum algorithm and accumulated the times. We also include
6
{s −→
xk
(p, q ′ ) | s −→
xk
(p, q) ∈ ∆int ∧ k = c} ∪ MCToffoli
the time taken and Grover-All
by Feynman consider [2] to check ​2​n​ input states, andof
the equivalence
c c we can observe the exponential factor emerging;
the circuits with themselves. Although checking equivalence hence Au-
{s − → () | s − → () ∈ ∆leaf } toQ outperforms SliQSim in large cases. All tools perform
of quantum circuits is a harder problem than what we are
return AR ; pretty well(soonthe BV,results
even cases
solving cannotwith 100 qubits
be used for direct cancomparison
be easily
handled. We can we
with AutoQ), alsoinclude
see that Hybrid
these is consistently
results in order to givefaster
an
than
ideaComposition.
about the hardness of the circuits for path-sum-based
c < t ∧ c′ < t. We can assume
6. COMPOSITION-BASED w.l.o.g. that c < c′ . Output
ENCODING approaches.
of OFthese gates can GATES
QUANTUM be constructed recursively following Algo- 7.2. We ran this
Finding bugs. experiment on a benchmark composed of well-
rithm
In 2. Let us look at theapproach,
the composition-based corresponding we develop update TA
formulae:
opera- Inknown quantum
the second circuits, where
experiment, we comparedthe semantics AutoQwas known
with the
to us so that we could construct TAs with pre- and 8 post-
tions for all
c operations in the symbolic
CNOTt (T ) = Rxc (T ) + Rxc ( Rxt (Pxt (T )) + Rxt (Pxt (T )) ) update formulae equivalence checkers Feynman, 2
Qcec, 7
and SliQEC, and
conditions. These were the following: BV, MCToffoli,
Grover-Sing, and Grover-All. We give the results in
CZct (T ) = Rxc (T ) + Rxc ( Rxt (T ) − Rxt (T ) ) 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 91
Table JU2.N EBoth BV and Grover-Sing work with only one
c,c′ input state, which should be most favorable for simulators.
Toffolit (T ) = Rxc (T ) +
research highlights

Table 2. Verification of quantum circuits. Here, n denotes the parameter value for the circuit, #q denotes the number of qubits, #G denotes
the number of gates in the circuit. For AutoQ, the columns “before” and “after” have the format “states (transitions)” denoting the number of
states and transitions in TA in the pre-condition and the output of our analysis respectively. The column “analysis” contains the time it took
AutoQ to derive the TA for the output states and = denotes the time it took Vata to test equivalence. The timeout was 12min. We use colors to
distinguish the best result in each row and timeouts.

AutoQ-Hybrid AutoQ-Composition SliQSim Feynman


n #q #G before after analysis = before after analysis = time verdict time

95 96 241 193 (193) 193 (193) 6.0s 0.0s 193 (193) 193 (193) 7.1s 0.0s 0.0s equal 0.5s
96 97 243 195 (195) 195 (195) 5.9s 0.0s 195 (195) 195 (195) 7.1s 0.0s 0.0s equal 0.5s
BV 97 98 246 197 (197) 197 (197) 6.3s 0.0s 197 (197) 197 (197) 7.4s 0.0s 0.0s equal 0.6s
98 99 248 199 (199) 199 (199) 6.5s 0.0s 199 (199) 199 (199) 7.7s 0.0s 0.0s equal 0.6s
99 100 251 201 (201) 201 (201) 6.7s 0.0s 201 (201) 201 (201) 7.8s 0.0s 0.0s equal 0.6s
12 24 5,215 49 (49) 71 (71) 11s 0.0s 49 (49) 71 (71) 49s 0.0s 2.8s timeout
14 28 12,217 57 (57) 83 (83) 31s 0.0s 57 (57) 83 (83) 2m26s 0.0s 18s timeout
Grover-
16 32 28,159 65 (65) 95 (95) 1m29s 0.0s 65 (65) 95 (95) 6m59s 0.0s 1m41s timeout
Sing
18 36 63,537 73 (73) 107 (107) 4m1s 0.0s timeout 9m27s timeout
20 40 141,527 81 (81) 119 (119) 10m56s 0.0s timeout timeout timeout
8 16 15 33 (42) 104 (149) 0.0s 0.0s 33 (42) 404 (915) 2.8s 0.0s 1.6s equal 0.0s
10 20 19 41 (52) 150 (216) 0.0s 0.0s 41 (52) 1,560 (3,607) 27s 0.0s 6.1s equal 0.1s
MCToffoli 12 24 23 49 (62) 204 (295) 0.0s 0.0s 49 (62) 6,172 (14,363) 6m48s 0.1s 25s equal 0.1s
14 28 27 57 (72) 266 (386) 0.1s 0.0s timeout 1m40s equal 0.1s
16 32 31 65 (82) 336 (489) 0.2s 0.0s timeout timeout equal 0.2s
6 18 357 37 (43) 252 (315) 3.3s 0.0s 37 (43) 510 (573) 12s 0.0s 1.7s timeout
7 21 552 43 (50) 481 (608) 10s 0.0s 43 (50) 1,123 (1,250) 42s 0.0s 5.4s timeout
Grover-
8 24 939 49 (57) 934 (1,189) 39s 0.1s 49 (57) 2,472 (2,727) 2m40s 0.0s 26s timeout
All
9 27 1,492 55 (64) 1,835 (2,346) 2m17s 0.4s 55 (64) 5,421 (5,932) 10m13s 0.1s 2m5s timeout
10 30 2,433 61 (71) 3,632 (4,655) 9m48s 2.1s timeout 11m31s timeout

evaluated the ability of the tools to determine that two For such cases, AutoQ can still find bugs using a weaker
quantum circuits are non-equivalent (this is to simulate the specification. For instance, AutoQ was able to find bugs in
use case of verifying the output of an optimizer). We took some large-scale instances from RevLib with hundreds of
circuits from the benchmarks FeynmanBench, Random, qubits, where both Feynman and Qcec fail (see11 for more
and RevLib, and for each circuit, we created a copy and in- details).
jected an artificial bug (one additional randomly selected
gate at a random location). Then we ran the tools and let 8. CONCLUSION
them check circuit equivalence; for AutoQ, we let it com- We presented a foundational framework for automata-
pute two TAs representing sets of output states for both cir- based verification of quantum circuits, demonstrating its
cuits for the given set of input states and then checked their effectiveness through empirical studies. We believe apply-
language equivalence with Vata.18 ing automata to quantum verification establishes a valu-
Our strategy for finding bugs with AutoQ (we used the able bridge between quantum and classical verification
Hybrid setting) was the following: We started with a TA approaches, with the potential to adapt many classical
representing a single basis state, (that is, a TA with no top- verification techniques to quantum settings—for instance,
down nondeterminism), and gradually added more non- by replacing constant tree leaf values with symbolic vari-
deterministic transitions (in each iteration one randomly ables10 and tracking the symbolic values in a similar way as
chosen transition) into the TA, making it represent a larger in symbolic execution. This allows the verification of prop-
set of states, running the analysis for each of the TAs, until erties such as whether the probability of finding a solution
we found the bug. This proved to be a successful strategy, exceeds 90% or increases over time. We also extended the
since running the analysis with an input TA representing, model’s applicability from quantum circuit to quantum pro-
for example, all possible basis states, might be too challeng- gram verification9 based on a classical deductive program
ing (generally speaking, the larger is the TA representing verification approach.
the set of states, the slower is the analysis). Note that TAs allow representation of infinite languages,
The results we obtained show that our approach to hunt- yet we only use them for finite ones, which might seem like
ing for bugs in quantum circuits is beneficial, particularly the model is overly expressive. We, however, stick to TAs for
for larger circuits, where equivalence checkers do not scale. the following two reasons: (i) there is an existing rich tool-

92 COM MUNICATIO NS O F TH E ACM | J U NE 2025 | VO L . 68 | NO. 6


box for TA manipulation and minimization, (for example, 13. Comon, H. et al. Tree automata Computer Society (1994), 124–134.
techniques and applications (2008). 21. Tsai, Y., Jiang, J.R., and Jhang,
Comon13 and Lengál18), and (ii) we want to have a robust for- 14. D’Antoni, L., Veanes, M., Livshits, B., C. Bit-slicing the Hilbert space:
mal model for extending our framework to parameterized and Molnar, D. Fast: A transducer- Scaling up accurate quantum circuit
based language for tree manipulation. simulation. In Proc. 58th ACM/IEEE
verification, (that is, proving that an ​n​-qubit algorithm is ACM Trans. Program. Lang. Syst., 38, 1 Design Automation Conf., IEEE (2021),
correct for any ​n​, which will require us to deal with infinite (2015), 1:1–1:32. 439–444.
15. Dawson, C.M. and Nielsen, M.A. The 22. Wei, C., Tsai, Y., Jhang, C., and Jiang,
languages). In Abdulla1, we introduced a specialized tree au- Solovay-Kitaev algorithm. Quantum J.R. Accurate BDD-based unitary
tomata model tailored for quantum verification, achieving Inf. Comput., 6, 1 (2006), 81–95. operator manipulation for scalable and
16. Grover, L.K. A fast quantum robust quantum circuit verification.
exponential speed-up over classical TAs and even allowing mechanical algorithm for database In Proc. 59th ACM/IEEE Design
search. In Proc. of the 28th Annual Automation Conf., ACM, Oshana, R.
parameterized verification of some class of circuits. ACM Symp. on the Theory of (ed),(2022), 523–528.
There remain considerable opportunities for further Computing (1996), 212–219. 23. Wille, R. et al. RevLib: An online
17. Janzing, D., Wocjan, P., and Beth, resource for reversible functions and
advancements in this area, including but not limited to: (i) T. ”Non-identity-check” is QMA- reversible circuits. In Proc. Intern.
automated loop invariant synthesis procedures for quantum complete. Intern. J. of Quantum Symp. on Multi-Valued Logic (2008),
Information 03, 03 (2005), 463–473. 220–225; http://www.revlib.org.
program verification and more fine-grained analyses such 18. Lengál, O., Šimác̆ek, J., and Vojnar, 24. Ying, M. Floyd-Hoare logic for
as probabilistic reachability; (ii) developing abstractions and T. VATA: A library for efficient quantum programs. ACM Transactions
manipulation of non-deterministic tree on Programming Languages and
refinement scheme suited to quantum verification; (iii) verifi- automata. In Intern. Conf. on Tools Systems (TOPLAS), 33, 6 (2012), 1–49.
cation of quantum circuits simulating physical phenomena, and Algorithms for the Construction 25. Yu, N. and Palsberg, J. Quantum
and Analysis of Systems, Springer abstract interpretation. In Proceedings
for example, those implementing Hamiltonian simulation; (2012), 79–94. of the 42nd ACM SIGPLAN Intern.
and (iv) integrating automata with other quantum verifica- 19. Moll, N. et al. Quantum optimization Conf. on Programming Language
using variational algorithms on near- Design and Implementation (2021),
tion frameworks like Quantum Hoare Logic24 and meth- term quantum devices. Quantum 542–558.
ods for equivalence checking to enable a more automated, Science and Technology, 3, 3 (Jun. 26. Zulehner, A. and Wille, R. Advanced
2018), 030503. simulation of quantum computations.
systematic verification pipeline across diverse verification 20. Shor, P.W. Algorithms for quantum IEEE Trans. Comput. Aided Des.
paradigms. These directions highlight the vast potential for computation: Discrete logarithms and Integr. Circuits Syst., 38, 5 (2019),
factoring. In 35th Annual Symp. on 848–859.
refining and expanding automata-based quantum verifica- Foundations of Computer Science, IEEE
tion, promising a pathway to more robust, scalable, and ver-
satile tools for analyzing complex quantum systems. Yu-Fang Chen, Institute of Information Jyun-Ao Lin, National Taipei University
Science, Academia Sinica, Taipei, Taiwan. of Technology, Taipei, Taiwan.
ACKNOWLEDGMENTS Kai-Min Chung, Institute of Information Wei-Lun Tsai, Institute of Information
Science, Academia Sinica, Taipei, Taiwan. Science, Academia Sinica, Taipei, Taiwan.
This material is based on a work supported by the Czech Min-
Ondřej Lengál, Brno University of Di-De Yen, Max Planck Institute for
istry of Education, Youth and Sports project LL1908 of the Technology, Brno, Czech Republic. Software Systems, Saarbrücken,
ERC.CZ programme; the Czech Science Foundation project Germany.
23-07565S; the FIT BUT internal project FIT-S-23-8151; Na-
tional Science and Technology Council, R.O.C., projects NSTC
111-2119-M-001-004, NSTC-112-2222-E-001-002-MY3, NSTC-
113-2119-M-001-009, and NSTC-113-2222-E-027-007, ASOFR
(Air Force Office of Scientific Research) project FA2386-23-1-
4107, Foxconn research project 05T-1120327-1C, Academia
Sinica Grand Challenge Seeding Project AS-GCS-113-M06,
Academia Sinica Investigator Project Grant AS-IV-114-M07.

References
1. Abdulla, P.A. et al. Verifying quantum 767–772.
circuits with level-synchronized tree 7. Burgholzer, L. and Wille, R. Advanced
automata. In Proc. ACM Program. equivalence checking for quantum
Lang. 9, POPL (2025). circuits. IEEE Trans. on Computer-
2. Amy, M. Towards large-scale Aided Design of Integrated Circuits
functional verification of universal and Systems 40, 9 (2020), 1810–1824.
quantum circuits. In Proceedings 15th 8. Chen, T.-F., Jiang, J.-H.R., and Hsieh,
Intern. Conf. on Quantum Physics and M.-H. Partial equivalence checking of
Logic, Selinger, P. and Chiribella, G. quantum circuits. In Proc. 2022 IEEE
(eds.), 2018), 1–21. Intern. Conf. on Quantum Computing
3. Arute, F. et al. Quantum and Engineering (2022), 594–604.
supremacy using a programmable 9. Chen, Y. et al. AutoQ 2.0: From
superconducting processor. Nature, verification of quantum circuits to
574, 7779 (Oct. 2019), 505–510. verification of quantum programs
4. Bernstein, E. and Vazirani, U.V. (technical report). (2024); https://arxiv.
Quantum complexity theory. In Proc. org/abs/2411.09121.
25th Annual ACM Symp. on Theory of 10. Chen, Y. et al. AutoQ: An automata-
Computing, ACM (1993). based quantum circuit verifier. In Proc.
5. Bouajjani, A., Jonsson, B., Nilsson, M., 35th Intern. Conf. Computer Aided
and Touili, T. Regular model checking. Verification, Springer (2023), 139–153.
In Computer Aided Verification, 12th 11. Chen, Y. et al. An automata-based
Intern. Conf., Springer Emerson, framework for verification and bug
E.A. and Sistla, A.P. (eds.), (2000), hunting in quantum circuits. Proc.
403–418. ACM Program. Lang. 7, PLDI (2023),
6. Burgholzer, L., Kueng, R., and Wille, 1218–1243.
R. Random stimuli generation for 12. Coecke, B. and Duncan, R. Interacting
This work is licensed under a Creative Commons
the verification of quantum circuits. quantum observables: categorical
Attribution International 4.0 License.
In Proc. 26th Asia and South Pacific algebra and diagrammatics. New J. of
Design Automation Conf., ACM (2021), Physics, 13, 4 (apr 2011), 043016. © 2025 Copyright held by the owner/author(s).

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 93
last byte

people who knew about temporal differ-


ence learning looked at them and said,
“That’s a lot like the TD algorithm.”
Sutton: Wolfram Schultz and Peter
Dayan and Martin Hammer researched
this, but Andy was a key part of get-
ting the connections right between TD
learning and neuroscience.
Barto: The connection is really obvi-
ous when you see the data.
Sutton: In retrospect, it’s obvious.
But it was also a huge insight. I think it’s
the greatest interaction between engi-
neering and natural science about the
brain ever.

You both spent much of your careers


working on various aspects of rein-
forcement learning. What are some
of the most exciting developments to
of the basic
[ C ONT I N U E D FRO M P. 9 6] (TD) learning, which is a subset of rein- emerge over the years, and has anything
ideas involved with reinforcement learn- forcement learning that was the focus changed the way you approach the goal
ing, which is the work that came out of of Rich’s Ph.D. thesis. If I understand of learning from the environment?
these years. correctly, TD learning builds on this Barto: Speaking for myself, I can view
Sutton: “Learning from experience” concept by incorporating sequential a lot of what’s going on in reinforcement
is the catchphrase I like to use. And I like predictions to figure out what’s good in learning from a mathematical perspec-
to reference Alan Turing, who said, in the long term. tive, as a collection of engineering tools.
1947, “What we want is a machine that Barto: If some action is taken, and There’s a connection to dynamic pro-
can learn from experience.” That turned the consequences don’t occur for a con- gramming that’s extremely interesting.
out to be the key of reinforcement learn- siderable amount of time, how do you It looks at what a reinforcement learn-
ing. The world doesn’t tell you what you assign credit? We call this a delayed ing style of computing buys you over
should do; you learn that from experi- reward problem, and the temporal dif- classical dynamic programming, and
ence, trial and error. ference algorithm was designed to deal it has yielded great results in asynchro-
with it. It’s based on animal learning nous dynamic programming and Mon-
How did you go about building a system theory, where predictors of reward act as te Carlo dynamic programming.
that learns from experience? reward themselves. Sutton: In terms of what’s changed,
Sutton: We settled on the perspec- one of the major things is how we think
tive that two things are essential for Temporal difference learning was later about model-based reinforcement
reinforcement learning. One is that it’s found to be a very plausible way of ex- learning. In the 90’s, we went through
associative—you’re associating things plaining how dopamine works in the a period of examination about whether
with other things. The second is that it’s brain. model-based or model-free methods
evaluative. The opposite of evaluative is Barto: The data that connected do- were superior. At the end of the decade,
instructive. In reinforcement learning, pamine to the temporal difference algo- we concluded that the answer was to
the feedback you get is either a reward rithm didn’t exist at the time. Wolfram have them operating simultaneously.
or a penalty, rather than instructions Schultz did some experiments, and
about what you should have done. We By “model-based method,” you mean
never tell the system, “In this situation, an algorithm that incorporates a model
do this. In that situation, do that.” “I like to reference of the environment, which gives agents
the ability to plan and make inferences
Instead, it learns: “In this situation, if Alan Turing, who about how the environment will behave.
you do this, it’s bad. If you do that, it’s said, in 1947, ‘What Sutton: Of course, a model can never
good.” be complete, and it has to be learned,
we want is a machine
PHOTOGRA PH BY RICHA RD M ORGENST EIN

Barto: This is the connection to and all models will be wrong, as is the
search. Trial and error is search. The that can learn from famous quote, but some of them are use-
system searches for something that ful. The word “model” suggests it might
yields a high evaluation. experience’.” be a simple thing, like a bunch of proba-
Sutton: You search, and then you as- bilities or differential equations. Really,
sociate what you found. it’s what we mean when we say “knowl-
edge.” Knowledge of the world—how
Let’s move on to temporal difference you represent and organize and learn

94 COMMUNICATIO NS O F TH E AC M | J U NE 2025 | VO L . 68 | NO. 6


last byte

it—is obviously complex. But I think


we’ve gotten a handle on it, although it
hasn’t fully been tested. Temporal ab-
straction and the technology of options,
which you may have heard about, deals
with the topic. So does off-policy learn-
ing, which enables you to learn about a
policy when you’re not necessarily fol-
lowing it. A lot of things fit together, but
have yet to reach their full power.
I want to say one more thing. We have
to simplify sometimes when we say that
reinforcement learning is about learn-
ing from rewards and penalties, trial
and error. That’s not wrong. But some-
times people hear that and say, “It’s not
about model-based methods. You can’t
be thoughtful. You can’t reason.” In my
view, those two are closely related, and I
think we would want to include both of
them in the same subject. been interested in mechanisms for cre- ing very little reviewing, and I’m reading
ating inputs to reinforcement learning what I want to read. I keep track a little
Rich, you’ve made the point that deep systems, and we experimented with lots bit with some of my former students, but
learning and reinforcement learning of different ways of doing this, including I’m not deeply involved in the projects
are often forced into an artificial dichot- neural networks. The world has settled that they’re working on.
omy. Can you talk about that a little bit? on backprop (backpropagation) as a
Sutton: Obviously, deep learning good way of training multi-layer neural Rich, since 2002, you’ve been working
methods like large language models networks, so at some point we became on the Alberta Plan for AI Research,
are important developments, and they consumers of these methods for gen- which posits that understanding intelli-
are useful in lots of ways. On the other erating inputs. Backprop is the basis gence is a grand scientific prize that may
hand, you can question whether they’re of, or part of the basis of, not just large soon be within reach—and outlines
really intelligence as opposed to just the language models but also AlphaGo and your vision for getting there.
use of computation to do something AlphaZero. Sutton: The Alberta Plan is a 12-step
useful. I shouldn’t say “just,” because In fact, before backprop came out, plan that represents my attempt to be
computation is important to all fields, we were using reinforcement learning more detailed than “learning from
and there’s a tendency to call things to train multi-layer neural networks. I experience.” Right now, I’m working
“AI” because AI has cachet. But there was very excited about it, but backprop on the first step, which is actually the
are two ways of thinking about learning is much faster. However, I don’t think easiest case, namely linear supervised
and making a mind. One is where hu- many people believe that it’s very plau- learning. But I feel that I have to go
mans are deeply involved in the learn- sible from a biological perspective. back to the very beginning and figure
ing process, and systems are learning out some basic things that have been
from human data. The other is learn- Let’s talk about what you’re working on neglected, things about normalization
ing from experience. According to that these days. Andy, you retired in 2012, and meta-learning. Step two would be
dichotomy, large language models are right? nonlinear, and if we accomplish that, it
clearly on the side of learning from hu- Barto: I’m happily retired, and I love will remake deep learning. Deep learn-
mans. They’re trying to mimic humans, not having to do forced reading. I’m do- ing doesn’t learn continually—it has
and they don’t have any goals of their to learn and then be frozen. And it’s
own. They don’t have any way of tell- a shame, because learning should be
ing true from false other than what hu- “As Andy said, continual. Ultimately, I hope to move
mans have told them. on from these early steps, which are
At the same time, as Andy said, you you could view RL purely supervised, and incorporate
could view reinforcement learning not not as a model of the insights into better reinforcement
as a model of the mind, but as a collec- learning systems. I’m trying to stay ac-
tion of tools. And those tools are used in the mind but as a tive, because I think exciting things are
PHOTOGRA PH BY JASON M. GROW

large language models. They’re used in collection of tools.” going to continue to happen, and I want
Reinforcement Learning from Human to be a part of that.
Feedback (RLHF). They’re used in the
runtime processes that are mistakenly Leah Hoffmann is a technology writer located in
Piermont, NY.
called “reasoning.”
Barto: On the flip side, we have always © 2025 ACM 0001-0782/25/6

JU N E 2 0 2 5 | VO L. 6 8 | N O. 6 | C OM M U N IC AT ION S OF T HE ACM 95
last byte

DOI:10.1145/3724079 Leah Hoffmann

Q&A
Developing the Foundations
of Reinforcment Learning
2024 ACM A.M. Turing Award recipients Andrew G. Barto and Richard S. Sutton on learning from
experience, temporal difference learning, and reinforcement learning as “a collection of tools.”
THE EX A MPLES ARE nothing if not relat-
able: preparing breakfast, or playing a
game of chess or tic-tac-toe. Yet the idea
of learning from the environment and
taking steps that progress toward a goal
apparently was under-studied when
ACM A.M. Turing Award recipients An-
drew G. Barto and Richard S. Sutton
took on the topic in the late 1970s. Even-
tually, their research led to the creation
of reinforcement learning algorithms
that sought not to recognize patterns
but maximize rewards. We spoke with
Barto and Sutton about how it all un-
folded, and what’s next for the tech-
niques that are so celebrated for their
success in AlphaGo and AlphaZero. Andrew G. Barto Richard S. Sutton

PHOTOGRA PH OF ANDREW G. BARTO BY JASO N M. GROW; PHOTOGRAP H O F RICH A RD S. SUTTON BY RIC HARD MORGE N ST E I N
Let’s start with the earliest days of your sense, and we continued on that line of of Massachusetts for graduate school.
collaboration. Andy, you were a postdoc research for a long time. The insight Harry had was that machine
at a group that was founded by Michael Sutton: Michael was a key player, but learning had lost the idea of agents that
Arbib at the University of Massachusetts the person who brought us together could learn; it had become pattern rec-
Amherst, known informally as the Brain is Harry Klopf. I found his work in the ognition. He was disappointed in that,
Theory Group. Rich, you were Andy’s library at Stanford (University), and I so he got the Air Force to fund the pro-
first Ph.D. student. You were tasked with wrote him a letter and met him, and gram at the University of Massachusetts.
exploring a hypothesis that neurons he suggested I apply to the University Barto: Harry felt that the idea of ho-
behaved like hedonists—that they were meostasis, which dominated thinking
self-interested agents working to maxi- about biological systems at the time,
mize pleasure and minimize pain. “Homeostatis is was not correct. Homeostasis is keeping
Barto: It was a kind of a crazy idea, things the same, and that’s just not what
because in those days, neurons were keeping things the intelligence is about. It’s about maxi-
treated as logic gates. Our job was to see same, and that’s just mizing, it’s about improving—
if the hypothesis was worth pursuing. Sutton: It’s about optimizing.
Rich joined as a graduate student about not what intelligence Barto: Interestingly, Marvin Minsky
a year after the project started, and we is about.” made the same comment in something
had five years where I didn’t have to he wrote, that homeostatic systems are
teach, Rich was taking classes, and the really good at doing nothing.
principal investigators left us alone. So
we had the freedom to explore the his- Perhaps we should take a moment to
tory of this idea and whether it made review some [C O NTINUED O N P. 94]

96 COMM UNICATIO NS O F THE AC M | J U NE 2025 | VO L . 68 | NO. 6


Today’s Research
Driving Tomorrow’s
Technology
The ACM Digital Library (DL) is the most
comprehensive research platform available
for computing and information technology
and includes the ongoing contributions of
the field’s most renowned researchers and
practitioners.
Each year, roughly 20,000 newly published
articles from ACM journals, magazines,
technical newsletters and annual conference
volumes are added to the DL’s complete full
text contents of more than 550,000 articles.
The DL also features the fully integrated
and comprehensive bibliographic index,
The Guide to Computing Literature—a
continually updated index featuring millions
of publication records from over 5,000
publishers worldwide.
For more information, please visit
https://libraries.acm.org/
or contact ACM at
[email protected]

You might also like