100% found this document useful (2 votes)
1K views529 pages

Concurrent Programming

Concurrent programming : concepts and principles description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
1K views529 pages

Concurrent Programming

Concurrent programming : concepts and principles description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 529

Concurrent Programming:

Algorithms, Principles, and Foundations


Michel Raynal

Concurrent Programming:
Algorithms, Principles,
and Foundations

123
Michel Raynal
Institut Universitaire de France
IRISA-ISTIC
Université de Rennes 1
Rennes Cedex
France

ISBN 978-3-642-32026-2 ISBN 978-3-642-32027-9 (eBook)


DOI 10.1007/978-3-642-32027-9
Springer Heidelberg New York Dordrecht London

Library of Congress Control Number: 2012944394

ACM Computing Classification (1998): F.1, D.1, B.3

 Springer-Verlag Berlin Heidelberg 2013


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed. Exempted from this legal reservation are brief
excerpts in connection with reviews or scholarly analysis or material supplied specifically for the
purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the
work. Duplication of this publication or parts thereof is permitted only under the provisions of
the Copyright Law of the Publisher’s location, in its current version, and permission for use must always
be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright
Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)


Preface

As long as the grass grows and the rivers flow….


From American Indians

Homo sum: humani nihil a me alienum puto.


In Heautontimoroumenos, Publius Terencius (194–129 BC)

… Ce jour-là j’ai bien cru tenir quelque chose et que ma vie s’en trouverait changée.
Mais rien de cette nature n’est définitivement acquis.
Comme une eau, le monde vous traverse et pour un temps vous prête ses couleurs.
Puis se retire et vous replace devant ce vide qu’on porte en soi, devant cette espèce
d’insuffisance centrale de l’âme qu’il faut bien apprendre à côtoyer, à combattre,
et qui, paradoxalement, est peut-être notre moteur le plus sûr.
In L’usage du monde (1963), Nicolas Bouvier (1929–1998)

v
vi Preface

What synchronization is

A concurrent program is a program made up of several entities (processes, peers,


sensors, nodes, etc.) that cooperate to a common goal. This cooperation is made
possible thanks to objects shared by the entities. These objects are called con-
current objects. Let us observe that a concurrent object can be seen as abstracting a
service shared by clients (namely, the cooperating entities).

A fundamental issue of computing science and computing engineering consists


in the design and the implementation of concurrent objects. In order that concurrent
objects remain always consistent, the entities have to synchronize their accesses to
these objects. Ensuring correct synchronization among a set of cooperating entities
is far from being a trivial task. We are no longer in the world of sequential
programming, and the approaches and methods used in sequential computing are of
little help when one has to design concurrent programs. Concurrent programming
requires not only great care but also knowledge of its scientific foundations.
Moreover, concurrent programming becomes particularly difficult when one has to
cope with failures of cooperating entities or concurrent objects.

Why this book?

Since the early work of E.W. Dijkstra (1965), who introduced the mutual exclu-
sion problem, the concept of a process, the semaphore object, the notion of a
weakest precondition, and guarded commands (among many other contributions),
synchronization is no longer a catalog of tricks but a domain of computing science
with its own concepts, mechanisms, and techniques whose results can be applied in
many domains. This means that process synchronization has to be a major topic of
any computer science curriculum.

This book is on synchronization and the implementation of concurrent objects.


It presents in a uniform and comprehensive way the major results that have been
produced and investigated in the past 30 years and have proved to be useful from
both theoretical and practical points of view. The book has been written first for
people who are not familiar with the topic and the concepts that are presented.
These include mainly:
• Senior-level undergraduate students and graduate students in computer science
or computer engineering, and graduate students in mathematics who are inter-
ested in the foundations of process synchronization.
• Practitioners and engineers who want to be aware of the state-of-the-art concepts,
basic principles, mechanisms, and techniques encountered in concurrent
programming and in the design of concurrent objects suited to shared memory
systems.
Preface vii

Prerequisites for this book include undergraduate courses on algorithms and


base knowledge on operating systems. Selections of chapters for undergraduate
and graduate courses are suggested in the section titled ‘‘How to Use This Book’’
in the Afterword.

Content

As stressed by its title, this book is on algorithms, base principles, and foundations
of concurrent objects and synchronization in shared memory systems, i.e., systems
where the entities communicate by reading and writing a common memory. (Such
a corpus of knowledge is becoming more and more important with the advent of
new technologies such as multicore architectures.)

The book is composed of six parts. Three parts are more focused on base
synchronization mechanisms and the construction of concurrent objects, while the
other three parts are more focused on the foundations of synchronization. (A
noteworthy feature of the book is that nearly all the algorithms that are presented
are proved.)
• Part I is on lock-based synchronization, i.e., on well-known synchronization
concepts, techniques, and mechanisms. It defines the most important synchro-
nization problem in reliable asynchronous systems, namely the mutual exclusion
problem (Chap. 1). It then presents several base approaches which have been
proposed to solve it with machine-level instructions (Chap. 2). It also presents
traditional approaches which have been proposed at a higher abstraction level to
solve synchronization problems and implement concurrent objects, namely the
concept of a semaphore and, at an even more abstract level, the concepts of
monitor and path expression (Chap. 3).

• After the reader has become familiar with base concepts and mechanisms suited
to classical synchronization in reliable systems, Part II, which is made up of a
single chapter, addresses a fundamental concept of synchronization; namely, it
presents and investigates the concept of atomicity and its properties. This allows
for the formalization of the notion of a correct execution of a concurrent pro-
gram in which processes cooperate by accessing shared objects (Chap. 4).

• Part I has implicitly assumed that the cooperating processes do not fail. Hence,
the question: What does happen when cooperating entities fail? This is the main
issue addressed in Part III (and all the rest of the book); namely, it considers that
cooperating entities can halt prematurely (crash failure). To face the net effect of
asynchrony and failures, it introduces the notions of mutex-freedom and asso-
ciated progress conditions such as obstruction-freedom, non-blocking, and wait-
freedom (Chap. 5).
viii Preface

The rest of Part III focuses on hybrid concurrent objects (Chap. 6), wait-free
implementations of paradigmatic concurrent objects such as counters and store-
collect objects (Chap. 7), snapshot objects (Chap. 8), and renaming objects
(Chap. 9).

• Part IV, which is made up of a single chapter, is on software transactional


memory systems. This is a relatively new approach whose aim is to simplify the
job of programmers of concurrent applications. The idea is that programmers
have to focus their efforts on which parts of their multiprocess programs have to
be executed atomically and not on the way atomicity has to be realized (Chap. 10).

• Part V returns to the foundations side. It shows how reliable atomic read/write
registers (shared variables) can be built from non-atomic bits. This part consists
of three chapters. Chapter 11 introduces the notions of safe register, regular
register, and atomic register. Then, Chap. 12 shows how to build an atomic bit
from a safe bit. Finally, Chap. 13 shows how an atomic register of any size can
be built from safe and atomic bits.
This part shows that, while atomic read/write registers are easier to use than safe
read/write registers, they are not more powerful from a computability point-of-
view.

• Part VI, which concerns also the foundations side, is on the computational
power of concurrent objects. It is made up of four chapters. It first introduces the
notion of a consensus object and shows that consensus objects are universal
objects (Chap. 14). This means that, as soon as a system provides us with atomic
read/write registers and consensus objects, it is possible to implement in a wait-
free manner any object defined from a sequential specification.
Part VI then introduces the notion of self-implementation and shows how atomic
registers and consensus objects can be built from base objects of the same type
which are not reliable (Chap. 15). Then, it presents the notion of a consensus
number and the associated consensus hierarchy which allows the computability
power of concurrent objects to be ranked (Chap. 16). Finally, the last chapter of
the book focuses on the wait-free implementation of consensus objects from
read/write registers and failure detectors (Chap. 17).
To have a more complete feeling of the spirit of this book, the reader can also
consult the section ‘‘What Was the Aim of This Book’’ in the Afterword) which
describes what it is hoped has been learned from this book. Each chapter starts
with a short presentation of its content and a list of keywords; it terminates with a
summary of the main points that have explained and developed. Each of the six
parts of the book is also introduced by a brief description of its aim and its
technical content.
Preface ix

Acknowledgments

This book originates from lecture notes for undergraduate and graduate courses on
process synchronization that I give at the University of Rennes (France) and, as an
invited professor, at several universities all over the world. I would like to thank
the students for their questions that, in one way or another, have contributed to this
book.

I want to thank my colleagues Armando Castañeda (UNAM, MX), Ajoy Datta


(UNLV, Nevada), Achour Mostéfaoui (Université de Nantes), and François Taiani
(Lancaster University, UK) for their careful reading of chapters of this book.
Thanks also to François Bonnet (JAIST, Kanazawa), Eli Gafni (UCLA), Damien
Imbs (IRISA, Rennes), Segio Rajsbaum (UNAM, MX), Matthieu Roy (LAAS,
Toulouse), and Corentin Travers (LABRI, Bordeaux) for long discussions on wait-
freedom. Special thanks are due to Rachid Guerraoui (EPFL), with whom I dis-
cussed numerous topics presented in this book (and many other topics) during the
past seven years. I would also like to thank Ph. Louarn (IRISA, Rennes) who was
my Latex man when writing this book, and Ronan Nugent (Springer) for his
support and his help in putting it all together.

Last but not least (and maybe most importantly), I also want to thank all the
researchers whose results are presented in this book. Without their work, this book
would not exist (Since I typeset the entire text myself (– for the text and xfig
for figures–), any typesetting or technical errors that remain are my responsibility.)

Michel Raynal
Professeur des Universités
Institut Universitaire de France
IRISA-ISTIC, Université de Rennes 1
Campus de Beaulieu, 35042 Rennes, France

September–November 2005 and June–October 2011


Rennes, Mont-Louis (ARCHI’11), Gdańsk (SIROCCO’11), Saint-Philibert,
Hong Kong (PolyU), Macau, Roma (DISC’11), Tirana (NBIS’11),
Grenoble (SSS’11), Saint-Grégoire, Douelle, Mexico City (UNAM).
Contents

Part I Lock-Based Synchronization

1 The Mutual Exclusion Problem. . . . . . . . . . . . . . . . . . . . . . . . . . 3


1.1 Multiprocess Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 The Concept of a Sequential Process. . . . . . . . . . . . 3
1.1.2 The Concept of a Multiprocess Program . . . . . . . . . 4
1.2 Process Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Processors and Processes . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Synchronization: Competition. . . . . . . . . . . . . . . . . 5
1.2.4 Synchronization: Cooperation. . . . . . . . . . . . . . . . . 7
1.2.5 The Aim of Synchronization
Is to Preserve Invariants . . . . . . . . . . . . . . . . . . . . 7
1.3 The Mutual Exclusion Problem . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 The Mutual Exclusion Problem (Mutex) . . . . . . . . . 9
1.3.2 Lock Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Three Families of Solutions . . . . . . . . . . . . . . . . . . 12
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Solving Mutual Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15


2.1 Mutex Based on Atomic Read/Write Registers . . . . . . . . . .. 15
2.1.1 Atomic Register . . . . . . . . . . . . . . . . . . . . . . . . .. 15
2.1.2 Mutex for Two Processes:
An Incremental Construction . . . . . . . . . . . . . . . .. 17
2.1.3 A Two-Process Algorithm . . . . . . . . . . . . . . . . . .. 19
2.1.4 Mutex for n Processes:
Generalizing the Previous Two-Process Algorithm. .. 22
2.1.5 Mutex for n Processes:
A Tournament-Based Algorithm . . . . . . . . . . . . . .. 26
2.1.6 A Concurrency-Abortable Algorithm. . . . . . . . . . .. 29

xi
xii Contents

2.1.7 A Fast Mutex Algorithm . . . . . . . . . . . . . . . . . . . . 33


2.1.8 Mutual Exclusion in a Synchronous System . . . . . . . 37
2.2 Mutex Based on Specialized Hardware Primitives . . . . . . . . 38
2.2.1 Test&Set, Swap and Compare&Swap . . . . . . . . . . . 39
2.2.2 From Deadlock-Freedom to Starvation-Freedom . . . . 40
2.2.3 Fetch&Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3 Mutex Without Atomicity . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.1 Safe, Regular and Atomic Registers . . . . . . . . . . . . 45
2.3.2 The Bakery Mutex Algorithm . . . . . . . . . . . . . . . . 48
2.3.3 A Bounded Mutex Algorithm . . . . . . . . . . . . . . . . . 53
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 Lock-Based Concurrent Objects . . . . . . . . . . . . . . . . . . . . . . . . . 61


3.1 Concurrent Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Concurrent Object. . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.2 Lock-Based Implementation . . . . . . . . . . . . . . . . . . 62
3.2 A Base Synchronization Object: the Semaphore . . . . . . . . . . 63
3.2.1 The Concept of a Semaphore . . . . . . . . . . . . . . . . . 63
3.2.2 Using Semaphores to Solve
the Producer-Consumer Problem. . . . . . . . . ...... 65
3.2.3 Using Semaphores to Solve
a Priority Scheduling Problem . . . . . . . . . . ...... 71
3.2.4 Using Semaphores to Solve
the Readers-Writers Problem . . . . . . . . . . . ...... 74
3.2.5 Using a Buffer to Reduce Delays
for Readers and Writers. . . . . . . . . . . . . . . . . . . . . 78
3.3 A Construct for Imperative Languages: the Monitor . . . . . . . 81
3.3.1 The Concept of a Monitor . . . . . . . . . . . . . . . . . . . 82
3.3.2 A Rendezvous Monitor Object . . . . . . . . . . . . . . . . 83
3.3.3 Monitors and Predicates. . . . . . . . . . . . . . . . . . . . . 85
3.3.4 Implementing a Monitor from Semaphores . . . . . . . 87
3.3.5 Monitors for the Readers-Writers Problem . . . . . . . . 89
3.3.6 Scheduled Wait Operation . . . . . . . . . . . . . . . . . . . 94
3.4 Declarative Synchronization: Path Expressions . . . . . . . . . . . 95
3.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.4.2 Using Path Expressions to Solve
Synchronization Problems . . . . . . . . . . . . . ...... 97
3.4.3 A Semaphore-Based Implementation
of Path Expressions. . . . . . . . . . . . . . . . . . . . . . . . 98
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.7 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Contents xiii

Part II On the Foundations Side: The Atomicity Concept

4 Atomicity: Formal Definition and Properties . . . . . . . . . . . . . . . . 113


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2 Computation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.2.1 Processes and Operations. . . . . . . . . . . . . . . . . . . . 115
4.2.2 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2.3 Histories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.2.4 Sequential History. . . . . . . . . . . . . . . . . . . . . . . . . 119
4.3 Atomicity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.3.1 Legal History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.3.2 The Case of Complete Histories . . . . . . . . . . . . . . . 121
4.3.3 The Case of Partial Histories . . . . . . . . . . . . . . . . . 123
4.4 Object Composability and Guaranteed
Termination Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.4.1 Atomic Objects Compose for Free . . . . . . . . . . . . . 125
4.4.2 Guaranteed Termination . . . . . . . . . . . . . . . . . . . . 127
4.5 Alternatives to Atomicity. . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.5.1 Sequential Consistency . . . . . . . . . . . . . . . . . . . . . 128
4.5.2 Serializability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Part III Mutex-Free Synchronization

5 Mutex-Free Concurrent Objects . . . . . . . . . . . . . . . . . . . . . . . . . 135


5.1 Mutex-Freedom and Progress Conditions . . . . . . . . . . . . . . . 135
5.1.1 The Mutex-Freedom Notion . . . . . . . . . . . . . . . . . . 135
5.1.2 Progress Conditions . . . . . . . . . . . . . . . . . . . . . . . 137
5.1.3 Non-blocking with Respect to Wait-Freedom . . . . . . 140
5.2 Mutex-Free Concurrent Objects . . . . . . . . . . . . . . . . . . . . . 140
5.2.1 The Splitter: A Simple Wait-Free Object from
Read/Write Registers. . . . . . . . . . . . . . . . . . . . . .. 140
5.2.2 A Simple Obstruction-Free Object from
Read/Write Registers. . . . . . . . . . . . . . . . . . . . . .. 143
5.2.3 A Remark on Compare&Swap: The ABA Problem. .. 145
5.2.4 A Non-blocking Queue Based on
Read/Write Registers and Compare&Swap . . . . . .. 146
5.2.5 A Non-blocking Stack Based on
Compare&Swap Registers . . . . . . . . . . . . . . . . . .. 150
5.2.6 A Wait-Free Stack Based on
Fetch&Add and Swap Registers . . . . . . . . . . . . . .. 152
xiv Contents

5.3 Boosting Obstruction-Freedom to Stronger Progress


in the Read/Write Model . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.3.1 Failure Detectors . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.3.2 Contention Managers for Obstruction-Free
Object Implementations . . . . . . . . . . . . . . . . . . . . . 157
5.3.3 Boosting Obstruction-Freedom to Non-blocking . . . . 158
5.3.4 Boosting Obstruction-Freedom to Wait-Freedom . . . 159
5.3.5 Mutex-Freedom Versus Loops Inside a Contention
Manager Operation . . . . . . . . . . . . . . . . . . . . . . . . 161
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.6 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

6 Hybrid Concurrent Objects . . . . . . . . . . . . . . . . . . . . . . . . . ... 165


6.1 The Notion of a Hybrid Implementation . . . . . . . . . . . . ... 165
6.1.1 Lock-Based Versus Mutex-Free Operation:
Static Hybrid Implementation. . . . . . . . . . . . . . ... 166
6.1.2 Contention Sensitive (or Dynamic Hybrid)
Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.1.3 The Case of Process Crashes . . . . . . . . . . . . . . . . . 166
6.2 A Static Hybrid Implementation of a Concurrent Set Object . . . 167
6.2.1 Definition and Assumptions . . . . . . . . . . . . . . . . . . 167
6.2.2 Internal Representation and Operation
Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.2.3 Properties of the Implementation . . . . . . . . . . . . . . 171
6.3 Contention-Sensitive Implementations . . . . . . . . . . . . . . . . . 172
6.3.1 Contention-Sensitive Binary Consensus . . . . . . . . . . 172
6.3.2 A Contention Sensitive Non-blocking
Double-Ended Queue . . . . . . . . . . . . . . . . . . . ... 176
6.4 The Notion of an Abortable Object . . . . . . . . . . . . . . . . ... 181
6.4.1 Concurrency-Abortable Object . . . . . . . . . . . . . ... 181
6.4.2 From a Non-blocking Abortable Object
to a Starvation-Free Object . . . . . . . . . . . . . . . . . . 183
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.7 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

7 Wait-Free Objects from Read/Write Registers Only . . . . . . . . . . 189


7.1 A Wait-Free Weak Counter for Infinitely Many Processes. . . 189
7.1.1 A Simple Counter Object. . . . . . . . . . . . . . . . . . . . 190
7.1.2 Weak Counter Object for Infinitely Many Processes. . . 191
7.1.3 A One-Shot Weak Counter Wait-Free Algorithm . . . 193
7.1.4 Proof of the One-Shot Implementation . . . . . . . . . . 194
7.1.5 A Multi-Shot Weak Counter Wait-Free Algorithm . . . 199
Contents xv

7.2 Store-Collect Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201


7.2.1 Store-Collect Object: Definition . . . . . . . . . . . . . . . 201
7.2.2 An Adaptive Store-Collect Implementation . . . . . . . 204
7.2.3 Proof and Cost of the Adaptive Implementation . . . . 208
7.3 Fast Store-Collect Object . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.3.1 Fast Store-Collect Object: Definition. . . . . . . . . . . . 211
7.3.2 A Fast Algorithm for the store collectðÞ Operation . . . 212
7.3.3 Proof of the Fast Store-Collect Algorithm . . . . . . . . 215
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.6 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

8 Snapshot Objects from Read/Write Registers Only . . . . . . . . . . . 219


8.1 Snapshot Objects: Definition . . . . . . . . . . . . . . . . . . . . . . . 219
8.2 Single-Writer Snapshot Object . . . . . . . . . . . . . . . . . . . . . . 220
8.2.1 An Obstruction-Free Implementation. . . . . . . . . . . . 221
8.2.2 From Obstruction-Freedom to Bounded
Wait-Freedom . . . . . . . . . . . . . . . . . . . . . . . . . .. 223
8.2.3 One-Shot Single-Writer Snapshot Object:
Containment Property . . . . . . . . . . . . . . . . . . . . .. 227
8.3 Single-Writer Snapshot Object with Infinitely
Many Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 228
8.4 Multi-Writer Snapshot Object. . . . . . . . . . . . . . . . . . . . . .. 230
8.4.1 The Strong Freshness Property . . . . . . . . . . . . . . .. 231
8.4.2 An Implementation of a Multi-Writer
Snapshot Object . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.4.3 Proof of the Implementation. . . . . . . . . . . . . . . . . . 234
8.5 Immediate Snapshot Objects . . . . . . . . . . . . . . . . . . . . . . . 238
8.5.1 One-Shot Immediate Snapshot Object: Definition . . . 238
8.5.2 One-Shot Immediate Snapshot Versus
One-Shot Snapshot . . . . . . . . . . . . . . . . . . . . . . .. 238
8.5.3 An Implementation of One-Shot
Immediate Snapshot Objects . . . . . . . . . . . . . . . .. 240
8.5.4 A Recursive Implementation of a One-Shot
Immediate Snapshot Object . . . . . . . . . . . . . . . . . . 244
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.8 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
xvi Contents

9 Renaming Objects from Read/Write Registers Only . . . . . . . . . . 249


9.1 Renaming Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
9.1.1 The Base Renaming Problem . . . . . . . . . . . . . . . . . 249
9.1.2 One-Shot Renaming Object . . . . . . . . . . . . . . . . . . 250
9.1.3 Adaptive Implementations . . . . . . . . . . . . . . . . . . . 250
9.1.4 A Fundamental Result . . . . . . . . . . . . . . . . . . . . . . 251
9.1.5 Long-Lived Renaming. . . . . . . . . . . . . . . . . . . . . . 252
9.2 Non-triviality of the Renaming Problem . . . . . . . . . . . . . . . 252
9.3 A Splitter-Based Optimal Time-Adaptive Implementation . . . 254
9.4 A Snapshot-Based Optimal Size-Adaptive Implementation. . . 256
9.4.1 A Snapshot-Based Implementation . . . . . . . . . . . . . 256
9.4.2 Proof of the Implementation. . . . . . . . . . . . . . . . . . 258
9.5 Recursive Store-Collect-Based Size-Adaptive
Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.5.1 A Recursive Renaming Algorithm . . . . . . . . . . . . . 259
9.5.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
9.5.3 Proof of the Renaming Implementation . . . . . . . . . . 263
9.6 Variant of the Previous Recursion-Based
Renaming Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 266
9.6.1 A Renaming Implementation Based
on Immediate Snapshot Objects . . . . . . . . . . . . . .. 266
9.6.2 An Example of a Renaming Execution . . . . . . . . .. 268
9.7 Long-Lived Perfect Renaming Based
on Test&Set Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
9.7.1 Perfect Adaptive Renaming . . . . . . . . . . . . . . . . . . 269
9.7.2 Perfect Long-Lived Test&Set-Based Renaming . . . . 270
9.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
9.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
9.10 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

Part IV The Transactional Memory Approach

10 Transactional Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277


10.1 What Are Software Transactional Memories . . . . . . . . . . . . 277
10.1.1 Transactions = High-Level Synchronization . . . . . . . 277
10.1.2 At the Programming Level. . . . . . . . . . . . . . . . . . . 279
10.2 STM System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
10.2.1 Speculative Executions, Commit
and Abort of a Transaction . . . . . . . . . . . . . . . . . . 281
10.2.2 An STM Consistency Condition: Opacity . . . . . . . . 282
10.2.3 An STM Interface. . . . . . . . . . . . . . . . . . . . . . . . . 282
10.2.4 Incremental Reads and Deferred Updates. . . . . . . . . 283
Contents xvii

10.2.5 Read-Only Versus Update Transactions . . . . . . . ... 283


10.2.6 Read Invisibility . . . . . . . . . . . . . . . . . . . . . . . ... 284
10.3 A Logical Clock-Based STM System: TL2 . . . . . . . . . . ... 284
10.3.1 Underlying System and Control Variables
of the STM System . . . . . . . . . . . . . . . . . . . . . ... 284
10.3.2 Underlying Principle: Consistency
with Respect to Transaction Birth Date . . . . . . . . . . 285
10.3.3 The Implementation of an Update Transaction . . . . . 286
10.3.4 The Implementation of a Read-Only Transaction . . . 288
10.4 A Version-Based STM System: JVSTM . . . . . . . . . . . . . . . 289
10.4.1 Underlying and Control Variables
of the STM System . . . . . . . . . . . . . . . . . . . . . . . . 290
10.4.2 The Implementation of an Update Transaction . . . . . 291
10.4.3 The Implementation of a Read-Only Transaction . . . 293
10.5 A Vector Clock-Based STM System . . . . . . . . . . . . . . . . . . 293
10.5.1 The Virtual World Consistency Condition . . . . . . . . 293
10.5.2 An STM System for Virtual World Consistency . . . . 295
10.5.3 The Algorithms Implementing
the STM Operations . . . . . . . . . . . . . . . . . . . . . . . 296
10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
10.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 300

Part V On the Foundations Side:


From Safe Bits to Atomic Registers

11 Safe, Regular, and Atomic Read/Write Registers . . . . . . . . . . . . . 305


11.1 Safe, Regular, and Atomic Registers . . . . . . . . . . . . . . . . . . 305
11.1.1 Reminder: The Many Faces of a Register . . . . . . . . 305
11.1.2 From Regularity to Atomicity: A Theorem . . . . . . . 308
11.1.3 A Fundamental Problem:
The Construction of Registers . . . . . . . . . . . . . . .. 310
11.2 Two Very Simple Bounded Constructions . . . . . . . . . . . . .. 311
11.2.1 Safe/Regular Registers:
From Single-Reader to Multi-Reader. . . . . . . . . . .. 311
11.2.2 Binary Multi-Reader Registers:
From Safe to Regular . . . . . . . . . . . . . . . . . . . . . . 313
11.3 From Bits to b-Valued Registers. . . . . . . . . . . . . . . . . . . . . 314
11.3.1 From Safe Bits to b-Valued Safe Registers . . . . . . . 314
11.3.2 From Regular Bits to Regular b-Valued Registers. . . 315
11.3.3 From Atomic Bits to Atomic b-Valued Registers . . . 319
xviii Contents

11.4 Three Unbounded Constructions . . . . . . . . . . ........... 321


11.4.1 SWSR Registers:
From Unbounded Regular to Atomic. ........... 322
11.4.2 Atomic Registers:
From Unbounded SWSR to SWMR . ........... 324
11.4.3 Atomic Registers:
From Unbounded SWMR to MWMR ........... 325
11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . ........... 327
11.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . ........... 327

12 From Safe Bits to Atomic Bits:


Lower Bound and Optimal Construction . . . . . . . . . . . . . . . . . . . 329
12.1 A Lower Bound Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 329
12.1.1 Two Preliminary Lemmas . . . . . . . . . . . . . . . . . . . 330
12.1.2 The Lower Bound Theorem . . . . . . . . . . . . . . . . . . 331
12.2 A Construction of an Atomic Bit from Three Safe Bits . . . . . 334
12.2.1 Base Architecture of the Construction . . . . . . . . . . . 334
12.2.2 Underlying Principle and Signaling Scheme. . . . . . . 335
12.2.3 The Algorithm Implementing
the Operation R:writeðÞ . . . . . . . . . . . . . . . . . .... 336
12.2.4 The Algorithm Implementing
the Operation R:readðÞ . . . . . . . . . . . . . . . . . . . . . 336
12.2.5 Cost of the Construction . . . . . . . . . . . . . . . . . . . . 338
12.3 Proof of the Construction of an Atomic Bit . . . . . . . . . . . . . 338
12.3.1 A Preliminary Theorem . . . . . . . . . . . . . . . . . . . . . 338
12.3.2 Proof of the Construction. . . . . . . . . . . . . . . . . . . . 340
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
12.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
12.6 Exercise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

13 Bounded Constructions of Atomic b-Valued Registers . . . . . . ... 347


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 347
13.2 A Collision-Free (Pure Buffers) Construction . . . . . . . . . ... 349
13.2.1 Internal Representation of the Atomic b-Valued
Register R . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 349
13.2.2 Underlying Principle: Two-Level Switch
to Ensure Collision-Free Accesses to Buffers . . . ... 349
13.2.3 The Algorithms Implementing
the Operations R:writeðÞ and R:readðÞ . . . . . . . . . . 350
13.2.4 Proof of the Construction: Collision-Freedom. . . . . . 352
13.2.5 Correctness Proof . . . . . . . . . . . . . . . . . . . . . . . . . 355
13.3 A Construction Based on Impure Buffers . . . . . . . . . . . . . . 357
13.3.1 Internal Representation of the Atomic b-Valued
Register R . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 357
Contents xix

13.3.2 An Incremental Construction . . . . . . . .......... 358


13.3.3 The Algorithms Implementing
the Operations R:writeðÞ and R:readðÞ .......... 360
13.3.4 Proof of the Construction. . . . . . . . . . .......... 360
13.3.5 From SWSR to SWMR b-Valued
Atomic Register . . . . . . . . . . . . . . . . .......... 367
13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... 368
13.5 Bibliographic Notes . . . . . . . . . . . . . . . . . . . .......... 368

Part VI On the Foundations Side:


The Computability Power of Concurrent Objects (Consensus)

14 Universality of Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 371


14.1 Universal Object, Universal Construction,
and Consensus Object . . . . . . . . . . . . . . . . . . . . . . . . . ... 371
14.1.1 Universal (Synchronization) Object
and Universal Construction . . . . . . . . . . . . . . . . . . 371
14.1.2 The Notion of a Consensus Object . . . . . . . . . . . . . 372
14.2 Inputs and Base Principles of Universal Constructions . . . . . 373
14.2.1 The Specification of the Constructed Object . . . . . . 373
14.2.2 Base Principles of Universal Constructions . . . . . . . 374
14.3 An Unbounded Wait-Free Universal Construction. . . . . . . . . 374
14.3.1 Principles and Description of the Construction . . . . . 375
14.3.2 Proof of the Construction. . . . . . . . . . . . . . . . . . . . 378
14.3.3 Non-deterministic Objects . . . . . . . . . . . . . . . . . . . 382
14.3.4 Wait-Freedom Versus Bounded Wait-Freedom . . . . . 383
14.4 A Bounded Wait-Free Universal Construction . . . . . . . . . . . 384
14.4.1 Principles of the Construction . . . . . . . . . . . . . . . . 384
14.4.2 Proof of the Construction. . . . . . . . . . . . . . . . . . . . 388
14.4.3 Non-deterministic Objects . . . . . . . . . . . . . . . . . . . 391
14.5 From Binary Consensus to Multi-Valued Consensus . . . . . . . 391
14.5.1 A Construction Based on the Bit Representation
of Proposed Values . . . . . . . . . . . . . . . . . . . . . . . . 392
14.5.2 A Construction for Unbounded Proposed Values . . . 394
14.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
14.7 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
14.8 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

15 The Case of Unreliable Base Objects. . . . . . . . . . . . . . . . . ..... 399


15.1 Responsive Versus Non-responsive Crash Failures . . . ..... 400
15.2 SWSR Registers Prone to Crash Failures . . . . . . . . . . ..... 400
15.2.1 Reliable Register When Crash Failures
Are Responsive: An Unbounded Construction ..... 401
xx Contents

15.2.2 Reliable Register When Crash Failures Are


Responsive: A Bounded Construction . . . . . . . . . . . 403
15.2.3 Reliable Register When Crash Failures Are Not
Responsive: An Unbounded Construction . . . . . . . . 406
15.3 Consensus When Crash Failures Are Responsive:
A Bounded Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 408
15.3.1 The ‘‘Parallel Invocation’’ Approach
Does Not Work . . . . . . . . . . . . . . . . . . . . . . . . . . 408
15.3.2 A t-Tolerant Wait-Free Construction . . . . . . . . . . . . 409
15.3.3 Consensus When Crash Failures Are Not Responsive:
An Impossibility . . . . . . . . . . . . . . . . . . . . . . . . . . 410
15.4 Omission and Arbitrary Failures . . . . . . . . . . . . . . . . . . . . . 410
15.4.1 Object Failure Modes . . . . . . . . . . . . . . . . . . . . . . 410
15.4.2 Simple Examples . . . . . . . . . . . . . . . . . . . . . . . . . 412
15.4.3 Graceful Degradation . . . . . . . . . . . . . . . . . . . . . . 413
15.4.4 Fault-Tolerance Versus Graceful Degradation . . . . . 417
15.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
15.6 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
15.7 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 419

16 Consensus Numbers and the Consensus Hierarchy . . . . . . . . . . . 421


16.1 The Consensus Number Notion . . . . . . . . . . . . . . . . . . . . . 421
16.2 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
16.2.1 Schedule, Configuration, and Valence . . . . . . . . . . . 422
16.2.2 Bivalent Initial Configuration . . . . . . . . . . . . . . . . . 423
16.3 The Weak Wait-Free Power of Atomic Registers . . . . . . . . . 425
16.3.1 The Consensus Number of Atomic
Read/Write Registers Is 1 . . . . . . . . . . . . . . . . . . . 425
16.3.2 The Wait-Free Limit of Atomic Registers . . . . . . . . 428
16.4 Objects Whose Consensus Number Is 2. . . . . . . . . . . . . . . . 429
16.4.1 Consensus from Test&Set Objects . . . . . . . . . . . . . 429
16.4.2 Consensus from Queue Objects . . . . . . . . . . . . . . . 431
16.4.3 Consensus from Swap Objects . . . . . . . . . . . . . . . . 432
16.4.4 Other Objects for Wait-Free Consensus
in a System of Two Processes . . . . . . . . . . . . . . . . 432
16.4.5 Power and Limit of the Previous Objects. . . . . . . . . 433
16.5 Objects Whose Consensus Number Is þ1 . . . . . . . . . . . . . 438
16.5.1 Consensus from Compare&Swap Objects . . . . . . . . 439
16.5.2 Consensus from Mem-to-Mem-Swap Objects . . . . . . 440
16.5.3 Consensus from an Augmented Queue . . . . . . . . . . 442
16.5.4 From a Sticky Bit to Binary Consensus . . . . . . . . . . 442
16.5.5 Impossibility Result . . . . . . . . . . . . . . . . . . . . . . . 443
Contents xxi

16.6 Hierarchy of Atomic Objects . . . . . . ......... . . . . . . . . 443


16.6.1 From Consensus Numbers to a Hierarchy . . . . . . . . 443
16.6.2 On Fault Masking . . . . . . . . ......... . . . . . . . . 444
16.6.3 Robustness of the Hierarchy. ......... . . . . . . . . 445
16.7 Summary . . . . . . . . . . . . . . . . . . . . ......... . . . . . . . . 445
16.8 Bibliographic Notes . . . . . . . . . . . . ......... . . . . . . . . 445
16.9 Exercises and Problems . . . . . . . . . . ......... . . . . . . . . 446

17 The Alpha(s) and Omega of Consensus:


Failure Detector-Based Consensus. . . . . . . . . . . . . . . . . . . . . . . . 449
17.1 De-constructing Compare&Swap . . . . . . . . . . . . . . . . . . . . 450
17.2 A Liveness-Oriented Abstraction: The Failure Detector X . . . 452
17.2.1 Definition of X. . . . . . . . . . . . . . . . . . . . . . . . . . . 452
17.2.2 X-Based Consensus:
X as a Resource Allocator or a Scheduler . . . . . . .. 453
17.3 Three Safety-Oriented Abstractions:
Alpha1 , Alpha2 , and Alpha3 . . . . . . . . . . . . . . . . . . . . . . . . 454
17.3.1 A Round-Free Abstraction: Alpha1 . . . . . . . . . . . . . 454
17.3.2 A Round-Based Abstraction: Alpha2 . . . . . . . . . . . . 455
17.3.3 Another Round-Free Abstraction: Alpha3 . . . . . . . . . 456
17.3.4 The Rounds Seen as a Resource . . . . . . . . . . . . . . . 457
17.4 X-Based Consensus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
17.4.1 Consensus from Alpha1 Objects and X . . . . . . . . . . 457
17.4.2 Consensus from an Alpha2 Object and X. . . . . . . . . 459
17.4.3 Consensus from an Alpha3 Object and X. . . . . . . . . 460
17.4.4 When the Eventual Leader Elected by X
Does Not Participate . . . . . . . . . . . . . . . . . . . . . .. 463
17.4.5 The Notion of an Indulgent Algorithm . . . . . . . . .. 464
17.4.6 Consensus Object Versus X . . . . . . . . . . . . . . . . .. 464
17.5 Wait-Free Implementations of the Alpha1 and Alpha2
Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 465
17.5.1 Alpha1 from Atomic Registers . . . . . . . . . . . . . . .. 465
17.5.2 Alpha2 from Regular Registers . . . . . . . . . . . . . . .. 467
17.6 Wait-Free Implementations of the Alpha2 Abstraction
from Shared Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
17.6.1 Alpha2 from Unreliable Read/Write Disks . . . . . . . . 472
17.6.2 Alpha2 from Active Disks . . . . . . . . . . . . . . . . . . . 476
17.7 Implementing X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
17.7.1 The Additional Timing Assumption EWB . . . . . . . . 478
17.7.2 An EWB-Based Implementation of X . . . . . . . . . . . 479
17.7.3 Proof of the Construction. . . . . . . . . . . . . . . . . . . . 481
17.7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
xxii Contents

17.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485


17.9 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
17.10 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 486

Afterword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Notation

No-op No operation
Process Program in action
n Number of processes
Correct process Process that does not crash during
an execution
Faulty process Process that crashes during an execution
Concurrent object Object shared by several processes
AA½1::m Array with m entries
ha; bi Pair with two elements a and b
Mutex Mutual exclusion
Read/write register Synonym of read/write variable
SWSR Single-writer/single-reader (register)
SWMR Single-writer/multi-reader (register)
MWSR Multi-writer/single-reader (register)
SWMR Single-writer/multi-reader (register)
ABCD Identifiers in italics upper case letters:
shared objects
abcd Identifiers in italics lower case letters:
local variables
"X Pointer to object X
P# Object pointed to by the pointer P
AA½1::s, (a½1::s) Shared (local) array of size s
for each i 2 f1; :::; mg do statements end for Order irrelevant
for each i from 1 to m do statements end for Order relevant
wait ðPÞ while :P do no-op end while
return ðvÞ Returns v and terminates the operation
invocation
% blablabla % Comments
; Sequentiality operator between two
statements

xxiii
Figures and Algorithms

1.1 Operations to access a disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


1.2 An interleaving of invocations to disk primitives. . . . . . . . . . . . . 6
1.3 Synchronization is to preserve invariants . . . . . . . . . . . . . . . . . . 8
1.4 Invariant expressed with control flows . . . . . . . . . . . . . . . . . . . . 8
1.5 Sequential specification of a lock object LOCK. . . . . . . . . . . . . . 12

2.1 An atomic register execution. . . . . . . . . . . . . . . . . . . . . . . . ... 16


2.2 Peterson’s algorithm for two processes:
first component (code for pi). . . . . . . . . . . . . . . . . . . . . . . . ... 18
2.3 Peterson’s algorithm for two processes:
second component (code for pi). . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Peterson’s algorithm for two processes (code for pi) . . . . . . . . . . 20
2.5 Mutex property of Peterson’s two-process algorithm (part 1) . . . . 21
2.6 Mutex property of Peterson’s two-process algorithm (part 2) . . . . 21
2.7 Bounded bypass property of Peterson’s two-process algorithm . . . 22
2.8 Peterson’s algorithm for n processes (code for pi) . . . . . . . . . . . . 22
2.9 Total order on read/write operations . . . . . . . . . . . . . . . . . . . . . 24
2.10 A tournament tree for n processes . . . . . . . . . . . . . . . . . . . . . . . 27
2.11 Tournament-based mutex algorithm (code for pi) . . . . . . . . . . . . 28
2.12 An n-process concurrency-abortable operation (code for pi) . . . . . 30
2.13 Access pattern to X and Y for a successful conc_abort_op()
invocation by process pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.14 Lamport’s fast mutex algorithm (code for pi) . . . . . . . . . . . . . . . 33
2.15 Fischer’s synchronous mutex algorithm (code for pi). . . . . . . . . . 37
2.16 Accesses to X by a process pj . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.17 Test&set-based mutual exclusion . . . . . . . . . . . . . . . . . . . . . . . 39
2.18 Swap-based mutual exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.19 Compare&swap-based mutual exclusion . . . . . . . . . . . . . . . . . . 41
2.20 From deadlock-freedom to starvation-freedom (code for pi) . . . . . 42
2.21 A possible case when going from deadlock-freedom
to starvation-freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 43

xxv
xxvi Figures and Algorithms

2.22 Fetch&add-based mutual exclusion . . . . . . . . . . . . . . . . . . . . . . 45


2.23 An execution of a regular register . . . . . . . . . . . . . . . . . . . . . . . 46
2.24 An execution of a register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.25 Lamport’s bakery mutual exclusion algorithm . . . . . . . . . . . . . . 49
2.26 The two cases where pj updates the safe register FLAG[j] . . . . . . 51
2.27 Aravind’s mutual exclusion algorithm . . . . . . . . . . . . . . . . . . . . 54
2.28 Relevant time instants in Aravind’s algorithm . . . . . . . . . . . . . . 55

3.1 From a sequential stack to a concurrent stack: structural view . . . 62


3.2 From a sequential to a concurrent stack (code for pi). . . . . . . . . . 63
3.3 Implementing a semaphore (code for pi). . . . . . . . . . . . . . . . . . . 65
3.4 A semaphore-based implementation of a buffer. . . . . . . . . . . . . . 66
3.5 A production/consumption cycle . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6 Behavior of the flags FULL[x] and EMPTY[x] . . . . . . . . . . . . . . 69
3.7 An efficient semaphore-based implementation of a buffer . . . . . . 70
3.8 Blackboard and sleeping rooms . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.9 Resource allocation with priority (code for process pi) . . . . . . . . . 73
3.10 From a sequential file to a concurrent file . . . . . . . . . . . . . . . . . 75
3.11 Readers-writers with weak priority to the readers . . . . . . . . . . . . 76
3.12 Readers-writers with strong priority to the readers . . . . . . . . . . . 77
3.13 Readers-writers with priority to the writers . . . . . . . . . . . . . . . . 78
3.14 One writer and several readers from producer-consumer . . . . . . . 79
3.15 Efficiency gain and mutual exclusion requirement . . . . . . . . . . . 80
3.16 Several writers and several readers from producer-consumer . . . . 81
3.17 A register-based rendezvous object . . . . . . . . . . . . . . . . . . . . . . 83
3.18 A monitor-based rendezvous object . . . . . . . . . . . . . . . . . . . . . . 84
3.19 A simple single producer/single consumer monitor . . . . . . . . . . . 85
3.20 Predicate transfer inside a monitor . . . . . . . . . . . . . . . . . . . . . . 86
3.21 Base objects to implement a monitor. . . . . . . . . . . . . . . . . . . . . 87
3.22 Semaphore-based implementation of a monitor . . . . . . . . . . . . . . 88
3.23 A readers-writers monitor with strong priority to the readers . . . . 91
3.24 A readers-writers monitor with strong priority to the writers . . . . 92
3.25 The fairness properties P1 and P2 . . . . . . . . . . . . . . . . . . . . . . . 93
3.26 A readers-writers monitor with fairness properties. . . . . . . . . . . . 94
3.27 A monitor based on a scheduled wait operation . . . . . . . . . . . . . 95
3.28 A buffer for a single producer and a single consumer . . . . . . . . . 98
3.29 Operations prio_down() and prio_up() . . . . . . . . . . . . . . . . . . . . . 99
3.30 Derivation tree for a path expression . . . . . . . . . . . . . . . . . . . . . 100
3.31 Control prefixes and suffixes automatically generated . . . . . . . . . 101
3.32 A variant of a semaphore-based implementation of a buffer . . . . . 103
3.33 Two buffer implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.34 A readers-writers implementation . . . . . . . . . . . . . . . . . . . . . . . 105
3.35 Railways example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.36 Another readers-writers implementation . . . . . . . . . . . . . . . . . . . 108
Figures and Algorithms xxvii

4.1 A sequential execution of a queue object . . . . . . . . . . . . . . . . . . 114


4.2 A concurrent execution of a queue object . . . . . . . . . . . . . . . . . . 114
4.3 Structural view of a system. . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.4 Example of a history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.5 Linearization points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.6 Two ways of completing a history . . . . . . . . . . . . . . . . . . . . . . . 124
4.7 Atomicity allows objects to compose for free . . . . . . . . . . . . . . . 127
4.8 A sequentially consistent history . . . . . . . . . . . . . . . . . . . . . . . . 129
4.9 Sequential consistency is not a local property . . . . . . . . . . . . . . . 130

5.1 Interleaving at the implementation level . . . . . . . . . . . . . . . .... 137


5.2 Splitter object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 141
5.3 Wait-free implementation of a splitter object
(code for process pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 142
5.4 On the modification of LAST . . . . . . . . . . . . . . . . . . . . . . .... 143
5.5 Obstruction-free implementation of a timestamp object
(code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.6 A typical use of compare&swap() by a process . . . . . . . . . . . . . . 145
5.7 The list implementing the queue . . . . . . . . . . . . . . . . . . . . . . . . 146
5.8 Initial state of the list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.9 Michael & Scott’s non-blocking implementation of a queue . . . . . 148
5.10 Shafiei’s non-blocking atomic stack . . . . . . . . . . . . . . . . . . . . . . 152
5.11 A simple wait-free implementation of an atomic stack . . . . . . . . . 153
5.12 On the linearization points of the wait-free stack. . . . . . . . . . . . . 154
5.13 Boosting obstruction-freedom . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.14 A contention-based enrichment of an obstruction-free
implementation (code for pi) . . . . . . . . . . . . . . . . . . . . . . . .... 158
5.15 A contention manager to boost obstruction-freedom
to non-blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 159
5.16 A contention manager to boost obstruction-freedom
to wait-freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 160

6.1 The initial state of the list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168


6.2 Hybrid implementation of a concurrent set object . . . . . . . . . . . . 169
6.3 The remove() operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.4 The add() operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.5 The contain() operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.6 A contention sensitive implementation of a binary
consensus object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.7 Proof of the contention sensitive consensus algorithm (a) . . . . . . . 174
6.8 Proof of the contention sensitive consensus algorithm (b). . . . . . . 175
6.9 A double-ended queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.10 Definition of the atomic operations LL(), SC(), and VL()
(code for process pi) . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 178
xxviii Figures and Algorithms

6.11 Implementation of the operations right_enq()


and right_deq() of a double-ended queue . . . . . . . . . . . . . . . . . . 179
6.12 How DQ.right_enq() enqueues a value . . . . . . . . . . . . . . . . . . . . 180
6.13 Examples of concurrency-free patterns . . . . . . . . . . . . . . . . . . . . 182
6.14 A concurrency-abortable non-blocking stack . . . . . . . . . . . . . . . . 182
6.15 From a concurrency-abortable object
to a starvation-free object . . . . . . . . . . . . . . ............... 183

7.1 A simple wait-free counter for n processes (code for pi). . . . . . . . 191
7.2 Wait-free weak counter (one-shot version, code for pi) . . . . . . . . 194
7.3 Proof of the weak increment property . . . . . . . . . . . . . . . . . . . . 197
7.4 Fast read of a weak counter (code for process pi) . . . . . . . . . . . . 199
7.5 Reading a weak counter (non-restricted version,
code for process pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.6 A trivial implementation of a store-collect object (code for pi) . . . 203
7.7 A store-collect object has no sequential specification . . . . . . . . . . 203
7.8 A complete binary tree to implement a store-collect object. . . . . . 205
7.9 Structure of a vertex of the binary tree. . . . . . . . . . . . . . . . . . . . 205
7.10 An adaptive implementation of a store-collect
object (code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 207
7.11 Computing an upper bound on the number of marked vertices ... 211
7.12 Merging store() and collect() (code for process pi) . . . . . . . . . ... 212
7.13 Incorrect versus correct implementation
of the store collect() operation . . . . . . . . . . . . . . . . . . . . . . ... 213
7.14 An efficient store_collect() algorithm (code for pi). . . . . . . . . ... 214
7.15 Sequential and concurrent invocations
of store_collect() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 215
7.16 Concurrent invocations of store_collect() . . . . . . . . . . . . . . . ... 216

8.1 Single-writer snapshot object for n processes . . . . . . . . . . . .... 220


8.2 Multi-writer snapshot object with m components . . . . . . . . . .... 220
8.3 An obstruction-free implementation of a snapshot
object (code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 221
8.4 Linearization point of an invocation of the snapshot()
operation (case 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 223
8.5 The update() operation includes an invocation
of the snapshot() operation . . . . . . . . . . . . . . . . . . . . . . . . .... 224
8.6 Bounded wait-free implementation of a snapshot
object (code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 225
8.7 Linearization point of an invocation of the snapshot()
operation (case 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 227
8.8 Single-writer atomic snapshot for infinitely many processes
(code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 229
8.9 An array transmitted from an update() to a snapshot()
operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 230
Figures and Algorithms xxix

8.10 Wait-free implementation of a multi-writer snapshot object


(code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
8.11 A snapshot() with two concurrent update()
by the same process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.12 An execution of a one-shot snapshot object . . . . . . . . . . . . . . . . 239
8.13 An execution of an immediate one-shot snapshot object. . . . . . . . 239
8.14 An algorithm for the operation update snapshot()
(code for process pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.15 The levels of an immediate snapshot objects. . . . . . . . . . . . . . . . 242
8.16 Recursive construction of a one-shot immediate snapshot object
(code for process pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

9.1 Uncertainties for 2 processes after one communication


exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
9.2 A grid of splitters for renaming . . . . . . . . . . . . . . . . . . . . . . . . . 255
9.3 Moir-Anderson grid-based time-adaptive renaming
(code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
9.4 A simple snapshot-based wait-free size-adaptive (2p-1)-renaming
implementation(code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
9.5 Recursive optimal size-adaptive renaming (code for pi) . . . . . . . . 261
9.6 Recursive renaming: first, p3 executes alone . . . . . . . . . . . . . . . . 262
9.7 Recursive renaming: p1 and p4 invoke new_name(4, 1, up)() . . . . 263
9.8 Borowsky and Gafni’s recursive size-adaptive renaming
algorithm (code for pi) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
9.9 Tree associated with a concurrent renaming execution . . . . . . . . . 269
9.10 Simple perfect long-lived test&set-based renaming . . . . . . . . . . . 270

10.1 An execution of a transaction-based two-process program . . .... 280


10.2 Execution of a transaction-based program: view
of an external observer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
10.3 Structure of the execution of a transaction . . . . . . . . . . . . . . . . . 283
10.4 Read from a consistent global state . . . . . . . . . . . . . . . . . . . . . . 285
10.5 Validation test for a transaction T . . . . . . . . . . . . . . . . . . . . . . . 286
10.6 TL2 algorithms for an update transaction . . . . . . . . . . . . . . . . . . 287
10.7 TL2 algorithms for a read-only transaction . . . . . . . . . . . . . . . . . 289
10.8 The list of versions associated with an application register X . . . . 290
10.9 JVSTM algorithm for an update transaction . . . . . . . . . . . . . . . . 291
10.10 JVSTM algorithm for a read-only transaction . . . . . . . . . . . . . . . 293
10.11 Causal pasts of two aborted transactions. . . . . . . . . . . . . . . . . . . 294
10.12 An STM system guaranteeing the virtual world
consistency condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 297

11.1 An execution of a regular register . . . . . . . . . . . . . . . . . . . . . . . 307


11.2 An execution of a register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
xxx Figures and Algorithms

11.3 From SWSR safe/regular to SWMR safe/regular:


a bounded construction (code for pi) . . . . . . . . . . . . . . . . . . . . . 311
11.4 A first counter-example to atomicity . . . . . . . . . . . . . . . . . . . . . 312
11.5 SWMR binary register: from safe to regular . . . . . . . . . . . . . . . . 313
11.6 SWMR safe register: from binary domain to b-valued domain . . . 314
11.7 SWMR regular register: from binary domain
to b-valued domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
11.8 A read invocation with concurrent write invocations . . . . . . . . . . 317
11.9 A second counter-example for atomicity. . . . . . . . . . . . . . . . . . . 319
11.10 SWMR atomic register: from bits to a b-valued register. . . . . . . . 320
11.11 There is no new/old inversion . . . . . . . . . . . . . . . . . . . . . . . . . . 321
11.12 SWSR register: from regular to atomic
(unbounded construction) . . . . . . . . . . . . . . . . . . . . . . . . . . ... 322
11.13 Atomic register: from one reader to multi-reader
(unbounded construction) . . . . . . . . . . . . . . . . . . . . . . . . . . ... 325
11.14 Atomic register: from one writer to multi-writer
(unbounded construction) . . . . . . . . . . . . . . . . . . . . . . . . . . ... 326

12.1 Two read invocations r and r0 concurrent with an invocation


w2i?1 of R.write(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
12.2 A possible scenario of read/write invocations at the base level . . . 333
12.3 Tromp’s construction of an atomic bit . . . . . . . . . . . . . . . . . . . . 336
12.4 What is forbidden by the properties A1 and A2 . . . . . . . . . . . . . 339
12.5 p(r1) ?H w2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
12.6 qr belongs neither to r nor to r0 . . . . . . . . . . . . . . . . . . . . . . . . . 342
12.7 A new/old inversion on the regular register REG. . . . . . . . . . . . . 343

13.1 Buffers and switch in Tromp’s construction . . . . . . . . . . . . . . . . 349


13.2 Tromp’s construction of a SWSR b-valued atomic register . . . . . . 351
13.3 A write invocation concurrent with two read invocations . . . . . . . 352
13.4 The write automaton of Tromp’s construction . . . . . . . . . . . . . . . 353
13.5 The read automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
13.6 Global state automaton. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
13.7 Simultaneous accesses to the same buffer . . . . . . . . . . . . . . . . . . 358
13.8 Successive read/write collisions. . . . . . . . . . . . . . . . . . . . . . . . . 360
13.9 Vidyasankar’s construction of an SWSR b-valued
atomic register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
13.10 Successive updates of the atomic bit WR . . . . . . . . . . . . . . . . . . 362
13.11 Ordering on base operations (Lemma 30) . . . . . . . . . . . . . . . . . . 363
13.12 Overlapping invocations (atomicity in Theorem 56). . . . . . . . . . . 366
13.13 Vidyasankar’s construction of an SWMR b-valued
atomic register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... 367

14.1 From a sequential specification to a wait-free implementation . . . 372


14.2 Architecture of the universal construction. . . . . . . . . . . . . . . . . . 376
Figures and Algorithms xxxi

14.3 A wait-free universal construction (code for process pi) . . . . . . . . 379


14.4 The object Z implemented as a linked list . . . . . . . . . . . . . . . . . 385
14.5 Herlihy’s bounded wait-free universal construction . . . . . . . . . . . 386
14.6 Sequence numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
14.7 Multi-valued consensus from binary consensus: construction 1 . . . 393
14.8 Multi-valued consensus from binary consensus: construction 2 . . . 394
14.9 Linearization order for the proof of the termination property . . . . 395

15.1 t-Tolerant self-implementation of an object RO . . . . . . . . . . . . . . 400


15.2 t-Tolerant SWSR atomic register: unbounded
self-implementation (responsive crash) . . . . . . . . . . . . . . . . . . . . 401
15.3 t-Tolerant SWSR atomic register: bounded self-implementation
(responsive crash) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
15.4 Order in which the operations access the base registers . . . . . . . . 404
15.5 Proof of the ‘‘no new/old inversion’’ property . . . . . . . . . . . . . . . 405
15.6 A simple improvement of the bounded construction. . . . . . . . . . . 406
15.7 t-Tolerant SWSR atomic register: unbounded
self-implementation (non-responsivecrash) . . . . . . . . . . . . . . . . . 407
15.8 Wait-free t-tolerant self-implementation of a consensus
object (responsive crash/omission) . . . . . . . . . . . . . . . . . . . . . . . 409
15.9 Wait-free t-tolerant (and gracefully degrading) self-implementation
of an SWSR saferegister (responsive arbitrary failures) . . . . . . . . 412
15.10 Gracefully degrading self-implementation of a consensus
object (responsive omission) . . . . . . . . . . . . . . . . . . . . . . . . . . . 415

16.1 Existence of a bivalent initial configuration . . . . . . . . . . . . . . . . 424


16.2 Read/write invocations on distinct registers. . . . . . . . . . . . . . . . . 426
16.3 Read and write invocations on the same register . . . . . . . . . . . . . 427
16.4 From test&set to consensus (code for pi, i [ {0, 1}) . . . . . . . . . . 430
16.5 From an atomic concurrent queue to consensus
(code for pi, i [ {0, 1}) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
16.6 From a swap register to consensus (code for pi, i [ {0, 1}) . . . . . 432
16.7 Q.enqueue() invocations by the processes p and q . . . . . . . . . . . . 434
16.8 State of the atomic queue Q in configuration q(p(D)) . . . . . . . . . 435
16.9 Assuming that Sp contains at most k invocations
of Q.dequeue() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 436
16.10 Assuming that Sq does not contain invocations
of Q.dequeue() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
16.11 From the configuration D to the configuration D0 or D1. . . . . . . . 438
16.12 From compare&swap to consensus . . . . . . . . . . . . . . . . . . . . . . 439
16.13 From mem-to-mem-swap to consensus (code for process pi). . . . . 440
16.14 From an augmented queue to consensus . . . . . . . . . . . . . . . . . . . 442
16.15 From a sticky bit to binary consensus . . . . . . . . . . . . . . . . . . . . 443
xxxii Figures and Algorithms

17.1 From compare&swap to alpha and omega . . . . . . . . . . . . . . . . . 451


17.2 Obligation property of an alpha2 object . . . . . . . . . . . . . . . . . . . 455
17.3 From alpha1 (adopt-commit) objects and X to consensus . . . . . . . 458
17.4 From an alpha2 object and X to consensus . . . . . . . . . . . . . . . . . 460
17.5 From an alpha3 (store-collect) object and X to consensus . . . . . . . 461
17.6 Wait-free implementation of adopt_commit() . . . . . . . . . . . . . . . . 466
17.7 Timing on the accesses to AA for the proof
of the quasi-agreement property . . . . . . . . . . . . . . . . . . . . . ... 466
17.8 Timing on the accesses to BB for the proof
of the quasi-agreement property . . . . . . . . . . . . . . . . . . . . . ... 467
17.9 Array of regular registers implementing an alpha2 object . . . . ... 468
17.10 Wait-free construction of an alpha2 object
from regular registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 469
17.11 Regular register: read and write ordering . . . . . . . . . . . . . . . ... 471
17.12 Replicating and distributing REG[1..n] on the m disks . . . . . . ... 472
17.13 Implementing an SWMR regular register from unreliable
read/write disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 473
17.14 Wait-free construction of an alpha2 object from unreliable
read/write disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
17.15 The operations of an active disk . . . . . . . . . . . . . . . . . . . . . . . . 476
17.16 Wait-free construction of an alpha2 object from an active disk . . . 477
17.17 A t-resilient construction of X (code for pi) . . . . . . . . . . . . . . . . 481
17.18 The operations collect() and deposit() on a closing
set object (code for process pi) . . . . . . . . . . . . . . . . . . . . . . ... 487
Part I
Lock-Based Synchronization

This first part of the book is devoted to lock-based synchronization, which is


known as the mutual exclusion problem. It consists of three chapters:

• The first chapter is a general introduction to the mutual exclusion problem


including the definition of the safety and liveness properties, which are the
properties that any algorithm solving the problem has to satisfy.
• The second chapter presents three families of algorithms that solve the mutual
exclusion problem. The first family is based on atomic read/write registers only,
the second family is based on specific atomic hardware operations, while the
third family is based on read/write registers which are weaker than atomic read/
write registers.
• The last chapter of this part is on the construction of concurrent objects. Three
approaches are presented. The first considers semaphores, which are traditional
lock mechanisms provided at the system level. The two other approaches
consider a higher abstraction level, namely the language constructs of the
concept of a monitor (imperative construct) and the concept of a path expression
(declarative construct).
Chapter 1
The Mutual Exclusion Problem

This chapter introduces definitions related to process synchronization and focuses


then on the mutual exclusion problem, which is one of the most important syn-
chronization problems. It also defines progress conditions associated with mutual
exclusion, namely deadlock-freedom and starvation-freedom.

Keywords Competition · Concurrent object · Cooperation · Deadlock-freedom ·


Invariant · Liveness · Lock object · Multiprocess program · Mutual exclusion ·
Safety · Sequential process · Starvation-freedom · Synchronization

1.1 Multiprocess Program

1.1.1 The Concept of a Sequential Process

A sequential algorithm is a formal description of the behavior of a sequential state


machine: the text of the algorithm states the transitions that have to be sequentially
executed. When written in a specific programming language, an algorithm is called
a program.
The concept of a process was introduced to highlight the difference between an
algorithm as a text and its execution on a processor. While an algorithm is a text
that describes statements that have to be executed (such a text can also be analyzed,
translated, etc.), a process is a “text in action”, namely the dynamic entity generated
by the execution of an algorithm (program) on one or several processors. At any
time, a process is characterized by its state (which comprises, among other things,
the current value of its program counter). A sequential process (sometimes called a
thread) is a process defined by a single control flow (i.e., its behavior is managed by
a single program counter).

M. Raynal, Concurrent Programming: Algorithms, Principles, and Foundations, 3


DOI: 10.1007/978-3-642-32027-9_1, © Springer-Verlag Berlin Heidelberg 2013
4 1 The Mutual Exclusion Problem

1.1.2 The Concept of a Multiprocess Program

The concept of a process to express the idea of an activity has become an indispensable
tool to master the activity on multiprocessors. More precisely, a concurrent algorithm
(or concurrent program) is the description of a set of sequential state machines that
cooperate through a communication medium, e.g., a shared memory. A concurrent
algorithm is sometimes called a multiprocess program (each process corresponding
to the sequential execution of a given state machine).
This chapter considers processes that are reliable and asynchronous. “Reliable”
means that each process results from the correct execution of the code of the corre-
sponding algorithm. “Asynchronous” means that there is no timing assumption on
the time it takes for a process to proceed from a state transition to the next one (which
means that an asynchronous sequential process proceeds at an arbitrary speed).

1.2 Process Synchronization

1.2.1 Processors and Processes

Processes of a multiprocess program interact in one way or another (otherwise,


each process would be independent of the other processes, and a set of indepen-
dent processes does not define a multiprocess program). Hence, the processes of a
multiprocess program do interact and may execute simultaneously (we also say that
the processes execute “in parallel” or are “concurrent”).
In the following we consider that there is one processor per process and con-
sequently the processes do execute in parallel. This assumption on the number of
processors means that, when there are fewer processors than processes, there is an
underlying scheduler (hidden to the processes) that assigns processors to processes.
This scheduling is assumed to be fair in the sense that each process is repeatedly
allowed a processor for finite periods of time. As we can see, this is in agreement with
the asynchrony assumption associated with the processes because, when a process is
waiting for a processor, it does not progress, and consequently, there is an arbitrary
period of time that elapses between the last state transition it executed before stopping
and the next state transition that it will execute when again assigned a processor.

1.2.2 Synchronization

Process synchronization occurs when the progress of one or several processes


depends on the behavior of other processes. Two types of process interaction require
synchronization: competition and cooperation.
1.2 Process Synchronization 5

More generally, synchronization is the set of rules and mechanisms that allows
the specification and implementation of sequencing properties on statements issued
by the processes so that all the executions of a multiprocess program are correct.

1.2.3 Synchronization: Competition

This type of process interaction occurs when processes have to compete to execute
some statements and only one process at a time (or a bounded number of them) is
allowed to execute them. This occurs, for example, when processes compete for a
shared resource. More generally, resource allocation is a typical example of process
competition.

A simple example As an example let us consider a random access input/output


device such as a shared disk. Such a disk provides the processes with three primitives:
seek(x), which moves the disk read/write head to the address x; read(), which returns
the value located at the current position of the read/write head; and write(v), which
writes value v at the current position of the read/write head.
Hence, if a process wants to read the value at address x of a disk D, it has to execute
the operation disk_read(x) described in Fig. 1.1. Similarly, if a process wants to write
a new value v at address x, it has to execute the operation disk_write(x, v) described
in the same figure.
The disk primitives seek(), read(), and write() are implemented in hardware, and
each invocation of any of these primitives appears to an external observer as if it
was executed instantaneously at a single point of the time line between the beginning
and the end of its real-time execution. Moreover, no two primitive invocations are
associated with the same point of the time line. Hence, the invocations appear as if
they had been executed sequentially. (This is the atomicity consistency condition that
will be more deeply addressed in Chap. 4.)
If a process p invokes disk_read(x) and later (after p’s invocation has terminated)
another process q invokes disk_write(y, v), everything works fine (both operations
execute correctly). More precisely, the primitives invoked by p and q have been
invoked sequentially, with first the invocations by p followed by the invocations by q;

Fig. 1.1 Operations to access a disk


6 1 The Mutual Exclusion Problem

time line

Fig. 1.2 An interleaving of invocations to disk primitives

i.e., from the disk D point of view, the execution corresponds to the sequence
D.seek(x); r ← D.read(); D.seek(y); D.write(v), from which we conclude that p
has read the value at address x and afterwards q has written the value v at address y.
Let us now consider the case where p and q simultaneously invoke disk_read(x)
and disk_write(y, v), respectively. The effect of the corresponding parallel execution
is produced by any interleaving of the primitives invoked by p and the primitives
invoked by q that respects the order of invocations issued by p and q. As an example,
a possible execution is depicted in Fig. 1.2. This figure is a classical space-time
diagram. Time flows from left to right, and each operation issued by a process is
represented by a segment on the time axis associated with this process. Two dashed
arrows are associated with each invocation of an operation. They meet at a point of
the “real time” line, which indicates the instant at which the corresponding operation
appears to have been executed instantaneously. This sequence of points define the
order in which the execution is seen by an external sequential observer (i.e., an
observer who can see one operation invocation at a time).
In this example, the processes p and q have invoked in parallel D.seek(x) and
D.seek(y), respectively, and D.seek(x) appears to be executed before D.seek(y).
Then q executes D.write(v) while p executes in parallel D.read(), and the write by
q appears to an external observer to be executed before the read of p.
It is easy to see that, while the write by process q is correct (namely v has been
written at address y), the read by process p of the value at address x is incorrect
( p obtains the value written at address y and not the value stored at address x).
Other incorrect parallel executions (involving invocations of both disk_read() and
disk_write() or involving only invocations of disk_write() operations) in which a
value is not written at the correct address can easily be designed.
A solution to prevent this problem from occurring consists in allowing only
one operation at a time (either disk_read() or disk_write()) to be executed. Mutual
exclusion (addressed later in this chapter) provides such a solution.

Non-determinism It is important to see that parallelism (or concurrency) generates


non-determinism: the interleaving of the invocations of the primitives cannot be
predetermined, it depends on the execution. Preventing interleavings that would
produce incorrect executions is one of the main issues of synchronization.
1.2 Process Synchronization 7

1.2.4 Synchronization: Cooperation

This section presents two examples of process cooperation. The first is a pure coor-
dination problem while the second is the well-known producer–consumer problem.
In both cases the progress of a process may depend on the progress of other processes.

Barrier (or rendezvous) A synchronization barrier (or rendezvous) is a set of


control points, one per process involved in the barrier, such that each process is
allowed to pass its control point only when all other processes have attained their
control points.
From an operational point of view, each process has to stop until all other processes
have arrived at their control point. Differently from mutual exclusion (see below), a
barrier is an instance of the mutual coincidence problem.

A producer–consumer problem Let us consider two processes, one called “the


producer” and the other called “the consumer”, such that the producer produces
data items that the consumer consumes (this cooperation pattern, called producer–
consumer, occurs in a lot of applications). Assuming that the producer loops forever
on producing data items and the consumer loops forever on consuming data items,
the problem consists in ensuring that (a) only data items that were produced are
consumed, and (b) each data item that was produced is consumed exactly once.
One way to solve this problem could be to use a synchronization barrier: Both
the producer (when it has produced a new data item) and the consumer (when it
wants to consume a new data item) invoke the barrier operation. When, they have
both attained their control point, the producer gives the data item it has just produced
to the consumer. This coordination pattern works but is not very efficient (overly
synchronized): for each data item, the first process that arrives at its control point
has to wait for the other process.
An easy way to cope with this drawback and increase concurrency consists in
using a shared buffer of size k ≥ 1. Such an object can be seen as queue or a circular
array. When it has produced a new data item, the producer adds it to the end of the
queue. When it wants to consume a new item, the consumer process withdraws the
data item at the head of the queue. With such a buffer of size k, a producer has to
wait only when the buffer is full (it then contains k data items produced and not yet
consumed). Similarly, the consumer has to wait only when the buffer is empty (which
occurs each time all data items that have been produced have been consumed).

1.2.5 The Aim of Synchronization Is to Preserve Invariants

To better understand the nature of what synchronization is, let us consider the
previous producer–consumer problem. Let # p and #c denote the number of data
items produced and consumed so far, respectively. The instance of the problem
8 1 The Mutual Exclusion Problem

authorized
forbidden area area
forbidden area

forbidden area

Fig. 1.3 Synchronization is to preserve invariants

associated with a buffer of size k is characterized by the following invariant:


(#c ≥ 0) ∧ (# p ≥ #c) ∧ (# p ≤ #c + k). The predicate #c ≥ 0 is trivial. The
predicate # p ≥ #c states that the number of data items that have been consumed
cannot be greater than the number of data items that have been produced, while the
predicate # p ≤ #c + k states that the size of the buffer is k.
This invariant is depicted in Fig. 1.3, where any point (# p, #c) inside the area
(including its borders) defined by the lines #c = 0, # p = #c, and # p = #c + k
is a correct pair of values for # p and #c. This means that, in order to be correct,
the synchronization imposed to the processes must ensure that, in any execution and
at any time, the current pair (# p, #c) has to remain inside the authorized area. This
shows that the aim of synchronization is to preserve invariants. More precisely, when
an invariant is about to be violated by a process, that process has to be stopped until
the values of the relevant state variables allow it to proceed: to keep the predicate
# p ≤ #c + k always satisfied, the producer can produce only when # p < #c + k;
similarly, in order for the predicate #c ≤ # p to be always satisfied, the consumer
can consume only when #c < # p. In that way, the pair (# p, #c) will remain forever
in the authorized area.
It is possible to represent the previous invariant in a way that relates the control
flows of both the producer and the consumer. Let pi and c j represent the ith data
item production and the jth data item consumption, respectively. Let a → b means
that a has to be terminated before b starts (where each of a and b is some pi or c j ).
A control flow-based statement of the invariant (#c ≥ 0)∧(# p ≥ #c)∧(# p ≤ #c + k)
is expressed in Fig. 1.4.

Fig. 1.4 Invariant expressed with control flows


1.3 The Mutual Exclusion Problem 9

1.3 The Mutual Exclusion Problem

1.3.1 The Mutual Exclusion Problem (Mutex)

Critical section Let us consider a part of code A (i.e., an algorithm) or several parts
of code A, B, C . . . (i.e., different algorithms) that, for some consistency reasons,
must be executed by a single process at a time. This means that, if a process is execut-
ing one of these parts of code, e.g., the code B, no other process can simultaneously
execute the same or another part of code, i.e., any of the codes A or B or C or etc.
This is, for example, the case of the disk operations disk_read() and disk_write()
introduced in Sect. 1.2.2, where guaranteeing that, at any time, at most one process
can execute either of these operations ensures that each read or write of the disk is
correct. Such parts of code define what is called a critical section. It is assumed that a
code defining a critical section always terminates when executed by a single process
at a time.
In the following, the critical section code is abstracted into a procedure called
cs_code(in) where in denotes its input parameter (if any) and that returns a result
value (without loss of generality, the default value ⊥ is returned if there is no explicit
result).

Mutual exclusion: providing application processes with an appropriate abstrac-


tion level The mutual exclusion problem (sometimes abbreviated mutex) consists
in designing an entry algorithm (also called entry protocol) and an exit algorithm
(also called exit protocol) that, when used to bracket a critical section cs_code(in),
ensure that the critical section code is executed by at most one process at a time.
Let acquire_mutex() and release_mutex() denote these “bracket” operations.
When several processes are simultaneously executing acquire_mutex(), we say that
they are competing for access to the critical section code. If one of these invocations
terminates while the other invocations do not, the corresponding process p is called
the winner, while each other competing process q is a loser (its invocation remains
pending). When considering the pair ( p, q) of competing processes, we say that p
has won its competition with q.
It is assumed that the code of the processes are well formed, which means that,
each time a process wants to execute cs_code(), it first executes acquire_mutex(),
then executes cs_code(), and finally executes release_mutex(). It is easy to direct
the processes to be well-formed by providing them with a high-level procedure
which encapsulates the critical section code cs_code(). This procedure, denoted
protected_code(in), is defined as follows (r is a local variable of the invoking
process):
10 1 The Mutual Exclusion Problem

Mutual exclusion: definition The mutual exclusion problem consists in imple-


menting the operations acquire_mutex() and release_mutex() in such a way that the
following properties are always satisfied:
• Mutual exclusion, i.e., at most one process at a time executes the critical section
code.
• Starvation-freedom. Whatever the process p, each invocation of acquire_mutex()
issued by p eventually terminates.
A problem is defined by safety properties and liveness properties. Safety properties
state that nothing bad happens. They can usually be stated as invariants. This invariant
is here the mutual exclusion property which states that at most one process at a time
can execute the critical section code.
A solution in which no process is ever allowed to execute the critical section code
would trivially satisfy the safety property. This trivial “solution” is prevented by the
starvation-freedom liveness property, which states that, if a process wants to execute
the critical section code, then that process eventually executes it.

On liveness properties Starvation-freedom means that a process that wants to


enter the critical section can be bypassed an arbitrary but finite number of times by
each other process. It is possible to define liveness properties which are weaker or
stronger than starvation-freedom, namely deadlock-freedom and bounded bypass.
• Deadlock-freedom. Whatever the time τ , if before τ one or several processes
have invoked the operation acquire_mutex() and none of them has terminated its
invocation at time τ , then there is a time τ  > τ at which a process that has invoked
acquire_mutex() terminates its invocation.
Let us notice that deadlock-freedom does not require the process that termi-
nates its invocation of acquire_mutex() to be necessarily one of the processes which
have invoked acquire_mutex() before time τ . It can be a process that has invoked
acquire_mutex() after time τ . The important point is that, as soon as processes want
to enter the critical section, then processes will enter it.
It is easy to see that starvation-freedom implies deadlock-freedom, while deadlock-
freedom does not imply starvation-freedom. This is because, if permanently several
processes are concurrently executing acquire_mutex(), it is possible that some of
them never win the competition (i.e., never terminate their execution of
acquire_mutex()). As an example, let us consider three processes p1 , p2 , and p3
that are concurrently executing acquire_mutex() and p1 wins (terminates). Due to
the safety property, there is a single winner at a time. Hence, p1 executes the pro-
cedure cs_code() and then release_mutex(). Then, p2 wins the competition with
p3 and starts executing cs_code(). During that time, p1 invokes acquire_mutex() to
execute cs_code() again. Hence, while p3 is executing acquire_mutex(), it has lost
two competitions: the first one with respect to p1 and the second one with respect
to p2 . Moreover, p3 is currently competing again with p1 . When later p2 terminates
its execution of release_mutex(), p1 wins the competition with p3 and starts its sec-
ond execution of cs_code(). During that time p2 invokes acquire_mutex() again, etc.
1.3 The Mutual Exclusion Problem 11

It is easy to extend this execution in such a way that, while p3 wants to enter the
critical section, it can never enter it. This execution is deadlock-free but (due to p3 )
is not starvation-free.

Service point of view versus client point of view Deadlock-freedom is a meaning-


ful liveness condition from the critical section (service) point of view: if processes are
competing for the critical section, one of them always wins, hence the critical section
is used when processes want to access it. On the other hand, starvation-freedom is
a meaningful liveness condition from a process (client) point of view: whatever the
process p, if p wants to execute the critical section code it eventually executes it.

Finite bypass versus bounded bypass A liveness property that is stronger than
starvation-freedom is the following one. Let p and q be a pair of competing processes
such that q wins the competition. Let f (n) denote a function of n (where n is the
total number of processes).
• Bounded bypass. There is a function f (n) such that, each time a process invokes
acquire_mutex(), it loses at most f (n) competitions with respect to the other
processes.
Let us observe that starvation-freedom is nothing else than the case where the
number of times that a process can be bypassed is finite. More generally, we have the
following hierarchy of liveness properties: bounded bypass ⇒ starvation-freedom
≡ finite bypass ⇒ deadlock-freedom.

1.3.2 Lock Object

Definition A lock (say LOCK) is a shared object that provides the processes
with two operations denoted LOCK.acquire_lock() and LOCK.release_lock(). It
can take two values, free and locked, and is initialized to the value free. Its
behavior is defined by a sequential specification: from an external observer point
of view, all the acquire_lock() and release_lock() invocations appear as if they
have been invoked one after the other. Moreover, using the regular language

operators “;” and “∗”, this sequence corresponds to the regular expression LOCK.
∗
acquire_lock(); LOCK.release_lock() (see Fig. 1.5).

Lock versus Mutex It is easy to see that, considering acquire_lock() as a synonym


of acquire_mutex() and release_lock() as a synonym of release_mutex(), a lock
object solves the mutual exclusion problem. Hence, the lock object is the object
associated with mutual exclusion: solving the mutual exclusion problem is the same
as implementing a lock object.
12 1 The Mutual Exclusion Problem

Fig. 1.5 Sequential specification of a lock object LOCK

1.3.3 Three Families of Solutions

According to the operations and their properties provided to the processes by the
underlying shared memory communication system, several families of mutex algo-
rithms can be designed. We distinguish three distinct families of mutex algorithms
which are investigated in the next chapter.

Atomic read/write registers In this case the processes communicate by reading


and writing shared atomic registers. There is no other way for them to cooperate.
Atomic registers and a few mutex algorithms based on such registers are presented
in Sect. 2.1.

Specialized hardware primitives Multiprocessor architectures usually offer hard-


ware primitives suited to synchronization. These operations are more sophisticated
than simple read/write registers. Some of them will be introduced and used to solve
mutual exclusion in Sect. 2.2.

Mutex without underlying atomicity Solving the mutual exclusion problem


allows for the construction of high-level atomic operations (i.e., whatever the base
statements that define a block of code, this block of code can be made atomic). The
mutex algorithms based on atomic read/write registers or specialized hardware prim-
itives assume that the underlying shared memory offers low-level atomic operations
and those are used to implement mutual exclusion at a higher abstraction level. This
means that these algorithms are atomicity-based: they allow high level programmer-
defined atomic operations to be built from base hardware-provided atomic opera-
tions. Hence, the fundamental question: Can programmer-defined atomic operations
be built without assuming atomicity at a lower abstraction level? This question can
also be stated as follows: Is atomicity at a lower level required to solve atomicity at
a higher level?
Somehow surprisingly, it is shown in Sect. 2.3 that the answer to the last formu-
lation of the previous question is “no”. To that end, new types of shared read/write
registers are introduced and mutual exclusion algorithms based on such particularly
weak registers are presented.
1.4 Summary 13

1.4 Summary

This chapter has presented the mutual exclusion problem. Solving this problem con-
sists in providing a lock object, i.e., a synchronization object that allows a zone of
code to be bracketed to guarantee that a single process at a time can execute it.

1.5 Bibliographic Notes

• The mutual exclusion problem was first stated by E.W. Dijkstra [88].
• A theory of interprocess communication and mutual exclusion is described in
[185].
• The notions of safety and liveness were introduced by L. Lamport in [185]. The
notion of liveness is investigated in [20].
• An invariant-based view of synchronization is presented in [194].
Chapter 2
Solving Mutual Exclusion

This chapter is on the implementation of mutual exclusion locks. As announced at


the end of the previous chapter, it presents three distinct families of algorithms that
solve the mutual exclusion problem. The first is the family of algorithms which are
based on atomic read/write registers only. The second is the family of algorithms
which are based on specialized hardware operations (which are atomic and stronger
than atomic read/write operations). The third is the family of algorithms which are
based on read/write registers which are weaker than atomic registers. Each algorithm
is first explained and then proved correct. Other properties such as time complexity
and space complexity of mutual exclusion algorithms are also discussed.

Keywords Atomic read/write register · Lock object · Mutual exclusion · Safe


read/write register · Specialized hardware primitive (test&set, fetch&add,
compare&swap)

2.1 Mutex Based on Atomic Read/Write Registers

2.1.1 Atomic Register

The read/write register object is one of the most basic objects encountered in com-
puter science. When such an object is accessed only by a single process it is said to
be local to that process; otherwise, it is a shared register. A local register allows a
process to store and retrieve data. A shared register allows concurrent processes to
also exchange data.
Definition A register R can be accessed by two base operations: R.read(), which
returns the value of R (also denoted x ← R where x is a local variable of the invoking
process), and R.write(v), which writes a new value into R (also denoted R ← v,
where v is the value to be written into R). An atomic shared register satisfies the
following properties:

M. Raynal, Concurrent Programming: Algorithms, Principles, and Foundations, 15


DOI: 10.1007/978-3-642-32027-9_2, © Springer-Verlag Berlin Heidelberg 2013
16 2 Solving Mutual Exclusion

• Each invocation op of a read or write operation:


– Appears as if it was executed at a single point τ (op) of the time line,
– τ (op) is such that τb (op) ≤ τ (op) ≤ τe (op), where τb (op) and τe (op) denote
the time at which the operation op started and finished, respectively,

– For any two operation invocations op1 and op2: (op1 = op2) ⇒ τ (op1) =
τ (op2) .
• Each read invocation returns the value written by the closest preceding write invo-
cation in the sequence defined by the τ () instants associated with the operation
invocations (or the initial value of the register if there is no preceding write oper-
ation).
This means that an atomic register is such that all its operation invocations appear
as if they have been executed sequentially: any invocation op1 that has terminated
before an invocation op2 starts appears before op2 in that sequence, and this sequence
belongs to the specification of a sequential register.
An atomic register can be single-writer/single-reader (SWSR)—the reader and
the writer being distinct processes—or single-writer/multi-reader (SWMR), or multi-
writer/multi-reader (MWMR) . We assume that a register is able to contain any value.
(As each process is sequential, a local register can be seen as a trivial instance of
an atomic SWSR register where, additionally, both the writer and the reader are the
same process.)
An example An execution of a MWMR atomic register accessed by three
processes p1 , p2 , and p3 is depicted in Fig. 2.1 using a classical space-time diagram.
R.read() → v means that the corresponding read operation returns the value v.
Consequently, an external observer sees the following sequential execution of the
register R which satisfies the definition of an atomic register:

R.write(1), R.read() → 1, R.write(3), R.write(2), R.read() → 2, R.read() → 2.

Let us observe that R.write(3) and R.write(2) are concurrent, which means that
they could appear to an external observer as if R.write(2) was executed before

Fig. 2.1 An atomic register execution


2.1 Mutex Based on Atomic Read/Write Registers 17

R.write(3). If this was the case, the execution would be correct if the last two read
invocations (issued by p1 and p3 ) return the value 3; i.e., the external observer should
then see the following sequential execution:

R.write(1), R.read() → 1, R.write(2), R.write(3), R.read() → 3, R.read() → 3.

Let us also observe that the second read invocation by p1 is concurrent with both
R.write(2) and R.write(3). This means that it could appear as having been executed
before these two write operations or even between them. If it appears as having been
executed before these two write operations, it should return the value 1 in order for
the register behavior be atomic.
As shown by these possible scenarios (and as noticed before) concurrency is
intimately related to non-determinism. It is not possible to predict which execution
will be produced; it is only possible to enumerate the set of possible executions that
could be produced (we can only predict that the one that is actually produced is one
of them).
Examples of non-atomic read and write operations will be presented in Sect. 2.3.
Why atomicity is important Atomicity is a fundamental concept because it allows
the composition of shared objects for free (i.e., their composition is at no additional
cost). This means that, when considering two (or more) atomic registers R1 and
R2, the composite object [R1, R2] which is made up of R1 and R2 and provides the
processes with the four operations R1.read(), R1.write(), R2.read(), and R2.write()
is also atomic. Everything appears as if at most one operation at a time was executed,
and the sub-sequence including only the operations on R1 is a correct behavior of
R1, and similarly for R2.
This is very important when one has to reason about a multiprocess program
whose processes access atomic registers. More precisely, we can keep reasoning
sequentially whatever the number of atomic registers involved in a concurrent com-
putation. Atomicity allows us to reason on a set of atomic registers as if they were a
single “bigger” atomic object. Hence, we can reason in terms of sequences, not only
for each atomic register taken separately, but also on the whole set of registers as if
they were a single atomic object.
The composition of atomic objects is formally addressed in Sect. 4.4, where
it is shown that, as atomicity is a “local property”, atomic objects compose for
free.

2.1.2 Mutex for Two Processes: An Incremental Construction

The mutex algorithm for two processes that is presented below is due to G.L. Peterson
(1981). This construction, which is fairly simple, is built from an “addition” of two
base components. Despite the fact that these components are nearly trivial, they allow
us to introduce simple basic principles.
18 2 Solving Mutual Exclusion

operation

Fig. 2.2 Peterson’s algorithm for two processes: first component (code for pi )

The processes are denoted pi and p j . As the algorithm for p j is the same as the
one for pi after having replaced i by j, we give only the code for pi .
First component This component is described in Fig. 2.2 for process pi . It is
based on a single atomic register denoted AFTER_YOU, the initial value of which
is irrelevant (a process writes into this register before reading it). The principle that
underlies this algorithm is a “politeness” rule used in current life. When pi wants
to acquire the critical section, it sets AFTER_YOU to its identity i and waits until
AFTER_YOU = i in order to enter the critical section. Releasing the critical section
entails no particular action.
It is easy to see that this algorithm satisfies the mutual exclusion property. When
both processes want to acquire the critical section, each assigns its identity to the
register AFTER_YOU and waits until this register contains the identity of the other
process. As the register is atomic, there is a “last” process, say p j , that updated it,
and consequently only the other process pi can proceed to the critical section.
Unfortunately, this simple algorithm is not deadlock-free. If one process alone
wants to enter the critical section, it remains blocked forever in the wait statement.
Actually, this algorithm ensures that, when both processes want to enter the critical
section, the first process that updates the register AFTER_YOU is the one that is
allowed to enter it.
Second component This component is described in Fig. 2.3. It is based on a simple
idea. Each process pi manages a flag (denoted FLAG[i]) the value of which is down
or up. Initially, both flags are down. When a process wants to acquire the critical
section, it first raises its flag to indicate that it is interested in the critical section. It
is then allowed to proceed only when the flag of the other process is equal to down.
To release the critical section, a process pi has only to reset FLAG[i] to its initial
value (namely, down), thereby indicating that it is no longer interested in the mutual
exclusion.

Fig. 2.3 Peterson’s algorithm for two processes: second component (code for pi )
2.1 Mutex Based on Atomic Read/Write Registers 19

It is easy to see that, if a single process pi wants to repeatedly acquire the critical
section while the other process is not interested in the critical section, it can do so
(hence this algorithm does not suffer the drawback of the previous one). Moreover,
it is also easy to see that this algorithm satisfies the mutual exclusion property. This
follows from the fact that each process follows the following pattern: first write its flag
and only then read the value of the other flag. Hence, assuming that pi has acquired
(and not released) the critical section, we had (FLAG[i] = up)∧(FLAG[ j] = down)
when it was allowed to enter the critical section. It follows that, after p j has set
FLAG[ j] to the value up, it reads up from FLAG[i] and is delayed until pi resets
FLAG[i] to down when it releases the critical section.
Unfortunately, this algorithm is not deadlock-free. If both processes concurrently
raise first their flags and then read the other flag, each process remains blocked until
the other flag is set down which will never be done.
Remark: the notion of a livelock In order to prevent the previous deadlock situa-
tion, one could think replacing wait (FLAG[ j] = down) by the following statement:
while (FLAG[ j] = up) do
FLAG[i] ← down;
pi delays itself for an arbitrary period of time;
FLAG[i] ← up
end while.

This modification can reduce deadlock situations but cannot eliminate all of them.
This occurs, for example when both processes execute “synchronously” (both delay
themselves for the same duration and execute the same step—writing their flag and
reading the other flag—at the very same time). When it occurs, this situation is
sometimes called a livelock.
This tentative solution was obtained by playing with asynchrony (modifying the
process speed by adding delays). As a correct algorithm has to work despite any
asynchrony pattern, playing with asynchrony can eliminate bad scenarios but cannot
suppress all of them.

2.1.3 A Two-Process Algorithm

Principles and description In a very interesting way, a simple “addition” of the


two previous “components” provides us with a correct mutex algorithm for two
processes (Peterson’s two-process algorithm). This component addition consists in
a process pi first raising its flag (to indicate that it is competing, as in Fig. 2.3), then
assigning its identity to the atomic register AFTER_YOU (as in Fig. 2.2), and finally
waiting until any of the progress predicates AFTER_YOU = i or FLAG[ j] = down
is satisfied.
It is easy to see that, when a single process wants to enter the critical section, the
flag of the other process allows it to enter. Moreover, when each process sees that
20 2 Solving Mutual Exclusion

Fig. 2.4 Peterson’s algorithm for two processes (code for pi )

the flag of the other one was raised, the current value of the register AFTER_YOU
allows exactly one of them to progress.
It is important to observe that, in the wait statement of Fig. 2.4, the reading of
the atomic registers FLAG[ j] and AFTER_YOU are asynchronous (they are done at
different times and can be done in any order).
Theorem 1 The algorithm described in Fig. 2.4 satisfies mutual exclusion and
bounded bypass (where the bound is f (n) = 1).

Preliminary remark for the proof The reasoning is based on the fact that the
three registers FLAG[i], FLAG[ j], and AFTER_YOU are atomic. As we have seen
when presenting the atomicity concept (Sect. 2.1.1), this allows us to reason as if at
most one read or write operation on any of these registers occurs at a time.
Proof Proof of the mutual exclusion property.
Let us assume by contradiction that both pi and p j are inside the critical section.
Hence, both have executed acquire_mutex() and we have then FLAG[i] = up,
FLAG[ j] = up and AFTER_YOU = j (if AFTER_YOU = i, the reasoning is the
same after having exchanged i and j). According to the predicate that allowed pi to
enter the critical section, there are two cases.

• Process pi has terminated acquire_mutex(i) because FLAG[ j] = down.


As pi has set FLAG[i] to up before reading down from FLAG[ j] (and entering the
critical section), it follows that p j cannot have read down from FLAG[i] before
entering the critical section (see Fig. 2.5). Hence, p j entered it due to the predicate
AFTER_YOU = i. But this contradicts the assumption that AFTER_YOU = j
when both processes are inside the critical section.
• Process pi has terminated acquire_mutex(i) because AFTER_YOU = j.
As (by assumption) p j is inside the critical section, AFTER_YOU = j, and only p j
can write j into AFTER_YOU, it follows that p j has terminated acquire_mutex( j)
because it has read down from FLAG[i]. On another side, FLAG[i] remains con-
tinuously equal to up from the time at which pi has executed the first statement
of acquire_mutex(i) and the execution of release_mutex(i) (Fig. 2.6).
2.1 Mutex Based on Atomic Read/Write Registers 21

Fig. 2.5 Mutex property of Peterson’s two-process algorithm (part 1)

pi executes
AFTER YOU ← i at current time we have:
pi executes pj executes AFTER YOU = j
FLAG[i] ← up AFTER YOU ← j FLAG [i] = FLAG [j] = up

time line
pj executes its wait statement

Fig. 2.6 Mutex property of Peterson’s two-process algorithm (part 2)

As p j executes the wait statement after writing j into AFTER_YOU and pi read
j from AFTER_YOU, it follows that p j cannot read down from FLAG[i] when it
executes the wait statement. This contradicts the assumption that p j is inside the
critical section.

Proof of the bounded bypass property.


Let pi be the process that invokes acquire_mutex(i). If FLAG[ j] = down or
AFTER_YOU = j when pi executes the wait statement, it enters the critical section.
Let us consequently assume that (FLAG[ j] = up)∧(AFTER_YOU = i) when pi
executes the wait statement (i.e., the competition is lost by pi ). If, after p j has exe-
cuted release_mutex( j), it does not invoke acquire_mutex( j) again, we permanently
have FLAG[ j] = down and pi eventually enters the critical section.
Hence let us assume that p j invokes again acquire_mutex( j) and sets FLAG[ j]
to up before pi reads it. Thus, the next read of FLAG[ j] by pi returns up. We have
then (FLAG[ j] = up) ∧ (AFTER_YOU = i), and pi cannot progress (see Fig. 2.7).
It follows from the code of acquire_mutex( j) that p j eventually assigns j to
AFTER_YOU (and the predicate AFTER_YOU = j remains true until the next invo-
cation of acquire_mutex() by pi ). Hence, pi eventually reads j from AFTER_YOU
and is allowed to enter the critical section.
It follows that a process looses at most one competition with respect to the other
process, from which we conclude that the bounded bypass property is satisfied and
we have f (n) = 1. 
22 2 Solving Mutual Exclusion

pi executes pj executes
AFTER YOU ← i AFTER YOU ← j
pi executes pj executes pj executes
FLAG [i] ← up FLAG [j] ← down FLAG [j] ← up

time line
pi does not read FLAG[j]

Fig. 2.7 Bounded bypass property of Peterson’s two-process algorithm

Space complexity The space complexity of a mutex algorithm is measured by the


number and the size of the atomic registers it uses.
It is easy to see that Peterson’s two-process algorithm has a bounded space com-
plexity: there are three atomic registers FLAG[i], FLAG[ j], and AFTER_YOU,
and the domain of each of them has two values. Hence three atomic bits are
sufficient.

2.1.4 Mutex for n Processes:


Generalizing the Previous Two-Process Algorithm

Description Peterson’s mutex algorithm for n processes is described in Fig. 2.8.


This algorithm is a simple generalization of the two-process algorithm described in
Fig. 2.4. This generalization, which is based on the notion of level, is as follows.
In the two-process algorithm, a process pi uses a simple SWMR flag FLAG[i]
whose value is either down (to indicate it is not interested in the critical section) or
up (to indicate it is interested). Instead of this binary flag, a process pi uses now a
multi-valued flag that progresses from a flag level to the next one. This flag, denoted
FLAG_LEVEL[i], is initialized to 0 (indicating that pi is not interested in the critical
section). It then increases first to level 1, then to level 2, etc., until the level n − 1,

Fig. 2.8 Peterson’s algorithm for n processes (code for pi )


2.1 Mutex Based on Atomic Read/Write Registers 23

which allows it to enter the critical section. For 1 ≤ x < n −1, FLAG_LEVEL[i] = x
means that pi is trying to enter level x + 1.
Moreover, to eliminate possible deadlocks at any level , 0 <  < n − 1 (such as
the deadlock that can occur in the algorithm of Fig. 2.3), the processes use a second
array of atomic registers AFTER_YOU[1..(n − 1)] such that AFTER_YOU[] keeps
track of the last process that has entered level .
More precisely, a process pi executes a for loop to progress from one level to
the next one, starting from level 1 and finishing at level n − 1. At each level the
two-process solution is used to block a process (if needed). The predicate that allows
a process to progress from level , 0 <  < n − 1, to level  + 1 is similar to the
one of the two-process algorithm. More precisely, pi is allowed to progress to level
 + 1 if, from its point of view,
• Either all the other processes are at a lower level (i.e., ∀ k = i:FLAG_LEVEL
[k] < ).
• Or it is not the last one that entered level  (i.e., AFTER_YOU[] = i).
Let us notice that the predicate used in the wait statement of line 4 involves all but one
of the atomic registers FLAG_LEVEL[·] plus the atomic register AFTER_YOU[].
As these registers cannot be read in a single atomic step, the predicate is repeatedly
evaluated asynchronously on each register.
When all processes compete for the critical section, at most (n − 1) processes can
concurrently be winners at level 1, (n − 2) processes can concurrently be winners
at level 2, and more generally (n − ) processes can concurrently be winners at
level . Hence, there is a single winner at level (n − 1).
The code of the operation release_mutex(i) is similar to the one of the two-process
algorithm: a process pi resets FLAG_LEVEL[i] to its initial value 0 to indicate that
it is no longer interested in the critical section.
Theorem 2 The algorithm described in Fig. 2.8 satisfies mutual exclusion and
starvation-freedom.
Proof Initially, a process pi is such that FLAG_LEVEL[i] = 0 and we say that it is
at level 0. Let  ∈ [1..(n − 1)]. We say that a process pi has “attained” level  (or,
from a global state point of view, “is” at level ) if it has exited the wait statement
of the th loop iteration. Let us notice that, after it has set its loop index  to α > 0
and until it exits the wait statement of the corresponding iteration, that process is at
level α − 1. Moreover, a process that attains level  has also attained the levels 

with 0 ≤ 
≤  ≤ n − 1 and consequently it is also at these levels 
.
The proof of the mutual exclusion property amounts to showing that at most one
process is at level (n − 1). This is a consequence of the following claim when we
consider  = n − 1.
Claim. For , 0 ≤  ≤ n − 1, at most n −  processes are at level .
The proof of this claim is by induction on the level . The base case  = 0 is
trivial. Assuming that the claim is true up to level  − 1, i.e., at most n − ( − 1)
24 2 Solving Mutual Exclusion

FLAG LEVEL [y] ←  AFTER YOU [] ← y

py
time line

px

AFTER YOU [] ← x r ← FLAG LEVEL [y]

Fig. 2.9 Total order on read/write operations

processes are simultaneously at level  − 1, we have to show that at least one process
does not progress to level . The proof is by contradiction: let us assume that n −+1
processes are at level .
Let px be the last process that wrote its identity into AFTER_YOU[] (hence,
AFTER_YOU[] = x). When considering the sequence of read and write operations
executed by every process, and the fact that these operations are on atomic registers,
this means that, for any of the n −  other processes p y that are at level , these
operations appear as if they have been executed in the following order where the
first two operations are issued by p y while the least two operations are issued by px
(Fig. 2.9):
1. FLAG_LEVEL[y] ←  is executed before AFTER_YOU[] ← y (sequentiality
of p y )
2. AFTER_YOU[] ← y is executed before AFTER_YOU[] ← x (assumption:
definition of px )
3. AFTER_YOU[] ← x is executed before r ← FLAG_LEVEL[y] (sequentiality
of px ; r is px ’s local variable storing the last value read from FLAG_LEVEL[y]
before px exits the wait statement at level ).
It follows from this sequence that r = . Consequently, as AFTER_YOU[] = x,
px exited the wait statement of the th iteration because ∀ k = x : FLAG_LEVEL
[k] < . But this is contradicted by the fact that we had then FLAG_LEVEL[y] = ,
which concludes the proof of the claim.
The proof of the starvation-freedom property is by induction on the levels starting
from level n − 1 and proceeding until level 1. The base case  = n − 1 follows from
the previous claim: if there is a process at level (n − 1), it is the only process at that
level and it can exit the for loop. This process eventually enters the critical section
(that, by assumption, it will leave later). The induction assumption is the following:
each process that attains a level 
such that n − 1 ≥ 
≥  eventually enters the
critical section.
The rest of the proof is by contradiction. Let us assume that  is such that there is
a process (say px ) that remains blocked forever in the wait statement during its th
2.1 Mutex Based on Atomic Read/Write Registers 25

iteration (hence, px cannot attain level ). It follows that, each time px evaluates the
predicate controlling the wait statement, we have
 
∃ k = i : FLAG_LEVEL[k] ≥ ) ∧ (AFTER_YOU[] = x)

(let us remember that the atomic registers are read one at a time, asynchronously,
and in any order). There are two cases.
• Case 1: There is a process p y that eventually executes AFTER_YOU[] ← y.
As only px can execute AFTER_YOU[] ← x, there is eventually a read of
AFTER_YOU[] that returns a value different from x, and this read allows px
to progress to level . This contradicts the assumption that px remains blocked
forever in the wait statement during its th iteration.
• Case 2: No process p y eventually executes AFTER_YOU[] ← y.
The other processes can be partitioned in two sets: the set G that contains the
processes at a level greater or equal to , and the set L that contains the processes
at a level smaller than .
As the predicate AFTER_YOU[] = x remains forever true, it follows that no
process p y in L enters the th loop iteration (otherwise p y would necessarily
execute AFTER_YOU[] ← y, contradicting the case assumption).
On the other side, due to the induction assumption, all processes in G eventu-
ally enter (and later leave) the critical section. When this has occurred, these
processes have moved from the set G to the set L and then the predicate
∀ k = i : FLAG_LEVEL[k] <  becomes true.
When this has happened, the values returned by the asynchronous reading of
FLAG_LEVEL[1..n] by px allow it to attain level , which contradicts the assump-
tion that px remains blocked forever in the wait statement during its th iteration.
In both case the assumption that a process remains blocked forever at level  is
contradicted which completes the proof of the induction step and concludes the
proof of the starvation-freedom property. 

Starvation-freedom versus bounded bypass The two-process Peterson’s algo-


rithm satisfies the bounded bypass liveness property while the n-process algorithm
satisfies only starvation-freedom. Actually, starvation-freedom (i.e., finite bypass) is
the best liveness property that Peterson’s n-process algorithm (Fig. 2.8) guarantees.
This can be shown with a simple example. Let us consider the case n = 3. The three
processes p1 , p2 , and p3 invoke simultaneously acquire_mutex(), and the run is such
that p1 wins the competition and enters the critical section. Moreover, let us assume
that AFTER_YOU[1] = 3 (i.e., p3 is the last process that wrote AFTER_YOU[1])
and p3 blocked at level 1.
Then, after it has invoked release_mutex(), process p1 invokes acquire_mutex()
again and we have consequently AFTER_YOU[1] = 1. But, from that time, p3 starts
26 2 Solving Mutual Exclusion

an arbitrary long “sleeping” period (this is possible as the processes are asynchronous)
and consequently does not read AFTER_YOU[1] = 1 (which would allow it to
progress to the second level). Differently, p2 progresses to the second level and
enters the critical section. Later, p2 first invokes release_mutex() and immediately
after invokes acquire_mutex() and updates AFTER_YOU[1] = 2. While p3 keeps
on “sleeping”, p1 progresses to level 2 and finally enters the critical section. This
scenario can be reproduced an arbitrary number of times until p3 wakes up. When this
occurs, p3 reads from AFTER_YOU[1] a value different from 3, and consequently
progresses to level 2. Hence:
• Due to asynchrony, a “sleeping period” can be arbitrarily long, and a process can
consequently lose an arbitrary number of competitions with respect to the other
processes,
• But, as a process does not sleep forever, it eventually progresses to the next level.
It is important to notice that, as shown in the proof of the bounded pass property of
Theorem 1, this scenario cannot happen when n = 2.
Atomic register: size and number It is easy to see that the algorithm uses
2n − 1 atomic registers. The domain of each of the n registers FLAG_LEVEL[i] is
[0..(n −1)], while the domain of each of the n −1 AFTER_YOU[] registers is [1..n].
Hence, in both cases, log2 n bits are necessary and sufficient for each atomic reg-
ister.
Number of accesses to atomic registers Let us define the time complexity of a
mutex algorithm as the number of accesses to atomic registers for one use of the
critical section by a process.
It is easy to see that this cost is finite but not bounded when there is contention
(i.e., when several processes simultaneously compete to execute the critical section
code).
Differently in a contention-free scenario (i.e., when only one process pi wants to
use the critical section), the number of accesses to atomic registers is (n − 1)(n + 2)
in acquire_mutex(i) and one in release_mutex(i).
The case of k-exclusion This is the k-mutual exclusion problem where the critical
section code can be concurrently accessed by up to k processes (mutual exclusion
corresponds to the case where k = 1).
Peterson’s n-process algorithm can easily be modified to solve k-mutual exclusion.
The upper bound of the for loop (namely (n−1)) has simply to be replaced by (n−k).
No other statement modification is required. Moreover, let us observe that the size
of the array AFTER_YOU can then be reduced to [1..(n − k)].

2.1.5 Mutex for n Processes: A Tournament-Based Algorithm

Reducing the number of shared memory accesses In the previous n-process


mutex algorithm, a process has to compete with the (n − 1) other processes before
2.1 Mutex Based on Atomic Read/Write Registers 27

Fig. 2.10 A tournament tree for n processes

being able to access the critical section. Said differently, it has to execute n − 1 loop
iterations (eliminating another process at each iteration), and consequently, the cost
(measured in number of accesses to atomic registers) in a contention-free scenario
is O(n) × the cost of one loop iteration, i.e., O(n 2 ). Hence a natural question is the
following: Is it possible to reduce this cost and (if so) how?
Tournament tree A simple principle to reduce the number of shared memory
accesses is to use a tournament tree. Such a tree is a complete binary tree. To simplify
the presentation, we consider that the number of processes is a power of 2, i.e., n = 2k
(hence k = log2 n). If n is not a power of two, it has to be replaced by n
= 2k where
k = log2 n (i.e., n
is the smallest power of 2 such that n
> n).
Such a tree for n = 23 processes p1 , . . . , p8 , is represented in Fig. 2.10. Each
node of the tree is any two-process starvation-free mutex algorithm, e.g., Peterson’s
two-process algorithm. It is even possible to associate different two-process mutex
algorithms with different nodes. The important common feature of these algorithms
is that any of them assumes that it is used by two processes whose identities are 0
and 1.
As we have seen previously, any two-process mutex algorithm implements a lock
object. Hence, we consider in the following that the tournament tree is a tree of (n−1)
locks and we accordingly adopt the lock terminology. The locks are kept in an array
denoted LOCK[1..(n − 1)], and for x = y, LOCK[x] and LOCK[y] are independent
objects (the atomic registers used to implement LOCK[x] and the atomic registers
used to implement LOCK[y] are different).
The lock LOCK[1] is associated withe root of the tree, and if it is not a leaf, the
node associated with the lock LOCK[x] has two children associated with the locks
LOCK[2x] and LOCK[2x + 1].
According to its identity i, each process pi starts competing with a single other
process p j to obtain a lock that is a leaf of the tree. Then, when it wins, the process
28 2 Solving Mutual Exclusion

Fig. 2.11 Tournament-based mutex algorithm (code for pi )

pi proceeds to the next level of the tree to acquire the lock associated with the node
that is the father of the node currently associated with pi (initially the leaf node
associated with pi ). Hence, a process competes to acquire all the locks on the path
from the leaf it is associated with until the root node.
As (a) the length of such a path is log2 n and (b) the cost to obtain a lock
associated with a node is O(1) in contention-free scenarios, it is easy to see that
the number of accesses to atomic registers in these scenarios is O(log2 n) (it
is exactly 4 log2 n when each lock is implemented with Peterson’s two-process
algorithm).

The tournament-based mutex algorithm This algorithm is described in Fig. 2.11.


Each process pi manages a local variable node_id such that LOCK[node_id] is the
lock currently addressed by pi and a local array p_id[1..k] such that p_id[] is the
identity (0 or 1) used by pi to access LOCK[node_id] as indicated by the labels on
the arrows in Fig. 2.10. (For a process pi , p_id[] could be directly computed from
the values i and ; a local array is used to simplify the presentation.)
When a process pi invokes acquire_mutex(i) it first considers that it has suc-
cessfully locked a fictitious lock object LOCK[i + (n − 1)] that can be accessed
only by this process (line 1). Process pi then enters a loop to traverse the tree, level
by level, from its starting leaf until the root (lines 2–6). The starting leaf of pi is
associated with the lock LOCK[(i + (n − 1))/2] (lines 1 and 4). The identity used
by pi to access the lock LOCK[node_id] (line 5) is computed at line 3 and saved in
p_id[level].
When it invokes release_mutex(i), process pi releases the k locks it has locked
starting from the lock associated with the root (LOCK[1]) until the lock associated
2.1 Mutex Based on Atomic Read/Write Registers 29

with its starting leaf LOCK[(i + (n − 1))/2]. When it invokes LOCK[node_id].


release_lock( p_id[level]) (line 10), the value of the parameter p_id[level] is
the identity (0 or 1) used by pi when it locked that object. This identity is
also used by pi to compute the index of the next lock object it has to unlock
(line 11).
Theorem 3 Assuming that each two-process lock object satisfies mutual exclusion
and deadlock-freedom (or starvation-freedom), the algorithm described in Fig. 2.11
satisfies mutual exclusion and deadlock-freedom (or starvation-freedom).
Proof The proof of the mutex property is by contradiction. If pi and p j (i = j) are
simultaneously in the critical section, there is a lock object LOCK[node_id] such
that pi and p j have invoked acquire_lock() on that object and both have been simul-
taneously granted the lock. (If there are several such locks, let LOCK[node_id] be
one at the lowest level in the tree.) Due to the specification of the lock object (that
grants the lock to a single process identity, namely 0 or 1), it follows that both pi
and p j have invoked LOCK[node_id].acquire_lock() with the same identity value
(0 or 1) kept in their local variable p_id[level]. But, due to the binary tree struc-
ture of the set of lock objects and the way the processes compute p_id[level],
this can only happen if i = j (on the lowest level on which pi and p j share
a lock), which contradicts our assumption and completes the proof of the mutex
property.
The proof of the starvation-freedom (or deadlock-freedom) property follows from
the same property of the base lock objects. We consider here only the starvation-
freedom property. Let us assume that a process pi is blocked forever at the object
LOCK[node_id]. This means that there is another process p j that competes infi-
nitely often with pi for the lock granted by LOCK[node_id] and wins each time.
The proof follows from the fact that, due to the starvation-freedom property of
LOCK[node_id], this cannot happen. 

Remark Let us consider the case where each algorithm implementing an under-
lying two-process lock object uses a bounded number of bounded atomic regis-
ters (which is the case for Peterson’s two-process algorithm). In that case, as the
tournament-based algorithm uses (n−1) lock objects, it follows that it uses a bounded
number of bounded atomic registers.
Let us observe that this tournament-based algorithm has better time complexity
than Peterson’s n-process algorithm.

2.1.6 A Concurrency-Abortable Algorithm

When looking at the number of accesses to atomic registers issued by


acquire_mutex() and release_mutex() for a single use of the critical section in a
contention-free scenario, the cost of Peterson’s n-process mutual exclusion
30 2 Solving Mutual Exclusion

algorithm is O(n 2 ) while the cost of the tournament tree-based algorithm is O(log2 n).
Hence, a natural question is the following: Is it possible to design a fast n-process
mutex algorithm, where fast means that the cost of the algorithm is constant in a
contention-free scenario?
The next section of this chapter answers this question positively. To that end, an
incremental presentation is adopted. A simple one-shot operation is first presented.
Each of its invocations returns a value r to the invoking process, where r is the value
abor t or the value commit. Then, the next section enriches the algorithm imple-
menting this operation to obtain a deadlock-free fast mutual exclusion algorithm due
to L. Lamport (1987).
Concurrency-abortable operation A concurrency-abortable (also named conten-
tion-abortable and usually abbreviated abortable) operation is an operation that is
allowed to return the value abor t in the presence of concurrency. Otherwise, it has to
return the value commit. More precisely, let conc_abort_op() be such an operation.
Assuming that each process invokes it at most once (one-shot operation), the set of
invocations satisfies the following properties:
• Obligation. If the first process which invokes conc_abort_op() is such that its
invocation occurs in a concurrency-free pattern (i.e., no other process invokes
conc_abort_op() during its invocation), this process obtains the value commit.
• At most one. At most one process obtains the value commit.

An n-process concurrency-abortable algorithm Such an algorithm is described


in Fig. 2.12. As in the previous algorithms, it assumes that all the processes have
distinct identities, but differently from them, the number n of processes can be arbi-
trary and remains unknown to the processes.
This algorithm uses two MWMR atomic registers denoted X and Y . The register
X contains a process identity (its initial value being arbitrary). The register Y contains
a process identity or the default value ⊥ (which is its initial value). It is consequently
assumed that these atomic registers are made up of log2 (n + 1) bits.

Fig. 2.12 An n-process concurrency-abortable operation (code for pi )


2.1 Mutex Based on Atomic Read/Write Registers 31

When it invokes conc_abort_op(), a process pi first deposits its identity in X


(line 1) and then checks if the current value of Y is its initial value ⊥ (line 2). If
Y = ⊥, there is (at least) one process p j that has written into Y . In that case,
pi returns abor t1 (both abor t1 and abor t2 are synonyms of abor t; they are used
only to distinguish the place where the invocation of conc_abort_op() is “aborted”).
Returning abor t1 means that (from a concurrency point of view) pi was late: there
is another process that wrote into Y before pi reads it.
If Y = ⊥, process pi writes its identity into Y (line 4) and then checks if X is
still equal to its identity i (line 5). If this is the case, pi returns the value commit
at line 6 (its invocation of conc_abort_op(i) is then successful). If X = i, another
process p j has written its identity j into X , overwriting the identity i before pi reads
X at line 5. Hence, there is contention and the value abor t2 is returned to pi (line 7).
Returning abor t2 means that, among the competing processes that found y = ⊥, pi
was not the last to have written its name into X .
Remark Let us observe that the only test on Y is Y = ⊥ (line 2). It follows that Y
could be replaced by a flag with the associated domain {⊥, }. Line 4 should then
be replaced by Y ← .
Using such a flag is not considered here because we want to keep the notation
consistent with that of the fast mutex algorithm presented below. In the fast mutex
algorithm, the value of Y can be either ⊥ or any process identifier.
Theorem 4 The algorithm described in Fig. 2.12 guarantees that (a) at most
one process obtains the value commit and (b) if the first process that invokes
conc_abort_op() executes it in a concurrency-free pattern, it obtains the value
commit.
Proof The proof of property (b) stated in the theorem is trivial. If the first process
(say pi ) that invokes conc_abort_op() executes this operation in a concurrency-free
context, we have Y = ⊥ when it reads Y at line 2 and X = i when it reads X at line 5.
It follows that it returns commit at line 6.
Let us now prove property (a), i.e., that no two processes can obtain the value
commit. Let us assume for the sake of contradiction that a process pi has invoked
conc_abort_op(i) and obtained the value commit. It follows from the text of the
algorithm that the pattern of accesses to the atomic registers X and Y issued by pi
is the one described in Fig. 2.13 (when not considering the accesses by p j in that
figure). There are two cases.

• Let us first consider the (possibly empty) set Q of processes p j that read Y at line
2 after this register was written by pi or another process (let us notice that, due to
the atomicity of the registers X and Y , the notion of after/before is well defined).
As Y is never reset to ⊥, it follows that each process p j ∈ Q obtains a non-⊥
value from Y and consequently executes return(abor t1 ) at line 3.
32 2 Solving Mutual Exclusion

pi executes X ← i pi reads i from X

possibly some pj
executes X ← j pi executes Y ← i

time line

no process has modified X

possibly some pj has executed Y ← j

Fig. 2.13 Access pattern to X and Y for a successful conc_abort_op() invocation by process pi

• Let us now consider the (possibly empty) set Q


of processes p j distinct from
pi that read ⊥ from Y at line 2 concurrently with pi . Each p j ∈ Q
writes
consequently its identity j into Y at line 4.
As pi has read i from X (line 5), it follows that no process p j ∈ Q
has modified
X between the execution of line 1 and line 5 by pi (otherwise pi would not have
read i from X at line 5, see Fig. 2.13). Hence any process p j ∈ Q
has written X
(a) either before pi writes i into X or (b) after pi has read i from X . But, observe
that case (b) cannot happen. This is due to the following observation. A process pk
that writes X (at line 1) after pi has read i from this register (at line 5) necessarily
finds Y = ⊥ at line 4 (this is because pi has previously written i into Y at line 4
before reading i from X at line 5). Consequently, such a process pk belongs to the
set Q and not to the set Q
. Hence, the only possible case is that each p j ∈ Q
has
written j into X before pi writes i into X . It follows that pi is the last process of
Q
∪ { pi } which has written its identity into X .
We conclude from the previous observation that, when a process p j ∈ Q
reads X
at line 5, it obtains from this register a value different from j and, consequently,
its invocation conc_abort_op( j) returns the value abor t2 , which concludes the
proof of the theorem. 

The next corollary follows from the proof of the previous theorem.
Corollary 1 (Y = ⊥) ⇒ a process has obtained the value commit or several
processes have invoked conc_abort_op().

Theorem 5 Whatever the number of processes that invoke conc_abort_op(), any of


these invocations costs at most four accesses to atomic registers.
Proof The proof follows from a simple examination of the algorithm. 

Remark: splitter object When we (a) replace the value commit, abor t1 , and
abor t2 by stop, right, and left, respectively, and (b) rename the operation
2.1 Mutex Based on Atomic Read/Write Registers 33

conc_abort_op(i) as direction(i), we obtain a one-shot object called a splitter. A


one-shot object is an object that provides processes with a single operation and each
process invokes that operation at most once.
In a run in which a single process invokes direction(), it obtains the value stop.
In any run, if m > 1 processes invoke direction(), at most one process obtains the
value stop, at most (m − 1) processes obtain right, and at most (m − 1) processes
obtain left. Such an object is presented in detail in Sect. 5.2.1.

2.1.7 A Fast Mutex Algorithm

Principle and description This section presents L. Lamport’s fast mutex algo-
rithm, which is built from the previous one-shot concurrency-abortable operation.
More specifically, this algorithm behaves similarly to the algorithm of Fig. 2.12 in
contention-free scenarios and (instead of returning abort) guarantees the deadlock-
freedom liveness property when there is contention.
The algorithm is described in Fig. 2.14. The line numbering is the same as in
Fig. 2.12: the lines with the same number are the same in both algorithms, line N0 is
new, line N3 replaces line 3, lines N7.1–N7.5 replace line 7, and line N10 is new.
To attain its goal (both fast mutex and deadlock-freedom) the algorithm works as
follows. First, each process pi manages a SWMR flag FLAG[i] (initialized to down)

Fig. 2.14 Lamport’s fast mutex algorithm (code for pi )


34 2 Solving Mutual Exclusion

that pi sets to up to indicate that it is interested in the critical section (line N0). This
flag is reset to down when pi exits the critical section (line N10). As we are about
to see, it can be reset to down also in other parts of the algorithm.
According to the contention scenario in which a process pi returns abort in the
algorithm of Fig. 2.12, there are two cases to consider, which have been differentiated
by the values abort 1 and abort 2 .
• Eliminating abort 1 (line N3).
In this case, as we have seen in Fig. 2.12, process pi is “late”. As captured by
Corollary 1, this is because there are other processes that currently compete for
the critical section or there is a process inside the critical section. Line 3 of Fig. 2.12
is consequently replaced by the following statements (new line N3):
– Process pi first resets its flag to down in order not to prevent other processes
from entering the critical section (if no other process is currently inside it).
– According to Corollary 1, it is useless for pi to retry entering the critical section
while Y = ⊥. Hence, process pi delays its request for the critical section until
Y = ⊥.
• Eliminating abort 2 (lines N7.1–N7.5).
In this case, as we have seen in the base contention-abortable algorithm (Fig. 2.12),
several processes are competing for the critical section (or a process is already
inside the critical section). Differently from the base algorithm, one of the com-
peting processes has now to be granted the critical section (if no other process is
inside it). To that end, in order not to prevent another process from entering the
critical section, process pi first resets its flag to down (line N7.1). Then, pi tries
to enter the critical section. To that end, it first waits until all flags are down (line
N7.2). Then, pi checks the value of Y (line N7.3). There are two cases:
– If Y = i, process pi enters the critical section. This is due to the following
reason.
Let us observe that, if Y = i when pi reads it at line N7.3, then no process has
modified Y since pi set it to the value i at line 4 (the write of Y at line 4 and its
reading at line N7.3 follow the same access pattern as the write of X at line 1 and
its reading at line 5). Hence, process pi is the last process to have executed line
4. It then follows that, as it has (asynchronously) seen each flag equal to down
(line 7.2), process pi is allowed to enter the critical section (return() statement
at line N7.3).
– If Y = i, process pi does the same as what is done at line N3. As it has already
set its flag to down, it has only to wait until the critical section is released before
retrying to enter it (line N7.4). (Let us remember that the only place where Y is
reset to ⊥ is when a process releases the critical section.)
Fast path and slow path The fast path to enter the critical section is when pi
executes only the lines N0, 1, 2, 4, 5, and 6. The fast path is open for a process pi
2.1 Mutex Based on Atomic Read/Write Registers 35

if it reads i from X at line 5. This is the path that is always taken by a process in
contention-free scenarios.
The cost of the fast path is five accesses to atomic registers. As release_mutex()
requires two accesses to atomic registers, it follows that the cost of a single use of the
critical section in a contention-free scenario is seven accesses to atomic registers.
The slow path is the path taken by a process which does not take the fast path.
Its cost in terms of accesses to atomic registers depends on the current concurrency
pattern.
A few remarks A register FLAG[i] is set to down when pi exits the critical section
(line N10) but also at line N3 or N7.1. It is consequently possible for a process pk to
be inside the critical section while all flags are down. But let us notice that, when this
occurs, the value of Y is different from ⊥, and as already indicated, the only place
where Y is reset to ⊥ is when a process releases the critical section.
When executed by a process pi , the aim of the wait statement at line N3 is to
allow any other process p j to see that pi has set its flag to down. Without such a
wait statement, a process pi could loop forever executing the lines N0, 1, 2 and N3
and could thereby favor a livelock by preventing the other processes from seeing
FLAG[i] = down.
Theorem 6 Lamport’s fast mutex algorithm satisfies mutual exclusion and
deadlock-freedom.
Proof Let us first consider the mutual exclusion property. Let pi be a process that
is inside the critical section. Trivially, we have then Y = ⊥ and pi returned from
acquire_mutex() at line 6 or at line N7.3. Hence, there are two cases. Before consid-
ering these two cases, let us first observe that each process (if any) that reads Y after
it was written by pi (or another process) executes line N3: it resets its flag to down
and waits until Y = ⊥ (i.e., at least until pi exits the critical section, line N10). As
the processes that have read a non-⊥ value from Y at line 2 cannot enter the critical
section, it follows that we have to consider only the processes p j that have read ⊥
from Y at line 2.

• Process pi has executed return() at line 6.


In this case, it follows from a simple examination of the text of the algorithm that
FLAG[i] remains equal to up until pi exits the critical section and executes line
N10.
Let us consider a process pj that has read ⊥ from Y at line 2. As process pi has
executed line 6, it was the last process (among the competing processes which read
⊥ from Y ) to have written its identity into X (see Fig. 2.13) and consequently pj
cannot read j from X . As X = j when pj reads X at line 5, it follows that process
pj executes the lines N7.1–N7.5. When it executes line N7.2, pj remains blocked
until pi resets its flag to down, but as we have seen, pi does so only when it exits
the critical section. Hence, pj cannot be inside the critical section simultaneously
with pi . This concludes the proof of the first case.
36 2 Solving Mutual Exclusion

• Process pi has executed return() at line N7.3.


In this case, the predicate Y = i allowed pi to enter the critical section. Moreover,
the atomic register Y has not been modified during the period starting when it was
assigned the identity i at line 4 by pi and ending at the time at which pi read it at
line N7.3. It follows that, among the processes that read ⊥ from Y (at line 2), pi
is the last one to have updated Y .
Let us observe that X = j, otherwise p j would have entered the critical section at
line 6, and in that case (as shown in the previous item) pi could not have entered
the critical section.
As Y = i, it follows from the test of line N7.3 that p j executes line N7.4
and consequently waits until Y = ⊥. As Y is set to ⊥ only when a process
exits the critical section (line N10), it follows that p j cannot be inside the crit-
ical section simultaneously with pi , which concludes the proof of the second
case.

To prove the deadlock-freedom property, let us assume that there is a non-empty


set of processes that compete to enter the critical section and, from then on, no process
ever executes return() at line 6 or line N 7.3. We show that this is impossible.
As processes have invoked acquire_mutex() and none of them executes line 6,
it follows that there is among them at least one process px that has executed first
line N0 and line 1 (where it assigned its identity x to X ) and then line N3. This
assignment of x to X makes the predicate of line 5 false for the processes that have
obtained ⊥ from Y . It follows that the flag of these processes px are eventually reset
to down and, consequently, these processes cannot entail a permanent blocking of
any other process pi which executes line N7.2.
When the last process that used the critical section released it, it reset Y to ⊥
(if there is no such process, we initially have Y = ⊥). Hence, among the processes
that have invoked acquire_mutex(), at least one of them has read ⊥ from Y . Let Q
be this (non-empty) set of processes. Each process of Q executes lines N7.1–N7.5
and, consequently, eventually resets its flag to down (line N7.1). Hence, the predicate
evaluated in the wait statement at line N7.2 eventually becomes satisfied and the
processes of Q which execute the lines N7.1–N7.5 eventually check at line N7.3 if
the predicate Y = i is satisfied. (Due to asynchrony, it is possible that the predicate
used at N7.2 is never true when evaluated by some processes. This occurs for the
processes of Q which are slow while another process of Q has entered the critical
section and invoked acquire_mutex() again, thereby resetting its flag to up. The
important point is that this can occur only if some process entered the critical section,
hence when there is no deadlock.)
As no process is inside the critical section and the number of processes is finite,
there is a process p j that was the last process to have modified Y at line 4. As (by
assumption) p j has not executed return() at line 6, it follows that it executes line
N7.3 and, finding Y = j, it executes return(), which contradicts our assumption and
consequently proves the deadlock-freedom property. 
2.1 Mutex Based on Atomic Read/Write Registers 37

2.1.8 Mutual Exclusion in a Synchronous System

Synchronous system Differently from an asynchronous system (in which there is


no time bound), a synchronous system is characterized by assumptions on the speed
of processes. More specifically, there is a bound  on the speed of processes and this
bound is known to them (meaning that  can be used in the code of the algorithms).
The meaning of  is the following: two consecutive accesses to atomic registers by
a process are separated by at most  time units.
Moreover, the system provides the processes with a primitive delay(d), where d
is a positive duration, which stops the invoking process for a finite duration greater
than d. The synchrony assumption applies only to consecutive accesses to atomic
registers that are not separated by a delay() statement.
Fischer’s algorithm A very simple mutual exclusion algorithm (due to M. Fischer)
is described in Fig. 2.15. This algorithm uses a single atomic register X (initialized
to ⊥) that, in addition to ⊥, can contain any process identity.
When a process pi invokes acquire_mutex(i), it waits until X = ⊥. Then it
writes its identity into X (as before, it is assumed that no two processes have the
same identity) and invokes delay(). When it resumes its execution, it checks if X
contains its identity. If this is the case, its invocation acquire_mutex(i) terminates
and pi enters the critical section. If X = i, it re-executes the loop body.
Theorem 7 Let us assume that the number of processes is finite and all have
distinct identities. Fischer’s mutex algorithm satisfies mutual exclusion and deadlock-
freedom.
Proof To simplify the statement of the proof we consider that each access to an
atomic register is instantaneous. (Considering that such accesses take bounded dura-
tion is straightforward.)
Proof of the mutual exclusion property. Assuming that, at some time, processes
invoke acquire_mutex(), let C be the subset of them whose last read of X returned ⊥.
Let us observe that the ones that read a non-⊥ value from X remain looping in the

Fig. 2.15 Fischer’s synchronous mutex algorithm (code for pi )


38 2 Solving Mutual Exclusion

Fig. 2.16 Accesses to X by a process p j

wait statement at line 1. By assumption, C is finite. Due to the atomicity of the


register X and the fact that all processes in C write into X , there is a last process (say
pi ) that writes its identity into X .
Given any process p j of C let us define the following time instants (Fig. 2.16):
• τ 0j = time at which p j reads the value ⊥ from X (line 1),
• τ 1j = time at which p j writes its identity j into X (line 2), and
• τ 2j = time at which p j reads X (line 4) after having executed the delay() state-
ment (line 3).
Due to the synchrony assumption and the delay() statement we have τ 1j ≤ τ 0j + 
(P1) and τ 2j > τ 1j +  (P2). We show that, after pi has written i into X , this register
remains equal to i until pi resets it to ⊥ (line 6) and any process p j of C reads
i from X at line 4 from which follows the mutual exclusion property. This is the
consequence of the following observations:
1. τ 1j +  < τ 2j (property P2),
2. τi0 < τ 1j (otherwise pi would not have read ⊥ from X at line 1),
3. τi0 +  < τ 1j +  (adding  to both sides of the previous line),
4. τi1 ≤ τi0 +  < τ 1j +  < τ 2j (from P1 and the previous items 1 and 3).

It then follows from the fact that pi is the last process which wrote into X and τ 2j > τi1
that p j reads i from X at line 4 and consequently does enter the repeat loop again
and waits until X = ⊥. The mutual exclusion property follows.
Proof of the deadlock-freedom property. This is an immediate consequence of
the fact that, among the processes that have concurrently invoked the operation
acquire_mutex(), the last process that writes X ( pi in the previous reasoning) reads
its own identity from X at line 4. 
Short discussion The main property of this algorithm is its simplicity. Moreover,
its code is independent of the number of processes.

2.2 Mutex Based on Specialized Hardware Primitives

The previous section presented mutual exclusion algorithms based on atomic read/
write registers. These algorithms are important because understanding their design
and their properties provides us with precise knowledge of the difficulty and subtleties
2.2 Mutex Based on Specialized Hardware Primitives 39

that have to be addressed when one has to solve synchronization problems. These
algorithms capture the essence of synchronization in a read/write shared memory
model.
Nearly all shared memory multiprocessors propose built-in primitives (i.e., atomic
operations implemented in hardware) specially designed to address synchroniza-
tion issues. This section presents a few of them (the ones that are the most
popular).

2.2.1 Test&Set, Swap, and Compare&Swap

The test&set()/reset() primitives This pair of primitives, denoted test&set() and


reset(), is defined as follows. Let X be a shared register initialized to 1.
• X.test&set() sets X to 0 and returns its previous value.
• X.reset() writes 1 into X (i.e., resets X to its initial value).
Given a register X , the operations X.test&set() and X.reset() are atomic. As we have
seen, this means that they appear as if they have been executed sequentially, each
one being associated with a point of the time line (that lies between its beginning
and its end).
As shown in Fig. 2.17 (where r is local variable of the invoking process), solv-
ing the mutual exclusion problem (or equivalently implementing a lock object),
can be easily done with a test&set register. If several processes invoke simultane-
ously X.test&set(), the atomicity property ensures that one and only of them wins
(i.e., obtains the value 1 which is required to enter the critical section). Releasing the
critical section is done by resetting X to 1 (its initial value). It is easy to see that this
implementation satisfies mutual exclusion and deadlock-freedom.
The swap() primitive Let X be a shared register. The primitive denoted X.swap(v)
atomically assigns v to X and returns the previous value of X .
Mutual exclusion can be easily solved with a swap register X . Such an algorithm is
depicted in Fig. 2.18 where X is initialized to 1. It is assumed that the invoking process

Fig. 2.17 Test&set-based mutual exclusion


40 2 Solving Mutual Exclusion

Fig. 2.18 Swap-based mutual exclusion

does not modify its local variable r between acquire_mutex() and release_mutex()
(or, equivalently, that it sets r to 1 before invoking release_mutex()). The test&set-
based algorithm and the swap-based algorithm are actually the very same algorithm.
Let ri be the local variable used by each process pi . Due to the atomicity property
and the “exchange of values” semantics of the swap() primitive, it is easy to see the
swap-based algorithm is characterized by the invariant X + 1≤i≤n ri = 1.
The compare&swap() primitive Let X be a shared register and old and new
be two values. The semantics of the primitive X.compare&swap(old, new), which
returns a Boolean value, is defined by the following code that is assumed to be
executed atomically.
X.compare&swap(old, new) is
if (X = old) then X ← new; return(true)
else return(false)
end if.

The primitive compare&swap() is an atomic conditional write; namely, the write


of new into X is executed if and only if X = old. Moreover, a Boolean value is
returned that indicates if the write was successful. This primitive (or variants of it)
appears in Motorola 680x0, IBM 370, and SPARC architectures. In some variants,
the primitive returns the previous value of X instead of a Boolean.
A compare&swap-based mutual exclusion algorithm is described in Fig. 2.19
in which X is an atomic compare&swap register initialized to 1. (no-op means
“no operation”.) The repeat statement is equivalent to wait (X.compare&swap
(1, 0)); it is used to stress the fact that it is an active waiting. This algorithm is
nearly the same as the two previous ones.

2.2.2 From Deadlock-Freedom to Starvation-Freedom

A problem due to asynchrony The previous primitives allow for the (simple)
design of algorithms that ensure mutual exclusion and deadlock-freedom. Said dif-
ferently, these algorithms do not ensure starvation-freedom.
2.2 Mutex Based on Specialized Hardware Primitives 41

Fig. 2.19 Compare&swap-based mutual exclusion

As an example, let us consider the test&set-based algorithm (Fig. 2.17). It is


possible that a process pi executes X.test&set() infinitely often and never obtains
the winning value 1. This is a simple consequence of asynchrony: if, infinitely often,
other processes invoke X.test&set() concurrently with pi (some of these processes
enter the critical section, release it, and re-enter it, etc.), it is easy to construct a
scenario in which the winning value is always obtained by only a subset of processes
not containing pi . If X infinitely often switches between 1 to 0, an infinite number
of accesses to X does not ensure that one of these accesses obtains the value 1.
From deadlock-freedom to starvation-freedom Considering that we have an
underlying lock object that satisfies mutual exclusion and deadlock-freedom, this
section presents an algorithm that builds on top of it a lock object that satisfies the
starvation-freedom property. Its principle is simple: it consists in implementing a
round-robin mechanism that guarantees that no request for the critical section is
delayed forever. To that end, the following underlying objects are used:
• The underlying deadlock-free lock is denoted LOCK. Its two operations are
LOCK.acquire_lock(i) and LOCK.release_lock(i), where i is the identity of the
invoking process.
• An array of SWMR atomic registers denoted FLAG[1..n] (n is the number of
processes, hence this number has to be known). For each i, FLAG[i] is initialized
to down and can be written only by pi . In a very natural way, process pi sets
FLAG[i] to up when it wants to enter the critical section and resets it to down
when it releases it.
• TURN is an MWMR atomic register that contains the process which is given
priority to enter the critical section. Its initial value is any process identity.
Let us notice that accessing FLAG[TURN] is not an atomic operation. A process
pi has first to obtain the value v of TURN and then address FLAG[v]. Moreover,
due to asynchrony, between the read by pi first of TURN and then of FLAG[v],
the value of TURN has possibly been changed by another process p j .
The behavior of a process pi is described in Fig. 2.20. It is as follows. The processes
are considered as defining a logical ring pi , pi+1 , . . . , pn , p1 , . . . , pi . At any time,
42 2 Solving Mutual Exclusion

Fig. 2.20 From deadlock-freedom to starvation-freedom (code for pi )

the process pTURN is the process that has priority and p(TURN mod n)+1 is the next
process that will have priority.
• When a process pi invokes acquire_mutex(i) it first raises its flag to inform the
other processes that it is interested in the critical section (line 1). Then, it waits
(repeated checks at line 2) until it has priority (predicate TURN = i) or the process
that is currently given the priority is not interested (predicate FLAG[TURN] =
down). Finally, as soon as it can proceed, it invokes LOCK.acquire_lock(i)
in order to obtain the underlying lock (line 3). (Let us remember that reading
FLAG[TURN] requires two shared memory accesses.)
• When a process pi invokes release_mutex(i), it first resets its flag to down
(line 5). Then, if (from pi ’s point view) the process that is currently given priority
is not interested in the critical section (i.e., the predicate FLAG[TURN] = down
is satisfied), then pi makes TURN progress to the next process (line 6) on the ring
before releasing the underlying lock (line 7).
Remark 1 Let us observe that the modification of TURN by a process pi is always
done in the critical section (line 6). This is due to the fact that pi modifies TURN
after it has acquired the underlying mutex lock and before it has released it.
Remark 2 Let us observe that a process pi can stop waiting at line 2 because it finds
TURN = i while another process p j increases TURN to ((i + 1) mod n) because it
does not see that FLAG[i] has been set to up. This situation is described in Fig. 2.21.
Theorem 8 Assuming that the underlying mutex lock LOCK is deadlock-free, the
algorithm described in Fig. 2.20 builds a starvation-free mutex lock.

Proof We first claim that, if at least one process invokes acquire_mutex(), then
at least one process invokes LOCK.acquire_lock() (line 3) and enters the critical
section.
2.2 Mutex Based on Specialized Hardware Primitives 43

pi executes lines 1 and 2

pi updates FLAG [i] to up pi reads TURN = i

pi’s side

pj ’s side
pj reads FLAG [i] = down
pj reads TURN = i pj updates TURN to ((i + 1) mod n)

pj executes line 6

Fig. 2.21 A possible case when going from deadlock-freedom to starvation-freedom

Proof of the claim. Let us first observe that, if processes invoke LOCK.acquire_
lock(), one of them enters the critical section (this follows from the fact that the
lock is deadlock-free). Hence, X being the non-empty set of processes that invoke
acquire_mutex(), let us assume by contradiction that no process of X terminates
the wait statement at line 2. It follows from the waiting predicate that TURN ∈ / X
and FLAG[TURN] = up. But, FLAG[TURN] = up implies TURN ∈ X , which
contradicts the previous waiting predicate and concludes the proof of the claim.
Let pi be a process that has invoked acquire_mutex(). We have to show that
it enters the critical section. Due to the claim, there is a process pk that holds the
underlying lock. If pk is pi , the theorem follows, hence let pk = pi . When pk exits
the critical section it executes line 6. Let TURN = j when pk reads it. We consider
two cases:
1. FLAG[ j] = up. Let us observe that p j is the only process that can write into
FLAG[ j] and that it will do so at line 5 when it exits the critical section. More-
over, as TURN = j, p j is not blocked at line 2 and consequently invokes
LOCK.acquire_lock() (line 3).
We first show that eventually p j enters the critical section. Let us observe that
all the processes which invoke acquire_mutex() after FLAG[ j] was set to up
and TURN was set to j remain blocked at line 2 (Observation OB). Let Y be
the set of processes that compete with p j for the lock with y = |Y |. We have
0 ≤ y ≤ n − 1. It follows from observation OB and the fact that the lock is
deadlock-free that the number of processes that compete with p j decreases from
y to y − 1, y − 2, etc., until p j obtains the lock and executes line 5 (in the worst
case, p j is the last of the y processes to obtain the lock).
If pi is p j or a process that has obtained the lock before p j , the theorem follows
from the previous reasoning. Hence, let us assume that pi has not obtained the
lock. After p j has obtained the lock, it eventually executes lines 5 and 6. As
TURN = j and p j sets FLAG[ j] to down, it follows that p j updates the register
TURN to  = ( j mod n)+1. The previous reasoning, where k and j are replaced
by j and , is then applied again.
44 2 Solving Mutual Exclusion

2. FLAG[j] = down. In this case, pk updates TURN to  = ( j mod n) + 1. If


 = i, the previous reasoning (where p j is replaced by pi ) applies and it follows
that pi obtains the lock and enters the critical section.
If  = i, let pk
be the next process that enters the critical section (due to the
claim, such a process does exist). Then, the same reasoning as in case 1 applies,
where k is replaced by k
.
As no process is skipped when TURN is updated when processes invoke release_
mutex(), it follows from the combination of case 1 and case 2 that eventually case 1
where p j = pi applies and consequently pi obtains the deadlock-free lock. 

Fast starvation-free mutual exclusion Let us consider the case where a process pi
wants to enter the critical section, while no other process is interested in entering it.
We have the following:
• The invocation of acquire_mutex(i) requires at most three accesses to the shared
memory: one to set the register FLAG[i] to up, one to read TURN and save it in a
local variable tur n, and one to read FLAG[tur n].
• Similarly, the invocation by pi of release_mutex(i) requires at most four accesses
to the shared memory: one to reset FLAG[i] to down, one to read TURN and save
it in a local variable tur n, one to read FLAG[tur n], and a last one to update TURN.
It follows from this observation that the stacking of the algorithm of Fig. 2.20
on top of the algorithm described in Fig. 2.14 (Sect. 2.1.7), which implements a
deadlock-free fast mutex lock, provides a fast starvation-free mutex algorithm.

2.2.3 Fetch&Add

Let X be a shared register. The primitive X.fetch&add() atomically adds 1 to X and


returns the new value. (In some variants the value that is returned is the previous
value of X . In other variants, a value c is passed as a parameter and, instead of being
increased by 1, X becomes X + c.)
Such a primitive allows for the design of a simple starvation-free mutex algorithm.
Its principle is to use a fetch&add atomic register to generate tickets with consecutive
numbers and to allow a process to enter the critical section when its ticket number
is the next one to be served.
An algorithm based on this principle is described in Fig. 2.22. The variable
TICKET is used to generate consecutive ticket values, and the variable NEXT indi-
cates the next winner ticket number. TICKET is initialized to 0, while NEXT is
initialized to 1.
When it invokes acquire_mutex(), a process pi takes the next ticket, saves it in
its local variable my_tur n, and waits until its turn occurs, i.e., until (my_tur n =
NEXT ). An invocation of release_mutex() is a simple increase of the atomic register
NEXT .
2.2 Mutex Based on Specialized Hardware Primitives 45

Fig. 2.22 Fetch&add-based mutual exclusion

Let us observe that, while NEXT is an atomic MWMR register, the operation
NEXT ← NEXT + 1 is not atomic. It is easy to see that no increase of NEXT can be
missed. This follows from the fact that the increase statement NEXT ← NEXT + 1
appears in the operation release_mutex(), which is executed by a single process at a
time.
The mutual exclusion property follows from the uniqueness of each ticket number,
and the starvation-freedom property follows from the fact that the ticket numbers are
defined from a sequence of consecutive known values (here the increasing sequence
of positive integers).

2.3 Mutex Without Atomicity

This section presents two mutex algorithms which rely on shared read/write registers
weaker than read/write atomic registers. In that sense, they implement atomicity
without relying on underlying atomic objects.

2.3.1 Safe, Regular, and Atomic Registers

The algorithms described in this section rely on safe registers. As shown here, safe
registers are the weakest type of shared registers that we can imagine while being
useful, in the presence of concurrency.
As an atomic register, a safe register (or a regular register) R provides the processes
with a write operation denoted R.write(v) (or R ← v), where v is the value that is
written and a read operation R.read() (or local ← R, where local is a local variable
of the invoking process). Safe, regular and atomic registers differ in the value returned
by a read operation invoked in the presence of concurrent write operations.
Let us remember that the domain of a register is the set of values that it can contain.
As an example, the domain of a binary register is the set {0, 1}.
46 2 Solving Mutual Exclusion

SWMR safe register An SWMR safe register is a register whose read operation
satisfies the following properties (the notion of an MWMR safe register will be
introduced in Sect. 2.3.3):
• A read that is not concurrent with a write operation (i.e., their executions do not
overlap) returns the current value of the register.
• A read that is concurrent with one (or several consecutive) write operation(s) (i.e.,
their executions do overlap) returns any value that the register can contain.
It is important to see that, in the presence of concurrent write operations, a read can
return a value that has never been written. The returned value has only to belong to
the register domain. As an example, let the domain of a safe register R be {0, 1, 2, 3}.
Assuming that R = 0, let R.write(2) be concurrent with a read operation. This read
can return 0, 1, 2, or 3. It cannot return 4, as this value is not in the domain of R, but
can return the value 3, which has never been written.
A binary safe register can be seen as modeling a flickering bit. Whatever its
previous value, the value of the register can flicker during a write operation and
stabilizes to its final value only when the write finishes. Hence, a read that overlaps
with a write can arbitrarily return either 0 or 1.
SWMR regular register An SWMR regular register is an SWMR safe register
that satisfies the following property. This property addresses read operations in thee
presence of concurrency. It replaces the second item of the definition of a safe register.
• A read that is concurrent with one or several write operations returns the value of
the register before these writes or the value written by any of them.
An example of a regular register R (whose domain is the set {0, 1, 2, 3, 4}) written
by a process p1 and read by a process p2 is described in Fig. 2.23. As there is no
concurrent write during the first read by p2 , this read operation returns the current
value of the register R, namely 1. The second read operation is concurrent with three
write operations. It can consequently return any value in {1, 2, 3, 4}. If the register
was only safe, this second read could return any value in {0, 1, 2, 3, 4}.
Atomic register The notion of an atomic register was defined in Sect. 2.1.1. Due
to the total order on all its operations, an atomic register is more constrained (i.e.,
stronger) than a regular register.

R.write(1) R.write(2) R.write(3) R.write(4)


p1

p2
R.read() → 1 R.read() → v

Fig. 2.23 An execution of a regular register


2.3 Mutex Without Atomicity 47

R.write(1) R.write(0) R.write(0)


p1

p2
R.read() → 1 R.read() → a R.read() → b R.read() → 0 R.read() → c

Fig. 2.24 An execution of a register

Table 2.1 Values returned by safe, regular and atomic registers


Value returned a b c Number of correct executions
Safe 1/0 1/0 1/0 8
Regular 1/0 1/0 0 4
Atomic 1 1/0 0 3
Atomic 0 0 0

To illustrate the differences between safe, regular, and atomic, Fig. 2.24 presents
an execution of a binary register R and Table 2.1 describes the values returned by
the read operations when the register is safe, regular, and atomic. The first and third
read by p2 are issued in a concurrency-free context. Hence, whatever the type of the
register, the value returned is the current value of the register R.

• If R is safe, as the other read operations are concurrent with a write operation,
they can return any value (i.e., 0 or 1 as the register is binary). This is denoted 0/1
in Table 2.1.
It follows that there are eight possible correct executions when the register R is
safe for the concurrency pattern depicted in Fig. 2.24.
• If R is regular, each of the values a and b returned by the read operation which
is concurrent with R.write(0) can be 1 (the value of R before the read oper-
ation) or 0 (the value of R that is written concurrently with the read operation).

Differently, the value c returned by the last read operation can only be 0 (because
the value that is written concurrently does not change the value of R).
It follows that there are only four possible correct executions when the register R
is regular.
• If R is atomic, there are only three possible executions, each corresponding to a
correct sequence of read and write invocations (“correct” means that the sequence
respects the real-time order of the invocations and is such that each read invocation
returns the value written by the immediately preceding write invocation).
48 2 Solving Mutual Exclusion

2.3.2 The Bakery Mutex Algorithm

Principle of the algorithm The mutex algorithm presented in this section is due to
L. Lamport (1974) who called it the mutex bakery algorithm. It was the first algorithm
ever designed to solve mutual exclusion on top of non-atomic registers, namely on
top of SWMR safe registers. The principle that underlies its design (inspired from
bakeries where a customer receives a number upon entering the store, hence the
algorithm name) is simple. When a process pi wants to acquire the critical section,
it acquires a number x that defines its priority, and the processes enter the critical
section according to their current priorities.
As there are no atomic registers, it is possible that two processes obtain the same
number. A simple way to establish an order for requests that have the same number
consists in using the identities of the corresponding processes. Hence, let a pair x, i
define the identity of the current request issued by pi . A total order is defined for
the requests competing for the critical section as follows, where x, i and y, j
are the identities of two competing requests; x, i < y, j means that the request
identified by x, i has priority over the request identified by y, j where “<” is
defined as the lexicographical ordering on pairs of integers, namely

x, i < y, j ≡ (x < y) ∨ ((x = y) ∧ (i < j)).

Description of the algorithm Two SWMR safe registers, denoted FLAG[i] and
MY _TURN[i], are associated with each process pi (hence these registers can be read
by any process but written only by pi ).
• MY _TURN[i] (which is initialized to 0 and reset to that value when pi exits the
critical section) is used to contain the priority number of pi when it wants to use the
critical section. The domain of MY _TURN[i] is the set of non-negative integers.
• FLAG[i] is a binary control variable whose domain is {down, up}. Initialized to
down, it is set to up by pi while it computes the value of its priority number
MY _TURN[i].
The sequence of values taken by FLAG[i] is consequently the regular expression
down(up, down)∗ . The reader can verify that a binary safe register whose write
operations of down and up alternate behaves as a regular register.
The algorithm of a process pi is described in Fig. 2.25. When it invokes acquire_
mutex(), process pi enters a “doorway” (lines 1–3) in which it computes its turn
number MY _TURN[i] (line 2). To that end it selects a number greater than all
MY _TURN[ j], 1 ≤ j ≤ n. It is possible that pi reads some MY _TURN[ j] while it is
written by p j . In that case the value obtained from MY _TURN[ j] can be any value.
Moreover, a process informs the other processes that it is computing its turn value by
raising its flag before this computation starts (line 1) and resetting it to down when
it has finished (line 3). Let us observe that a process is never delayed while in the
doorway, which means no process can direct another process to wait in the doorway.
2.3 Mutex Without Atomicity 49

Fig. 2.25 Lamport’s bakery mutual exclusion algorithm

After it has computed its turn value, a process pi enters a “waiting room” (lines
4–7) which consists of a for loop with one loop iteration per process p j . There are
two cases:
• If p j does not want to enter the critical section, we have FLAG[ j] = down ∧
MY _TURN[ j] = 0. In this case, pi proceeds to the next iteration without being
delayed by p j .
• Otherwise, pi waits until FLAG[ j] = down (i.e., until p j has finished to compute
its turn, line 5) and then waits until either p j has exited the critical section (predicate
MY _TURN[ j] = 0) or pi ’s current request has priority over p j ’s one (predicate
(MY _TURN[i], i) < (MY _TURN[ j], j)).
When pi has priority with respect to each other process (these priorities being
checked in an arbitrary order, one after the other) it enters the critical section
(line 8).
Finally, when it exits the critical section, the only thing a process pi has to do is
to reset MY _TURN[i] to 0 (line 9).
Remark: process crashes Let us consider the case where a process may crash (i.e.,
stop prematurely). It is easy to see that the algorithm works despite this type of failure
if, after a process pi has crashed, its two registers FLAG[i] and MY _TURN[i] are
eventually reset to their initial values. When this occurs, the process pi is considered
as being no longer interested in the critical section.
A first in first out (FIFO) order As already indicated, the priority of a process
pi over a process p j is defined from the identities of their requests, namely the pairs
MY _TURN[i], i and MY _TURN[ j], j. Moreover, let us observe that it is not
possible to predict the values of these pairs when pi and p j compute concurrently
the values of MY _TURN[i] and MY _TURN[ j].
50 2 Solving Mutual Exclusion

Let us consider two processes pi and p j that have invoked acquire_mutex() and
where pi has executed its doorway part (line 2) before p j has started executing its
doorway part. We will see that the algorithm guarantees a FIFO order property defined
as follows: pi terminates its invocation of acquire_mutex() (and consequently enters
the critical section) before p j . This FIFO order property is an instance of the bounded
bypass liveness property with f (n) = n − 1.
Definitions The following time instant definitions are used in the proof of
Theorem 9. Let px be a process. Let us remember that, as the read and write operations
on the registers are not atomic, they cannot be abstracted as having been executed
instantaneously. Hence, when considering the execution of such an operation, its
starting time and its end time are instead considered.
The number that appears in the following definitions corresponds to a line number
(i.e., to a register operation). Moreover, “b” stands for “beginning” while “e” stands
for “end”.
1. τex (1) is the time instant at which px terminates the assignment FLAG[x] ← up
(line 1).
2. τex (2) is the time instant at which px terminates the execution of line 2. Hence,
at time τex (2) the non-atomic register MY _TURN[x] contains the value used by
px to enter the critical section.
3. τbx (3) is the time instant at which px starts the execution of line 3. This means that
a process that reads FLAG[x] during the time interval [τex (1)..τbx (3)] necessarily
obtains the value up.
4. τbx (5, y) is the time instant at which px starts its last evaluation of the waiting
predicate (with respect to FLAG[y]) at line 5. This means that px has obtained
the value down from FLAG[y].
5. Let us notice that, as it is the only process which writes into MY _TURN[x],
px can save its value in a local variable. This means that the reading of
MY _TURN[x] entails no access to the shared memory. Moreover, as far as a
register MY _TURN[y] (y = x) is concerned, we consider that px reads it once
each time it evaluates the predicate of line 6.
τbx (6, y) is the time instant at which px starts its last reading of MY _TURN[y].
Hence, the value tur n it reads from MY _TURN[y] is such that (tur n =
0) ∨ MY _TURN[x], x < tur n, y.

Terminology Let us remember that a process px is “in the doorway” when it


executes line 2. We also say that it “is in the bakery” when it executes lines 4–9.
Hence, when it is in the bakery, px is in the waiting room, inside the critical section,
or executing release_mutex(x).
Lemma 1 Let pi and p j be two processes that are in the bakery and such
that pi entered the bakery before p j enters the doorway. Then MY _TURN[i] <
MY _TURN[ j].
2.3 Mutex Without Atomicity 51

Proof Let tur n i be the value used by pi at line 6. As pi is in the bakery (i.e., exe-
cuting lines 4–9) before p j enters the doorway (line 2), it follows that MY _TURN[i]
was assigned the value tur n i before p j reads it at line 2. Hence, when p j reads the
safe register MY _TURN[i], there is no concurrent write and p j consequently obtains
the value tur n i . It follows that the value tur n j assigned by p j to MY _TURN[ j] is
such that tur n j ≥ tur n i + 1, from which the lemma follows. 
Lemma 2 Let pi and p j be two processes such that pi is inside the critical section
while p j is in the bakery. Then MY _TURN[i], i < MY _TURN[ j], j.
Proof Let us notice that, as p j is inside the bakery, it can be inside the critical
section.
As process pi is inside the critical section, it has read down from FLAG[ j] at
line 5 (and exited the corresponding wait statement). It follows that, according to the
timing of this read of FLAG[ j] that returned the value down to pi and the updates
of FLAG[ j] by p j to up at line 1 or down at line 3 (the only lines where FLAG[ j]
is modified), there are two cases to consider (Fig. 2.26).
j
As pi reads down from FLAG[ j], we have either τbi (5, j) < τe (1) or τei (5, j) >
j j
τb (3) (see Fig. 2.26). This is because if we had τbi (5, j) > τe (1), pi would
necessarily have read up from FLAG[ j] (left part of the figure), and, if we had
j
τbi (5, j) < τb (3), pi would necessarily have also read up from FLAG[ j] (right part
of the figure). Let us consider each case:
j
• Case 1: τbi (5, j) < τe (1) (left part of Fig. 2.26). In this case process, pi has entered
the bakery before process p j enters the doorway. It then follows from Lemma 1
that MY _TURN[i] < MY _TURN[ j], which proves the lemma for this case.
j
• Case 2: τei (5, j) > τb (3) (right part of Fig. 2.26). As p j is sequential, we have
j j
τe (2) < τb (3) (P1). Similarly, as pi is sequential, we also have τbi (5, j) < τbi (6, j)
j
(P2). Combing (P1), (P2), and the case assumption, namely τb (3) < τbi (5, j), we
obtain
j j
τe (2) < τb (3) < τei (5, j) < τbi (6, j);

FLAG [j] → down FLAG [j] → down


pi pi
τej (1) τbj (3)
τbi(5, j) τei(5, j)
pj pj
FLAG [j] ← up FLAG [j] ← down

Fig. 2.26 The two cases where p j updates the safe register FLAG[ j]
52 2 Solving Mutual Exclusion

j
i.e., τe (2) < τbi (6, j) from which we conclude that the last read of
MY _TURN[ j] by pi occurred after the safe register MY _TURN[ j] obtained its
value (say tur n j ).
As pi is inside the critical section (lemma assumption), it exited the second wait
statement because (MY _TURN[ j] = 0)∨MY _TURN[i], i<MY _TURN[ j], j.
j
Moreover, as p j was in the bakery before pi executed line 6 (τe (2) < τbi (6, j)), we
have MY _TURN[ j] = tur n j = 0. It follows that we have MY _TURN[i], i <
MY _TURN[ j], j, which terminates the proof of the lemma. 

Theorem 9 Lamport’s bakery algorithm satisfies mutual exclusion and the bounded
bypass liveness property where f (n) = n − 1.
Proof Proof of the mutual exclusion property. The proof is by contradiction. Let
us assume that pi and p j (i = j) are simultaneously inside the critical section. We
have the following:
• As pi is inside the critical section and p j is inside the bakery, we can apply Lemma
2. We then obtain: MY _TURN[i], i < MY _TURN[ j], j.
• Similarly, as p j is inside the critical section and pi is inside the bakery, applying
Lemma 2, we obtain: MY _TURN[ j], j < MY _TURN[i], i.
As i = j, the pairs MY _TURN[ j], j and MY _TURN[i], i are totally ordered.
It follows that each item contradicts the other, from which the mutex property follows.
Proof of the FIFO order liveness property. The proof shows first that the algo-
rithm is deadlock-free. It then shows that the algorithm satisfies the bounded
bypass property where f (n) = n − 1 (i.e., the FIFO order as defined on the pairs
MY _TURN[x], x).
The proof that the algorithm is deadlock-free is by contradiction. Let us assume
that processes have invoked acquire_mutex() and no process exits the waiting room
(lines 4–7). Let Q be this set of processes. (Let us notice that, for any other process
p j , we have FLAG[ j] = down and MY _TURN[ j] = 0.) As the number of processes
is bounded and no process has to wait in the doorway, there is a time after which
we have ∀ j ∈ {1, . . . , n} : FLAG[ j] = down, from which we conclude that no
process of Q can be blocked forever in the wait statement of line 5.
By construction, the pairs MY _TURN[x], x of the processes px ∈ Q are totally
ordered. Let MY _TURN[i], i be the smallest one. It follows that, eventually, when
evaluated by pi , the predicate associated with the wait statement of line 6 is satisfied
for any j. Process pi then enters the critical section, which contradicts the deadlock
assumption and proves that the algorithm is deadlock-free.
To show the FIFO order liveness property, let us consider a pair of processes pi
and p j that are competing for the critical section and such that p j wins and after
exiting the critical section it invokes acquire_mutex( j) again, executes its doorway,
and enters the bakery. Moreover, let us assume that pi is still waiting to enter the
critical section. Let us observe that we are then in the context defined in Lemma 1: pi
and p j are in the bakery and pi entered the bakery before p j enters the doorway.
2.3 Mutex Without Atomicity 53

We then have MY _TURN[i] < MY _TURN[ j], from which we conclude that p j
cannot bypass again pi . As there are n processes, in the worst case pi is competing
with all other processes. Due to the previous observation and the fact that there is
no deadlock, it can lose at most n − 1 competitions (one with respect to each other
process p j (which enters the critical section before pi ), which proves the bounded
bypass liveness property with f (n) = n − 1. 

2.3.3 A Bounded Mutex Algorithm

This section presents a second mutex algorithm which does not require underlying
atomic registers. This algorithm is due to A. Aravind (2011). Its design principles
are different from the ones of the bakery algorithm.
Principle of the algorithm The idea that underlies the design of this algorithm is to
associate a date with each request issued by a process and favor the competing process
which has the oldest (smallest) request date. To that end, the algorithm ensures that
(a) the dates associated with requests are increasing and (b) no two process requests
have the same date.
More precisely, let us consider a process pi that exits the critical section. The
date of its next request (if any) is computed in advance when, just after pi has used
the critical section, it executes the corresponding release_mutex() operation. In that
way, the date of the next request of a process is computed while this process is still
“inside the critical section”. As a consequence, the sequence of dates associated with
the requests is an increasing sequence of consecutive integers and no two requests
(from the same process or different processes) are associated with the same date.
From a liveness point of view, the algorithm can be seen as ensuring a least
recently used (LRU) priority: the competing process whose previous access to the
critical section is the oldest (with respect to request dates) is given priority when it
wants to enter the critical section.
Safe registers associated with each process The following three SWMR safe
registers are associated with each process pi :
• FLAG[i], whose domain is {down, up}. It is initialized to up when pi wants to
enter the critical section and reset to down when pi exits the critical section.
• If pi is not competing for the critical section, the safe register DATE[i] contains the
(logical) date of its next request to enter the critical section. Otherwise, it contains
the logical date of its current request.
DATE[i] is initialized to i. Hence, no two processes start with the same date for
their first request. As already indicated, pi will compute its next date (the value
that will be associated with its next request for the critical section) when it exits
the critical section.
• STAGE[i] is a binary control variable whose domain is {0, 1}. Initialized to 0,
it is set to 1 by pi when pi sees DATE[i] as being the smallest date among the
54 2 Solving Mutual Exclusion

Fig. 2.27 Aravind’s mutual exclusion algorithm

dates currently associated with the processes that it perceives as competing for the
critical section. The sequence of successive values taken by STAGE[i] (including
its initial value) is defined by the regular expression 0((0, 1)+ , 0)∗ .

Description of the algorithm Aravind’s algorithm is described in Fig. 2.27. When


a process pi invokes acquire_mutex(i) it first sets its flag FLAG[i] to up (line 1),
thereby indicating that it is interested in the critical section. Then, it enters a loop
(lines 2–5), at the end of which it will enter the critical section. The loop body is made
up of two stages, denoted 0 and 1. Process pi first sets STAGE[i] to 0 (line 2) and
waits until the dates of the requests of all the processes that (from its point of view) are
competing for the criticalsection are greater than the date of its own request. This is 
captured by the predicate ∀ j = i : (FLAG[ j] = down)∨(DATE[i] < DATE[ j]) ,
which is asynchronously evaluated by pi at line 3. When, this predicate becomes
true, pi proceeds to the second stage by setting STAGE[i] to 1 (line 1).
Unfortunately, having the smallest request date (as asynchronously checked at
line 3 by a process pi ) is not sufficient to ensure the mutual exclusion property. More
precisely, several processes can simultaneously be at the second stage. As an example
let us consider an execution in which pi and p j are the only processes that invoke
acquire_mutex() and are such that DATE[i] = a < DATE[ j] = b. Moreover,
p j executes acquire_mutex() before pi does. As all flags (except the one of p j )
are equal to down, p j proceeds to stage 1 and, being alone in stage 1, exits the
loop and enters the critical section. Then, pi executes acquire_mutex(). As a < b,
pi does not wait at line 3 and is allowed to proceed to the second stage (line 4). This
observation motivates the predicate that controls the end of the repeat loop (line 5).
More precisely, a process pi is granted the critical section only if it is the only process
which is at the second stage (as captured by the predicate ∀ j = i : (STAGE[ j] = 0)
evaluated by pi at line 5).
2.3 Mutex Without Atomicity 55

Finally, when a process pi invokes release_mutex(i), it resets its control registers


STAGE[i] and FLAG[i] to their initial values (0 and down, respectively). Before
these updates, pi benefits from the fact that it is still “inside the critical section” to
compute the date of its next request and save it in DATE[i] (line 7). It is important
to see that no process p j modifies DATE[ j] while pi reads the array DATE[1..n].
Consequently, despite the fact that the registers are only SWMR safe registers (and not
atomic registers), the read of any DATE[ j] at line 7 returns its exact value. Moreover,
it also follows from this observation that no two requests have the same date and the
sequence of dates used by the algorithm is the sequence of natural integers.
Theorem 10 Aravind’s algorithm (described in Fig. 2.27) satisfies mutual exclusion
and the bounded bypass liveness property where f (n) = n − 1.
Proof The proof of the mutual exclusion property is by contradiction. Let us assume
that both pi and p j (i = j) are in the critical section.
Let τbi (4) (or τei (4)) be the time instant at which pi starts (or terminates) writing
STAGE[i] at line 4 and τbi (5, j) (or τei (5, j)) be the time instant at which pi starts (or
terminates) reading STAGE[ j] for the last time at line 5 (before entering the critical
section). These time instants are depicted in Fig. 2.28. By exchanging i and j we
obtain similar notations for time instants associated with p j .
As pi is inside the critical section, it has read 0 from STAGE[ j] at line 5 and
j
consequently we have τbi (5, j) < τe (4) (otherwise, pi would necessarily have read
1 from STAGE[ j]). Moreover, as pi is sequential we have τei (4) < τbi (5, j), and as
j j
p j is sequential, we have τe (4) < τb (5, i). Piecing together the inequalities, we
obtain
j j
τei (4) < τbi (5, j) < τe (4) < τb (5, i),

j
from which we conclude τei (4) < τb (5, i), i.e., the last read of STAGE[i] by p j at line
5 started after pi had written 1 into it. Hence, the last read of STAGE[i] by p j returned
1 which contradicts the fact that it is inside the critical section simultaneously with
pi . (A similar reasoning shows that, if p j is inside the critical section, pi cannot be.)
Before proving the liveness property, let us notice that at most one process at a
time can modify the array DATE[1..n]. This follows from the fact that the algorithm
satisfies the mutual exclusion property (proved above) and line 7 is executed by
a process pi before it resets STAGE[i] to 0 (at line 8), which is necessary to allow

STAGE [i] ← 1 STAGE [j] → 0


pi

τbi(4) τei(4) τbi(5, j) τei(5, j)

Fig. 2.28 Relevant time instants in Aravind’s algorithm


56 2 Solving Mutual Exclusion

another process p j to enter the critical section (as the predicate of line 5 has to be true
when evaluated by p j ). It follows from the initialization of the array DATE[1..n] and
the previous reasoning that no two requests can have the same date and the sequence
of dates computed in mutual exclusion at line 7 by the processes is the sequence of
natural integers (Observation OB).
As in the proof of Lamport’s algorithm, let us first prove that there is no deadlock.
Let us assume (by contradiction) that there is a non-empty set of processes Q that have
invoked acquire_mutex() and no process succeeds in entering the critical section.
Let pi be the process of Q with the smallest date. Due to observation OB, there is a
single process pi . It then follows that, after some finite time, pi is the only process
whose predicate at line 3 is satisfied. Hence, after some time, pi is the only process
such that STAGE[i] = 1, which allows it to enter the critical section. This contradicts
the initial assumption and proves the deadlock-freedom property.
As a single process at a time can modify its entry of the array DATE, it follows
that a process p j that exits the critical section updates its register DATE[ j] to a
value greater than all the values currently kept in DATE[1..n]. Consequently, after
p j has executed line 7, all the other processes pi which are currently competing
for the critical section are such that DATE[i] < DATE[ j]. Hence, as we now have
(FLAG[i] = up) ∧ (DATE[i] < DATE[ j]), the next request (if any) issued by p j
cannot bypass the current request of pi , from which the starvation-freedom property
follows.
Moreover, it also follows from the previous reasoning that, if pi and p j are
competing and p j wins, then as soon as p j has exited the critical section pi has
priority over p j and can no longer be bypassed by it. This is nothing else than the
bounded bypass property with f (n) = n − 1 (which defines a FIFO order property).

Bounded mutex algorithm Each safe register MY _TURN[i] of Lamport’s algo-
rithm and each safe register DATE[i] of Aravind’s algorithm can take arbitrary large
values. It is shown in the following how a simple modification of Aravind’s algorithm
allows for bounded dates. This modification relies on the notion of an MWMR safe
register.
MWMR safe register An MWMR safe register is a safe register that can be written
and read by several processes. When the write operations are sequential, an MWMR
safe register behaves as an SWMR safe register. When write operations are concur-
rent, the value written into the register is any value of its domain (not necessarily a
value of a concurrent write).
Said differently, to be meaningful, an algorithm based on MWMR safe registers
has to prevent write operations on an MWMR safe register from being concurrent in
order for the write operations to be always meaningful. The behavior of an MWMR
safe register is then similar to the behavior of an SWMR safe register in which the
“single writer” is implemented by several processes that never write at the same time.
From unbounded dates to bounded dates Let us now consider that each safe
register DATE[i], 1 ≤ i ≤ n, is an MWMR safe register: any process pi can write
any register DATE[ j]. MWMR safe registers allow for the design of a (particularly
2.3 Mutex Without Atomicity 57

simple) bounded mutex algorithm. The domain of each register DATE[ j] is now
[1..N ] where N ≥ 2n. Hence, all registers are safe and have a bounded domain.
In the following we consider N = 2n. A single bit is needed for each safe register
FLAG[ j] and each safe register STAGE[ j], and only log2 N  bits are needed for
each safe register DATE[ j].
In a very interesting way, no statement has to be modified to obtain a bounded
version of the algorithm. A single new statement has to be added, namely the insertion
of the following line 7
between line 7 and line 8:

(7
) if (DATE[i] ≥ N ) then for all j ∈ [1..n] do DATE[ j] ← j end for end if.

This means that, when a process pi exiting the critical section updates its register
DATE[i] and this update is such that DATE[i] ≥ N , pi resets all date registers
to their initial values. As for line 7, this new line is executed before STAGE[i] is
reset to 0 (line 8), from which it follows that it is executed in mutual exclusion and
consequently no two processes can concurrently write the same MWMR safe register
DATE[ j]. Hence, the MWMR safe registers are meaningful.
Moreover, it is easy to see that the date resetting mechanism is such that each
date d, 1 ≤ d ≤ n, is used only by process pd , while each date d, n + 1 ≤ d ≤ 2n
can be used by any process. Hence, ∀d ∈ {1, . . . , n} we have DATE[d] ∈ {d, n +
1, n + 2, . . . , 2n}.
Theorem 11 When considering Aravind’s mutual exclusion algorithm enriched with
line 7
with N ≥ 2n, a process encounters at most one reset of the array DATE[1..n]
while it is executing acquire_mutex().
Proof Let pi be a process that executes acquire_mutex() while a reset of the array
DATE[1..n] occurs. If pi is the next process to enter the critical section, the theorem
follows. Otherwise, let p j be the next process which enters the critical section. When
p j exits the critical section, DATE[ j] is updated to max(DATE[1], . . . , DATE[n]) +
1 = n + 1. We then have FLAG[i] = up and DATE[i] < DATE[ j]. It follows that,
if there is no new reset, p j cannot enter again the critical section before pi .
In the worst case, after the reset, all the other processes are competing with pi
and pi is pn (hence, DATE[i] = n, the greatest date value after a reset). Due to
line 3 and the previous observation, each other process p j enters the critical section
before pi and max(DATE[1], . . . , DATE[n]) becomes equal to n + (n − 1). As
2n − 1 < 2n ≤ N , none of these processes issues a reset. It follows that pi enters
the critical section before the next reset. (Let us notice that, after the reset, the
invocation issued by pi can be bypassed only by invocations (pending invocations
issued before the reset or new invocations issued after the reset) which have been
issued by processes p j such that j < i). 
The following corollary is an immediate consequence of the previous theorem.
Corollary 2 Let N ≥ 2n. Aravind’s mutual exclusion algorithm enriched with line
7
satisfies the starvation-freedom property.
58 2 Solving Mutual Exclusion

(Different progress conditions that this algorithm can ensure are investigated in
Exercise 6.)
Bounding the domain of the safe registers has a price. More precisely, the addition
of line 7
has an impact on the maximal number of bypasses which can now increase
up to f (n) = 2n −2. This is because, in the worst case where all the processes always
compete for the critical section, before it is allowed to access the critical section, a
process can be bypassed (n − 1) times just before a reset of the array DATE and, due
to the new values of DATE[1..n], it can again be bypassed (n − 1) times just after
the reset.

2.4 Summary

This chapter has presented three families of algorithms that solve the mutual exclu-
sion problem. These algorithms differ in the properties of the base operations they
rely on to solve mutual exclusion.
Mutual exclusion is one way to implement atomic objects. Interestingly, it was
shown that implementing atomicity does not require the underlying read and write
operations to be atomic.

2.5 Bibliographic Notes

• The reader will find surveys on mutex algorithms in [24, 231, 262]. Mutex algo-
rithms are also described in [41, 146].
• Peterson’s algorithm for two processes and its generalization to n processes are
presented in [224].
The first tournament-based mutex algorithm is due to G.L. Peterson and M.J.
Fischer [227].
A variant of Peterson’s algorithm in which all atomic registers are SWMR registers
due to J.L.W. Kessels is presented in [175].
• The contention-abortable mutex algorithm is inspired from Lamport’s fast mutex
algorithm [191]. Fischer’s synchronous algorithm is described in [191].
Lamport’s fast mutex algorithm gave rise to the splitter object as defined in [209].
The notion of fast algorithms has given rise to the notion of adaptive algorithms
(algorithms whose cost is related to the number of participating processes) [34].
• The general construction from deadlock-freedom to starvation-freedom that was
presented in Sect. 2.2.2 is from [262]. It is due to Y. Bar-David.
2.5 Bibliographic Notes 59

• The notions of safe, regular, and atomic read/write registers are due to L. Lamport.
They are presented and investigated in [188, 189]. The first intuition on these types
of registers appears in [184].
It is important to insist on the fact that “non-atomic” does not mean “arbiter-free”.
As defined in [193], “An arbiter is a device that makes a discrete decision based on
a continuous range of values”. Binary arbiters are the most popular. Actually, the
implementation of a safe register requires an arbiter. The notion of arbitration-free
synchronization is discussed in [193].
• Lamport’s bakery algorithm is from [183], while Aravind’s algorithm and its
bounded version are from [28].
• A methodology based on model-checking for automatic discovery of mutual exclu-
sion algorithms has been proposed by Y. Bar-David and G. Taubenfeld [46]. Inter-
estingly enough, this methodology is both simple and computationally feasible.
New algorithms obtained in this way are presented in [46, 262].
• Techniques (and corresponding algorithms) suited to the design of locks for
NUMA and CC-NUMA architectures are described in [86, 200]. These techniques
take into account non-uniform memories and caching hierarchies.
• A combiner is a thread which, using a coarse-grain lock, serves (in addition to its
own synchronization request) active requests announced by other threads while
they are waiting by performing some form of spinning. Two implementations of
such a technique are described in [173]. The first addresses systems that support
coherent caches, whereas the second works better in cacheless NUMA architec-
tures.

2.6 Exercises and Problems

1. Peterson’s algorithm for two processes uses an atomic register denoted TURN
that is written and read by both processes. Design a two-process mutual exclusion
algorithm (similar to Peterson’s algorithm) in which the register TURN is replaced
by two SWMR atomic registers TURN[i](which can be written only by pi ) and
TURN[ j](which can be written only by p j ). The algorithm will be described for
pi where i ∈ {0, 1} and j = (i + 1) mod 2.
Solution in [175].
2. Considering the tournament-based mutex algorithm, show that if the base two-
process mutex algorithm is deadlock-free then the n-process algorithm is
deadlock-free.
60 2 Solving Mutual Exclusion

3. Design a mutex starvation-free algorithm whose cost (measured by the number of


shared memory accesses) depends on the number of processes which are currently
competing for the critical section. (Such an algorithm is called adaptive.)
Solutions in [23, 204, 261].
4. Design a fast deadlock-free mutex synchronous algorithm. “Fast” means here
that, when no other process is interested in the critical section when a process p
requires it, then process p does not have to execute the delay() statement.
Solution in [262].
5. Assuming that all registers are atomic (instead of safe), modify Lamport’s bakery
algorithm in order to obtain a version in which all registers have a bounded
domain.
Solutions in [171, 261].
6. Considering Aravind’s algorithm described in Fig. 2.27 enriched with the reset
line (line 7
):
• Show that the safety property is independent of N ; i.e., whatever the value of
N (e.g., N = 1), the enriched algorithm allows at most one process at a time
to enter the critical section.
• Let x ∈ {1, . . . , n − 1}. Which type of liveness property is satisfied when
N = x + n (where n is the number of processes).
• Let I = {i 1 , . . . , i z } ⊆ {1, . . . , n} be a predefined subset of process indexes.
Modify Aravind’s algorithm in such a way that starvation-freedom is guar-
anteed only for the processes px such that x ∈ I . (Let us notice that this
modification realizes a type of priority for the processes whose index belong
to I in the sense that the algorithm provides now processes with two types of
progress condition: the invocations of acquire_mutex() issued by any process
px with x ∈ I are guaranteed to terminate, while they are not if x ∈ / I .)
Modify Aravind’s algorithm so that the set I can be dynamically updated (the
main issue is the definition of the place where such a modification has to
introduced).
Chapter 3
Lock-Based Concurrent Objects

After having introduced the notion of a concurrent object, this chapter presents
lock-based methodologies to implement such objects. The first one is based on a
low-level synchronization object called a semaphore. The other ones are based on
linguistic constructs. One of these constructs is based on an imperative approach
(monitor construct), while the other one is based on a declarative approach (path
expression construct). This chapter closes the first part of the book devoted to lock-
based synchronization.

Keywords Declarative synchronization · Imperative synchronization · Lock-based


implementation · Monitor · Path expression · Predicate transfer · Semaphore

3.1 Concurrent Objects

3.1.1 Concurrent Object

Definition An object type is defined by a finite set of operations and a specification


describing the correct behaviors of the objects of that type. The internal representation
of an object is hidden to the processes (and several objects of the same type can have
different implementations). The only way for a process to access an object of a given
type is by invoking one of its operations.
A concurrent object is an object that can be accessed concurrently by several
processes. The specification of such an object can be sequential or not. “Sequential”
means that all correct behaviors of the object can be described with sequences (traces).
Not all concurrent objects have a sequential specification.
Example As an example, let us consider the classical unbounded stack object type.
Any object of such a type provides processes with a push() operation and a pop()
operation. As the stack is unbounded, the push() operation can always be executed.
As far as the pop() operation is concerned, let us assume that it returns the default

M. Raynal, Concurrent Programming: Algorithms, Principles, and Foundations, 61


DOI: 10.1007/978-3-642-32027-9_3, © Springer-Verlag Berlin Heidelberg 2013
62 3 Lock-Based Concurrent Objects

value ⊥ when the stack is empty. Hence, both operations can always be executed
whatever the current state of the stack (such operations are said to be total). The
sequential specification of such a stack is the set of all the sequences of push() and
pop() operations that satisfy the “last in, first out” (LIFO) property (“last in” being ⊥
when the stack is empty). Differently, as indicated in the first chapter, a rendezvous
object has no sequential specification.

3.1.2 Lock-Based Implementation

A simple way to implement a concurrent object defined by a sequential specification


consists in using a lock to force each invocation of an operation to execute in mutual
exclusion. In that way, a single process at a time can access the internal representation
of the object. It follows that sequential algorithms can be used to implement the object
operations.
As an example, let us consider a sequential stack S_STACK (i.e., a stack which
was designed to be used by a sequential program). As previously noticed, its internal
representation and the code of the sequential algorithms implementing its push()
and pop() operations are hidden to its user program (such a description is usually
kept in a library).
Let conc_push() and conc_pop() be the operations associated with a concurrent
stack denoted C_STACK. A simple way to obtain an implementation of C_STACK
is as follows:
• Its internal representation consists of an instance of a sequential stack S_STACK
plus a lock instance LOCK as depicted in Fig. 3.1;
• The algorithms implementing its conc_push() and conc_pop() operations are
based on this internal representation as described in Fig. 3.2.
This methodology has the great advantage of clearly separating the control part (the
underlying LOCK object) from the data part (the underlying S_SACK object).
Let us notice that each instance of a concurrent stack has its own internal repre-
sentation, i.e., is made up of its own instances of both a lock and a sequential stack.
This means that, if we have two concurrent stacks C_STACK 1 and C_STACK 2 , the

lock’s internal stack’s internal


representation representation

Fig. 3.1 From a sequential stack to a concurrent stack: structural view


3.1 Concurrent Objects 63

operation STACK is

end operation.

operation is

end operation.

Fig. 3.2 From a sequential to a concurrent stack (code for pi )

first uses a sequential stack S_STACK 1 and a lock instance LOCK 1 , while the sec-
ond uses another sequential stack S_STACK 2 and another lock instance LOCK 2 .
Hence, as LOCK 1 and LOCK 2 are distinct locks, the operations on C_STACK 1 and
C_STACK 2 are not prevented from being concurrent.

3.2 A Base Synchronization Object: the Semaphore

3.2.1 The Concept of a Semaphore

Definition of a semaphore A semaphore S is a shared counter which can be


accessed by two atomic operations denoted S.down() and S.up(). The specification
of a semaphore is defined as follows:
• A semaphore S is initialized to a non-negative value s0 .
• The predicate S ≥ 0 is always satisfied (i.e., the counter never becomes negative).
• S.down() atomically decreases S by 1.
• S.up() atomically increases S by 1.
It is easy to see that the operation S.up() can always be executed. Differently,
the operation S.down() can be executed only if its execution does not entail the
violation of the invariant S ≥ 0. When S = 1 and two or more processes invoke
S.down(), one of them succeeds (and the semaphore becomes equal to 0) while the
other becomes blocked. One of them will be unblocked when a process executes
S.up(), etc.
Invariant Let #(S.down) (or #(S.up)) be the number of invocations of S.down()
(or S.up()) that have terminated. It follows from the definition of a semaphore S that
the following relation is always satisfied:

S = s0 + #(S.up) − #(S.down).
64 3 Lock-Based Concurrent Objects

Locks, tokens, and semaphores A semaphore can be seen as a generalization of


a lock; namely, a lock is a semaphore S which is initialized to 1 and where S.down()
and S.up() are renamed acquire_lock() and release_lock(), respectively.
More generally, a semaphore can be seen as a token manager. Initially, there
are s0 tokens. Then, each invocation of S.down() consumes one token while each
invocation of S.up() generates one token. If a process invokes S.down() while there
are no tokens, it has to wait until a token is created in order to consume it. The value
of S defines the actual number of tokens which can be used by the processes.
Semaphore variants A semaphore S is strong if the processes that are blocked are
unblocked in the order in which they became blocked. Otherwise, the semaphore is
weak.
A binary semaphore S is a semaphore that can take only the values 0 and 1. Hence,
an invocation of S.down() blocks the invoking process if S = 0, and an invocation
of S.up() blocks it if S = 1.
A semaphore S is private to a process pi if only pi can invoke S.down(). The other
processes can only invoke S.up() to send “signals” to pi . (This is analogous to the
notion of an SWMR atomic registers that can be written by a single process.)
Implementation of a semaphore Semaphores were initially introduced to help
solve both synchronization and scheduling problems. More precisely, they were intro-
duced to prevent busy waiting (waiting loops as used in the previous chapter devoted
to mutual exclusion) in systems where there are more processes than processors.
To that end a semaphore S is implemented by two underlying objects:
• A counter, denoted S.count and initialized to s0 . As we will see, this counter can
become negative and is not to be confused with the value of S, which is never
negative.
• A queue, denoted S.queue, which is initially empty. (A FIFO queue gives rise to
a strong semaphore.)
A schematic implementation of the operations S.down() and S.up() is described
in Fig. 3.3. (This code is mainly designed to assign processors to processes.) It is
assumed that, for each semaphore S, each operation is executed in mutual exclusion.
Let nb_blocked(S) denote the number of processes currently blocked during their
invocations of S.down(). The reader can verify that the following relation is a invariant
of the previous implementation:

if (S.count ≥ 0) then nb_blocked(S) = 0 else nb_blocked(S) = |S.count| end if.

Hence, when it is negative, the implementation counter S.count provides us with the
number of processes currently blocked on the semaphore S. Differently, when it is
non-negative, the value of S.count is the value of the semaphore S.
3.2 A Base Synchronization Object: the Semaphore 65

operation is

if then
the invoking process is blocked and added to ; the control is given to the scheduler
end if

end operation

operation is

if then
remove the first process in which can now be assigned a processor
end if;

end operation.

Fig. 3.3 Implementing a semaphore(code for pi )

3.2.2 Using Semaphores


to Solve the Producer-Consumer Problem

The producer-consumer problem This problem was introduced in Chap. 1. We


have to construct a concurrent object (usually called a buffer) defined by two oper-
ations denoted produce() and consume() such that produce() allows a process to
deposit a new item in the buffer while consume() allows a process to withdraw an
item from the buffer. The capacity of the buffer is k > 0 items. Moreover, the xth
item that was produced has to be consumed exactly once and before the (x + 1)th
item (see Fig. 1.4). It is (of course) assumed that the operation consume() is invoked
enough times so that each item is consumed.

The Case of a Single Producer and a Single Consumer

Implementation data structures Considering the case where there is a single


producer process and a single consumer process, let us construct a buffer, denoted
B, whose size is k ≥ 1. A simple semaphore-based implementation of a buffer is as
follows. The internal representation is made up of two parts.
• Data part. This part is the internal representation of the buffer. It comprises three
objects:

– BUF[0..(k − 1)] is an array of read/write registers which are not required to be


atomic or even safe. This is because the access to any register BUF[x] is protected
by the control part of the implementation that (as we are about to see) guarantees
that the producer and the consumer cannot simultaneously access any BUF[x].
66 3 Lock-Based Concurrent Objects

The base read and write operations on BUF[x] are denoted BUF[x].read() and
BUF[x].write().
– in and out are two local variables containing array indexes whose domain is
[0..(k − 1)]; in is used by the producer to point to the next entry of BUF where
an item can be deposited; out is used by the consumer to point to the next entry
of BUF from which an item can be consumed. The law that governs the progress
of these index variables is the addition mod k, and we say that the buffer is
circular.
• Control part. This part comprises the synchronization objects that allow the
processes to never violate the buffer invariant. It consists of two semaphores:
– The semaphore FREE counts the number of entries of the array BUF that
can currently be used to deposit new items.This semaphore is initialized
to k.
– The semaphore BUSY counts the number of entries of the array BUF that cur-
rently contain items produced and not yet consumed. This semaphore is initial-
ized to 0.

Production and consumption algorithms The algorithms implementing the


buffer operations produce() and consume() are described in Fig. 3.4. When the pro-
ducer invokes B.produce(value) (where value is the value of the item it wants to
deposit into the buffer), it first checks if there is a free entry in the array BUF. The
semaphore FREE is used to that end (line 1). When FREE = 0, the producer is
blocked until an entry of the buffer is freed. Then, the producer deposits the next
item value into BUF[in] (line 2) and increases the index in so that it points to the next
entry of the array. Finally, the producer signals that one more entry was produced
(line 3).
The algorithm for the operation B.consume() is a control structure symmetric
to that of the B.produce() operation, exchanging the semaphores BUSY and FREE.

operation is
(1)
(2)
(3)
(4)
end operation

operation is
(5)
(6)
(7)
(8)
end operation

Fig. 3.4 A semaphore-based implementation of a buffer


3.2 A Base Synchronization Object: the Semaphore 67

When the consumer invokes B.consume(), it first checks if there is one entry in the
array BUF that contains an item not yet consumed. The semaphore BUSY is used to
that end (line 5). When it is allowed to consume, the consumer consumes the next
item value, i.e., the one kept in BUF[out], saves it in a local variable r, and increases
the index out (line 5). Finally, before returning that value saved in r (line 5), the
consumer signals that one entry is freed; this is done by increasing the value of the
semaphore FREE (line 7).
Remark 1 It is important to repeat that, for any x, a register BUF[x] is not required
to satisfy special semantic requirements and that the value that is written (read) can be
of any type and as large as we want (e.g., a big file). This is an immediate consequence
of the fact each register BUF[x] is accessed in mutual exclusion. This means that
what is abstracted as a register BUF[x] does not have to be constrained in any way. As
an example, the operation BUF[in].write(v) (line 2) can abstract several low-level
write operations involving accesses to underlying disks which implement the register
(and similarly for the operation BUF[in].read() at line 2). Hence, the size of the items
that are produced and consumed can be arbitrary large, and reading and writing them
can take arbitrary (but finite) durations. This means that one can reasonably assume
that the duration of the operations BUF[in].write(v) and BUF[in].read() (i.e., the
operations which are in the data part of the algorithms) is usually several orders of
magnitude greater than the execution of the rest of the algorithms (which is devoted
to the control part).
Remark 2 It is easy to see that the values taken by the semaphores FREE and
BUSY are such that 0 ≤ FREE, BUSY ≤ k, but it is important to remark that a
semaphore object does not offer an operation such as FREE.read() that would return
the exact value of FREE. Actually, such an operation would be useless because there
is no guarantee that the value returned by FREE.read() is still meaningful when the
invoking process would use it (FREE may have been modified by FREE.up() or
FREE.down() just after its value was returned).
A semaphore S can be seen as an atomic register that could be modified by the
operations fech&add() and fetch&sub() (which atomically add 1 and subtract 1,
respectively) with the additional constraint that S can never become negative.

The case of a buffer with a single entry This is the case k = 1. Each of the
semaphores FREE and BUSY takes then only the values 0 or 1. It is interesting
to look at the way these values are modified. A corresponding cycle of produc-
tion/consumption is depicted in Fig. 3.5.
Initially the buffer is empty, FREE = 1, and BUSY = 0. When the producer
starts to deposit a value, the semaphore FREE decreases from 1 to 0 and the buffer
starts to being filled. When it has been filled, the producer raises BUSY from 0 to
1. Hence, FREE = 0 ∧ BUSY = 1 means that a value has been deposited and can
be consumed. When the consumer wants to read, it first decreases the semaphore
BUSY from 1 to 0 and then reads the value kept in the buffer. When, the reading is
terminated, the consumer signals that the buffer is empty by increasing FREE from
68 3 Lock-Based Concurrent Objects

The buffer is empty The buffer is full The buffer is empty

The buffer is being filled The buffer is being emptied

Fig. 3.5 A production/consumption cycle

0 to 1, and we are now in the initial configuration where FREE = 1 ∧ BUSY = 0


(which means that the buffer is empty).
When looking at the values of the semaphores, it is easy to see that we can
never have both semaphores simultaneously equal to 1, from which we conclude that
¬(FREE = 1 ∧ BUSY = 1) is an invariant relation that, for k = 1, characterizes
the buffer implementation described in Fig. 3.5. More generally, when k ≥ 1, the
invariant is 0 ≤ |FREE + BUSY | ≤ k.

The Case of Several Producers and Several Consumers

If there are several producers (consumers) the previous solution no longer works,
because the control register in (out) now has to be an atomic register shared by all
producers (consumers). Hence, the local variables in and out are replaced by the
atomic registers IN and OUT . Moreover, (assuming k > 1) the read and update
operations on each of these atomic registers have to be executed in mutual exclusion
in order that no two producers simultaneously obtain the same value of IN, which
could entail the write of an arbitrary value into BUF[in]. (And similarly for out.)
A simple way to solve this issue consists in adding two semaphores initialized
to 1, denoted MP and MC. The semaphore MP is used by the producers to ensure
that at most one process at a time is allowed to execute B.produce(); (similarly
MC is used to ensure that no two consumers concurrently execute B.consume()).
Albeit correct, such a solution can be very inefficient. Let us consider the case of
a producer p1 that is very slow while another producer p2 is very rapid. If both p1
and p2 simultaneously invoke produce() and p1 wins the competition, p2 is forced to
wait for a long time before being able to produce. Moreover, if there are several free
entries in BUF[0..(k −1)], it should be possible for p1 and p2 to write simultaneously
in two different free entries of the array.
Additional control data To that end, in addition to the buffer BUF[0..(k − 1)] and
the atomic registers IN and OUT , two arrays of atomic Boolean registers denoted
3.2 A Base Synchronization Object: the Semaphore 69

Behavior of Behavior of

Before reading from Before writing into

After writing in After reading from

Fig. 3.6 Behavior of the flags FULL[x] and EMPTY [x]

FULL[0..(k − 1)] and EMPTY [0..(k − 1)] are used. They are such that, for every x,
the pair FULL[x], EMPTY [x] describes the current state of BUF[x] (full, empty,
being filled, being emptied). These registers have similar behaviors, one from the
producer point of view and the other one from the consumer point of view. More
precisely, we have the following:
• FULL[x] (which is initialized to false) is set to true by a producer p just after it has
written a new item value in BUF[x]. In that way, p informs the consumers that the
value stored in BUF[x] can be consumed. FULL[x] is reset to false by a consumer
c just after it has obtained the right to consume the item value kept in BUF[x]. In
that way, c informs the other consumers that the value in BUF[x] is not for them.
To summarize: FULL[x] ⇔ (BUF[x] can be read by a consumer) (Fig. 3.6).
• EMPTY [x] (which is initialized to true) is reset to false by a consumer c just after
it has read the item value kept in BUF[x]. In that way, the consumer c informs the
producers that BUF[x] can be used again to deposit a new item value. EMPTY [x]
is set to false by a producer p just before it writes a new item value in BUF[x]. In
that way, the producer p informs the other producers that BUF[x] is reserved and
they cannot write into it. To summarize: EMPTY [x] ⇔ (BUF[x] can be written
by a producer).

The algorithm The code of the algorithms implementing produce() and


consume() is described in Fig. 3.7. We describe only the code of the operation
produce(). As in the base algorithms of Fig. 3.4, the code of the operation consume()
is very similar.
When a producer p invokes produce(), it first executes FREE.down(), which
blocks it until there is at least one free entry in BUF[1..(k − 1)] (line 1). Then, pi
executes MP.down(). As MP is initialized to 1 and used only at lines 2 and 7, this
means that a single producer at a time can execute the control statements defined at
lines 2–7. The aim of these lines is to give to p an index (my_index) such that no other
producer is given the same index in order to write into BUF[my_index] (at line 8).
To that end, the producer p scans (modulo k) the array EMPTY [1..(k − 1)] in order
to find a free entry. Let us notice that there is necessarily such an entry because p
has passed the statement FREE.down(). Moreover, so that each item that is written
70 3 Lock-Based Concurrent Objects

operation is
(1)
(2)
(3) while do end while
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
end operation

operation is
(12)
(13)
(14) while do end while
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)

Fig. 3.7 An efficient semaphore-based implementation of a buffer

is consumed, a producer starts scanning at EMPTY [IN] (and similarly a consumer


starts scanning the array FULL[1..(k − 1)] at the entry FULL[OUT ]). The first value
of IN such that EMPTY [IN] is true defines the value of my_index (lines 3–4). Then,
p updates EMPTY [IN] to false (line 5) and increases IN so that it points to the next
entry that a producer will use to start its scan (line 6). The accesses to IN are then
finished for p, and it releases the associated mutual exclusion (line 7).
Then, the producer p writes its item value into BUF[my_index] at line 8 (let us
remember that this statement is the most expensive from a time duration point of
view). When, the write is terminated, p informs the consumers that BUF[my_index]
contains a new value; this is done by setting FULL[my_index] to true (line 9). Finally,
(as in the base algorithm) the producer p indicates that one more value can be con-
sumed by increasing the semaphore BUSY (line 10).
The reader can check that, for any x, when considering BUF[x] the produc-
tion/consumption cycle is exactly the same as the one described in Fig. 3.5 where the
semaphore FREE and BUSY are replaced by the Boolean atomic registers EMPTY [x]
and FULL[x], respectively. It follows that the invariant

∀x ∈ [0..(k − 1)] : (¬EMPTY [x]) ∨ (¬FULL[x])


3.2 A Base Synchronization Object: the Semaphore 71

characterizes the buffer implementation given in Fig. 3.7. This invariant states that
no BUF[x] can be simultaneously full and empty. Other facts can be deduced from
Fig. 3.7:
– ¬EMPTY [x]) ∧ ¬FULL[x] ⇒ BUF[x] is currently being filled or emptied,
– FULL[x] ⇒ BUF[x] contains a value not yet consumed, and
– EMPTY [x] ⇒ BUF[x] does contain a value to be consumed.

3.2.3 Using Semaphores to Solve a Priority Scheduling Problem

Priority without preemption Let us consider a resource that has to be accessed


in mutual exclusion. To that end, as we have seen before, we may build an object that
provides the processes with two operations, denoted acquire() and release(), that are
used to bracket any use of the resource.
Moreover, in addition to being mutually exclusive, the accesses to the resource
have to respect a priority scheme defined as follows:
• Each request for the resource is given a priority prioi value by the invoking process
pi .
• When a process (which was granted the resource) releases it, the resource has to be
given to the process whose pending request has the highest priority. (It is assumed
that no two requests have the same priority. Process identities can be used for
tie-breaking.)
Hence, when a process pi wants to use the resource, it invokes the following operation:
operation use_resource(i, prioi ) is
acquire(i, prioi );
access the resource;
release(i);
return()
end operation.

Let us observe that this priority scheme is without preemption: when a process p
has obtained the resource, it keeps it until it invokes release(), whatever the priority
of the requests that have been issued by other processes after p was granted the
resource.
Principle of the solution and base objects The object we are building can be
seen as a room made up of two parts: a blackboard room where processes can post
information to inform the other processes plus a sleeping room where processes go
to wait when the resource is used by another process (see Fig. 3.8).
In order for the information on the blackboard to be consistent, at most one
process at a time can access the room. To that end a semaphore denoted MUTEX and
initialized to 1 is used by the processes.
72 3 Lock-Based Concurrent Objects

Semaphore:

Boolean register:
Boolean registers: Blackboard room

Registers:

Semaphores: Sleeping room

Fig. 3.8 Blackboard and sleeping rooms

The blackboard is made up of the following read/write registers. As the accesses


to these objects are protected by the mutual exclusion semaphore MUTEX, they do
not need to be atomic or even safe.
• A Boolean register denoted BUSY which is initialized to false. This Boolean is
equal to true if and only if one process is allowed to use the resource.
• An array of Boolean registers WAITING[1..n] initialized to [false, . . . , false].
WAITING[i] is set to true by pi to indicate that it wants to access the resource. In
that way, when a process releases the resource, it knows which processes want to
use it.
• An array of Boolean SWMR registers PRIORITY [1..n]. The register
PRIORITY [i] is used by pi to store the priority of its current request (if any).
Finally, the waiting room is implemented as follows:
• An array of semaphores SLEEP_CHAIR[1..n] initialized to 0.
When a process pi has to wait before using the resource, it “goes to sleep” on its
personal sleeping chair until it is woken up. “Going to sleep” is implemented by
invoking SLEEP_CHAIR[i].down() (let us remember that this semaphore is initial-
ized to 0), and another process wakes pi up by invoking SLEEP_CHAIR[i].up().
It is important to see the role of the pair (WAITING[i], SLEEP_CHAIR[i]) for each
process pi . The value of the Boolean WAITING[i] is written on the blackboard and
consequently allows a process pj that reads the blackboard to know if pi is waiting
for the resource and, according to the priority, to wake it up.
The operations acquire() and release() The algorithms implementing acquire()
and release() for a process pi are described in Fig. 3.9.
3.2 A Base Synchronization Object: the Semaphore 73

operation is
(1)
(2) if then
(3)
(4)
(5) else
(6)
(7) end if
(8)
end operation

operation is
(9)
(10) if then let
(11)
(12)
(13) else
(14) end if
(15)
(16)
end operation

Fig. 3.9 Resource allocation with priority (code for process pi )

When a process pi invokes acquire(i), it has first to obtain exclusive access to


the blackboard (invocation MUTEX.down(), line 1). After it has obtained the mutual
exclusion, there are two cases:
• If the resource is free (test at line 2), before proceeding to use the resource, pi
sets BUSY to true to indicate to the other processes that the resource is no longer
available (line 5) and releases the mutual exclusion on the blackboard (line 6).
• If the resource is not available, pi goes to sleep on its sleeping chair (line 4). But
before going to sleep, pi must write on the blackboard the priority associated with
its request and the fact that it is waiting for the resource (line 2). Before going to
sleep, pi has also to release the mutual exclusion on the blackboard so that other
processes can access it (line 3).
To release the resource, a process pi invokes release(i), which does the following.
First pi requires mutual exclusion on the blackboard in order to read consistent
values (line 9). This mutual exclusion will be released at the end of the operation
(line 15). After it has obtained exclusive access to the blackboard, pi checks if there
are processes waiting for the resource (predicate ∃j = i : WAITING[j], line 10). If
no process is waiting, pi resets BUSY to false (line 13) to indicate to the next process
that invokes acquire() that the resource is available. Otherwise, there are waiting
processes: {j | WAITING[j]} = ∅. In that case, after it has determined the process
pk whose request has the strongest priority among all waiting processes (line 10),
pi resets WAITING[k] to false (line 11) and wakes up pk (line 12) before exiting the
blackboard room (line 15).
74 3 Lock-Based Concurrent Objects

Remark Let us observe that, due to asynchrony, it is possible that a process pi wakes
up a waiting process pk (by executing SLEEP_CHAIR[k].up(), line 12), before pk
has executed SLEEP_CHAIR[k].down() (this can occur if pk spends a long time to
go from line 3 to line 4). The reader may check that this does not cause problems:
actually, the slowness of pk between line 3 and line 4 has the same effect as if pk was
waiting for SLEEP_CHAIR[k] to become positive.
On the way priorities are defined The previous resource allocation algorithm is
very general. The way priorities are defined does not depend on it. Priorities can be
statically associated with processes, or can be associated with each invocation of the
operation acquire() independently one from another.
A token metaphor The algorithms in Fig. 3.9 can be seen as a token management
algorithm. There is initially a single token deposited on a table (this is expressed by
the predicate BUSY = false).
When there is no competition a process takes the token which is on the table
(statement BUSY ← true, line 5), and when it has finished using the resource, it
deposits it on the table (statement BUSY ← false, line 13).
When there is competition, the management of the token is different. The
process pi that owns the token gives it directly to a waiting process pk (state-
ment SLEEP_CHAIR[i].up(), line 12) and pk obtains it when it terminates executing
SLEEP_CHAIR[i].down() (line 4). In that case, due to priorities, the transmission
of the token is “direct” from pi to pk . The Boolean register BUSY is not set to true
by pi , and after being reset to false by pk , it remains equal to false until pk gives the
token to another process or deposits it on the table at line 13 if no process wants to
access the resource.

3.2.4 Using Semaphores to Solve the Readers-Writers Problem

The readers-writers problem Let us consider a file that can be accessed by a single
process by the operations read_file() and write_file(). The readers-writers problem
consists in designing a concurrent object that allows several processes to access this
file in a consistent way. Consistent means here that any number of processes can
simultaneously read the file, but at most one process at a time can write the file, and
writing the file and reading the file are mutually exclusive.
To that end, an approach similar to the one used in Fig. 3.1 to go from a sequen-
tial stack to a concurrent stack can be used. As reading a file does not modify it,
using a lock would be too constraining (as it would allow at most one read at a
time). In order to allow several read operations to execute simultaneously, let us
instead design a concurrent object defined by four operations denoted begin_read(),
end_read(), begin_write(), and end_write(), and let us use them to bracket the oper-
ation read_file() and write_file(), respectively. More precisely, the high-level oper-
ations used by the processes, denoted conc_read_file() and conc_write_file(), are
defined as described in Fig. 3.10.
3.2 A Base Synchronization Object: the Semaphore 75

operation is

end operation

operation is

end operation

Fig. 3.10 From a sequential file to a concurrent file

Remark It is easy to see that the readers-writers problem is a generalization of the


mutual exclusion problem. If there was only the write operation, the problem would
be the same as mutual exclusion. The generalization consists in having two classes of
operations (read and write), with specific execution constraints in each class (mutual
exclusion among the write operations and no constraint among the read operations)
and constraints among the two classes (which are mutually exclusive).
Readers-writers with weak priority to the readers A semaphore GLOBAL_
MUTEX initialized to 1 can be used to ensure mutual exclusion on the write opera-
tions. Moreover the same semaphore can be used to ensure mutual exclusion between
each write operation and the concurrent read operations. Such a solution is presented
in Fig. 3.11.
The code of begin_write() and end_write() (lines 11–14) follows from the pre-
vious observation. As far as the read operations are concerned, a shared register,
denoted NBR and initialized to 0, is introduced to count the current number of con-
current readers. This register is increased each time begin_read() is invoked (line 2)
and decreased each time end_read() is invoked (line 7). Moreover, to keep the value
of NBR consistent, its read and write accesses are done in mutual exclusion. This is
implemented by the semaphore NBR_MUTEX initialized to 1 (lines 1, 4, 6 and 9).
The mutual exclusion semaphore GLOBAL_MUTEX is used as follows by the
readers. A “first” reader (i.e., a reader that finds NBR = 1) invokes GLOBAL_MUTEX.
down() (line 3) to obtain an exclusive access to the file, not for it, but for the class
of all readers. This mutually exclusive access will benefit for free to all the readers
that will arrive while NBR > 1. Similarly, a “last” reader (i.e., a reader that finds
NBR = 0) invokes GLOBAL_MUTEX.up() (line 8) to release the mutual exclusion
on the file that was granted to the class of readers.
If at least one process is reading the file, we have NBR ≥ 1, which gives priority
to the class of readers. As we have seen, this is because a reader obtains mutual
exclusion nor for itself but for the class of readers (an invocation of begin_read()
does not need to invoke GLOBAL_MUTEX when NBR ≥ 1). It follows that writers
can be blocked forever if permanently readers invoke begin_read() while NBR ≥ 1.
The read operation consequently has priority over the write operation, but this is a
weak priority as shown below.
Readers-writers with strong priority to the readers When considering the
previous algorithm, let us consider the case where at least two processes pw1
76 3 Lock-Based Concurrent Objects

operation is
(1)
(2)
(3) if then end if
(4)
(5)
end operation

operation is
(6)
(7)
(8) if then end if
(9)
(10)
end operation

operation is
(11)
(12)
end operation

operation is
(13)
(14)
end operation

Fig. 3.11 Readers-writers with weak priority to the readers

and pw2 have invoked begin_write() and one of them (say pw1 ) has obtained the
mutual exclusion to write the file. GLOBAL_MUTEX is then equal to 0 and the
other writers are blocked at line 11. Let us assume that a process pr invokes
begin_read(). The counter NBR is increased from 0 to 1, and consequently pr invokes
GLOBAL_MUTEX.down() and becomes blocked. Hence, pw2 and pr are blocked on
the semaphore GLOBAL_MUTEX with pw2 blocked before pr . Hence, when later
pw1 executes end_write(), it invokes GLOBAL_MUTEX.up(), which unblocks the
first process blocked on that semaphore, i.e., pw2 .
Strong priority to the read operation is weak priority (a reader obtains the priority
for the class of readers) plus the following property: when a writer terminates and
readers are waiting, the readers have to immediately obtain the mutual exclusion.
As already noticed, the readers-writers object described in Fig. 3.11 satisfies weak
priority for the readers but not strong priority.
There is a simple way to enrich the previous implementation to obtain an object
implementation that satisfies strong priority for the readers. It consists in ensur-
ing that, when several processes invoke begin_write(), at most one of them is
allowed to access the semaphore GLOBAL_MUTEX (in the previous example, pw2
is allowed to invoke GLOBAL_MUTEX.down() while pw1 had not yet invoked
GLOBAL_MUTEX.up()). To that end a new semaphore used only by the writers
is introduced. As its aim is to ensure mutual exclusion among concurrent writers,
this semaphore, which is denoted WRITER_MUTEX, is initialized to 1.
3.2 A Base Synchronization Object: the Semaphore 77

operation is
(11.0)
(11)
(12)
end operation

operation is
(13)
(13.0)
(14)
end operation

Fig. 3.12 Readers-writers with strong priority to the readers

The implementation of the corresponding object is described in Fig. 3.12. As


the algorithms for the operations begin_read() and end_read() are the same as in
Fig. 3.11, they are not repeated here. The line number of the new lines is postfixed
by “0”. It is easy to see that at most one writer at a time can access the semaphore
GLOBAL_MUTEX.
Readers-writers with priority to the writers Similarly to the weak priority to
the readers, weak priority to the writers means that, when a writer obtains the right
to write the file, it obtains it not for itself but for all the writers that arrive while there
is a write that is allowed to access the file.
The corresponding readers-writers concurrent object is described in Fig. 3.13. Its
structure is similar to the one of Fig. 3.11. Moreover, the same lines in both figures
have the same number.
The writers now have to count the number of concurrent write invocations. A
counter denoted NBW is used to that end. The accesses to that register (which
is shared only by the writers) are protected by the semaphore NBW _MUTEX
initialized to 1.
The corresponding lines are numbered NW.1–NW.4 in begin_write() and NW.5–
NW.8 in end_write(). Moreover a new semaphore, denoted PRIO_W _MUTEX and
initialized to 1, is now used to give priority to the writers. This semaphore is used
in a way similar to the way the semaphore GLOBAL_MUTEX is used in Fig. 3.11 to
give weak priority to the readers.
While GLOBAL_MUTEX ensures mutual exclusion between any pair of write
operations and between any write operation and read operations, PRIO_W _MUTEX
allows the “first” writer that is allowed to access the file to obtain priority not for
itself but for the class of writers (lines NW.3 and NW.7). To attain this priority goal,
the operation begin_read() now has to be bracketed by PRIO_W _MUTEX.down() at
its beginning (line NR.1) and PRIO_W _MUTEX.up() at its end (line NR.2). Hence,
each read now competes with all concurrent writes considered as a single operation,
which gives weak priority to the readers.
78 3 Lock-Based Concurrent Objects

operation is
(NR.1)
(1)
(2)
(3) if then end if
(4)
(NR.2)
(5)
end operation

operation is
(6)
(7)
(8) if then end if
(9)
(10)
end operation

operation is
(NW.1)
(NW.2)
(NW.3) if then end if
(NW.4)
(11)
(12)
end operation

operation is
(13)
(NW.5)
(NW.6)
(NW.7) if then end if
(NW.8)
(14)
end operation

Fig. 3.13 Readers-writers with priority to the writers

3.2.5 Using a Buffer to Reduce Delays for Readers and Writers

When considering the readers-writers problem, the mutual exclusion used in the
previous solutions between base read and write operations can entail that readers
and writers experience long delays when there is heavy contention for the shared
file. This section presents a solution to the readers-writers problem that reduces
waiting delays. Interestingly, this solution relies on the producer-consumer problem.
Read/write lock A read/write lock is a synchronization object that (a) provides
the processes with the four operations begin_read(), end_read(), begin_write(), and
end_write() and (b) ensures one of the specifications associated with the readers-
writers problem (no priority, weak/strong priority to the readers or the writers, etc.).
3.2 A Base Synchronization Object: the Semaphore 79

Semaphore-based implementations of such objects have been presented previously


in Sect. 3.2.4.
The case of one writer and several readers The principle is fairly simple. There
is a buffer with two entries BUF[0..1] that is used alternately by the writer and a
reader reads the last entry that was written. Hence, it is possible for a reader to read
a new value of the file while the writer is writing a value that will become the new
“last” value.
While, for each x ∈ {0, 1}, several reads of BUF[x] can proceed concurrently,
writing into BUF[x] and reading from BUF[x] have to be mutually exclusive. To that
end, a read/write lock RW _LOCK[x] is associated with each BUF[x].
The write of a new value v into BUF[x] is denoted BUF[x].write(v) and a read
from BUF[x] is denoted r ← BUF[x].read() (where r is a local variable of the
invoking process). Initially, both BUF[0] and BUF[1] contain the initial value of
the file.
In addition to BUF[0..1], the processes share an atomic register denoted LAST
that contains the index of the last buffer that was written by the writer. It is initialized
to 0.
The algorithms implementing the conc_read_file() and conc_write_file() opera-
tions of the corresponding readers-writer object are described in Fig. 3.14.
As previously indicated, a read returns the last value that was written, and a write
deposits the new value in the other entry of the buffer which becomes the “last” one
when the write terminates. It is important to see that, differently from the producer-
consumer problem, not all the values written are necessarily read.
An execution is described in Fig. 3.15, which considers the writer and a sin-
gle reader. Its aim is to show both the efficiency gain and the mutual exclusion
requirement on each buffer entry. The efficiency gain appears with the first read
operation: this operation reads from BUF[1] while there is a concurrent write into

operation is
(1)
(2)
(3)
(4)
(5)
end operation

operation is
(6)
(7)
(8)
(9)
(10)
(11)
end operation

Fig. 3.14 One writer and several readers from producer-consumer


80 3 Lock-Based Concurrent Objects

Fig. 3.15 Efficiency gain and mutual exclusion requirement

BUF[0]. Similarly, the last read (which is of BUF[1] because LAST = 1 when the
corresponding conc_read_file() operation starts) is concurrent with the write into
BUF[0]. Hence, the next write operation (namely conc_write_file(v5 )) will be on
BUF[1]. If conc_write_file(v5 ) is invoked while BUF[1].read() has not terminated,
the write must be constrained to wait until this read terminates. Hence, the mutual
exclusion requirement on the reads and writes on each entry of the buffer.
Starvation-freedom There are two algorithm instances that ensure the mutual
exclusion property: one between BUF[0].read() and BUF[0].write() and the other
one between BUF[1].read() and BUF[1].write().
Let us assume that these algorithms guarantee the starvation-freedom property
for the read operations (i.e., each invocation of BUF[x].read() terminates). Hence,
there is no specific liveness property attached to the base BUF[0].write() operations.
The following theorem captures the liveness properties of the conc_read_file() and
conc_write_file() operations which are guaranteed by the algorithms described in
Fig. 3.14.
Theorem 12 If the underlying read/write lock objects ensure starvation-freedom for
the read operations, then the implementation of the operations conc_read_file() and
conc_write_file() given in Fig. 3.14 ensure starvation-freedom for both operations.
Proof Starvation-freedom of the invocations of the operation conc_read_file() fol-
lows trivially from the starvation-freedom of the base BUF[x].read() operation.
To show that each invocation of the operation conc_write_file() terminates, we
show that any invocation of the operation BUF[x].write() does terminate. Let us
consider an invocation of BUF[x].write(). Hence, LAST = 1 − x (lines 6 and 10).
This means that the read invocations that start after the invocation BUF[x].write() are
on BUF[1 − x]. Consequently, these read invocations cannot prevent BUF[x].write()
from terminating.
Let us now consider the invocations BUF[x].read() which are concurrent with
BUF[x].write(). There is a finite number of such invocations. As the underlying
mutual exclusion algorithm guarantees starvation for these read invocations, there
a finite time after which they all have terminated. If the invocation BUF[x].write()
3.2 A Base Synchronization Object: the Semaphore 81

operation is
(1)
(2)
(3)
(4)
(5)
end operation

operation is
(7)
(8)
(9)
(10)
(NW.1)
(NW.2) if then end if
(11)
end operation

Fig. 3.16 Several writers and several readers from producer-consumer

has not yet been executed, it is the only operation on BUF[x] and is consequently
executed, which concludes the proof of the theorem. 
The case of several writers and several readers The previous single writer/
multi-reader algorithm can be easily generalized to obtain a multi-writer/multi-reader
algorithm.
Let us assume that there are m writers denoted q1 , . . . , qm . The array of reg-
isters now has 2m entries: BUF[0..(2m − 1)], and the registers BUF[2i − 2] and
BUF[2i − 1] are associated with the writer number qi . As previously, a writer qi
writes alternately BUF[2i − 2] and BUF[2i − 1] and updates LAST after it has
written a base register. The corresponding write algorithm is described in Fig. 3.16.
The local index my_last of qi is initialized to 2i − 2. Basically line 6 of Fig. 3.14 is
replaced by the new lines NW.1–NW.2. Moreover, the local variable new_last is now
renamed my_last and the algorithm for the operation conc_read_file() is the same as
before.

3.3 A Construct for Imperative Languages: the Monitor

Semaphores are synchronization objects that allow for the construction of application-
oriented concurrent objects. Unfortunately, they are low-level counting objects.
The concept of a monitor allows for the definition of concurrent objects at a
“programming language” abstraction level. Several variants of the monitor concept
have been proposed. This concept was developed by P. Brinch Hansen and C.A.R.
Hoare from an initial idea of E.W. Dijkstra. To introduce it, this section adopts Hoare’s
presentation (1974).
82 3 Lock-Based Concurrent Objects

3.3.1 The Concept of a Monitor

A monitor is an object A monitor is a concurrent object. Hence, it offers operations


to the processes, and only these operations can access its internal representation.
The mutual exclusion guarantee While its environment is made up of parallel
processes, a monitor object guarantees mutual exclusion on its internal representa-
tion: at most one operation invocation at a time can be active inside the monitor. This
means that, when a process is executing a monitor operation, it has a consistent view
of the monitor internal representation, as no other process can concurrently be active
inside the monitor.
As an example let us consider a resource that has to be accessed in mutual exclu-
sion. A simple monitor-based solution consists in defining a monitor which contains
the resource and defining a single operation use_resource() that the monitor offers
to processes.
The queues (conditional variables) In order to solve issues related to internal
synchronization (e.g., when a process has to wait for a given signal from another
process), the monitor concept provides the programmer with specific objects called
conditions. A condition C is an object that can be used only inside a monitor. It offers
the following operations to the processes:

• Operation C.wait().
When a process p invokes C.wait() it stops executing and from an operational
point of view it waits in the queue C. As the invoking process is no longer active,
the mutual exclusion on the monitor is released.
• Operation C.signal().
When a process p invokes C.signal() there are two cases according to the value of
C:
– If no process is blocked in the queue C, the operation C.signal() has no effect.
– If at least one process is blocked in the queue C, the operation C.signal() reacti-
vates the first process blocked in C. Hence, there is one fewer process blocked in
C but two processes are now active inside the monitor. In order to guarantee that
a single process at a time can access the internal representation of the monitor
the following rule is applied:
* The process which was reactivated becomes active inside the monitor and
executes the statements which follows its invocation C.wait().
* The process that has executed C.signal() becomes passive but has priority to
re-enter the monitor. When allowed to re-enter the monitor, it will execute the
statements which follow its invocation C.signal().
• Operations C.empty().
This operation returns a Boolean value indicating if the queue C is empty or not.
3.3 A Construct for Imperative Languages: the Monitor 83

3.3.2 A Rendezvous Monitor Object

In order to illustrate the previous definition, let us consider the implementation of a


rendezvous object.
Definition As indicated in Chap. 1, a rendezvous object is associated with n control
points, one in each process participating in the rendezvous. It offers them a single
operation denoted rendezvous() (or barrier()). (The control point of a process is the
location in its control flow where it invokes rendezvous().)
The semantics of the rendezvous object is the following: any process involved in
the rendezvous can pass its control point (terminate its rendezvous() operation) only
when all other processes have attained their control point (invoked their rendezvous()
operation). Operationally speaking, an invocation of the rendezvous() operation
blocks the invoking process until all the processes involved in the rendezvous have
invoked the rendezvous() operation.
As already noticed, a rendezvous object cannot be specified with a sequential
specification.
An atomic register-based implementation Let us consider a register denoted
COUNTER that can be atomically accessed by three primitives, read, write, and
fetch&add() (which atomically adds 1 to the corresponding register). The domain
of COUNTER is the set {0, 1, . . . , n}.
An implementation of a rendezvous object based on such a counter register, ini-
tialized to 0, and a flag, denoted FLAG, is described in Fig. 3.17. FLAG is an atomic
binary register. Its initial value is any value (e.g., 0 or 1). Moreover, let FLAG denote
the “other” value (i.e., the value which is not currently stored in FLAG).
A process that invokes rendezvous() first reads the value of FLAG and stores it in a
local variable flag (line 1) and increases COUNTER (line 2). The register COUNTER
counts the number of processes that have attained their control point (i.e., invoked
rendezvous()). Hence, if COUNTER = m, each of the m processes involved in the
rendezvous has attained its control point. In that case, pi resets COUNTER to 0
and FLAG is switched to FLAG to signal that each of the m processes has attained
its control point (line 3). Differently, if COUNTER = m, pi enters a waiting loop
controlled by the predicate flag = FLAG (line 4). As just seen, this predicate becomes
true when COUNTER becomes equal to m.

operation is
(1)
(2)
(3) if then
(4) else wait
(5) end if
(6)
end operation

Fig. 3.17 A register-based rendezvous object


84 3 Lock-Based Concurrent Objects

Let us observe that a rendezvous object implemented as in Fig. 3.17 is not restricted
to be used only once. It can be used repeatedly to synchronize a given set of processes
as many times as needed. We say that the rendezvous object is not restricted to be a
one-shot object.
A monitor-based rendezvous The internal representation of the rendezvous object
(for m participating processes) consists of a register denoted COUNTER (initialized
to 0) and a condition denoted QUEUE.
The algorithm implementing the operation rendezvous() is described in Fig. 3.18.
Let us remember that, when a process is active inside the monitor, it is the only
active process inside the monitor. Hence, the algorithm implementing the opera-
tion rendezvous() can be designed as a sequential algorithm which momentarily
stops when it executes QUEUE.wait() and restarts when it is reactivated by another
process that has executed QUEUE.signal(). As we are about to see, as an invocation
of QUEUE.signal() reactivates at most one process, the only tricky part is the man-
agement of the invocations of QUEUE.signal() so that all blocked processes are
eventually reactivated.
When one of the m participating processes invokes rendezvous(), it first increases
COUNTER (line 1) and then checks the value of COUNTER (line 2). If COUNTER <
m, it is blocked and waits in the condition QUEUE (line 2). It follows that the (m −1)
first processes which invoke rendezvous() are blocked and wait in that condition. The
mth process which invokes rendezvous() increases COUNTER to m and consequently
resets COUNTER to 0 (line 3) and reactivates the first process that is blocked in the
condition QUEUE. When reactivated, this process executes QUEUE.signal() (line
5), which reactivates another process, etc., until all m processes are reactivated and
terminate their invocations of rendezvous().
Let us notice that, after their first rendezvous has terminated, the m processes can
use again the very same object for a second rendezvous (if needed). As an example,
this object can be used to re-synchronize the processes at the beginning of each
iteration of a parallel loop.

monitor is
init

operation is
(1)
(2) if then
(3) else
(4) end if
(5)
(6)
end operation
end monitor

Fig. 3.18 A monitor-based rendezvous object


3.3 A Construct for Imperative Languages: the Monitor 85

3.3.3 Monitors and Predicates

A simple producer-consumer monitor A monitor encapsulating a buffer of size


k accessed by a single producer and a single consumer is described in Fig. 3.19. The
internal representation consists of an array BUF[0..(k − 1)] and the two variables in
and out (as used previously) plus a shared register NB_FULL counting the number
of values produced and not yet consumed and two conditions (queues): C_PROD,
which is the waiting room used by the producer when the buffer is full, and C_CONS,
which is the waiting room used by the consumer when the buffer is empty.
When the producer invokes produce(v) it goes to wait in the condition (queue)
C_PROD if the buffer is full (line 1). When it is reactivated, or immediately if the
buffer is not full, it writes its value in the next buffer entry (as defined by in, line
2) and increases NB_FULL (line 3). Finally, in case the consumer is waiting for a
production, it invokes C_CONS.signal() (line 4) to reactivate it (let us remember
that this invocation has no effect if the consumer is not blocked in C_CONS). The
algorithm implementing consume() is similar.
A more general and efficient monitor Inspired from the implementation given
in Fig. 3.7, a more efficient monitor could be designed that would allow several
producers and several consumers to concurrently access distinct entries of the array
BUF[1..(k − 1)]. Such a monitor would provide operations begin_produce() and
end_produce() that would allow a producer to bracket the write of its value, and

monitor is

init
init

operation is
(1) if then end if
(2)
(3)
(4)
(5)
end operation

operation is
(6) if then end if
(7)
(8)
(9)
(10)
end operation
end monitor

Fig. 3.19 A simple single producer/single consumer monitor


86 3 Lock-Based Concurrent Objects

operations begin_consume() and end_consume() that would allow a consumer to


bracket the read of a value. Due to the monitor semantics, these four operations will
be executed in mutual exclusion.
Why signal() entails the blocking of the invoking process In the following we
consider only the case where the producer is blocked and the consumer reactivates
it. The same applies to the case where the consumer is blocked and the producer
reactivates it.
The producer has to be blocked when the buffer is full (line 1 in Fig. 3.19). Said
differently, to keep the buffer correct, the relation I ≡ (0 ≤ NB_FULL ≤ k) has
to be kept invariant (let us remember that synchronization is to preserve invariants).
This has two consequences:
• First, as it increases NB_FULL, the progress of the producer is constrained by the
condition P ≡ (NB_FULL < k), which means that the producer has to be blocked
when it executes C_PROD.wait() if its progress violates I.
• Second, when P becomes true (and in the producer-consumer problem this occurs
each time the consumer has consumed an item), the fact that control of the monitor
is immediately given to the process that it has reactivated guarantees that P is
satisfied when the reactivated process continues its execution.
This is sometimes called a transfer of a predicate (Fig. 3.20). More generally, if a
process p that has invoked C.signal() were to continue executing inside the monitor,
it would be possible for the predicate P to have been falsified when the reactivated
process accesses the monitor. Forcing C.signal() to pass the control of the monitor
to the reactivated process (if any) is a simple way to preserve the invariant associated
with a monitor.
Signal-and-wait versus signal-and-continue Let C[P] be the condition (queue)
associated with a predicate P. This means that if, when evaluated by a process, the
predicate P on the internal representation of the monitor is false, this process is
directed to wait in the queue C[P].

predicate transfer
from Consumer to Producer

Producer
active passive active
inside the monitor
active passive
Consumer

Fig. 3.20 Predicate transfer inside a monitor


3.3 A Construct for Imperative Languages: the Monitor 87

The previous semantics is called signal-and-wait. Another monitor variant uses


the semantics called signal-and-continue; namely, the process that invokes
C[P].signal() continues its execution inside the monitor and the process that has
been reactivated (if any) then has priority to enter the monitor and execute the state-
ments that follow its invocation of C[P].wait().
The main difference between a signal-and-wait monitor and a signal-and-continue
monitor is the following. The statement

if (¬P) then C[P].wait() end if

used in the signal-and-wait monitor has to be replaced by the statement

while (¬P) do C[P].wait() end while

in a signal-and-continue monitor. It is easy to see that, in both cases, the reactivated


process is allowed to progress only if the predicate P is satisfied, thereby ensuring
correct predicate transfer.

3.3.4 Implementing a Monitor from Semaphores

The base objects (semaphores and integers) used to implement the control part of
a signal-and-wait monitor are described below. This monitor internal structure is
depicted in Fig. 3.21 (let us remark that the control part of the internal structure is
similar to the structure depicted in Fig. 3.8).
• A semaphore MUTEX, initialized to 1, is used to ensure mutual exclusion on the
monitor (at most one process at a time can access the monitor internal representa-
tion).

Semaphore:

Integers:
for each predicate Blackboard room

Semaphores:
for each predicate Waiting rooms

Fig. 3.21 Base objects to implement a monitor


88 3 Lock-Based Concurrent Objects

• A semaphore denoted COND_SEM[P] and an integer NB[P] are associated with


each pair made up of a waiting predicate P and its condition C[P].
The semaphore COND_SEM[P], which is initialized to 0, is used as a waiting room
for the processes that have invoked C[P].wait(). The register NB[P], initialized to
0, counts the number of processes currently waiting in this room.
• A semaphore PRIO_SEM, initialized to 0, is used as a waiting room for the
processes that have invoked C[P].signal(), whatever the condition C[P]. As already
indicated, the processes blocked in PRIO_SEM have priority to re-enter the moni-
tor. The integer PRIO_NB, initialized to 0, counts the number of processes blocked
on the semaphore PRIO_SEM.
A monitor implementation, based on the previous integers and semaphores, is
described in Fig. 3.22. It is made up of four parts.
When a process p invokes an operation of the monitor, it has to require mutual
exclusion on its internal representation and consequently invokes MUTEX.down()
(line 1). When it terminates a monitor operation, there are two cases according
to whether there are or are not processes that are blocked because they have exe-
cuted a statement C[P].signal() (whatever P). If there are such processes, we have
PRIO_NB > 0 (see line 11). Process p then reactivates the first of them by invoking

when a process invokes a monitor operation do


(1)
end do

when a process terminates a monitor operation do


(2) if then
(3) else
(4) end if
end do

when a process invokes do


(5)
(6) if then
(7) else
(8) end if
(9)
(10)
end do

when a process invokes do


(11) if then
(12)
(13)
(14)
(15) end if
end do

Fig. 3.22 Semaphore-based implementation of a monitor


3.3 A Construct for Imperative Languages: the Monitor 89

PRIO_SEM.up() (line 2). Hence, the control inside the monitor passes directly from
p to the reactivated process. If there are no such processes, we have PRIO_NB = 0. In
that case p releases the mutual exclusion on the monitor by releasing the semaphore
MUTEX (line 3).
The code executed by a process p that invokes C[P].wait() is described at lines
5–10. Process p then has to be blocked on the semaphore COND_SEM[P], which
is the waiting room associated with the condition C[P] (line 9). Hence, p increases
NB[P] before being blocked (line 5) and will decrease it when reactivated (line 10).
Moreover, as p is going to wait, it has to release the mutual exclusion on the monitor
internal representation. If there is a process q which is blocked due to an invocation of
wait(), it has priority to re-enter the monitor. Consequently, p directly passes control
of the monitor to q (line 6). Differently, if no process is blocked on PRIO_SEM, p
releases the mutual exclusion on the monitor entrance (line 7) to allow a process that
invokes a monitor operation to enter it.
The code executed by a process p that invokes C[P].signal() is described at lines
11–15. If no process is blocked in the condition C[P] we have NB[P] = 0 and there
is nothing to do. If NB[P] > 0, the first process blocked in C[P] has to be reactivated
and p has to become a priority process to obtain the control of the monitor again. To
that end, p increases PRIO_NB (line 11) and reactivates the first process blocked in
C[P] (line 12). Then, it exits the monitor and goes to wait in the priority waiting room
PRIO_SEM (line 13). When later reactivated, it will decrease PRIO_NB in order to
indicate that one fewer process is blocked in the priority semaphore PRIO_SEM
(line 14).

3.3.5 Monitors for the Readers-Writers Problem

This section considers several monitors suited to the readers-writers problem. They
differ in the type of priority they give to readers or writers. This family of monitors
gives a good illustration of the programming comfort provided by the monitor con-
struct (the fact that a monitor allows a programmer to use directly the power of a
programming language makes her/his life easier).
These monitors are methodologically designed. They systematically use the fol-
lowing registers (which are all initialized to 0 and remain always non-negative):
• NB_WR and NB_AR denote the number of readers which are currently waiting and
the number of readers which are currently allowed to read the file, respectively.
• NB_WW and NB_AW denote the number of writers which are currently waiting
and the number of writers which are currently allowed to write the file, respectively.
In some monitors that follow, NB_WR and NB_AR could be replaced by a single
register NB_R (numbers of readers) whose value would be NB_WR + NB_AR and,
as its value is 0 or 1, NB_AW could be replaced by a Boolean register. This is not
done to insist on the systematic design dimension of these monitors.
90 3 Lock-Based Concurrent Objects

A readers-writers monitor encapsulates the operations begin_read() and


end_read() which bracket any invocation of read_file() and the operations
begin_write() and end_write() which bracket any invocation of write_file() as
described in Fig. 3.10 (Sect. 3.2.4).
Readers-writers invariant The mutual exclusion among the writers means that
(to be correct) a monitor has to be always such that 0 ≤ NB_AW ≤ 1. The mutual
exclusion between the readers and the writers means that it has also to be always
such that (NB_AR = 0) ∨ (NB_AW = 0). Combining these two relations we obtain
the following invariant which characterizes the readers-writers problem:

(NB_AR × NB_AW = 0) ∧ (NB_AW ≤ 1).

Hence, any invocation of read_file() or write_file() can be allowed only if its


execution does not entail the violation of the previous relation.
Strong priority to the readers The monitor described in Fig. 3.23 provides the
readers with strong priority. This means that, when a reader is reading, the readers
which arrive can immediately proceed to read and, when a writer terminates writing,
if readers are waiting they can immediately proceed to read.
When a reader p invokes begin_read(), it first increases NB_WR (line 1) Then,
if a writer is writing, p has to wait on the condition C_READERS (line 2). As
C_READERS.signal() reactivates a single process, when p reactivates, it will have
to reactivate another waiting reader (if any), which in turn will reactivate another
one, etc., until all the readers which are waiting have been reactivated. After it was
reactivated, or immediately if NB_AR = 0, p updates the control registers NB_WR
and NB_AR (line 3) before exiting the monitor and going to read the file.
When a reader p invokes end_read(), it first decreases NB_AR (line 5). If they are
no more readers interested in reading the file, p reactivates the first writer which is
waiting (if any) (line 6) and finally exits the monitor.
When a writer p invokes begin_write(), it waits (line 8) if another writer is writing
(NB_AR = 0) or readers are interested in the file (NB_WR + NB_AR = 0). When
it is reactivated, or immediately if it does not have to wait, p increases NB_AR to
indicate there is a writer accessing the file, and exits the monitor.
Finally, when a writer p invokes end_write(), it first decreases NB_AW (line 11).
Then, if readers are waiting (NB_WR > 0), it reactivates the first of them (which
in turn will reactivate another one, etc., as seen at line 2). If no reader is waiting
and at least one writer is waiting, the terminating writer reactivates another writer
(line 12).
Theorem 13 The monitor described in Fig. 3.23 solves the readers-writers problem
with strong priority to the readers.
Proof Let us first show that the predicate (NB_AR×NB_AW = 0) ∧ (NB_AW ≤ 1)
remains always true. Let us first observe that it is initially satisfied.
Due to line 9 and the waiting predicate NB_AW = 0 used at line 8 and line 11, it
follows that the only values taken by NB_AW are 0 and 1.
3.3 A Construct for Imperative Languages: the Monitor 91

monitor is
init

operation is

if then end if

end operation

operation is

if then end if

end operation

operation is
if then end if

end operation

operation is

if then else end if

end operation
end monitor

Fig. 3.23 A readers-writers monitor with strong priority to the readers

Let us now assume that NB_AW = 1. Then, due to the transfer of predicate
(between lines 6 and 8 or between lines 12 and 8) we have NB_AR + NB_AW = 0,
from which we conclude NB_AR = 0, and consequently NB_AR × NB_AW = 0.
Let us now assume that NB_AR > 0. This register is increased at line 3. Due to the
waiting predicate NB_AW > 0 used at line 2 and the transfer of predicate between
line 12 (where we also have NB_AW = 0) and line 2, it follows that NB_AW = 0
when line 3 is executed. Consequently, we have (NB_AR > 0) ⇒ (NB_AW = 0),
which completes the proof of the safety property of the readers-writers problem.
Let us now prove the liveness property, namely strong priority to the readers.
Let us first observe that, if a read is allowed to read, we have NB_AR > 0 and,
consequently, NB_AW = 0. It then follows from the waiting predicate used at line 2
that all the readers which invoke begin_read() are allowed to read.
Let us now consider that readers invoke begin_read() while a writer has previously
been allowed to write. We then have NB_AW > 0 (line 9 executed by the writer)
and NB_AR > 0 (line 1 executed later by readers). It follows that, when the writer
invokes end_write(), it will execute C_READERS.up() (line 12) and reactivate a
reader (which in turn will reactivate another reader, etc.). Consequently, when a
92 3 Lock-Based Concurrent Objects

writer is writing and there are waiting readers, those readers proceed to read when
the writer terminates, which concludes the proof of the liveness property. 

Strong priority to the writers The monitor described in Fig. 3.24 provides strong
priority to the writers. This means that, as soon as writers want to write, no more
readers are allowed to read until they have all terminated. The text of the monitor is
self-explanatory.
When looking at Fig. 3.24, as far as the management of priority is concerned,
it is important to insist on the role played by the register NB_WW . This register
stores the actual number of processes which want to write and are blocked. Hence,
giving strong priority to the writers is based on the testing of that register at line 1
and line 12. Moreover, when the priority is given to the writers, the register NB_WR
(which counts the number of waiting readers) is useless.
Similarly, the same occurs in the monitor described in Fig. 3.23. Strong priority
is given to the readers with the help of the register NB_WR while, in that case, the
register NB_WW becomes useless.
A type of fairness Let us construct a monitor in which, while all invocations
of conc_read_file() and conc_write_file() (as defined in Fig. 3.10) terminate, the
following two additional liveness properties are satisfied:

monitor is
init

operation is
(1) if then end if
(2)
(3)
end operation

operation is
(4)
(5) if then end if
(6)
end operation

operation is
(7)
(8) if then end if
(9)
(10)
end operation

operation is
(11)
(12) if then else end if
(13)
end operation
end monitor

Fig. 3.24 A readers-writers monitor with strong priority to the writers


3.3 A Construct for Imperative Languages: the Monitor 93

• Property P1: When a write terminates, all waiting readers are allowed to read
before the next write.
• Property P2: When there are readers which are reading the file, the newly arriving
readers have to wait if writers are waiting.
These properties are illustrated in Fig. 3.25, where indexes are used to distinguish
different executions of a same operation. During an execution of conc_write_file1 (),
two readers invoke conc_read_file1 () and conc_read_file2 () and a writer invokes
conc_write_file2 (). As there is currently a write on the file, these operations are
blocked inside the monitor (to preserve the monitor invariant). When
conc_write_file1 () terminates, due to property P1, the invocations conc_read_file1 ()
and conc_read_file2 () are executed. Then, while they are reading the file,
conc_read_file3 () is invoked. Due to property P2, this invocation must be blocked
because, despite the fact that the file is currently being read, there is a write waiting.
When conc_read_file1 () and conc_read_file2 () have terminated, conc_write_file2 ()
can be executed. When this write terminates, conc_read_file3 () and conc_read_file4 ()
are executed. Etc.
The corresponding monitor is described in Fig. 3.26. The difference from the
previous readers-writers monitors lies in the way the properties P1 and P2 are ensured.
The property P2 is ensured by the waiting predicate at line 2. If a writer is writing or
waiting (predicate (NB_WW + NB_AW = 0)) when a reader arrives, the reader has
to wait, and when this reader reactivates it will propagate the reactivation to another
waiting reader (if any) before starting to read. The property P1 is ensured by the
reactivating predicate (NB_WR > 0) used at line 13: if there are readers that are
waiting when a writer terminates, the first of them is reactivated, which reactivates
the following one (statement C_READERS.signal() at line 2), etc., until all waiting
readers have been reactivated.
The reader can check that the implementation of such fairness properties would
have been much more difficult if one was asked to implement them directly from
semaphores. (“Directly” meaning here: without using the translation described in
Fig. 3.22.)

Fig. 3.25 The fairness properties P1 and P2


94 3 Lock-Based Concurrent Objects

monitor is
init

operation is
(1)
(2) if then end if
(3)
(4)
end operation

operation is
(5)
(6) if then end if
(7)
end operation

operation is
(8)
(9) if then end if
(10)
(11)
end operation

operation is
(12)
(13) if then else end if
(14)
end operation
end monitor

Fig. 3.26 A readers-writers monitor with fairness properties

3.3.6 Scheduled Wait Operation

Parameterized wait operation Variants of the monitor concept provide conditions


C (internal queues) with a parameterized operation C.wait(x). The parameter x is a
positive integer defining a priority. The smaller the value of x, the higher the priority
for the corresponding process to be reactivated. (When there is no parameter x, the
processes are reactivated in the order in which they invoked C.wait().)
An activation-on-deadline monitor As a simple example of use of the parame-
terized wait operation, let us consider a monitor which provides the processes with
an operation denoted C.wake_up(x) which allows the invoking process to be woken
up x units of time after its invocation time.
The corresponding monitor is described in Fig. 3.27. When a process p invokes
wake_up_at(x), it computes its wake up date (line 1) and adds it to into a bag denoted
BAG (line 2). A bag is a set that can contain the same element several times (a bag is
also called a multiset). The statement “add d to BAG” adds one copy of d to the bag,
3.3 A Construct for Imperative Languages: the Monitor 95

monitor is
init init

operation is
(1)
(2) add
(3)
(4) suppress
(5)
end operation

operation is
(6)
(7)
(8) while do end while
(9)
end operation
end monitor

Fig. 3.27 A monitor based on a scheduled wait operation

while the statement “d from BAG” removes one copy of d from the bag (if any). Then,
p invokes QUEUE.wait(wake_up_date) (line 3). When it is reactivated, p removes
its wake up date from the bag (line 4).
The second operation, denoted tic(), is executed by the system at the end of each
time unit (it can also be executed by a specific process whose job is to measure the
physical or logical passage of time). This operation first increases the monitor clock
CLOCK (line 6). Then, it reactivates, one after the other, all the processes whose
wake up time is equal to now (the current time value) (lines 7–8).

3.4 Declarative Synchronization: Path Expressions

The monitor concept allows concurrent objects to be built by providing (a) sequen-
tiality inside a monitor and (b) condition objects to solve internal synchronization
issues. Hence, as we have seen, monitor-based synchronization is fundamentally an
imperative approach to synchronization. This section shows that, similarly to sequen-
tial programming languages, the statement of synchronization can be imperative or
declarative.
As monitors, several path expression formulations have been introduced. We
considered here the one that was introduced in a variant of the Pascal programming
language.
96 3 Lock-Based Concurrent Objects

3.4.1 Definition

The idea of path expressions is to state constraints on the order in which the operations
on a concurrent object have to be executed. To that end, four base operators are used,
namely concurrency, sequentiality, restriction, and de-restriction. It is then up to the
compiler to generate the appropriate control code so that these constraints are always
satisfied.
Let us consider an object defined by a set of operations. A path expression
associated with this object has the form

path path_expression end path

where path_expression is defined as follows. The identifier of any of the object


operations is a base path expression. Path expressions are then defined recursively
as described in the items that follow. Let pe1 and pe2 be two path expressions.
• Concurrency operator (denoted “,”). The statement “pe1 , pe2 ” defines a path
expression which imposes no restriction on the order in which pe1 and pe2 are
executed.
• Sequentiality operator (denoted “;”). The statement “pe1 ; pe2 ” defines a path
expression which states that pe1 has to be executed before pe2 . There can be
any number of concurrent executions of pe1 and pe2 as long as the number of
executions of pe2 that have started remains less than or equal to the number of
executions of pe2 that have terminated.
• Restriction operator (denoted “k: ” where k is a positive integer). The statement
“k : pe1 ” defines a path expression which states that at most k executions of pe1
are allowed to proceed concurrently.
• De-restriction operator (denoted “[ ]”). The statement “[pe1 ]” defines a path expres-
sion which states that any number of executions of pe1 are allowed to proceed
concurrently.
As with arithmetic or algebraic expressions, parentheses can be used to express
precedence when combining these operators to define powerful path expressions.
Sequentiality and concurrency have priority over restriction and de-restriction.
Simple examples The aim of the examples that follow is to show how path expres-
sions can be used to associate specific concurrency constraints with an object. The
object that is considered has two operations denoted op1 and op2 .
• path (1 : op1 ), op2 end path states that, at any time, at most one execution at a
time of op1 is allowed (hence all executions of op1 will be sequential), while there
is no constraint on the executions of op2 .
• path (1 : op1 ), (1 : op2 ) end path states that, at any time, (a) the executions of
op1 have to be sequential, (b) the executions of op2 have to be sequential, and (c)
there is no constraint relating the executions of op1 and op2 . It follows from this
3.4 Declarative Synchronization: Path Expressions 97

path expression that, at any time, there is at most one execution of op1 and one
execution of op2 which can proceed concurrently.
• path 2 : (op1 ; op2 ) end path states that, at any time, (a) the number of executions
of op2 that have started never surpasses the number of executions of op1 that
have completed (this is due to the “;” internal operator), and (b) the number of
executions of op1 that have started never surpasses by more than two the number
of executions of op2 that have completed (this is due to the “2 : ()” operator).
 
• path 1 : [op1 ], op2 end path states that, at any time, there is at most either one
execution of op2 or any number of concurrent executions of op1 .
 
• path 4 : (3 : op1 ), (2 : op2 ) end path states that, at any time, there are at most
three concurrent executions of op1 and at most two concurrent executions of op2 ,
and at most four concurrent executions when adding the executions of op1 and the
executions of op2 .

3.4.2 Using Path Expressions to Solve Synchronization Problems

Readers-writers Let us consider a reader-writer sequential object defined by the


operations read_file() and write_file() introduced in Sect. 3.2.4.
The following path expression path1 adds control on the operation executions
such that the file can now be accessed by any number of readers and any number of
writers. Hence, with such a control enrichment, the file becomes a concurrent file.
The path expression path1 is defined as follows:
 
path1 = path 1 : [read_file], write_file end path.
It is easy to see that path1 states that, at any time, access to the file is restricted to
a single writer or (due to the de-restriction on read_file) to any number of readers.
Thus, path1 gives weak priority to the readers.
Let us now replace the operation write_file() by a new write operation denoted
WRITE_file() defined as follows: operation WRITE_file(v) is write_file(v);
return() end operation. Thus, the file object offers the operations read_file() and
WRITE_file() to the processes; write_file() is now an internal procedure used to
implement the object operation WRITE_file(). (If the file was used by a single
process, WRITE_file() and write_file() would be synonyms, which is no longer the
case in a concurrency context as we are about to see.)
Considering such a file object, let us define the two following path expressions
path2 and path3 :
 
path2 = path 1 : [read_file], [WRITE_file] end path,
path3 = path 1 : write_file end path.
The path expression path3 states that no two processes can write the file simulta-
neously. Due to its restriction operator, path2 states that, at any time, access to the
98 3 Lock-Based Concurrent Objects

operation is

end operation

operation is

end operation

Fig. 3.28 A buffer for a single producer and a single consumer

file is given either to any number of readers (which have invoked read_file()) or any
number of writers (which have invoked WRITE_file()). Moreover, path2 defines a
kind of alternating priority. If a reader is reading, it gives access to the file to all the
readers that arrive while it is reading and, similarly, if a writer is writing, it reserves
the file for all the writers that are waiting.
Producer-consumer Let us consider a buffer of size k shared by a single producer
and a single consumer. Using the same base objects (BUF[0..(k − 1)], in and out)
as in Fig. 3.4 (Sect. 3.2.2), the operations of such a buffer object B are defined in
Fig. 3.28.
The following path expression path4 defines the synchronization control associ-
ated with such a buffer:
 
path4 = path k : prod; cons end path.
If there are both several readers and several writers, it is possible to use the same
object B. (The only difference for B is that now in is shared by the producers and out
is shared by the consumers, but this does not entail a modification of the code of B.)
The only modification is the addition of synchronization constraints specifying that
at most one producer at a time is allowed to produce and at most one consumer at a
time is allowed to consume.
Said differently, the only modification is the replacement of path4 by path5 ,
defined as follows:
 
path5 = path k : (1 : prod); ((1 : cons) end path.

3.4.3 A Semaphore-Based Implementation of Path Expressions

This section presents a semaphore-based implementation of path expressions. An


implementation pattern is defined for each operator (concurrency, sequentiality,
restriction, and de-restriction). For each object operation op(), we obtain a control
prefix and a control suffix that are used to bracket any invocation of the operation.
These prefixes and suffixes are the equivalent of the control operations begin_op()
and end_op() which have to be explicitly defined when using an imperative approach.
3.4 Declarative Synchronization: Path Expressions 99

Generating prefixes and suffixes Let pe denote a path expression. prefix(pe) and
suffix(pe) denote the code prefix and the code suffix currently associated with pe.
These prefixes and suffixes are defined recursively starting with the path expression
pe and proceeding until the prefix and suffix of each operation is determined. Initially,
prefix(pe) and suffix(pe) are empty control sequences.
1. Concurrency rule. Let pe = pe1 , pe2 . The expression prefix(pe) pe1 ,
pe2 suffix(pe) gives rise to the two expressions prefix(pe) pe1 suffix(pe)
and prefix(pe) pe2 suffix(pe), which are then considered separately.
2. Sequentiality rule. Let pe = pe1 ; pe2 . The expression prefix(pe) pe1 ;
pe2 suffix(pe) gives rise to two expressions which are then considered sep-
arately, namely the expression prefix(pe) pe1 S.up() and the expression
S.down() pe2 suffix(pe) where S is a new semaphore initialized to 0. As we
can see, the aim of the semaphore S is to force pe2 to wait until pe1 is executed.
Hence, for the next step, as far as pe1 is concerned, we have prefix(pe1 ) =
prefix(pe) and suffix(pe1 )=S.up(). Similarly, we have prefix(pe2 )=S.down() and
suffix(pe1 ) = S.up().
3. Restriction rule. Let pe = k : pe1 . The expression prefix(pe)k : pe1suffix(pe)
gives rise to the expression S  .down(); prefix(pe) pe1 suffix(pe); S  .up(),
where S  is a new semaphore initialized to k.
Hence, we have prefix(pe1 ) = S  .down(); prefix(pe) and suffix(pe1 ) = suffix(pe);
S  .up() to proceed recursively (if pe1 is not an operation name).
4. De-restriction rule. Let pe = [pe1 ]. The expression prefix(pe)[pe1 ]
suffix(pe) gives rise to the expression prio_down(CT , S  , prefix(pe))
pe1 prio_up(CT , S  , suffix(pe)), where
• CT is a counter initialized to 0,
• S  is a new semaphore initialized to 1, and
• prio_down() and prio_up() are defined as indicated in Fig. 3.29.

operation is

if then end if

end operation

operation is

if then end if

end operation

Fig. 3.29 Operations prio_down() and prio_up()


100 3 Lock-Based Concurrent Objects

The aim of these operations is to give priority to the processes that invoke an
operation involved in the path expression pe1 . (The reader can check that their
internal statements are the same as the ones used in Fig. 3.11 to give weak priority
to the readers.)

An example To illustrate the previous rules, let us consider the following path
expression involving three operations denoted op1 , op2 , and op3 :
 
path 1 : [op1 ; op2 ], op3 end path.
 
Hence, we have initially pe = 1 : [op1 ; op2 ], op3 , prefix(pe) = suffix(pe) = 
(where  represents the empty sequence). Let us now apply the rules as defined
by their precedence order. This is also described in Fig. 3.30 with the help of the
syntactical tree associated with the considered path expression.
• Let us first apply the restriction rule (item 3). We obtain k = 1 with pe1 =
[op1 ; op2 ], op3 . It follows from that rule that prefix(pe1 ) = S1.down();  and
suffix(pe1 ) = ; S1.up(), where S1 is a semaphore initialed to 1.
• Let us now apply the concurrency rule (item 1). We have pe1 = pe2 , pe3 , where
pe2 = [op1 ; op2 ] and pe3 = op3 . It follows from that rule that:
– prefix(op3 ) = prefix(pe1 ) = S1.down() and suffix(op3 ) = suffix(pe1 ) =
S1.up(). Hence, any invocation of op3 () has to be bracketed by S1.down()
and S1.up().
– Similarly, prefix(pe2 ) = prefix(pe1 ) = S1.down() and suffix(pe2 ) = suffix(pe1 )
= S1.up().
• Let us now consider pe2 = [op1 ; op2 ] = [pe4 ]. Applying the de-restriction rule
(item 4) we obtain prefix(pe4 ) = prio_down(CT , S2, prefix(pe2 )) and suffix(pe4 )

Fig. 3.30 Derivation tree for a path expression


3.4 Declarative Synchronization: Path Expressions 101

operation is

if then end if

end operation

operation is end operation

operation is end operation

operation is

if then end if

end operation

operation is end operation

operation is end operation

Fig. 3.31 Control prefixes and suffixes automatically generated

= prio_up(CT , S2, suffix(pe2 )), where CT is a counter initialized to 0 and S2 a


semaphore initialized to 1, i.e.,
– prefix(op1 ; op2 ) = prio_down(CT , S2, S1.down()), and
– suffix(op1 ; op2 ) = prio_up(CT , S2, S1.up()).
• Finally, let us apply the sequentiality rule (item 2) to pe4 = op1 ; op2 . A new
semaphore S3 initialized to 0 is added, and we obtain
– prefix(op1 ) = prio_down(CT , S2, S1.down()),
– suffix(op1 ) = S3.up(),
– prefix(op2 ) = S3.down(), and
– suffix(op2 ) = prio_up(CT , S2, S1.up()).
These prefixes and suffixes, which are automatically derived from the path expres-
sion, are summarized in Fig. 3.31, where the code of prio_down and prio_up() is
explicit.

3.5 Summary

This chapter has presented the semaphore object and two programming language
constructs (monitors and path expressions) that allow the design of lock-based atomic
objects. Such language constructs provide a higher abstraction level than semaphores
102 3 Lock-Based Concurrent Objects

or base mutual exclusion when one has to reason on the implementation of concurrent
objects. Hence, they can make easier the job of programmers who have to implement
concurrent objects.

3.6 Bibliographic Notes

• The concept of a semaphore was introduced by E.W. Dijkstra in [89, 90].


Several variants of semaphores have since been proposed. Lots of semaphore-
based solutions to synchronization problems are described in [91].
• The readers-writers problem was defined by P.J. Courtois, F. Heymans, and
D.L. Parnas [80], who also presented semaphore-based solutions.
The use buffers to reduce delays for readers and writers is from [44].
• The concept of a monitor is due to P. Brinch Hansen [56, 57] and C.A.R. Hoare
[150]. This concept originated from the “secretary” idea of E.W. Dijkstra [89].
• The notion of transfer of predicate is from [48].
• An alternative to monitor conditions (queues) can be found in [174].
• A methodology for proving monitors is presented in [63]. Proof methodologies
for concurrent objects can be found in [27, 49, 69, 220].
• Path expressions were introduced by R.H. Campbell and N. Haberman [63]. An
implementation is described in [64]. Extensions are proposed in [62].
• Other formalisms have been proposed to express synchronization on concurrent
objects. Counters [116, 246] and serializers [149] are among the most well known.

3.7 Exercises and Problems

1. Considering the implementation of a semaphore S, prove the invariant that relates


the number of processes which are blocked on S and the value of the implemen-
tation counter associated with S.
2. Implement a rendezvous object from semaphores. This object must be a multi-
shot object which means that the same object can be repeatedly used by the same
set of processes (e.g., to re-synchronize at the beginning of a parallel loop).
3. Design two algorithms which implement the FIFO access rule for the readers-
writers problem: one based on semaphores, the second on a monitor. “FIFO
access rule” means that, in addition to the exclusion rules of the readers-writers
problem, the readers and the writers have to access in their arrival order, namely:
3.7 Exercises and Problems 103

operation is

end operation

operation is

end operation

Fig. 3.32 A variant of a semaphore-based implementation of a buffer

• A reader that arrives while another reader is reading can immediately access
the file if no writer is waiting. Otherwise, the reader has to wait until the writers
that arrived before it have accessed the file.
• A writer cannot bypass the writers and the readers which arrived before it.

4. Consider the producers-consumers algorithm described in Fig. 3.32, which is a


variant of the solution given in Fig. 3.4.
The semaphores FREE and BUSY have the same meaning as in Fig. 3.4. FREE
(which counts the number of available entries) is initialized to k, while BUSY
(which counts the number of full entries) is initialized to 0. MP (or MC) which is
initialized to 1, is used to prevent producers (or consumers) from simultaneously
accessing IN (or OUT ).
Is this algorithm correct? (To show that it is correct, a proof has to be given. To
show that it is incorrect, a counter-example has to be exhibited.)
5. For a single producer and a single consumer, let us consider the two solutions
described in Fig. 3.33. These solutions are based neither on semaphores nor on
monitors.
The integers IN and OUT are SWMR atomic registers initialized to 0, and the
array BUF[0..(k − 1)] (with k ≥ 2) is also made up of SWMR atomic registers.
IN and BUF[0..(k − 1)] are written only by the producer, while OUT is written
only by the consumer.
These two algorithms differ only in the management of the counter indexes IN
and OUT . In solution 1, their domain is bounded, namely [0..k − 1], while it is
not in solution 2.
• Let k ≥ 2. Prove that both solutions are correct.
• Let k = 1. Is solution 1 correct? Is solution 2 correct?
104 3 Lock-Based Concurrent Objects

==================== Solution # 1 ===============


operation is
wait

end operation

operation is
wait

end operation

==================== Solution # 2 ===============


operation is
wait

end operation

operation is
wait

end operation

Fig. 3.33 Two buffer implementations

• Considering each solution, is it possible that all entries of BUF[1..(k − 1)]


contain item values produced and not yet consumed?
• What tradeoff exists between (a) the fact that IN and OUT are bounded and
(b) the fact that all entries of BUF[1..(k − 1)] can be simultaneously full?
6. The algorithm in Fig. 3.34 describes a solution to the readers-writers problem.
This solution is based on a semaphore, denoted MUTEX and initialized to 1, and
an array FLAG[1..n] of SWMR atomic registers with one entry per process pi .
The atomic register FLAG[i] is set to true by pi when it wants to read the file
and reset to false after it has read it. MUTEX is a mutex semaphore that allows
a writer to exclude other writers and all readers.

• Prove that the algorithm is correct (a writer executes in mutual exclusion and
readers are allowed to proceed concurrently).
• What type of priority is offered by this algorithm?
• In the worst case how many process can be blocked on the wait statement?
• Let us replace the array FLAG[1..n] of SWMR atomic registers by a single
MWMR atomic register READERS initialized to 0, and
3.7 Exercises and Problems 105

operation is

end operation

operation is end operation

operation is
wait
end operation

operation is end operation

Fig. 3.34 A readers-writers implementation

– The statement FLAG[i] ← true is replaced by READERS ← READERS


+ 1,
– The statement FLAG[i] ← false is replaced by READERS ← READERS
− 1, and
– The predicate ∀ i : (¬FLAG[i]) is replaced by READERS = 0.
Is this modified solution correct? Explain why?

7. Design an efficient monitor-based solution to the producers-consumers problem


based on the Boolean arrays FULL[1..n] and EMPTY [1..n] used in Fig. 3.7. As
for the semaphore-based algorithm described in this figure, “efficient” means
here that producers must be allowed to produce concurrently and consumers
must be allowed to consume concurrently.

8. In a lot of cases, the invocation of the C.signal() operation of a monitor appears as


the last invocation inside a monitor operation. Design an efficient implementation
of a monitor that takes into account this feature.

9. Let us associate the semantics “signal all” with the C.signal() operation on each
condition C of a monitor. This semantics means that (if any) all the processes
which are blocked on the condition C are reactivated and have priority to obtain
mutual exclusion and re-access the monitor. The process which has invoked
C.signal() continues its execution inside the monitor. Considering this “signal
all” semantics:
• Design a readers-writers monitor with strong priority to the writers.
• Design an implementation for “signal all” monitors from underlying
semaphores.
10. The last writer. Let us consider the monitor-based solution with strong priority
to the readers (Fig. 3.23). Modify this solution so that only the last writer can be
blocked (it can be blocked only because a reader is reading or a writer is writing).
This means that, when a writer p invokes begin_write(), it unblocks the waiting
106 3 Lock-Based Concurrent Objects

writer q if there is one. The write (not yet done) of q is then “overwritten” by
the write of p and the invocation of begin_write() issued by q returns false.
To that end, the operation s conc_write_file(v) defined in Fig. 3.10 is redefined
as follows:
operation conc_write_file(v) is
r ← begin_write();
if (r) then write_file(v); end_write() end if;
return(r)
end operation.

Design an algorithm implementing the corresponding begin_write() operation.


This operation returns a Boolean whose value is true if the write of the value v
has to be executed.

11. Implement semaphores (a) from monitors, and (b) from path expressions.

12. Let us consider the railways system described in Fig. 3.35.


• Any number of trains can go concurrently from A to B, or from B to A, but
not at the same time (a single railway has to be shared and can be used in one
direction only at any time).

• The same between D and C.


• There is a unidirectional railway from B to C (upper arrow) but it can be used
by only one train at a time.
• There is a unidirectional railway from C to B (lower arrow) but it can be used
by only one train at a time.
This problem is on resource allocation. It includes issues related to process
(train) interactions (trains in opposite directions cannot use AB, or CD, at the
same time) and issues related to the fact that some resources have a bounded
capacity (mutual exclusion: at most one train at a time can go from B to C and
at most one train at a time can go from C to B).
A train is going either from A to D or from D to A. A train (process) from A to
D has to execute the following operations:
(a) start_from_A() when it starts from A.
(b) leave_B_to_C() when it arrives at B and tries to enter the line BC.

Fig. 3.35 Railways example


3.7 Exercises and Problems 107

(c) leave_C_to_D() when it arrives at C and tries to enter the line CD.
(d) arrive_in_D() when it arrives at D.
The same four operations are defined for the trains that go from D to A (A, B, C,
and D are replaced by D, C, B, and A).
Design first a deadlock-free monitor that provides the processes (trains) with the
previous eight operations. Design then a starvation-free monitor.
Hints.
• The internal representation of the monitor will be made of:
– The integer variables NB_AB, NB_BA, NB_CD, and NB_DC, where NB_xy
represents the number of trains currently going from x to y.
The binary variables NB_BC and NB_CB, whose values are 0 or 1.
All these control variables are initialized to 0.
– The six following conditions (queues): START _FROM_A, ENTER_BC,
ENTER_CD, START _FROM_D, ENTER_CB, ENTER_BA.
• The code of a process going from A to D is:
start_from_A(); . . . ; leave_B_to_C();...; leave_C_to_D(); . . . ;
arrive_in_D)().
• Before deriving the predicates that allow a train to progress when it executes
a monitor operation, one may first prove that the following relation must be
an invariant of the monitor internal representation:

(0 ≤ NB_AB, NB_BA, NB_CD, NB_DC) ∧ (0 ≤ NB_BC, NB_CB ≤ 1)


∧ (NB_AB × NB_BA = 0) (mutual exclusion on the part AB)
∧ (NB_CD
 × NB_DC = 0) (mutual exclusion on the part
 CD)
∧ (NB_AB + NB_BC ≤ 1) ∨ (NB_DC + NB_CB ≤ 1) (no deadlock).

13. Eventcounts and sequencers.


An eventcount EC is a counting object that provides processes with two opera-
tions denoted EC.advance() and EC.wait(). EC.advance() increases the counter
by 1, while EC.wait(x) blocks the invoking process until the counter is equal to
or greater than x.
A sequencer SQ is an object that outputs sequence numbers. It provides the
processes with the operation SQ.ticket(), which returns the next sequence
number.
• Design a solution to the (one)producer-(one)consumer problem based on two
eventcounts denoted IN and OUT .
108 3 Lock-Based Concurrent Objects

operation is

end operation

operation is end operation

operation is

end operation

operation is end operation

Fig. 3.36 Another readers-writers implementation

• Show that it is impossible to solve the producers-consumers problem using


eventcounts only. (Some mutual exclusion on producers on one side and con-
sumers on the other side is needed).
• Design a solution to the producers-consumers problem based on two event-
count and two sequencer objects.
• Implement a semaphore from a sequencer and an eventcount.
• Design a solution to the readers-writers problem from sequencers and event-
counts.
• Let an eventcount EC that takes values between 0 and 2 − 1 be represented
by a bit array B[1..]. Design algorithms which implement the operations
EC.advance() and EC.wait() assuming that the eventcount EC is represented
by a Gray code (which has the important property that EC.advance() requires
only a single bit to be written).
Solutions in [243].
14. Priority with path expressions.
Let us consider the readers-writers problem where the operations read_file() and
read_file() are the base operations which access the file (see Fig. 3.10).
In order to define an appropriate priority rule, new algorithms implementing
the operations conc_read_file() and conc_write_file() are defined as described
in Fig. 3.36. These implementations use the underlying operations begin_read()
and begin_write(), which are pure control operations.
Let us consider that the invocations to these operations are controlled by the
following pair of path expressions:
 
path1 = path 1 : begin_read, [begin_write; write_file]
 end path,
path2 = path 1 : [begin_read; read_file], write_file end path.

Let us observe that path2 defines mutual exclusion between a write invocation
and any other operation invocation while allowing concurrent read operations.
3.7 Exercises and Problems 109

The combination of the path expressions path1 and path2 defines the associated
priority. What type of priority is defined?
Solutions in [63].
Part II
On the Foundations Side:
The Atomicity Concept

This part of the book is made up of a single chapter that introduces the atomicity
concept (also called linearizability). This concept (which was sketched in the first
part of the book) is certainly (with non-determinism) one of the most important
concepts related to the concurrency and synchronization of parallel and distributed
programs. It is central to the understanding and the implementation of concurrent
objects. This chapter presents a formal definition of atomicity and its main
properties. Atomicity (which is different from sequential consistency or serializ-
ability) is the most popular consistency condition. This is due to the fact that
atomic objects compose ‘‘for free’’.
Chapter 4
Atomicity:
Formal Definition and Properties

Atomicity is a consistency condition, i.e., it allows us to answer the following


question: Is this implementation of a concurrent object correct? The atomicity notion
for read/write registers was introduced in Chap. 1, where algorithms that solve the
mutual exclusion problem (i.e., algorithms which implement lock objects) were pre-
sented. Chap. 3 presented semaphore objects and programming language constructs
which allow designers of concurrent objects to benefit from lock objects.
While atomicity was already informally introduced in Chap. 1, this chapter
presents it from a very general and formal perspective. In the literature, the term
“atomicity” is sometimes restricted to registers, while its extension to any concur-
rent object is usually called linearizability. This chapter considers both these words
as synonyms.

Keywords Atomicity · Legal history · History · Linearizability · Locality property ·


Partial operation · Sequential consistency · Sequential history · Serializability · Total
operation

4.1 Introduction

Fundamental issues Fundamental questions for a concurrent object designer are


the following:
• How can the behavior of a concurrent object be specified?
• What is a correct execution of a set of processes accessing one or several concurrent
objects?
• When considering object implementations that are not based on locks, how can
correctness issues be addressed if one or more processes stop their execution (fail)

M. Raynal, Concurrent Programming: Algorithms, Principles, and Foundations, 113


DOI: 10.1007/978-3-642-32027-9_4, © Springer-Verlag Berlin Heidelberg 2013
114 4 Atomicity: Formal Definition and Properties

Fig. 4.1 A sequential execution of a queue object

in the middle of an operation? (This possibility was not considered in the previous
chapters.)
Example To give a flavor of these questions, let us consider an unbounded first-in
first-out (FIFO) queue denoted Q which provides the processes with the following
two operations:
• Q.enq(v), which adds the value v at the tail of the queue, and
• Q.deq(), which returns the value at the head of the queue and suppresses it from
the queue. If the queue is empty, the default value ⊥ is returned.
Figure 4.1 describes a sequential execution of a system made up of a single process
using the queue. The time line, going from left to right, describes the progress of
the process when it enqueues first the value a, then the value b, and finally the value
c. According to the expected semantics of a queue, and as depicted in the figure,
the first invocation of Q.deq() returns the value a, the second returns the value
b, etc.
Figure 4.2 depicts an execution of a system made up of two processes sharing
the same queue. Now, process p1 enqueues first a and then b whereas process p2
concurrently enqueues c. As shown in the figure, the execution of Q.enq(c) by p2
overlaps the executions of both Q.enq(a) and Q.enq(b) by p1 . Such an execution
raises many questions, including the following: What values are dequeued by p1 and
p2 ? What values can be returned by a process, say p1 , if the other process, p2 , stops
forever in the middle of an operation? What happens if p1 and p2 share several queues
instead of a single one?

Fig. 4.2 A concurrent execution of a queue object


4.2 Computation Model 115

4.2 Computation Model

Addressing the previous questions and related issues start from the definition of a
precise computation model. This chapter presents first the base elements of such a
model and the important notion of a concurrent computation history.

4.2.1 Processes and Operations

The computation model consists of a finite set of n processes, denoted p1 , . . . , pn . In


order to collectively solve a given problem, the processes cooperate and synchronize
their activities by accessing concurrent objects. The set of processes and objects is
defined from a multiprocess program.
Operation execution and events Processes synchronize by executing operations
exported by concurrent objects. An execution by a process of an operation on an
object X is denoted X.op(arg)(res), where arg and res denote, respectively, the input
and output parameters of the invocation. The output corresponds to the reply to
the invocation. The notation X.op is sometimes used when the input and output
parameters are not important.
When there is no ambiguity, we sometimes say operations where we should
say operation invocations. The execution of an operation op() on an object X by a
process pi is modeled by two events, namely the event denoted inv[X.op(arg) by pi ]
that occurs when pi invokes the operation (invocation or start event), and the event
denoted resp[X.op(res) by pi ] that occurs when the operation terminates (response,
reply, or end event). We say that these events are (a) generated by the process
pi and (b) associated with the object X. Given an operation X.op(arg)(res), the
event resp[X.op(res) by pi ] is called the reply event matching the invocation event
inv[X.op(arg) by pi ].
Execution (or run) An execution of a multiprocess program induces a sequence
of interactions between processes and concurrent objects. Every such interaction is
represented by an event, i.e., the invocation of or the reply to an operation. A sequence
of events is called a history, and this is precisely how executions are abstracted in
the computation model. (The notion of history is detailed later in this chapter.)
Sequentiality of processes Each process is assumed to be sequential, which means
that it executes one operation of an object at a time; that is, the algorithm of a sequen-
tial process stipulates that, after an operation is invoked on an object and until a
matching reply is received, the process does not invoke any other operation. The fact
that processes are each sequential does not preclude them from concurrently invoking
operations on the same concurrent object. Sometimes, we focus on sequential exe-
cutions (sequential histories), which precisely preclude such concurrency (only one
process at a time invokes an operation on an object in a sequential execution). In this
particular case, there is no overlapping of operation executions by different processes.
116 4 Atomicity: Formal Definition and Properties

Fig. 4.3 Structural view of a system

4.2.2 Objects

An object has a name and a type. A type is defined by (1) the set of possible values
for (the states of) objects of that type, (2) a finite set of operations through which the
objects of that type can be manipulated, and (3) a specification describing, for each
operation, the condition under which that operation can be invoked, and the effect
produced after the operation was executed. Figure 4.3 presents a structural view of a
set of n processes sharing m objects.
Sequential specification The object types we consider are defined by a sequential
specification. (We talk interchangeably about the specification of the object or the
specification of the type.) A sequential specification depicts the behavior of the object
when accessed sequentially, i.e., in a sequential execution. This means that, despite
concurrency, the implementation of any such object has to provide the illusion of
sequential accesses. As already noticed, the aim is to facilitate the task of application
programmers who have to reason only about sequential specifications.
It is common to define a sequential specification by associating two predicates
with each operation. These predicates are called pre-assertion and post-assertion.
Assuming the pre-assertion is satisfied before executing the operation, the post-
assertion describes the new value of the object and the result of the operation returned
to the calling process. We refine the notion of sequential specification in terms of
histories later in this chapter.
Total versus partial operations An object operation is total if it is defined for
every state of the object; otherwise it is partial. This means that, differently from a
pre-assertion associated with a partial operation, the pre-assertion associated with a
total operation is always satisfied.
Deterministic versus non-deterministic operations An object operation is deter-
ministic if, given any state of the object that satisfies the pre-assertion of the oper-
ation, and given any valid input parameters of the operation, the output parameters
and the final state of the object are uniquely defined. An object type that has only
4.2 Computation Model 117

deterministic operations is said to be deterministic. (Objects of such a type are also


said to be deterministic.) Otherwise, the object and its type are non-deterministic.
A few examples As we have seen in Chap. 1, an atomic read/write register is defined
from a sequential operation and both its read and write operations are total.
Similarly, the unbounded FIFO queue defined in Sect. 4.1 can easily be defined
from a sequential specification, and it has total operations (as the queue is unbounded,
Q.enq(v) can always be executed, and as Q.deq() returns ⊥ when the queue is empty,
this operation also is total).
Let us consider a bounded queue Q such that Q.enq() is blocked when the queue
is full, and Q.deq() is blocked when the queue is empty. Such a queue can easily be
implemented by a monitor (see Chap. 3). It is easy to see that both Q.enq() and Q.deq()
are partial. If the queue Q is unbounded, Q.enq() is total (because an invocation of
Q.enq() cannot be blocked due to the fact that there is room to enqueue a new value)
while Q.deq() is partial (because no value can be returned when the queue is empty).
To illustrate the idea of a non-deterministic object type, consider a bag. This type
exports two operations: insert(e) (where e is an element from some value domain)
and remove(). The first operation simply returns an indication, say ok, stipulating that
the element was inserted into the bag. The second operation removes and returns any
element from the bag. Hence, the current state of the bag does not uniquely determine
which element will be removed, and this is the precise source of the non-determinism.
Finally, as we have seen in Chap. 1, a rendezvous object has no sequential speci-
fication.

4.2.3 Histories

The notion of an execution of a set of processes accessing concurrent objects is


formally captured by the concept of a history.
Representing an execution as a history of events When considering events gener-
ated by sequential processes accessing concurrent objects with a sequential specifica-
tion, it is always possible, without loss of generality, to arbitrarily order simultaneous
events. This is because simultaneous (invocation and reply) events are independent
(none of them can be the cause of the other). This observation makes it possible to
consider a total order relation (denoted <H in the following) on the events of an
execution abstracting the real-time order in which the events actually occur.
Hence, the interactions between a set of sequential processes and a set of shared
objects are modeled by a sequence of invocation and reply events, called a history
(sometimes also called a trace), and denoted H  = (H, <H ), where H is the set of
events generated by the processes and <H a total order on these events. The objects
and processes associated with events of H  = (H, <H ) are said to be involved in H. 
  
H|pi (H at pi ) is called a local history; it denotes the sub-sequence of H made up of
all the events generated by the process pi .
118 4 Atomicity: Formal Definition and Properties

Complete versus partial histories An operation is said to be complete in a history


if the history includes both the event corresponding to the invocation of the operation
and its reply. Otherwise we say that the operation is pending. The fact that an operation
is pending typically helps model the fact that a process is stopped by the operating
system (paged out or swapped out) or simply crashed, say because the processor
hosting that process incurs a physical fault.
A history without pending operations is said to be complete. A history with pend-
ing operations is said to be partial. Note that, being sequential, a process can have
at most one pending operation in a given history.
Equivalent histories Two histories H  and H  are said to be equivalent if they have

the same local histories, i.e., for each pi , H|pi = H |pi . So, equivalent histories are
built from the same set of events (remembering that an event includes the name
of an object, the name of a process, the name of an operation, and input or output
parameters).
Well-formed histories As histories are generated by sequential processes, we
 such that, for each process pi , H|p
restrict our attention to histories H  i is sequential.
Such a local history starts with an invocation, followed by a matching reply, followed
by another invocation, etc. The corresponding history H  is said to be well-formed.

Partial order on operations A history H  induces an irreflexive partial order on


its operations as follows. Let op = X.op1() by pi and op = Y .op2() by pj be two
operations. Informally, operation op precedes operation op if op terminates before
op starts, where “terminates” and “starts” refer to the time-line abstracted by the
<H total order relation. More formally
  def  
op →H op = resp[op] <H inv[op ] .

Two operations op and op are said to overlap (or be concurrent) in a history H  if
 
neither resp[op] <H inv[op ] nor resp[op ] <H inv[op]. Notice that two overlapping
operations are such that ¬(op →H op ) and ¬(op →H op).
A sequential history has no overlapping
 operations; i.e., for any pair of operations

op and op , we have (op = op ) ⇒ (op →H op ) ∨ ((op →H op) . →H is
 is a sequential history.
consequently a total order if H
Illustrating histories Figure 4.4 depicts the (well-formed) history H  associated
with the queue object execution described in Fig. 4.2. This history comprises ten
events e1 . . . e10 (e4, e6, e7, and e9 are explicitly detailed). As there is a single object,
its name is omitted. Let us notice that the operation enq(c) by p2 is concurrent with
both enq(a) and enq(b) issued by p1 . Moreover, as the history H  has no pending
operations, it is a complete history.
The sequence e1 . . . e9 is a partial history where the dequeue operation issued by
p1 is pending. The sequence e1 . . . e6 e7 e8 e10 is another partial history in which
the dequeue operation issued by p2 is pending. Finally, the history e1 . . . e8 has two
pending operations.
4.2 Computation Model 119

4.2.4 Sequential History

Definition A history is sequential if its first event is an invocation, and then (1) each
invocation event, except possibly the last, is immediately followed by the matching
reply event, and (2) each reply event, except possibly the last, is immediately followed
by an invocation event. The phrase “except possibly the last” associated with an
invocation event is due to the fact that a history can be partial. A complete sequential
history always ends with a reply event. A history that is not sequential is concurrent.
A sequential history models a sequential multiprocess computation (there are no
overlapping operations in such a computation), while a concurrent history models
a concurrent multiprocess computation (there are at least two overlapping opera-
tions in such a computation). Given that a sequential history  S has no overlapping
operations, the associated partial order →S defined on its operations is actually a
total order. With a sequential history, one can thus reason about executions at the
granularity of the operations invoked by the processes, instead of at the granularity
of the underlying events.
Strictly speaking, the sequential specification of an object is a set of sequential
histories involving solely that object. Basically, the sequential specification repre-
sents all possible sequential ways according to which the object can be accessed such
that the pre-assertion and post-assertion of each of its operations are respected.
Example The history H  = e1 e2 · · · e10 depicted in Fig. 4.4 is a complete concur-
rent history. On the other hand, the complete history
1 = e1 e3 e4 e6 e2 e5 e7 e9 e8 e10
H

is sequential: it has no overlapping operations. We can thus highlight its sequential


nature by separating its operations using square brackets as follows:

Fig. 4.4 Example of a history


120 4 Atomicity: Formal Definition and Properties

1 = [e1 e3] [e4 e6] [e2 e5] [e7 e9] [e8 e10].
H

The histories

2 = [e1 e3] [e4 e6] [e2 e5] [e8 e10] [e7 e9],
H
3 = [e1 e3] [e4 e6] [e8 e10] [e2 e5] [e7 e9].
H

 H
are also sequential. Let us also notice that H, 2 , H
1 , H 3 are equivalent histories

(they have the same local histories). Let H4 be the history defined as

4 = [e1 e3] [e4 e6] [e2 e5] [e8 e10] [e7 e9].
H

4 is a partial sequential history. All these histories have the same local history for
H
 1 =H
process p1 : H|p 1 |p1 = H 2 |p1 = H3 |p1 = H 4 |p1 = [e1 e3] [e4 e6] [e8 e10],
 
and, as far p2 is concerned, H3 |p2 is a prefix of H|p2 = H 1 |p2 = H
2 |p2 = H3 |p2 =
[e2 e5] [e7 e9].
Hence, the notion of a history is an abstract way to depict the interactions between
a set of processes and a set of concurrent objects. In short, a history is a total order
on the set of (invocation and reply) events generated by the processes on the objects.
As we are about to see, the notion of a history is central to defining the notion of
atomicity through the very notion of atomic history.

4.3 Atomicity

The role of a correctness condition is to select, among all possible histories of a set
of processes accessing shared objects, those considered to be correct. This section
introduces the correctness condition called atomicity (also called linearizability). The
aim of atomicity is to transform the difficult problem of reasoning about a concurrent
execution into the simpler problem of reasoning about a sequential one.
Intuitively, atomicity states that a history is correct if its invocation and reply
events could have been obtained, in the same order, by a single sequential process. In
an atomic (or linearizable) history, each operation has to appear as if it was executed
alone and instantaneously at some point between its invocation event and its reply
event.

4.3.1 Legal History

As the concurrent objects that are considered are defined by sequential specifications,
a definition of what is a “correct” history has to refer in one way or another to these
specifications. The notion of legal history captures this idea.
4.3 Atomicity 121

Given a sequential history  S, let 


S|X (
S at X) denote the sub-sequence of S made
up of all the events involving object X. We say that a sequential history  S is legal
if, for each object X, the sequence  S|X belongs to the sequential specification of X.
In a sense, a history is legal if it could have been generated by processes accessing
sequentially concurrent objects.

4.3.2 The Case of Complete Histories

This section first defines atomicity for complete histories H,  i.e., histories without
pending operations: each invocation event of H  has a matching reply event in H. The
section that follows will extend this definition to partial histories.
 is atomic (or linearizable) if there is a “witness”
Definition A complete history H

history S such that:
 and 
1. H S are equivalent,
2. 
S is sequential and legal, and
3. →H ⊆→S .
 to be linearizable, there must
The definition above states that, for a history H
exist a permutation of H (namely the witness history 
 S) which satisfies the following
requirements:
• First,   and has to respect the
S has to be composed of the same set of events as H
local history of each process [item 1].
• Second, S has to be sequential (interleave the process histories at the granularity of
complete operations) and legal (respect the sequential specification of each object)
[item 2]. Notice that, as 
S is sequential, →S is a total order.
• Finally, 
S has also to respect the real-time occurrence order of the operations as
defined by →H [item 3].

S represents a history that could have been obtained by executing all the opera-
tions, one after the other, while respecting the occurrence order of non-overlapping
operations. Such a sequential history  
S is called a linearization of H.
Proving that an algorithm implements an atomic object To this end, we need
to prove that all histories generated by the algorithm are linearizable, i.e., iden-
tify a linearization of its operations that respects the “real-time” occurrence order
of the operations and that is consistent with the sequential specification of the
object.
It is important to notice that the notion of atomicity inherently includes a form
of non-determinism. More precisely, given a history H,  several linearizations of H

might exist.
122 4 Atomicity: Formal Definition and Properties

Linearization: an example Let us consider the history H  described in Fig. 4.4


where the dequeue operation invoked by p1 returns the value b while the dequeue
operation invoked by p2 returns the value a. This means that we have e9 =
resp[deq(a) by p2 ] and e10 = resp[deq(b) by p1 ].
To show that this history is linearizable, we have to exhibit a witness history
(linearization) satisfying the three requirements of atomicity. The reader can check
that history
H1 = [e1 e3] [e4 e6] [e2 e5] [e7 e9] [e8 e10]

defined in Sect. 4.2.4 is such a witness. At the granularity level defined by the oper-
ations, witness history H1 can be represented as f

[enq(a) by p1 ] [enq(b) by p1 ] [enq(c) by p2 ] [deq(a) by p2 ] [deq(b) by p1 ].

This formulation highlights the intuition that underlies the definition of the atomicity
concept.
Linearization point The very existence of a linearization of an atomic history H 
means that each operation of H  could have been executed at an indivisible instant
between its invocation and reply time events (while providing the same result as H).
It is thus possible to associate a linearization point with each operation of an atomic
history. This is a point of the time line at which the corresponding operation could
have been “instantaneously” executed according to the witness sequential and legal
history.
To respect the real-time occurrence order, the linearization point associated with
an operation has always to appear within the interval defined by the invocation event
and the reply event associated with that operation.
Example Figure 4.5 depicts the linearization point of each operation. A triangle is
associated with each operation, such that the vertex at the bottom of a triangle (bold
dot) represents the associated linearization point. A triangle shows how atomicity
allows shrinkage of an operation (the history of which takes some duration) into a
single point on the time line.
In that sense, atomicity reduces the difficult problem of reasoning about a con-
current system to the simpler problem of reasoning about a sequential system where
the operations issued by the processes are instantaneously executed.
As a second example, let us consider a variant of the history depicted in Fig. 4.5
where the reply events e9 and e10 are “exchanged”, i.e., we have now that e9 =
resp[deq(b) by p2 ] and e10 = resp[deq(a) by p1 ]. It is easy to see that this history
is linearizable: the sequential history H 2 described in Sect. 4.2.4 is a linearization
of it.
Similarly, the history where e9 = resp[deq(c) by p2 ] and e10 = resp[deq(a) by
p1 ] is also linearizable. It has the following sequential witness history:

[enq(c) by p2 ] [enq(a) by p1 ] [enq(b) by p1 ] [deq(c) by p2 ] [deq(a) by p1 ].


4.3 Atomicity 123

Fig. 4.5 Linearization points

Differently, the history in which the two dequeue operations would return the
same value is not linearizable: it does not have a witness history which respects the
sequential specification of the queue.

4.3.3 The Case of Partial Histories

Thissectionextendsthedefinitionofatomicitytopartialhistories.Asalreadyindicated,
thesearehistorieswithatleastoneprocesswhoselastoperationispending:theinvocation
event of this operation appears in the history while the corresponding reply event does
not. The history H4 described in Sect. 4.2.4 is a partial history. Extending atomicity to
partial histories is important as it allows arbitrary delays experienced by processes, or
even process crashes (when these delays become infinite), to be dealt with.
Definition A partial history H  is linearizable if H
 can be modified in such a way
that every invocation of a pending operation is either removed or completed with a
reply event, and the resulting (complete) history H is linearizable.
Basically, the problem of determining whether a partial history H is linearizable is
reduced to the problem of determining whether a complete history H  , extracted from
  
H, is linearizable. We obtain H by adding reply events to certain pending operations
 as if these operations have indeed been completed, but also removing invocation
of H,
events from some of the pending operations of H.  We require, however, that all
  
complete operations of H be preserved in H . It is important to notice that, given a
 that satisfy the required conditions.
 we can extract several histories H
history H,
Example Let us consider Fig. 4.6, which depicts two processes accessing a register.
Process p1 first writes the value 0. The same process later issues a write for the value 1,
but p1 crashes during this second write (this is indicated by a cross on its time line).
Process p2 executes two consecutive read operations. The first read operation lies
between the two write operations of p1 and returns the value 0. A different value
would clearly violate atomicity. The situation is less obvious with the second value,
124 4 Atomicity: Formal Definition and Properties

Fig. 4.6 Two ways of completing a history

and it is not entirely clear what value v has to be returned by the second read operation
in order for the history to be linearizable.
As explained below, both values 0 and 1 can be returned by that read operation
while preserving atomicity. The second write operation is pending in the partial
history H  modeling this execution. This history H  is made up of seven events (the
names of the object and the processes are omitted as there is no ambiguity), namely

inv[write(0)] resp[write(0)] inv[read(0)] resp[read(0)]


inv[read(v)] inv[write(1)] resp[read(v)].

We explain now why both 0 and 1 can be returned by the second read:
• Let us first assume that the returned value v is 0.
We can associate with history H  a legal sequential witness history H 0 which
includes only complete operations and respects the partial order defined by H 
 by remov-
0 , we construct history H
on these operations (see Fig. 4.6). To obtain H
ing the event inv[write(1)] from H:  we obtain a complete history, i.e., a history
without pending operations.
History H with v = 0 is consequently linearizable. The associated witness history
0 models the situation where p1 is considered as having crashed before invoking
H
the second write operation: everything appears as if this write had never been
issued.
• Assume that the returned value v is 1.
Similarly to the previous case, we can associate with history H  a witness legal

sequential history H1 that respects the partial order on the operations. We actually
1 by first constructing H
derive H  , which we obtain by adding to H
 the reply event
  —from which
res[write(1)]. (In Fig. 4.6, the part added to H in order to obtain H

H1 is constructed—is indicated by dotted lines.)
The history where v = 1 is consequently linearizable. The associated witness
1 represents the situation where the second write is taken into account
history H
despite the crash of the process that issued that write operation.
4.4 Object Composability and Guaranteed Termination Property 125

4.4 Object Composability and


Guaranteed Termination Property

This section presents two fundamental properties of atomicity that make it partic-
ularly attractive. The first property states that atomic objects can be composed for
free, while the second property states that, as object operations are total, no operation
invocation can be prevented from terminating.

4.4.1 Atomic Objects Compose for Free


The notion of a local property Let P be any property defined on a set of objects.
The property P is said to be local if the set of objects as a whole satisfies P whenever
each object taken alone satisfies P.
Locality is an important concept that promotes modularity. Consider some local
property P. To prove that an entire set of objects satisfy P, we only have to ensure that
each object, independently from the others, satisfies P. As a consequence, the prop-
erty P can be implemented for each object independently of the way it implemented
for the other objects. At one extreme, it is even possible to design an implementation
where each object has its own algorithm implementing P. At another extreme, all
the objects (whatever their type) might use the same algorithm to implement P (each
object using its own instance of the algorithm).
Atomicity is a local property We prove in the following that atomicity is a local
property. Intuitively, the fact that atomicity is local comes from the fact that it involves
the real-time occurrence order on non-concurrent operations whatever the objects and
the processes concerned by these operations. We will rely on this aspect in the proof
of the following theorem.
Theorem 14 A history H  is atomic (linearizable) if and only if, for each object X
 
involved in H, H|X is atomic (linearizable).
Proof The “⇒” direction (only if) is an immediate consequence of the definition
of atomicity: if H  is linearizable then, for each object X involved in H,  H|X
 is
linearizable. So, the rest of the proof is restricted to the “⇐” direction. We also
 is complete, i.e., H
restrict the rest of the proof to the case where H  has no pending
operation. This is without loss of generality, given that the definition of atomic-
ity for a partial history is derived from the definition of atomicity for a complete
history.
Given an object X, let SX be a linearization of H|X.
 It follows from the definition

of atomicity that SX defines a total order on the operations involving X. Let →X
denote this total order. We construct an order relation → defined on the whole set of
operations of H  as follows:

1. For each object X: →X ⊆→,


2. →H ⊆→.
126 4 Atomicity: Formal Definition and Properties

Basically, “→” totally orders all operations on the same object X, according to →X
(item 1), while preserving →H , i.e., the real-time occurrence order on the operations
(item 2). 
Claim. “→ is acyclic”. This claim means that → defines a partial order on the set of

all the operations of H.
Assuming this claim (see its proof below), it is thus possible to construct a sequen-
tial history   and respecting →. We trivially have →⊆→S ,
S including all events of H
where →S is the total order on the operations defined from  S. We have the three
following conditions: (1) H  and S are equivalent (they contain the same events and
contain the same local histories), (2) S is sequential (by construction) and legal (due
to item 1 above), and (3) →H ⊆→S (due to item 2 above and →⊆→S ). It follows
that H is linearizable.

Proof of the claim. We show (by contradiction) that → is acyclic. Assume first that
→ induces a cycle involving the operations on a single object X. Indeed, as →X is
a total order, in particular transitive, there must be two operations opi and opj on X
such that opi →X opj and opj →H opi . But opi →X opj ⇒ inv[opi ] <H resp[opj ]
because X is linearizable. As <H is a total order on the whole set of events, the fact
that opj →H opi ⇒ resp[opj ] <H inv[opi ] establishes the contradiction.
It follows that any cycle must involve at least two objects. To obtain a contradiction
we show that, in that case, a cycle in → implies a cycle in →H (which is acyclic).
Let us examine the way the cycle could be obtained. If two consecutive edges of the
cycle are due to just some →X or just →H , then the cycle can be shortened, as any
of these relations is transitive. Moreover, opi →X opj →Y opk is not possible for
X = Y , as each operation is on only one object (opi →X opj →Y opk would imply
that opj is on both X and Y ). So let us consider any sequence of edges of the cycle
such that: op1 →H op2 →X op3 →H op4 . We have:
– op1 →H op2 ⇒ resp[op1 ] <H inv[op2 ] (definition of op1 →H ),
– op2 →X op3 ⇒ inv[op2 ] <H resp[op3 ] (as X is linearizable),
– op3 →H op4 ⇒ resp[op3 ] <H inv[op4 ] (definition of op1 →H ).
Combining these statements, we obtain resp[op1 ] <H inv[op4 ], from which we can
conclude that op1 →H op4 . It follows that any cycle in → can be reduced to a cycle
in →H , which is a contradiction as →H is an irreflexive partial order. End of the
proof of the claim. 
The benefit of locality Considering an execution of a set of processes that access
concurrently a set of objects, atomicity allows the programmer to reason as if
all the operations issued by the processes on the objects were executed one after
the other. The previous theorem is fundamental. It states that, to reason about
sequential processes that access concurrent atomic objects, one can reason on
each object independently, without losing the atomicity property of the whole
computation.
4.4 Object Composability and Guaranteed Termination Property 127

Fig. 4.7 Atomicity allows objects to compose for free

An example Locality means that atomic objects compose for free. As an example,
let us consider two atomic queue objects Q1 and Q2 each with its own implementation
I1 and I2, respectively (hence, the implementations can use different algorithms).
Let us define the object Q that is a composition of Q1 and Q2 defined as fol-
lows (Fig. 4.7). Q provides processes with the four following operations Q.enq1(),
Q.deq1(), Q.enq2(), and Q.deq2() whose effect is the same as Q1.enq(),
Q1.deq(), Q2.enq() and Q2.deq(), respectively.
Thanks to locality, an implementation of Q consists simply in piecing together I1
and I2 without any modification to their code. As we will see in Sect.4.5, this object
composition property is no longer true for other consistency conditions.

4.4.2 Guaranteed Termination

Duetothefactthatoperationsaretotal,atomicity(linearizability)persedoesnotrequire
a pending invocation of an operation to wait for another operation to complete. This
means that, if a given implementation I of an atomic object entails blocking of a total
operation, this is not due to the atomicity concept but only to I. Blocking is an artifact
of particular implementations of atomicity but not an inherent feature of atomicity.
This property of the atomicity consistency condition is captured by the following
theorem, which states that any (atomic) history with a pending operation invocation
can be extended with a reply to that operation.
Theorem 15 Let inv[op(arg)] be the invocation event of a total operation that is
 There exists a matching reply event res[op(res)]
pending in a linearizable history H.
 
such that the history H = H.resp[op(res)] is linearizable.
Proof Let   By definition of a linearization,
S be a linearization of the partial history H.

S has a matching reply to every invocation. Assume first that  S includes a reply event
resp[op(res)] matching the invocation event inv[op(arg)]. In this case, the theorem
trivially follows, as then S is also a linearization of H .
128 4 Atomicity: Formal Definition and Properties

If 
S does not include a matching reply event, then 
S does not include inv[op(arg)].
Because the operation op() is total, there is a reply event resp[op(res)] matching
the invocation event inv[op(arg)] in every state of the shared object. Let S be the
sequential history  S with the invocation event inv[op(arg)] and a matching reply
S. S is trivially legal. It follows
event resp[op(res)] added in that order at the end of 
 .
that S is a linearization of H  

4.5 Alternatives to Atomicity

This section discusses two alternatives to atomicity, namely sequential consistency


and serializability.

4.5.1 Sequential Consistency

Overview Both atomicity and sequential consistency guarantee that operations


appear to execute instantaneously at some point on the time line. The difference
is that atomicity requires that, for each operation, this instant lies between the occur-
rence times of the invocation and reply events associated with the operation, which
is not the case for sequential consistency.
More precisely, the definition of atomicity requires that the witness sequential
 respects the partial order relation on operations
history that is equivalent to a history H

of H (also called the real-time order). This is irrespective of the process and the object
involved in the operations. Sequential consistency is a weaker property in the sense
that it requires only for the witness history to preserve the order on the operations
invoked by the same process.
Let <i denote the total order on the events generated by process pi (this order
corresponds to the projection of H  on pi denoted H|p  i in Sect. 4.2.3). Moreover,
let <proc = ∪1≤i≤n <i . This is the union of the n total orders associated with the
processes, which is called the process-order relation.
To illustrate this relation, consider Fig. 4.4, where <proc is the union of the local
history of p1 , <1 , and the local history of p2 , <2 , namely

<1 = [e1 e3] [e4 e6] [e8 e10] and <2 = [e2 e5] [e7 e9].

 can be partitioned into two partial order relations, namely


It is easy to see that H
<proc and <object (<object is H from which the edges due to <proc have been sup-
pressed).
As pointed out above, in contrast to atomicity that establishes the correctness
of a history using the constraints imposed by both <proc and <object , sequential
consistency establishes correctness based only on the process-order relation (i.e.,
<proc ).
4.5 Alternatives to Atomicity 129

Fig. 4.8 A sequentially consistent history

Definition The definition of the sequential consistency correctness condition reuses


the notions of history, sequential history, complete history, as in Sect. 4.2. To simplify
the presentation and without loss of generality, we only consider complete histories
(with no pending operations). Considering <proc to be the process order relation
on the events, we also define →proc as the partial order on the operations induced
from <proc .
A history H is sequentially consistent if there is a “witness” history  S such that:
 and 
1. H S are equivalent,
2. 
S is sequential and legal, and
3. →proc ⊆→S (
S has to respect process-order).
To illustrate sequential consistency, consider Fig. 4.8. There are two processes p1
and p2 that share a queue Q. At the operation level, the local history of p1 comprises
a single operation, Q.enq(a), while the local history of p2 comprises two operations,
first Q.enq(b) and then Q.deq(), which returns b. The reader can easily verify that
this history is not atomic. This is because, for atomicity, all the operations must
totally ordered according to real time. Consequently the Q.deq() operation issued by
p2 should return the value a whose enqueuing was terminated before the enqueuing
of a had started.
However, the history is sequentially consistent: the sequential history (described
at the operation level)


S = [Q.enq(b) by p2 ] [Q.enq(a) by p1 ] [Q.deq(b) by p2 ]

is legal and respects the process-order relation.


Atomicity versus sequential consistency It is easy to see from the previous def-
inition that any linearizable history is also sequentially consistent: this is because
→proc ⊆→H . As shown by the example of Fig. 4.8 however, the contrary is not true.
It is then natural to ask whether sequential consistency would not be sufficient to
judge correctness.
A drawback of sequential consistency is that it is not a local property. (This is
the price that sequential consistency has to pay to allow for more correct executions
than atomicity.) To illustrate this, consider the counter-example described in Fig. 4.9.
History H involves two processes accessing two concurrent queues Q and Q . It is
130 4 Atomicity: Formal Definition and Properties

Fig. 4.9 Sequential consistency is not a local property

easy to see that, when we consider each object in isolation, we obtain the histories
 and H|Q
H|Q   that are sequentially consistent. Unfortunately, there is no way to
witness a legal total order 
S that involves the six operations: if p1 dequeues b from
Q , Q .enq(a ) has to be ordered after Q .enq(b ) in a witness sequential history.
  

But this means that (to respect process-order) Q.enq(a) by p1 is necessarily ordered
before Q.enq(b) by p2 . Consequently Q.deq() by p2 should return a for  S to be
legal. A similar reasoning can be done starting from the operation Q.deq(b) by p2 .
It follows that there can be no legal witness total order. Hence, despite the fact that
 and H|Q
H|Q   are sequentially consistent, the whole history H  is not.

4.5.2 Serializability

Overview It is sometimes important to ensure that groups of operations appear


to execute as if they have been executed without interference with any other group
of operations. The concept of transaction is then the appropriate abstraction that
allows the grouping of operations. This abstraction is mainly encountered in database
systems.
A transaction is a sequence of operations that might complete successfully (com-
mit) or abort. In short, the execution of a set of concurrent transactions is correct
if committed transactions appear to execute at some indivisible point in time and
aborted transactions do not appear to have been executed at all. This correctness cri-
teria is called serializability (sometimes it is also called atomicity). The motivation
(again) is to reduce the difficult problem of reasoning about concurrent transactions
into the easier problem of reasoning about transactions that are executed one after
the other. For instance, if some invariant predicate on the set of shared objects is
preserved by every individual committed transaction, then it will be preserved by a
serializable execution of transactions.
Definition To define serializability, the notion of history needs to be revisited.
Events are now associated with objects and transactions. In short, processes are
replaced by transactions. For each transaction, in addition to the invocation and
reply events, two new events come into the picture: commit and abort events. These
are associated with transactions. At most one such event is associated with every
transaction in a history. A transaction without such an event is called pending; other-
wise the transaction is said to be complete (committed or aborted). Adding a commit
4.5 Alternatives to Atomicity 131

(or abort) event after all other events of a pending transaction is called commit-
ting (or aborting) the transaction. A sequential history is a sequence of committed
transactions.
Let →trans denote the total order of events of the committed transactions. This
is analogous to the process-order relation defined above. We say that a history is
complete if all its transactions are complete.
Let H be a complete history. H  is serializable if there is a “witness” history 
S
such that:
1.  
S is made up of all events of committed transactions of H,
2. 
S is sequential and legal, and
3. →trans ⊆→S (
S has to respect transaction order).
Let H be a history that is not complete. H is serializable if we can derive from
 a complete history H
H  (by completing or removing pending transactions from H) 
such that: (1) H  includes the complete transactions of H,
 is complete, (2) H  and (3)
 is serializable.
H
Atomicity versus serializability As for atomicity, serializability is defined accord-
ing to the equivalence to a witness sequential history, but differently from atomicity,
no real-time ordering is required. In this sense, serializability can be viewed as an
extension of sequential consistency to transactions where a transaction is made up of
several invocations of object operations. Unlike atomicity, serializability is not a local
property (replacing processes with transactions in Fig. 4.9 gives a counter-example).

4.6 Summary

This chapter has introduced the basic elements that are needed to reason about exe-
cutions of a multiprocess program whose processes cooperate through concurrent
objects (defined by a sequential specification on total operations). More specifically,
this chapter has presented the basic notions from which the atomicity concept has
then been defined.
The fundamental modeling element is that of a history: a sequence of events depict-
ing the interaction between processes and objects. An event represents the invocation
of an object or the return of a reply. A history is atomic if, despite concurrency, it
appears as if processes access the objects by invoking operations one after the other.
In this sense, the correctness of a concurrent computation is judged with respect to a
sequential behavior, itself determined by the sequential specification of the objects.
Hence, atomicity is what allows us to reason sequentially despite concurrency.
132 4 Atomicity: Formal Definition and Properties

4.7 Bibliographic Notes

• The notion of atomic read/write objects (registers), as studied here, was investi-
gated and formalized by L. Lamport [189] and J. Misra [206].
• The generalization of the atomicity consistency condition to objects of any sequen-
tial type was developed by M. Herlihy and J. Wing under the name linearizability
[148].
• The notion of sequential consistency was introduced by L. Lamport [187].
The relation between atomicity and sequential consistency was investigated in
[40] and [232], where it was shown that, from a protocol design point of view,
sequential consistency can be seen as lazy linearizability. Examples of protocols
implementing sequential consistency can be found in [3, 40, 233].
• The concept of transactions is part of almost every textbook on database systems.
Books entirely devoted to transactions include [50, 97, 119]. The theory of serial-
izability is the main topic of [97, 222].
Part III
Mutex-Free Synchronization

While Part I was devoted to lock-based synchronization, this part of the book is on
the design of concurrent objects whose implementation does not rely on mutual
exclusion. It is made up of five chapters:
• The first chapter introduces the notion of a mutex-free implementation (i.e.,
implementations which are not allowed to rely on locks) and the associated
liveness properties, namely obstruction-freedom, non-blocking, and wait-
freedom.
• The second chapter introduces the notion of a hybrid implementation, namely
an implementation which is partly lock-based and partly mutex-free.
• The next three chapters are on the power of atomic read/write registers when one
has to design wait-free object implementations. These chapters show that non-
trivial objects can be built in such a particularly poor context. To that end, they
present wait-free implementations of the following concurrent objects: weak
counters, store-collect objects, snapshot objects, and renaming objects.
Remark on terminology As we are about to see, the term mutex-freedom is used
to indicate that the use of critical sections (locks) is prohibited. The term lock-
freedom could have been used instead of mutex-freedom. This has not been done
for the following reason: the term lock-freedom is already used in a lot of papers
on synchronization with different meanings. In order not to overload it and to
prevent confusion, the term mutex-freedom is used in this book.
Chapter 5
Mutex-Free Concurrent Objects

This chapter is devoted to mutex-free implementations of concurrent objects.


Mutex-freedom means that in no way (be it explicit or implicit) is the implementa-
tion of a concurrent object allowed to rely on critical sections (locks). The chapter
consequently introduces new progress conditions suited to mutex-free object imple-
mentations, namely obstruction-freedom, non-blocking, and wait-freedom. It then
presents mutex-free implementations of concurrent objects (splitter, queue, stack,
etc.) that satisfy these progress conditions. Some of these implementations are based
on read/write atomic registers only, while others use also more sophisticated regis-
ters that can be accessed by hardware-provided primitive operations such as com-
pare&swap, swap, or fetch&add (which are stronger than base read/write operations).
To conclude, this chapter presents an approach based on failure detectors that allows
the construction of contention managers that permit a non-blocking or a wait-free
implementation of a concurrent object to be obtained from an obstruction-free imple-
mentation of that object.

Keywords Contention manager · Implementation boosting · Mutex-freedom ·


Obstruction-freedom · Non-blocking · Process crash · Progress condition · Wait-
freedom

5.1 Mutex-Freedom and Progress Conditions

5.1.1 The Mutex-Freedom Notion

Locks are not always the panacea As we have seen in Chaps. 1 and 3, the systematic
use of locks constitutes a relatively simple method to implement atomic concurrent
objects defined by total operations. A lock is associated with every object O and all the
operation invocations on O are bracketed by acquire_lock() and release_lock() so that
at most one operation invocation on O at a time is executed. However, as we are about to
see in this chapter, locks are not the only approach to implement atomic objects. Locks

M. Raynal, Concurrent Programming: Algorithms, Principles, and Foundations, 135


DOI: 10.1007/978-3-642-32027-9_5, © Springer-Verlag Berlin Heidelberg 2013
136 5 Mutex-Free Concurrent Objects

have drawbacks related to process blocking and the granularity of the underlying base
objects used in the internal representation of the object under construction.
As far as the granularity of the object protected by a lock is concerned, let us
consider a lock-based implementation of a bounded queue object Q with total oper-
ations (Q.deq() returns ⊥ when the queue is empty and Q.enq() returns  when
the queue if full). The use of a single lock on the whole internal representation of
the queue prevents Q.enq() and Q.deq() from being executed concurrently. This can
decrease the queue efficiency, as nothing prevents these two operations from exe-
cuting concurrently when the queue is neither empty nor full. A solution consists
in using locks at a finer granularity level in order to benefit from concurrency and
increase efficiency. Unfortunately this makes deadlock prevention more difficult and,
due to their very nature, locks cannot eliminate the blocking problem.
The drawback related to process blocking is more severe. Let us consider a process
p that for some reason (e.g., page fault) stops executing during a long period in the
middle of an operation on an object O. If we use locks, as we have explained above,
the processes which have concurrently invoked an operation on O become blocked
until p terminates its own operation. When such a scenario occurs, processes suffer
delays due to other processes. Such an implementation is said to be blocking-prone.
The situation is even worse if the process p crashes while it is in the middle of an
operation execution. (In an asynchronous system a crash corresponds to the case
where the speed of the corresponding process becomes and remains forever equal to
0, this being never known by the other processes. This point is developed below at
the end of Sect. 5.1.2.) When this occurs, p never releases the lock, and consequently,
all the processes that will invoke an operation on O will become blocked forever.
Hence, the crash of a process creates an infinite delay that can entail a deadlock on
all operations accessing the object O.
These observations have motivated the design of concurrent object implementa-
tions that do not use locks in one way or another (i.e., explicitly or implicitly). These
implementations are called mutex-free.
Operation level versus implementation level Let us consider an object O with
two operations O.op1() and O.op2(). At the user level, the (correct) behaviors of O
are defined by the traces of its sequential specification.
When considering the implementation level, the situation is different. Each exe-
cution of O.op1() or O.op2() corresponds to a sequence of invocations of base
operations on the base objects that constitute the internal representation of O.
If the implementation of O is lock-based and we do not consider the execution of
the base operations that implement acquire_lock() and release_lock(), the sequence
of base operations produced by an invocation of O.op1() or O.op2() cannot be
interleaved with the sequence of base operations produced by another operation
invocation. When the implementation is mutex-free, this is no longer the case, as
depicted in Fig. 5.1.
Figure 5.1 shows that the invocations of O.op1() by p1 , O.op2() by p2 , and O.op1()
by p3 are linearized in that order (i.e., they appear to have been executed in that order
from an external observer point of view).
5.1 Mutex-Freedom and Progress Conditions 137

History
linearization at the object level

History at the
implementation level

Fig. 5.1 Interleaving at the implementation level

Let us assume that the internal representation of O is made up of three base


objects: R1, R2, and R3, which are atomic. It follows from the locality property
of the atomicity consistency condition that their invocations are totally ordered (see
Chap. 4). In the figure, the ones issued by p1 are marked by a triangle, the ones issued
by p2 are marked by a square, and the ones issued by p3 are marked by a circle. The
name on the base object accessed by a process appears below the corresponding
square, triangle, or circle.
Mutex-free implementation An implementation of an object O is mutex-free if no
code inside an operation on O is protected by a critical section. The only atomicity
notion that is used by such an implementation is the one on the base operations on
the objects which constitute the internal representation of O.
It follows that, inherently, a mutex-free implementation of an object O allows
base operations generated by the invocations of operations on O to be interleaved
(as depicted in Fig. 5.1). (On the contrary, it is easy to see that the aim of locks is to
prevent such interleaving scenarios from occurring.)
In order for a mutex-free implementation to be meaningful, unexpected and arbi-
trarily long pauses of one or more processes which execute operations on O must not
prevent the progress of other processes that invoke operations on the same object O.
This observation motivates the definition of progress conditions suited to mutex-free
implementations.

5.1.2 Progress Conditions

As shown in Chap. 1, deadlock-freedom and starvation-freedom are the relevant


progress conditions when one has to implement a lock object (i.e., solve the mutual
exclusion problem) or (by “transitivity”) implement a mutex-based atomic object.
Due to the possible interleaving of base operations generated by a mutex-free imple-
mentation of a concurrent object, the situation is different from the one encountered in
lock-based implementations, and consequently, progress conditions suited to mutex-
free implementations must be defined. Going from the weakest to the strongest, this
section defines three progress conditions for mutex-free implementations.
Obstruction-freedom Obstruction-freedom is a progress condition related to con-
currency. An algorithm implementing an operation op() is obstruction-free if it
satisfies the following property: each time an invocation of op() is executed in
138 5 Mutex-Free Concurrent Objects

isolation, it does terminate. More generally, an object implementation is obstruction-


free if the implementation of each of its operations is obstruction-free.
“Execute in isolation” means that there is a point in time after which no other
invocation of any operation on the same object is executing. It is nevertheless possible
that other invocations of operations on the same object are pending (started and
not yet terminated). If this is the case, “in isolation” means that these operation
invocations have momentarily stopped their execution. From a practical point of view,
“in isolation” means that a process executes alone during a “long enough period”.
This is because, as the processes are asynchronous, no upper bound on the time needed
by a process to execute an operation can be determined. The processes have no notion
of time duration, and consequently, the best that can be said is “long enough period”.
Let us observe that, in the presence of concurrency, it is possible that no
invocation on any operation does ever terminate. Let us also observe that noth-
ing prevents a particular obstruction-free implementation from doing more than
what is required. (This occurs, for example, when the implementation guaran-
tees termination of an operation in specific scenarios where there are concurrent
accesses to the internal representation of the object. In that case, the implementation
designer has to specify the additional specific progress property L that is ensured,
and the implementation is then the progress condition defined as “obstruction-
freedom + L”.)
The difficulty in designing an object implementation that is both mutex-free and
obstruction-free comes from the fact that the safety properties attached to the internal
representation of the objects (usually expressed with invariants) have to be main-
tained whatever the number of concurrent operation invocations that are modifying
this state. In other words, when considering mutex-free object implementations,
obstruction-freedom is not given for free.
Non-blocking Non-blocking is a stronger progress condition than obstruction-
freedom. Its definition involves potentially all the operations of an object (it is not
defined for each operation separately). The implementation of an object O is non-
blocking if, as soon as processes have invoked operations on O, at least one invocation
of an operation on O terminates.
As an example let us consider the case where two invocations of Q.enq() and
one invocation of Q.deq() are concurrently executing. Non-blocking states that one
of them terminates. If no new invocation of Q.enq() or Q.deq() is ever issued, it
follows from the non-blocking property that the three previous invocations eventually
terminate (because, after one invocation has terminated, the non-blocking property
states that one of the two remaining ones eventually terminates, etc.). Differently, if
new operation invocations are permanently issued, it is possible that some invocations
never terminates.
The non-blocking progress condition is nothing else than deadlock-freedom in
the context of mutex-free implementations. While the term “deadlock-freedom” is
5.1 Mutex-Freedom and Progress Conditions 139

associated with lock-based implementations, the term “non-blocking” is used as its


counterpart for mutex-free implementations.
Wait-freedom Wait-freedom is the strongest progress condition. The algorithm
implementing an operation is wait-free if it always terminates. More generally, an
object implementation is wait-free if any invocation of any of its operations termi-
nates. This means that operation termination is guaranteed whatever the asynchrony
and concurrency pattern.
Wait-freedom is nothing else than starvation-freedom in the context of mutex-
free implementations. It means that, when a process invokes an object operation, it
terminates after having executed a finite number of steps. Wait-freedom can be refined
as follows (where the term “step” is used to denote the execution of an operation on
an underlying object of the internal representation of the object O):

• Bounded wait-freedom. In this case there is an upper bound on the number of


steps that the invoking process has to execute before terminating its operation.
This bound may depend on the number of processes, on the size of the internal
representation of the object, or both.
• Finite wait-freedom. In this case, there is no bound on the number of steps executed
by the invocation of an operation before it terminates. This number is finite but
cannot be bounded.

When processes may crash A process crashes when it stops its execution prema-
turely. Due to the asynchrony assumption on the speed of processes, a crash can be
seen as if the corresponding process pauses during an infinitely long period before
executing its next step. Asynchrony, combined with the fact that no base shared-
memory operation (read, write, compare&swap, etc.) provides processes with infor-
mation on failures, makes it impossible for a process to know if another process
has crashed or is only very slow. It follows that, when we consider mutex-free
object implementations, the definition of obstruction-freedom, non-blocking, and
wait-freedom copes naturally with any number of process crashes.
Of course, if a process crashes while executing an object operation, it is assumed
that this invocation trivially terminates. As we have seen in Chap. 4 devoted to the
atomicity concept, this operation invocation is then considered either as entirely
executed (and everything appears as if the process crashed just after the invocation)
or not at all executed (and everything appears as if the process crashed just before
the invocation). This is the all-or-nothing semantics associated with crash failures
from the atomicity consistency condition point of view.
Hierarchy of progress conditions It is easy to see that obstruction-freedom, non-
blocking, and wait-freedom define a hierarchy of progress conditions for mutex-free
implementations of concurrent objects.
More generally, the various progress conditions encountered in the implementa-
tion of concurrent objects are summarized in Table 5.1.
140 5 Mutex-Free Concurrent Objects

Table 5.1 Progress conditions for the implementation of concurrent objects


Lock-based implementations Mutex-free implementations
Obstruction-freedom
Deadlock-freedom Non-blocking
Starvation-freedom Wait-freedom

5.1.3 Non-blocking with Respect to Wait-Freedom

The practical interest of non-blocking object implementations When there are


very few conflicts (i.e., it is rare that processes concurrently access the same object),
a non-blocking implementation is practically wait-free. This is because, in the very
rare occasions where there are conflicting operations, enough time elapses before a
new operation is invoked. So, thanks to the non-blocking property, the conflicting
invocations have enough time to terminate one after the other.
This observation motivates the design of non-blocking implementations because
they are usually more efficient and less difficult to design than wait-free implemen-
tations.
The case of one-shot objects A one-shot object is an object accessed at most once
by each process. As an example, a one-shot stack is a stack such that any process
invokes the operation push() or the operation pop() at most once during an execution.
Theorem 16 Let us consider a one-shot object accessed by a bounded number of
processes. Any non-blocking implementation of such an object is wait-free.
Proof Let n be the number of processes that access the object. Hence, there are
at most n concurrent operation invocations. As the object is non-blocking there is
a finite time after which one invocation terminates. There are then at most (n − 1)
concurrent invocations, and as the object is non-blocking, one of them terminates,
etc. It follows that each operation invocation issued by a correct process eventually
terminates. 

5.2 Mutex-Free Concurrent Objects

5.2.1 The Splitter:


A Simple Wait-Free Object from Read/Write Registers

Definition The splitter object was implicitly used in Chap. 1 when presenting Lam-
port’s fast mutual exclusion algorithm. A splitter is a concurrent object that provides
processes with a single operation, denoted direction(). This operation returns a value to
the invoking process. The semantics of a splitter is defined by the following properties:
5.2 Mutex-Free Concurrent Objects 141

processes invoke

processes processes

process

Fig. 5.2 Splitter object

• Validity. The value returned by direction() is right, left, or stop.


• Concurrent execution. If x processes invoke direction(), then:
– At most x − 1 processes obtain the value right,
– At most x − 1 processes obtain the value left,
– At most one process obtains the value stop.
• Termination. Any invocation of direction() terminates.

A splitter (Fig. 5.2) ensures that (a) not all the invoking processes go in the same
direction, and (b) the direction stop is taken by at most one process and exactly one
process in a solo execution. As we will see in this chapter, splitters are base objects
used to build more sophisticated concurrent objects.
Let us observe that, for x = 1, the concurrent execution property becomes: if a
single process invokes direction(), only the value stop can be returned. This property
is sometimes called the “solo execution” property.
A wait-free implementation A very simple wait-free implementation of a splitter
object SP is described in Fig. 5.3. The internal representation is made up of two
MWMR atomic registers: LAST , which contains a process index (its initial value is
arbitrary), and a binary register DOOR, whose domain is {open, closed} and which
is initialized to open.
When a process pi invokes SP.direction() it first writes its index i in the atomic
register LAST (line 1). Then it checks if the door is open (line 2). If the door has been
closed by another process, pi returns right (line 3). Otherwise, pi closes the door
(which can be closed by several processes, line 4) and then checks if it was the last
process to have invoked direction() (line 5). If this is the case, we have LAST = i
and pi returns stop, otherwise it returns left.
A process that obtains the value right is actually a “late” process: it arrived late at
the splitter and found the door closed. Differently, a process pi that obtains the value
left is actually a “slow” process: it set LAST ← i but was not quick enough during
the period that started when it wrote its index i into LAST (line 1) and ended when
it read LAST (line 5). According to the previous meanings for “late” and “slow”,
not all the processes can be late, not all the processes can be slow, and at most one
process can be neither late not slow, being “timely” and obtaining the value stop.
Theorem 17 The algorithm described in Fig. 5.3 is a correct wait-free implemen-
tation of a splitter.
142 5 Mutex-Free Concurrent Objects

operation is
(1)
(2) if
(3) then
(4) else
(5) if
(6) then
(7) else
(8) end if
(9) end if
end operation

Fig. 5.3 Wait-free implementation of a splitter object (code for process pi )

Proof The algorithm of Fig. 5.3 is basically the same as the one implementing
the operation conc_abort_op() presented in Fig. 2.12 (Chap. 2); abort 1 , abort 2 , and
commit are replaced by right, left, and stop. The following proof is consequently very
close to the proof of Theorem 4. We adapt and repeat it here for self-containment of
the chapter.
The validity property follows trivially from the fact that the only values that can
be returned are right (line 3), stop (line 6), and left (line 7).
As far as the termination property is concerned, let us observe that the code of the
algorithm contains neither loops nor wait statements. It follows that any invocation
of SP.direction() by a process (which does not crash) does terminate and returns a
value. The implementation is consequently wait-free.
As far as the solo execution property is concerned, it follows from a simple
examination of the code and the fact that the door is initially open that, if a single
process invokes SP.direction() (and does not crash before executing line 6), it returns
the value stop.
Let us now consider the concurrent execution property. For a process to obtain
right, the door must be closed (lines 2–3). As the door is initially open, it follows that
the door was closed by at least one process p and this was done at line 4 (which is the
only place where a process can close the door). According to the value of LAST (line
5), process p will return stop or left. It follows that, among the x processes which
invoke SP.direction(), at least one does not return the value right.
As far as the value left is concerned, we have the following. Let pi be the last
process that writes its index i into the register LAST (as this register is atomic, the
notion of “last” writer is well defined). If the door is closed, it obtains the value
right. If the door is open, it finds LAST = i and obtains the value stop. Hence, not
all processes can return left.
Let us finally consider the value stop. Let pi be the first process that finds LAST
equal to its own index i (line 5). This means that no process pj , j = i, has modified
LAST during the period starting when it was written by pi at line 1 and ending when
it was read by pi at line 5 (Fig. 5.4). It follows that any process pj that modifies LAST
5.2 Mutex-Free Concurrent Objects 143

No process has modified

Fig. 5.4 On the modification of LAST

after this register was read by pi will find the door closed (line 2). Consequently, any
such pj cannot obtain the value stop. 
The reader may check that the proof of the splitter object remains valid if
processes crash.

5.2.2 A Simple Obstruction-Free Object from Read/Write Registers

This section presents a simple obstruction-free timestamp object built from atomic
registers. Actually, the object is built from splitters, which as we have just seen, are
in turn built from atomic read/write registers.
Definition The object is a weak timestamp generator object which provides the
processes with a single operation denoted get_timestamp() which returns an natural
integer. Its specification is the following:
• Validity. No two invocations of get_timestamp() return the same value.
• Consistency. Let gt1 () and gt2 () be two distinct invocations of get_timestamp().
If gt1 () returns before gt2 () starts, the timestamp returned by gt2 () is greater than
the one returned by gt1 ().
• Termination. Obstruction-freedom.
It is easy to see that a lock-based implementation of a timestamp object is triv-
ial: an atomic register protected by a lock is used to supply timestamps. But, as
already noticed, locking and obstruction-freedom are incompatible in asynchronous
crash-prone systems. It is also trivial to implement this object directly from the
fetch&add() primitive. The presentation of such a timestamp generator object is
mainly pedagogic, namely showing an obstruction-free implementation built on top
of read/write registers only.
An algorithm The obstruction-free implementation relies on the following under-
lying data structures:

• NEXT defines the value of the next integer that can be used as a timestamp. It is
initialized to 1.
• LAST is an unbounded array of atomic registers. A process pi deposits its index i
in LAST [k] to indicate it is trying to obtain the timestamp k.
144 5 Mutex-Free Concurrent Objects

• COMP is another unbounded array of atomic Boolean registers with each entry
initialized to false. A process pi sets COMP[k] to true to indicate that it is competing
for the timestamp k (hence several processes can write true into COMP[k]). For
any k, COMP[k] is initialized to false.

The algorithm implementing the obstruction-free operation get_timestamp() is


described in Fig. 5.5. It is inspired by the wait-free algorithm described in Fig. 5.3
that implements a splitter. (The pair of registers LAST [k] and COMP[k] in Fig. 5.5
plays the same role as the registers LAST and CLOSED in Fig. 5.3.) A process pi first
reads the next possible timestamp value (register NEXT ). Then it enters a loop that
it will exit after it has obtained a timestamp (line 6).
In the loop, pi first writes its index in LAST [k] to indicate that it is the last process
competing for the timestamp k (line 3). Then, if it finds COMP[k] = false, pi sets
it to true to indicate that at least one process is competing for the timestamp k.
Let us observe that it is possible that several processes find COMP[k] equal to
false and set it to true (lines 4–5). Then, pi checks the predicate LAST [k] = i.
If this predicate is satisfied, pi can conclude that it is the last process that wrote into
LAST [k]. Consequently, all other processes (if any) competing for timestamp k will
find COMP[k] = true, and will directly proceed to line 8 to try to obtain timestamp
k + 1. Hence, they do not execute lines 5–6.
It is easy to see that if, after some time, a single process keeps on executing
the algorithm implementing get_timestamp(), it eventually obtains a timestamp. In
contrast, when several processes find COMP[k] equal to false, there is no guarantee
that one of them obtains the timestamp k.
The proof of this mutex-free implementation based on atomic read/write registers
only is similar to the proof of a splitter. It is left to the reader. (The fact that, for
any timestamp value k, there is at most one process that obtains that value follows
from the fact that exactly one splitter is associated with each possible timestamp
value.)

operation is
(1)
(2) repeat forever
(3)
(4) if
(5) then
(6) if then end if
(7) end if
(8)
(9) end repeat
end operation

Fig. 5.5 Obstruction-free implementation of a timestamp object (code for pi )


5.2 Mutex-Free Concurrent Objects 145

5.2.3 A Remark on Compare&Swap: The ABA Problem

Definition (reminder) The compare&swap operation was introduced in Chap. 2.


It is an atomic conditional write provided at hardware level by some machines. As
we have seen, its effect can be described as follows. X is the register on which this
machine instruction is applied and old and new two values. The new value new is
written into X if and only if the actual value of X is old. A Boolean result indicates
if the write was successful or not.
X.compare&swap(old, new) is
if (X = old) then X ← new; return(true) else return(false) end if.

The ABA problem When using compare&swap(), a process pi usually does the
following. It first reads the atomic register X (obtaining the value a), then executes
statements (possibly involving accesses to the shared memory) and finally updates
X to a new value c only if X has not been modified by another process since it was
read by pi . To that end, pi invokes X.compare&swap(a, c) (Fig. 5.6).
Unfortunately, the fact that this invocation returns true to pi does not allow pi
to conclude that X has not been modified since the last time it read it. This is
because, between the read of X and the invocation X.compare&swap(a, c) both
issued by pi , X could have been updated twice, first by a process pj that success-
fully invoked X.compare&swap(a, b) and then by a process pk that has successfully
invoked X.compare&swap(b, a), thereby restoring the value a to X. This is called
the ABA problem.
Solving the ABA problem This problem can be solved by associating tags
(sequence numbers) with each value that is written. The atomic register X is
then composed of two fields content, tag. When it reads X, a process pi obtains
a pair x, y (where x is the current “data value” of X) and it later invokes
X.compare&swap(x, y, c, y + 1) to write a new value c into X. It is easy to
see that the write succeeds only if X has continuously been equal to x, y.

statements;

statements possibly involving accesses to the shared memory;


then statements else statements if
statements.

Fig. 5.6 A typical use of compare&swap() by a process


146 5 Mutex-Free Concurrent Objects

5.2.4 A Non-blocking Queue


Based on Read/Write Registers and Compare&Swap

This section presents a non-blocking mutex-free implementation of a queue Q due to


M. Michael and M. Scott (1996). Interestingly, this implementation was included in
the standard Java Concurrency Package. Let us remember that, to be non-blocking,
this implementation has to ensure that, in any concurrency pattern, at least one invo-
cation always terminates.
Internal representation of the queue Q The queue is implemented by a linked list
as described in Fig. 5.7. The core of the implementation consists then in handling
pointers with the help of the compare&swap() primitive.
As far as registers containing pointer values are concerned, the following notations
are employed. If P is a pointer register, P ↓ denotes the object pointed to by P. If X
is an object, ↑ X denotes a pointer that points to X. Hence, (↑ X) ↓ and X denote
the same object.
The list is accessed from an atomic register Q that contains a pointer to a record
made up of two fields denoted head and tail. Each of these field is an atomic register.
Each atomic register (Q ↓).head and (Q ↓).tail has two fields denoted ptr and
tag. The field ptr contains a pointer, while the field tag contains an integer (see
below). To simplify the exposition, it is assumed that each field ptr and tag can be
read independently.
The list is made up of cells such that the first cell is pointed to by (Q ↓).head.ptr
and the last cell of the list is pointed to by (Q ↓).tail.ptr.
Let CELL be a cell. It is a record composed of two atomic registers. The atomic
register CELL.value contains a value enqueued by a process, while (similarly to
(Q ↓).head and (Q ↓).tail) the atomic register CELL.next is made up of two fields:
CELL.next.ptr is a pointer to the next cell of the list (or ⊥ if CELL is the last cell of
the list), and CELL.next.tag is an integer.
Initially the queue contains no element. At the implementation level, the list Q
contains then a dummy cell CELL (see Fig. 5.8). This cell is such that CELL.next.ptr
is (always) irrelevant and CELL.next.ptr = ⊥. This dummy cell allows for a simpler
algorithm. It always belongs to the list and (Q ↓).head.ptr always points to it.

Fig. 5.7 The list implementing the queue


5.2 Mutex-Free Concurrent Objects 147

Fig. 5.8 Initial state of the list

Differently, (Q ↓).tail.ptr points to the dummy cell only when the list is empty.
Moreover, we have initially (Q ↓).head.tag = (Q ↓).tail.tag = 0.
It is assumed that the operation new_cell() creates a new cell in the shared memory,
while the operation free_cell(pt) frees the cell pointed to by pt.
The algorithm implementing the operation Q.enq() As already indicated, these
algorithms consist in handling pointers in an appropriate way. An interesting point is
the fact that they require processes to help other processes terminate their operations.
Actually, this helping mechanism is the mechanism that implements the non-blocking
property.
The algorithm implementing the enq() operation is described at lines 1–13 of
Fig. 5.9. The invoking process pi first creates a new cell in the shared memory,
assigns its address to the local pointer pt_cell, and updates its fields value and next.ptr
(line 1). Then pi enters a loop that it will exit when the value v will be enqueued.
In the loop, pi executes the following statements. It is important to notice that,
in order to obtain consistent pointer values, these statements include sequences of
read and re-read (with compare&swap) to check that pointer values have not been
modified.
• Process pi first makes local copies (kept in tail and next) of (Q ↓).tail and
(tail.ptr ↓).next, respectively. These values inform pi on the current state of the
tail of the queue (lines 3–4).
• Then pi checks if the content of (Q ↓).tail has changed since it read it (line 5).
If it has changed, tail.ptr no longer points to the last element of the queue.
Consequently, pi starts the loop again.
• If tail = (Q ↓).tail (line 6), pi optimistically considers that no other process is
currently trying to enqueue a value. It then checks if next.ptr is equal to ⊥.

– If next.ptr = ⊥, pi optimistically considers that tail points to the last element


of the queue. It consequently tries to add the new element v to the list (lines
7–8). This is done in two steps, each based on a compare&swap: the first to
append the cell to the list, and the second to update the pointer (Q ↓).tail.
∗ Process pi tries first to append its new cell to the list. This is done by executing
the statement ((tail.ptr ↓).next).compare&swap(next, cell, next.tag +
1) (line 7). If pi does not succeed, this is because another process succeeded
in appending a new cell to the list. If this is the case, pi continues looping.
148 5 Mutex-Free Concurrent Objects
operation is
(1)
(2) repeat forever
(3)
(4)
(5) if then
(6) if
(7) then if
(8) then
(9) end if
(10) else
(11) end if
(12) end if
(13) end repeat
end operation

operation is
(14) repeat forever
(15)
(16)
(17)
(18) if then
(19) if
(20) then if then end if
(21)
(22) else
(23) if
(24) then
(25) end if
(26) end if
(27) end if
(28) end repeat
end operation

Fig. 5.9 A non-blocking implementation of a queue

∗ If process pi succeeds in appending its new cell to the list, it tries to update the
content of (Q ↓).tail. This is done by executing (Q ↓).tail).compare&swap
(tail, cell, tail.tag + 1) (line 8). Finally, pi returns from its invocation.
Let us observe that it is possible that the second compare&swap does not
succeed. This is the case when, due to asynchrony, another process pj did the
work for pi by executing line 10 of enq() or line 21 of deq().
– If next.ptr = ⊥, pi discovers that next does not point to the last element of
the queue. Hence, pi discovers that the value of (Q ↓).tail was not up to date
when it read it. Another process has added an element to the queue but had not
yet updated (Q ↓).tail when pi read it. In that case, pi tries to help the other
process terminate the update of (Q ↓).tail if not yet done. To that end, it executes
the statement ((Q ↓).tail).compare&swap(tail, next.ptr, tail.tag+1) (line
10) before restarting the loop.
5.2 Mutex-Free Concurrent Objects 149

Linearization point of Q.enq() The linearization point associated with an enq()


operation corresponds to the execution of the compare&swap statement of line 7.
This means that an enq() operation appears as if it was executed atomically when
the new cell is linked to the last cell of the list.
The algorithm implementing the operation Q.deq() The algorithm implementing
the deq() operation is described in lines 14–28 of Fig. 5.9. The invoking process
loops until it returns a value at line 24. Due to its strong similarity with the algorithm
implementing the enq() operation, the deq() algorithm is not described in detail.
Let us notice that„ if head = (Q ↓).head (i.e., the predicate at line 18 is false),
the head of the list has been modified while pi was trying to dequeue an element. In
that case, pi restarts the loop.
If head = (Q ↓).head (line 18) then the values kept in head and next
defining the head of the list are consistent. Process pi then checks if head.ptr =
tail.ptr, i.e., if (according to the values it has read at lines 15–16) the list cur-
rently consists of a single cell (line 19). If this is the case and this cell is the
dummy cell (as witnessed by the predicate next.ptr = ⊥), the value empty is
returned (line 20). In contrast, if next.ptr = ⊥, a process is concurrently adding
a new cell to the list. To help it terminate its operation, pi executes ((Q ↓).
tail).compare&swap(tail, next.ptr, tail.tag + 1) (line 21).
Otherwise (head = (Q ↓).head), there is at least one cell in addition to
the dummy cell. This cell is pointed to by next.ptr. The value kept in that
cell can be returned (lines 22–24) if pi succeeds in updating the atomic regis-
ter (Q ↓).head that defines the head of the list. This is done by the statement
((Q ↓).head).compare&swap(head, next.ptr, head.tag + 1) (line 23). If this
compare&swap succeeds, pi returns the appropriate value and frees the cell (pointed
to by next.ptr which was suppressed from the list, line 24). Let us observe that the
cell that is freed is the previous dummy cell while the cell containing the returned
value v is the new dummy cell.
Linearization point of Q.deq() The linearization point associated with a deq()
operation is the execution of the compare&swap statement of line 23 that termi-
nates successfully. This means that a deq() operation appears as if it was executed
atomically when the pointer to the head of the list (Q ↓).head is modified.
Remarks Both linearization points correspond to the execution of successful com-
pare&swap statements. The two other invocations of compare&swap statements
(lines 10 and 21) constitute the helping mechanism that realizes the non-blocking
property.
It is important to notice that, due to the helping mechanism, the crash of a process
does not annihilate the non-blocking property. If processes crash at any point while
executing enq() or deq() operations, at least one process that does not crash while
executing its operation terminates it.
150 5 Mutex-Free Concurrent Objects

5.2.5 A Non-blocking Stack


Based on Compare&Swap Registers

The stack and its operations The stack has two operations, denoted push(v)
(where v is the value to be added at the top of the stack) and pop(). It is a bounded
stack: it can contain at most k values. If the stack is full, push(v) returns the control
value full, otherwise v is added to the top of the stack and the control value done is
returned. The operation pop() returns the value that is at the top of the stack (and
suppresses it from the stack), or the control value empty if the stack is empty.
Internal representation of the stack This non-blocking implementation of an
atomic stack is due to N. Shafiei (2009). The stack is implemented with an atomic
register denoted TOP and an array of k + 1 atomic registers denoted STACK[0..k].
These registers can be read and can be modified only by using the compare&swap()
primitive.

• TOP has three fields that contain an index (to address an entry of STACK), a value,
and a counter. It is initialized to 0, ⊥, 0.
• Each atomic register STACK[x] has two fields: the field STACK[x].val, which
contains a value, and the field STACK[x].sn, which contains a sequence number
(used to prevent the ABA problem as far as STACK[x] is concerned).
STACK[0] is a dummy entry initialized to ⊥, −1. Its first field always contains the
default value ⊥. As far as the other entries are concerned, STACK[x] (1 ≤ x ≤ k)
is initialized to ⊥, 0.

The array STACK is used to store the contents of the stack, and the register TOP
is used to store the index and the value of the element at the top of the stack. The
contents of TOP and STACK[x] are modified with the help of the conditional write
instruction compare&swap() (which is used to prevent erroneous modifications of
the stack internal presentation).
The implementation is lazy in the sense that a stack operation assigns its new
value to TOP and leaves the corresponding effective modification of STACK to the
next stack operation. Hence, while on the one hand a stack operation is lazy, on the
other hand it has to help terminate the previous stack operation (as far as the internal
representation of the stack is concerned).
The algorithm implementing the operation push(v) When a process pi invokes
push(v), it enters a repeat loop that it will exit at line 4 or line 7. The process first
reads the content of TOP (which contains the last operation on the stack) and stores
its three fields in its local variables index, value, and seqnb (line 2).
Then, pi calls the internal procedure help(index, value, seqnb) to help terminate
the previous stack operation (line 3). That stack operation (be it a push() or a pop()) is
required to write the pair value, seqnb into
 STACK[index]. To that end, pi invokes
STACK[index].compare&swap. old, new with the appropriate values old and new
so that the write is executed only if not yet done (lines 17–18).
5.2 Mutex-Free Concurrent Objects 151

After its help (which was successful if not yet done by another stack operation)
to move the content of TOP into STACK[index], pi returns full if the stack is full
(line 4). If the stack is not full, it tries to modify TOP so that it registers its push
operation. This invocation of TOP.compare&swap() (line 7) succeeds if no other
process has modified TOP since it was read by pi at line 2. If it succeeds, TOP
takes its new value and push(v) returns the control value done (lines 7). Other-
wise pi executes the body of the repeat loop again until its invocation of push()
succeeds.
The triple of values to be written in TOP at line 7 is computed at lines 5–6. Process
pi first computes the last sequence number sn_of _next used in STACK[index + 1]
and then defines the new triple, namely newtop = index + 1, v, sn_of _next + 1, to
be written first in TOP and, later, in STACK[index + 1] thanks to the help provided
by the next stack operation (let us remember that sn_of _next + 1 is used to prevent
the ABA problem).
The algorithm implementing the operation pop() The algorithm implementing
this operation has exactly the same structure as the previous one and is nearly the
same. Its explanation is consequently left to the reader.
Linearization points of the push() and pop() operations The operations that
terminate are linearizable; i.e., they can be totally ordered on the time line, each
operation being associated with a single point of that line after its start event and
before its end event. Its start event corresponds to the execution of the first statement of
an operation, and its end event corresponds to the execution of the return() statement.
More precisely, an invocation of an operation appears as if it was atomically executed
• when it reads TOP (at line 2 or 10) if it returns full or empty (at line 4 or 12),
 
• or at the time at which its invocation TOP.compare&swap −, − (at line 7 or 15)
is successful (i.e., returns true).

Theorem 18 The stack implementation described in Fig. 5.10 is non-blocking.


Proof Let us first observe that, if a process p executes an operation while no
other process executes an operation, it does terminate. This is because the triple
(index, value, seqnb) it has read from TOP at line 2 or 11 is still in TOP when it exe-
cutes TOP.compare&swap (index, value, seqnb, newtop) at line 7 or 15. Hence,
the compare&swap() is successful and returns the value true, and the operation
terminates.
Let us now consider the case where the invocation of an operation by a process p
does not terminate (while p does not crash). This means that, between the read of TOP
at line 2 (or line 11) and the conditional write TOP.compare&swap(index, value,
seqnb, newtop) at line 7 (or line 15) issued by p, the atomic register TOP was
modified. According to the code of the push() and pop() operations, the only state-
ment that modifies TOP is the compare&swap() issued at line 7 (or line 15). If fol-
lows that another invocation of compare&swap() was successful, which means that
another push() and pop() terminated, completing the proof of the non-blocking
property. 
152 5 Mutex-Free Concurrent Objects

operation is
(1) repeat forever
(2)
(3)
(4) if then end if
(5)
(6)
(7) if then end if
(8) end repeat
end operation

operation is
(9) repeat forever
(10)
(11)
(12) if then end if
(13)
(14)
(15) if then end if
(16) end repeat
end operation

procedure
(17)
(18)
end procedure

Fig. 5.10 Shafiei’s non-blocking atomic stack

5.2.6 A Wait-Free Stack


Based on Fetch&Add and Swap Registers

The non-blocking implementation of a stack presented in the previous section was


based on a bounded array of compare&swap atomic registers. This section presents
a simple wait-free implementation of an unbounded stack. This construction, which
is due to Y. Afek, E. Gafni, and A. Morrison (2007), uses a fetch&add register and
an unbounded array of swap registers.
Internal representation of the stack STACK This representation is made up of
the following atomic registers which are not base read/write registers:

• REG[0..∞) is an array of atomic registers which contains the elements of the


stack. Each REG[x] can be written by any process. It can also be accessed by
any process by invoking the primitive REG[x].swap(v), which writes atomically
v into REG[x] and returns its previous value. Initially each REG[x] is initialized
to a default value ⊥ (which remains always unknown to the processes).
REG[0] contains always the value ⊥ (it is used only to simplify the description of
the algorithm).
5.2 Mutex-Free Concurrent Objects 153

• NEXT is an atomic register that contains the index of the next entry where a value
can be deposited. It is initialized to 1. This register can be read by any process. It
can be modified by any process by invoking NEXT .fetch&add(), which adds 1 to
NEXT and returns its new value.

The algorithm implementing the operation push(v) This algorithm, described


in Fig. 5.11, is simple. When a process invokes STACK.enq(v), it first computes the
next free entry (in) of the array (line 1) and then deposits its value in REG[in] (line 2).
The algorithm implementing the operation pop() The code of this algorithm
appears in Fig. 5.11. When it invokes STACK.pop(), a process pi first determines the
last entry (last) in which a value has been deposited (line 4). Then, starting from
REG[last], pi scans the array REG[0..last] (line 5). It stops scanning (downwards)
at the first register REG[x] whose value is different from ⊥ and returns it (lines 6–7).
Let us notice that, if pi returns a value, it has previously suppressed it from the corre-
sponding register when it invoked REG[x].swap(⊥). If the scan does not allow pi to
return a value, the queue is empty and, accordingly, pi executes return(empty) (line 9).
A remark on process crashes As indicated previously in this chapter, any mutex-
free algorithm copes naturally with process crashes. As the base operations that access
the shared memory (at the implementation level) are atomic, a process crashes before
or after such a base operation.
To illustrate this point, let us first consider a process pi that crashes while it is
executing STACK.push(v). There are two cases:

• Case 1: pi crashes after it has executed the atomic statement REG[in] ← v (line 2).
In this case, from an external observer point of view, everything appears as if pi
crashed after it invoked STACK.push(v).
• Case 2: pi crashes after it has obtained an index value (line 1) and before it invokes
the atomic statement REG[in] ← v. In this case, pi has obtained an entry in from

operation is
(1)
(2)
(3)
end operation

operation is
(4)
(5) for from to do
(6)
(7) if then end if
(8) end for
(9)
end operation

Fig. 5.11 A simple wait-free implementation of an atomic stack


154 5 Mutex-Free Concurrent Objects

NEXT but did not deposit a value into REG[in], which consequently will remain
forever equal to ⊥. In this case, from an external observer point of view, everything
appears as if the process crashed before invoking STACK.push(v).
From an internal point of view, the crash of pi just before executing REG[in] ← v
entails an increase of NEXT . But as the corresponding entry of the array REG will
remain forever equal to ⊥, this increase of NEXT can only increase the duration
of the loop but cannot affect its output.

Let us now consider a process pi that crashes while it is executing STACK.pop().


If pi crashes after it has executed the statement aux ← REG[x].swap(⊥) (line
6), which has returned it a value, everything appears to an external observer
as if pi crashed after the invocation of STACK.pop(). In the other case, every-
thing appears to an external observer as if pi crashed before the invocation of
STACK.pop().
Wait-freedom versus bounded wait-freedom A simple examination of the code
of push() shows that this operation is bounded wait-free: it has no loop and accesses
the shared memory twice.
In contrast, while all executions of STACK.pop() terminate, none of them can be
bounded. This is because the number of times the loop body is executed depends on
the current value of NEXT , which may increase forever. Hence, while no execution
of pop() loops forever, there is no bound on the number of iterations an execution
of Q.pop() has to execute. Hence, the algorithm implementing the operation pop()
is wait-free but not bounded wait-free.
On the linearization points of push() and pop() It is important to notice that
the linearization points of the invocations of push() and pop() cannot be statically
defined in a deterministic way. They depend on race conditions which occur dur-
ing the execution. This is due to the non-determinism inherent to each concurrent
computation.
As an example, let us consider the stack described in Fig. 5.12. The values a, b,
d, e, f , and g have been written into REG at the indicated entries. A process pi which
has invoked push(c) obtained the index value x (at line 1) before the invocations of
push(d), etc., push(g) obtained the indexes (x + 1), (x + 2), (x + 4), and (x + 5),
respectively. (The index (x+3) obtained by a process that crashed just after it obtained
that index value.) Moreover, pi executes REG[x] ← c only after d, e, f , and g have
been written into the array REG and the corresponding invocations of push() have
terminated. In that case the linearization point associated with push(c) is not the
time at which it executes REG[x] ← c but a time instant just before the linearization
points associated with push(d).

Fig. 5.12 On the linearization points of the wait-free stack


5.2 Mutex-Free Concurrent Objects 155

If pi executes REG[x] ← c after all the values deposited at entries with an index
greater than x have been removed from the stack, and before new values are pushed
onto the stack, then the linearization point associated with push(c) is the time at
which pi executes REG[x] ← c.
While the definition of the linearization points associated with the operation invo-
cations on a concurrent object is sometimes fairly easy, the previous wait-free imple-
mentation of a stack (whose algorithms are simple) shows that this is not always the
case. This is due to the net effect of the mutex-freedom requirement and asynchrony.

5.3 Boosting Obstruction-Freedom to Stronger Progress


in the Read/Write Model

Let us consider the case where (a) the processes can cooperate by accessing base
read/write atomic registers only and (b) any number of processes may crash. Let
us suppose that, in such a context, we have an obstruction-free implementation of
a concurrent object (hence this implementation relies only on read/write atomic
registers). An important question is then the following: Is it possible to boost this
implementation in order to obtain a non-blocking or even a wait-free implementa-
tion? This section presents an approach based on failure detectors that answers this
question.

5.3.1 Failure Detectors

As already indicated, given an execution E, a process is said to be correct in E if it


does not crash in execution E. Otherwise, it is faulty in E.
A failure detector is a device (object) that provides each process with a read-only
variable that contains information related to failures. According to the type and the
quality of this information, several types of failure detector can be defined. Two types
of failure detector are defined below.
When considering two failure detectors, one can be stronger than the other or they
can be incomparable. Failure detector FD1 is stronger than failure detector FD2 if
there is an algorithm that builds FD2 from FD1 and atomic read/write registers. If
additionally FD1 cannot be built from FD2, then FD1 is strictly stronger than FD2.
If FD1 is stronger than FD2 and FD2 is stronger than FD1, then FD1 and FD2 have
the same computability power in the sense that the information on failures provided
by either of them can be obtained from the other. If two failures detectors are such
that neither of them is stronger than the other one, they are incomparable.
The failure detector X (eventually restricted leadership) Let X be any non-
empty subset of process indexes. The failure detector denoted X provides each
156 5 Mutex-Free Concurrent Objects

process pi with a local variable denoted ev_leader(X) (eventual leader in the set X)
such that the following properties are always satisfied:
• Validity. At any time, the variable ev_leader(X) of any process contains a process
index.
• Eventual leadership. There is a finite time after which the local variables
ev_leader(X) of the correct processes of X contain the same index, which is
the index of one of them.

This means that there is an arbitrarily long anarchy period during which the
content of any local variable ev_leader(X) can change and, at the same time, distinct
processes can have different values in their local variables. However, this anarchy
period terminates for the correct processes of X, and when it has terminated, the
local variable ev_leader(X) of the correct processes of X contain forever the same
index, and it is the index of one of them. The time at which this occurs is finite but
remains unknown to the processes. This means that, when a process of X reads x from
ev_leader(X), it can never be sure that px is correct. In that sense, the information
on failures (or the absence of failures) provided by X is particularly weak.
Remark on the use of X This failure detector is usually used in a context where
X denotes a dynamically defined subset of processes. It then allows these processes
to rely on the fact that one of them (which is correct) is eventually elected as their
common leader.
It is possible that, at some time, a process perceived locally X as being xi while
another process pj perceives it as being xj = xi . Consequently, the local read-only
variables provided by X are denoted ev_leader(xi ) at pi and ev_leader(xj ) at pj .
As xi and xj may change with time, this means that X may potentially be required
to produce outputs for any non-empty subset x of  (the whole set of processes
composing the system).
The failure detector ♦P (eventually perfect) This failure detector provides each
process pi with a local set variable denoted suspected such that the following prop-
erties are always satisfied:

• Eventual completeness. Eventually the set suspectedi of each correct process pi


contains the indexes of all crashed processes.
• Eventual accuracy. Eventually the set suspectedi of each correct process pi con-
tains only indexes of crashed processes.

As with X (a) there is an arbitrary long anarchy period during which each set
suspectedi can contain arbitrary values, and (b) the time at which this anarchy period
terminates remains unknown to the processes.
It is easy to see that ♦P is stronger than X (actually, it is strictly stronger). Let
assume that we are given ♦P. The output of X can be constructed as follows. For
a process pi such that i ∈/ X the current value of ev_leader(X) is any process index
and it can change at any time. For a process pi such that i ∈ X, the output of X is
5.3 Boosting Obstruction-Freedom to Stronger Progress in the Read/Write Model 157
 
defined as follows: ev_leader(X) = min (\suspected)∩X . The reader can check
that the local variables ev_leader(X) satisfy the validity and eventual leadership
of X .

5.3.2 Contention Managers


for Obstruction-Free Object Implementations

A contention manager is an object whose aim is to improve the progress of processes


by providing them with contention-restricted periods during which they can complete
object operations.
As we consider obstruction-free object implementations, the idea is to associate a
contention manager with each obstruction-free implementation. Hence, the role of a
contention manager is to create “favorable circumstances” so that object operations
execute without contention in order to guarantee their termination. For these “favor-
able circumstances”, the contention manager uses the computational power supplied
by a failure detector.
The structure of the boosting is described in Fig. 5.13. A failure-detector-based
manager implements two (control) operations denoted need_help() and stop_help()
which are used by the obstruction-free implementation of an object as follows:
• need_help() is invoked by a process which is executing an object operation to
inform the contention manager that it has detected contention and, consequently,
needs help to terminate its operation invocation.
• stop_help() is invoked by a process to inform the contention manager that it
terminates its current operation invocation and, consequently, no longer needs help.
As an example let us consider the timestamping object defined in Sect. 5.2.2
whose obstruction-free implementation is given in Fig. 5.5. The enrichment of this
implementation to benefit from contention manager boosting appears in Fig. 5.14.
The invocations of need_help() and stop_help() are underlined. As we can see,
need_help() is invoked by a process pi when it discovers that there is contention
on the timestamp k and it decides accordingly to proceed to k + 1. The operation

Fig. 5.13 Boosting obstruction-freedom


158 5 Mutex-Free Concurrent Objects

operation is
(1)
(2) repeat forever
(3)
(4) if
(5) then
(6) if then end if
(7) end if
(8)
(9) end repeat
end operation

Fig. 5.14 A contention-based enrichment of an obstruction-free implementation (code for pi )

stop_help() is invoked by a process when it has obtained a timestamp as it no longer


needs help.
Let us observe that the index of the invoking process is passed as a parameter when
a contention manager operation is invoked. This is because progress is on processes
and, consequently, a contention manager needs to know which processes have to be
helped.
The next two sections present two different implementations of the contention
manager object CM. The first makes the contention-based enriched obstruction-free
implementation non-blocking (such as the one described in Fig. 5.14), while the
second one makes it wait-free. It is important to notice the generic dimension of the
contention manager object CM.

5.3.3 Boosting Obstruction-Freedom to Non-blocking

An X -based implementation of a contention manager that boosts any object imple-


mentation from obstruction-freedom to non-blocking is described in Fig. 5.15. This
implementation relies on an array of SWMR atomic Boolean read/write regis-
ters NEED_HELP[1..n] with one entry per process. This array is initialized to
[false, . . . , false].
This contention manager compels the processes that require help to obey a simple
rule: only one of them at a time is allowed to make progress. Observance of this rule
is implemented thanks to the underlying failure detector X . As already indicated,
each process pi manages a local variable x containing its local view of X, which here
is the current set of processes that have required help from the contention manager.
When a process pi invokes CM.need_help(i), it first sets NEED_HELP[i] to true.
Then, it repeatedly computes the set x of processes that have required help from the
contention manager until it becomes the leader of this set. When this occurs, it returns
from CM.need_help(i). Let us observe that the set x computed by pi possibly changes
with time. Moreover, a process may crash after it has required help. A process pi
5.3 Boosting Obstruction-Freedom to Stronger Progress in the Read/Write Model 159

operation is

repeat until end repeat

end operation

operation is end operation

Fig. 5.15 A contention manager to boost obstruction-freedom to non-blocking

indicates that it no longer needs help by invoking CM.stop_help(i). Let us observe


that this implementation is bounded (the array needs only n bits).
Let an enriched obstruction-free implementation of an object be an obstruction-
free implementation of that object that invoked the operations need_help() and
stop_help() of a contention manager CM (as described in Fig. 5.14, Sect. 5.3.2).
Theorem 19 The contention manager described in Fig. 5.15 transforms an enriched
obstruction-free implementation of an object into a non-blocking implementation.
Proof Given an enriched obstruction-free implementation of an object that uses the
contention manager of Fig. 5.15, let us assume (by contradiction) that this imple-
mentation is not non-blocking.
There is consequently an execution in which there is a time τ after several oper-
ations are invoked concurrently and none of them terminates. Let Q be the set of all
the correct processes involved in these invocations.
Due to the enrichment of the object operations, it follows that eventually the
register NEED_HELP[i] associated with each process pi of Q remains permanently
equal to true. Moreover, as a crashed process does not recover, there is a finite
time after which the array NEED_HELP[1..n] is no longer modified. It follows that
there is a time τ  ≥ τ after which all the processes of Q compute the same set
x = {j | NEED_HELP[j]}. Let us notice that we do not necessarily have Q = x, (this
is due to the processes pj that have crashed while NEED_HELP[j] is true), but we
have Q ⊆ x.
It now follows from the validity and eventual leadership property of the failure
detector instance x that there is a time τ  ≥ τ  after which all the processes of Q
have permanently the same index in their local variables ev_leaderx and this index
belongs to Q. It follows from the text of CM.need_help() that this process is the only
process of Q that is allowed to progress. Moreover, due to the obstruction-freedom
property of the base implementation, this process then terminates its operation, which
contradicts our initial assumption and concludes the proof. 

5.3.4 Boosting Obstruction-Freedom to Wait-Freedom

A ♦P-based implementation of a contention manager that boosts any object imple-


mentation from obstruction-freedom to wait-freedom is described in Fig. 5.16. Let
160 5 Mutex-Free Concurrent Objects

us remember that ♦P provides each process pi with a set suspected that eventually
contains all crashed processes and only them.
This contention manager uses an underlying operation, denoted weak_ts(), that
generates locally increasing timestamps such that, if a process obtains a timestamp
value ts, then any process can obtain only a finite number of timestamp values
lower than ts. This operation weak_ts() can be implemented from atomic read/write
registers only. (Let us remark that weak_ts() is a weaker operation than the operation
get_timestamp() described in Fig. 5.5.)
The internal representation of the contention manager consists of an array of
SWMR atomic read/write registers TS[1..n] such that only pi can write TS[i]. This
array is initialized to [0, . . . , 0].
When pi invokes need_help(i), it assigns a weak timestamp to TS[i] (line 1). It will
reset TS[i] to 0 only when it executes stop_help(i). Hence, TS[i] = 0 means that pi
is competing inside the contention manager. After it has assigned a value to TS[i], pi
waits (loops) until the pair (TS[i], i) is the smallest pair (according to lexicographical
ordering) among the processes that (a) are competing inside the contention manager
and (b) are not locally suspected to have crashed (lines 2–4).
Theorem 20 The contention manager described in Fig. 5.16 transforms an enriched
obstruction-free implementation of an object into a wait-free implementation.
Proof The proof is similar to the proof of Theorem 19. Let us suppose (by con-
tradiction) that there is an operation invocation by a correct process pi that never
terminates. Let tsi be its timestamp (obtained at line 1). Moreover, let this invocation
be the one with the smallest pair tsi , i among all the invocations issued by correct
processes that never terminate.
It follows from the property of weak_ts() that any other process obtains a finite
number of timestamp values smaller than ts, from which we conclude that there is
a finite number of operation invocations that are lexicographically ordered before
tsi , i. Let I be this set of invocations. There are two cases.
• If an invocation of I issued by a process pj that is not correct (i.e., a process
that will crash in the execution) does not terminate, it follows from the eventual
accuracy of ♦P that eventually j is forever suspected pi (i.e., remains forever in its
set suspected).

operation is
(1) if then end if
(2) repeat
(3) let be
(4) until end repeat
end operation

operation is end operation

Fig. 5.16 A contention manager to boost obstruction-freedom to wait-freedom


5.3 Boosting Obstruction-Freedom to Stronger Progress in the Read/Write Model 161

It then follows from the predicate tested by pi at line 1 that there is a finite time
after which, whatever the value of the pair tsj , j attached to the invocation issued
by pj , j will never belong to the set competing repeatedly computed by pi . Hence,
these invocations cannot prevent pi from progressing.
• Let us now consider the invocations in I issued by correct processes. Due to
the definition of the pair tsi , i and pi , all these invocation terminate. Moreover,
due to the definition of I, any of these processes pj that invokes again an operation
obtains a pair such that the pair tsj , j is greater than the pair tsi , i. Consequently,
the fact that j belongs or not to the set suspected of pi cannot prevent pi from
progressing.
To conclude the proof, as pi is correct, it follows from the eventual completeness
property of ♦P that there is a finite time after which i never belongs to the set
supectedk of any correct process pk .
Hence, there is a finite time after which, at any correct process pj , i ∈
/ suspected
and tsi , j is the smallest pair. As the number of processes is bounded, it follows
that, when this occurs, only pi can progress. 
On the design principles of contention managers As one can sec, this contention
manager and the previous one are based on the same design principle. When a process
asks for help, a priority is given to some process so that it can proceed alone and
benefit from the obstruction-freedom property.
In the case of non-blocking, it is required that any one among the concurrent
processes progresses. This was obtained from X , and the only additional under-
lying objects which are required are bounded atomic read/write registers. As any
invocation by a correct process has to terminate, the case of wait-freedom is more
demanding. This progress property is obtained from ♦P and unbounded atomic
read/write registers.

5.3.5 Mutex-Freedom Versus


Loops Inside a Contention Manager Operation

In both previous contention managers, the operation need_help() contains a loop


that may prevent the invoking process from making progress. But this delay period
is always momentary and can never last forever.
Said differently, this loop does not simulate an implicit lock. This is due to the
“eventual leadership” or “eventual accuracy” property of the underlying failure detec-
tor. Each of these “eventual” properties ensures that the processes that crash are
eventually eliminated from the predicate that controls the termination of the repeat
loop of the need_help() operation.
162 5 Mutex-Free Concurrent Objects

5.4 Summary

This chapter has introduced the notion of a mutex-free implementation and the asso-
ciated progress conditions, namely obstruction-freedom, non-blocking, and wait-
freedom.
To illustrate these notions, several mutex-free implementations of concurrent
objects have been described: wait-free splitter, obstruction-free counter, non-blocking
queue and stack based on compare&swap registers, and wait-free queues based
on fetch&add registers and swap registers. Techniques based on failure detectors
have also been described that allow boosting of an obstruction-free implementa-
tion of a concurrent object to a non-blocking or wait-free implementation of that
object.

5.5 Bibliographic Notes

• The notion of wait-free implementation of an object is due to M. Herlihy [138].


• The notion of obstruction-free implementation is due to M. Herlihy, V. Luchangco,
and M. Moir [143].
A lot of obstruction-free, non-blocking, or wait-free implementations of queues,
stacks, and other objects were developed, e.g., in [135, 142, 182, 207, 210, 238,
266, 267].
The notion of obstruction-freedom, non-blocking, and wait-freedom are also ana-
lyzed and investigated in the following books [146, 262].
• The splitter-based obstruction-free implementation of a timestamping object
described in Fig. 5.5 is from [125].
The splitter object was informally introduced by L. Lamport in his fast mutual
exclusion algorithm [191], and given an “object” status by M. Moir and J. Anderson
in [209].
• The non-blocking queue based on compare&swap atomic registers is due to
M. Michael and M. Scott [205].
• The non-blocking stack based on compare&swap atomic registers is due to
N. Shafiei [253]. This paper presents also a proof of the stack algorithm and
an implementation of a non-blocking queue based on the same principles.
• The wait-free implementation of a stack presented in Sect. 5.2.6 based on a
fetch&add register and an unbounded array of swap registers is due to Y. Afek,
E. Gafni, and A. Morrison [5]. A formal definition of the linearization points of
the invocations of the push() and push() operations can be found in that paper.
• A methodology for creating fast wait-free data structures is described in [179]. An
efficient implementation of a binary search tree is presented in [81].
5.5 Bibliographic Notes 163

• The use of failure detectors to boost obstruction-free object implementations


to obtain non-blocking or wait-free implementations is due to R. Guerraoui,
M. Kapałka, and P. Kuznetsov [125]. The authors show in that paper that X and
♦P are the weakest failure detectors to boost obstruction-freedom to non-blocking
and wait-freedom, respectively. “Weakest” means here that the information on
failures given by each of these failure detectors is both necessary and sufficient
for boosting to obstruction-freedom and wait-freedom, respectively, when one is
interested in implementations based on read/write registers only.
• X was simultaneously introduced in [125, 242]. ♦P was introduced in [67].
The concept of failure detectors is due to T.D. Chandra and S. Toueg [67]. An
introduction to failure detectors can be found in [235].
• The reader interested in progress conditions can consult [161, 164, 264], which
investigate the space of progress conditions from a computability point of view, and
[147] which analyzes progress conditions from a dependent/independent sched-
uler’s point of view.

5.6 Exercises and Problems

1. Prove that the concurrent queue implemented by Michael & Scott’s non-blocking
algorithm presented in Sect. 5.2.4 is an atomic object (i.e., its operations are
atomic).
Solution in [205].
2. The hardware-provided primitives LL(), SC() and VL() are defined in Sect. 6.3.2.
Modify Michael & Scott’s non-blocking algorithm to obtain an algorithm that
uses the operations LL(), SC(), and VL() instead of compare&swap().
3. A one-shot atomic test&set register R allows each process to invoke the operation
R.test&set() once. This operation is such that one of the invoking processes
obtains the value winner while the other invoking processes obtain the value
loser.
Let us consider an atomic swap() operation that can be used by two (statically
determined) processes only. Assuming that there are n processes, this means
that there is a half-matrix of registers MSWAP such that (a) MSWAP[i, j] and
MSWAP[j, i] denote the same atomic register, (b) this register can be accessed
only by pi and pj , and (c) their accesses are invocations of MSWAP[j, i].swap().
Design, in such a context, a wait-free algorithm that implements R.test&set().
Solutions in [13].
164 5 Mutex-Free Concurrent Objects

4. A double-compare/single-swap operation is denoted DC&SS().


It is a generalization of the compare&swap() operation which accesses atomically
two registers at the same time. It takes three values as parameters, and its effect can
be described as follows, where X and Y are the two atomic registers operated on.
operation (X, Y ).DC&SS(old1, old2, new1):
prev ← X;
if (X = old1 ∧ Y = old2) then X ← new1 end if;
return(prev).

Design an algorithm implementing DC&SS() in a shared memory system that


provides the processes with atomic registers that can be accessed with read and
compare&swap() operations.
Solutions in [135]. (The interested reader will find more general constructions of
synchronization operations that atomically read and modify up to k registers in
[30, 37].)
5. Define the linearization points associated with the invocations of the push()
and pop() operations of the wait-free implementation of a stack presented in
Sect. 5.2.6. Prove then that this implementation is linearizable (i.e., the sequence
of operation invocations defined by these linearization points belongs to the
sequential specification of a stack).
Solution in [5].
6. Design a linearizable implementation of a queue (which can be accessed by any
number of processes) based on fetch&add and swap registers (or on fetch&add
and test&set registers). The invocations of enq() are required to be wait-free.
Each invocation deq() has to return a value (i.e., it has to loop when the queue is
empty). Hence, an invocation may not terminate if the queue remains empty. It
is also allowed not to terminate when always overtaken by other invocations of
deq().
Solution in [148]. (Such an implementation is close to the wait-free implementa-
tion of a stack described in Sect. 5.2.6.)
Chapter 6
Hybrid Concurrent Objects

This chapter focuses on hybrid implementations of concurrent objects. Roughly


speaking, “hybrid” means that lock-based code and mutex-free code can be merged in
the same implementation. After defining the notion of a hybrid implementation, this
chapter presents hybrid implementations of concurrent objects, where each imple-
mentation has its own features. The chapter presents also the notion of an abortable
object and shows how a starvation-free implementation of a concurrent object can be
systematically obtained from an abortable non-blocking version of the same object.

Keywords Abortable object · Binary consensus · Concurrent set object · Contention


sensitive implementation · Double-ended queue · Hybrid (static versus dynamic)
implementation · LL/SC primitive operations · Linearization point · Non-blocking
to starvation-freedom.

6.1 The Notion of a Hybrid Implementation

A hybrid implementation of an object can be seen as an “impure” mutex-free imple-


mentation in the sense that it is allowed to use locks for a subset of the operations
or in specific concurrency patterns. Of course, a hybrid implementation is no longer
fully mutex-free. Two kinds of hybrid implementations can be distinguished: static
hybrid implementations and dynamic hybrid implementations (also called sensitive
implementations).
As each object has its own internal representation R, both mutex-free operations
and lock-based operations on that object can concurrently access R. It follows that the
main difficulty in the design of hybrid implementations lies in the correct cooperation
between mutex-free code and lock-based code.

M. Raynal, Concurrent Programming: Algorithms, Principles, and Foundations, 165


DOI: 10.1007/978-3-642-32027-9_6, © Springer-Verlag Berlin Heidelberg 2013
166 6 Hybrid Concurrent Objects

6.1.1 Lock-Based Versus Mutex-Free Operation:


Static Hybrid Implementation

This family of hybrid implementations considers two types of implementations for


the operations associated with an object, namely operations whose implementation
uses locks and operations whose implementation is wait-free. In the extreme case
where all operations are lock-based, so is the implementation. Similarly, if all oper-
ations are mutex-free, so is the implementation. This type of hybrid implementation
is called static, as the operations which are allowed to use locks and those which are
not are statically determined.
Designing such hybrid implementations for concurrent objects is interesting
when (a) there are no failures (this constraint is due to the use of locks) and (b)
some operations are invoked much more than the others (the operations which
are invoked very often being designed with efficient wait-free algorithms). A sta-
tic hybrid implementation of a concurrent set object is presented in Sect. 6.2.
In this set object, the operation that checks if an element belongs to the set is
wait-free while the operation that adds an element to the set and the operation
that removes an element from the set (which are assumed to be infrequent) are
lock-based.

6.1.2 Contention Sensitive (or Dynamic Hybrid) Implementation

Contention sensitiveness is another notion of a hybrid implementation with captures


the notion of a dynamic hybrid implementation. It states that, while any algorithm
that implements an operation can potentially use locks, the overhead introduced by
locks has to be eliminated in “favorable circumstances”. Examples of “favorable
circumstances” are the parts of an execution without concurrency, or the parts of
an execution in which only processes with non-interfering operations access the
object.
Hence, contention sensitiveness of an implementation depends on what is defined
as “favorable circumstances” which in turn depends on the object that we want to
implement. This point will be developed in Sects. 6.3 and 6.4, where contention-
sensitive implementations of concurrent objects are presented.

6.1.3 The Case of Process Crashes

As already mentioned, if a process crashes while holding a lock, the processes that
invoke a lock-based operation on the same object can be blocked forever. Hence, locks
cannot cope with process crashes. This means that the implementations described in
this chapter tolerate process crashes in all executions in which no process crashes
while holding a lock.
6.2 A Static Hybrid Implementation of a Concurrent Set Object 167

6.2 A Static Hybrid Implementation


of a Concurrent Set Object

This section presents a hybrid implementation of a concurrent set object due to


S. Heller, M. Herlihy, M. Luchangco, M. Moir, W. Scherer, and N. Shavit (2007).

6.2.1 Definition and Assumptions

Definition A concurrent set object S is defined by the three following operations:


• S.add(v) adds v to the set S and returns true if v was not in the set. Otherwise it
returns false.
• S.remove(v) suppresses v from S and returns true if v was in the set. Otherwise
it returns false.
• S.contain(v) returns true if v ∈ S and false otherwise.
Assumptions It is assumed that the elements that the set can contain belong to a
well-founded set. This means that they can be compared, have a smallest element,
have a greatest element, and that there is a finite number of elements between any
two elements.
Despite the fact that the implementation has to be correct for any pattern of oper-
ation invocations, it is assumed that the number of invocations of contain() is much
bigger than the number of invocations of add() and remove(). This application-related
assumption is usually satisfied when the set represents dictionary-like shared data
structures. This assumption motivates the fact that the implementation of contain()
is required to be wait-free while the implementations of add() and remove() are only
required to be deadlock-free.

6.2.2 Internal Representation and Operation Implementation

The internal representation of a set The set S is represented by a linked list pointed
to by a pointer kept in an atomic register HEAD. A cell of the list (say NEW _CELL)
is made up of four atomic registers:
• NEW _CELL.val which contains a value (element of the set).
• NEW _CELL.out, a Boolean (initialized to false) that is set to true when the
corresponding element is suppressed from the list.
• NEW _CELL.lock, which is a lock used to ensure mutual exclusion (when needed)
on the registers composing the cell. This lock is accessed with the operations
acquire_lock() and release_lock().
168 6 Hybrid Concurrent Objects

Fig. 6.1 The initial state of the list

• NEW _CELL.next, which is a pointer to the next cell. The set is organized as
a sorted linked list. Initially the list is empty and contains two sentinel cells, as
indicated in Fig. 6.1. The values associated with these cells are the default values
denoted ⊥ and . These values cannot belong to the set and are such that for any
value v of the set we have ⊥ < v < . All operations are based on list traversal.
The algorithm implementing the operation S.remove(v) This algorithm is des-
cribed in lines 1–9 of Fig. 6.2. Using the fact that the list is sorted in increasing order,
the invoking process pi traverses the list from the beginning until the first cell whose
element v  is greater than v (lines 1–2). Then it locks two cells: the cell containing
the element v  (which is pointed to by its local variable curr ) and the immediately
preceding cell (which is pointed to by its local variable pr ed).
The list traversal and the locking of the two consecutive cells are asynchronous,
and other processes can concurrently access the list to add or remove elements. It is
consequently possible that there are synchronization conflicts that make the content
of pr ed and curr no longer valid. More specifically, the cell pointed to by pr ed or
curr could have been removed, or new cells could have been inserted between the
cells pointed to by pr ed and curr . Hence, before suppressing the cell containing
v (if any), pi checks that pr ed and curr are still valid. The Boolean procedure
validate( pr ed, curr ) is used to this end (lines 10–11).
If the validation predicate is false, pi restarts the removal operation (line 9). This
is the price that has to be paid to have an optimistic removal operation (there is no
global locking of the whole list, which would prevent concurrent processes from
traversing the list). Let us remember that, as by assumption there are few invocations
of the remove() and add() operations, pi will eventually terminate its invocation.
If the validation predicate is satisfied, pi checks whether v belongs to the set or
not (Boolean pr es, line 5). If v is present, it is suppressed from the set (line 6). This
is done in two steps:
• First the Boolean field out of the cell containing v is set to true. This is a logical
removal (logical because the pointers have not yet been modified to suppress the
cell from the list). This logical removal is denoted S1 in Fig. 6.3.
• Then, the physical removal occurs. The pointer ( pr ed ↓).next is updated to its
new value, namely (curr ↓).next. This physical removal is denoted S2 in Fig. 6.3.
The algorithm implementing the operation S.add(v) This algorithm is described
in lines 12–23 of Fig. 6.2. It is very close to the algorithm implementing the
remove(v) operation. Process pi first traverses the list until it reaches the cell whose
value field is greater than v (lines 12–13) and then locks the cell that precedes it
(line 14). Then, as previously, it checks if the values of its pointers pr ed and curr
are valid (line 14). If they are valid and v is not in the list, pi creates a new cell that
contains v and inserts it into the list (lines 17–20).
6.2 A Static Hybrid Implementation of a Concurrent Set Object 169

Fig. 6.2 Hybrid implementation of a concurrent set object

It is interesting to observe that, as in the removal operation, the addition of a


new element v is done in two steps. The field NEW _CELL.next is first updated
(line 18). This is the logical addition (denoted S1 in Fig. 6.4). Only then, is the field
( pr ed ↓).next updated to a value pointing to NEW _CELL (line 18). This is the
physical addition (denoted S1 in Fig. 6.4). (Let us notice that the lock associated
with the new cell is initialized to the value open.)
170 6 Hybrid Concurrent Objects

Fig. 6.3 The remove() operation

Fig. 6.4 The add() operation

Finally, pi releases the lock on the cell pointed to by its local pointer variable
ptr . It returns a Boolean value if the validation predicate was satisfied and restarts
if it was not.
The algorithm implementing the operation S.contain(v) This algorithm is des-
cribed in lines 24–24 of Fig. 6.2. As it does not use locks and cannot be delayed
by locks used by the add() and remove() operations, it is wait-free. It consists of
a simple traversal of the list. Let us remark that, during this traversal, the list does
not necessarily remain constant: cells can be added or removed, and so the values of
the pointers are not necessarily up to date when they are read by the process pi that
invoked S.contain(). Let us consider Fig. 6.5. It is possible that the pointer values
pr edi and curri of the current invocation of contain(v) by pi are as indicated in the
figure while all the cells between those containing a1 and b are removed (let us remark
that it is also possible that a new cell containing the value v is concurrently added).
The list traversal is the same as for the add() and remove() operations. The value
true is returned if and only if v is currently the value of the cell pointed to by curr
and this cell has not been logically removed. The algorithm relies on the fact that a
cell cannot be recycled as long as it is reachable from a global or local pointer. (In
contrast, cells that are no longer accessible can be recycled.)

Fig. 6.5 The contain() operation


6.2 A Static Hybrid Implementation of a Concurrent Set Object 171

6.2.3 Properties of the Implementation

Base properties The previous implementation of a concurrent set has the following
noteworthy features:
• The traversal of the list by an add()/remove() operation is wait-free (a cell locked
by an add()/remove() does not prevent another add()/remove() from progressing
until it locks a cell).
• Locks are used on at most two (consecutive) cells by an add()/remove() operation.
• Invocations of the add()/remove() operations on non-adjacent list entries do not
interfere, thereby favoring concurrency.
Linearization points Let us remember that the linearization point of an operation
invocation is a point of the time line such that the operation appears as if it was been
executed instantaneously at that time instant. This point must lie between the start
time and the end time of the operation.
The algorithm described in Fig. 6.2 provides the operations add(), remove(), and
contain() with the following linearization points. Let an operation be successful
(unsuccessful) if it returns true (false).
• remove() operation:
– The linearization point of a successful remove(v) operation is when it marks
the value v as being removed from the set, i.e., when it executes the statement
(curr ↓).out ← true (line 6).
– The linearization point of an unsuccessful remove(v) operation is when, during
its list traversal, it reads the first unmarked cell with a value v  > v (line 2).
• add(v) operation:
– The linearization point of a successful add(v) operation is when it updates the
pointer ( pr ed ↓).next which, from then on, points to the new cell (line 19).
– The linearization point of an unsuccessful add(v) operation is when it reads the
value kept in (curr ↓).val and that value is v (line 16).
• contain(v) operation:
– The linearization point of a successful contain(v) operation is when it checks
whether the value v kept in (curr ↓).val belongs to the set, i.e., (curr ↓).out
is then false (line 26).
– The linearization point of an unsuccessful contain(v) operation is more tricky
to define. This is due to the fact that (as discussed previously with the help of
Fig. 6.5), while contain(v) executes, an execution of add(v) or remove(v) can
proceed concurrently.
Let τ1 be the time at which a cell containing v is found but its field out is marked
true (line 26), or a cell containing v  > v is found (line 25). Let τ2 be the time
172 6 Hybrid Concurrent Objects

just before the linearization point of a new operation add(v) that adds v to the
set (if there is no such add(v), let τ2 = +∞). The linearization point of an
unsuccessful contain(v) operation is min(τ1 , τ2 ).
The proof that this object construction is correct consists in (a) showing that any
the operation contain() is wait-free and the operations of add() and remove() are
deadlock-free, and (b) showing that, given any execution, the previous linearization
points associated with the operation invocations define a trace that belongs to the
sequential specification of the set object.

6.3 Contention-Sensitive Implementations

This section presents contention-sensitive implementations of two concurrent objects.


The first object is a consensus object, while the second is a double-ended queue.

6.3.1 Contention-Sensitive Binary Consensus

Binary consensus object A consensus object provides a single operation denoted


propose(v), where v is the value proposed by the invoking process. In a binary
consensus object, only the values 0 and 1 can be proposed. An invocation of propose()
returns a value which is said to be the value “decided” by the invoking process. A
process can invoke the operation propose() at most once (hence, a consensus object
is a one-shot object). Moreover any number of processes can invoke this operation.
A process that invokes propose() is a participating process. The object is defined by
the following three properties:
• Validity. A decided value is a proposed value.
• Agreement. No two processes decide different values.
• Termination. Any invocation of propose() terminates.
Favorable circumstances Here “favorable circumstances” (with respect to the
contention-sensitiveness property of an implementation) concern two different cases.
The first is when all the participating processes propose the same value. The second
is when an invocation of propose() executes in a concurrency-free context. (Let us
remember that two invocations prop1 and prop2 are not concurrent if prop1 termi-
nated before prop2 started or prop2 terminated before prop1 started.)
When a favorable circumstance occurs, no lock has to be used. This means that an
invocation of propose(v) is allowed to use the underlying locks only if (a) the other
value (1 − v) was previously or is currently proposed, and (b) there are concurrent
invocations. Hence, from a lock point of view, the notion of conflict is related to both
concurrency and proposed values.
6.3 Contention-Sensitive Implementations 173

Internal representation of a consensus object Let C be a consensus object. Its


internal representation is made up of the following atomic read/write registers:
• PROPOSED[0..1], which is an array of two Boolean registers, both initialized to
false. The atomic register PROPOSED[v] is set to true to indicate that a process
has proposed value v.
• DECIDED, which is an atomic register whose domain is {⊥, 0, 1}. Initialized to
⊥, it is eventually set to the value that is decided and never the value which is not
decided.
• AUX, which is an atomic register whose domain and initial value are the same as
for DECIDED. It can contain any value of its domain.
• LOCK, which is the starvation-free lock used to solve conflicts (if any).
The algorithm implementing the operation propose(v) This algorithm is des-
cribed in Fig. 6.6. A process decides when it executes the statement return(val),
where val is the value it decides. This algorithm is due to G. Taubenfeld (2009).
When a process p invokes propose(v), it first indicates that the value v was
proposed and it writes v into AUX if this register is still equal to ⊥ (line 1). Let us
notice that, if several processes proposing different values concurrently read ⊥ from
AUX, each writes its proposed value in AUX.
Then, process p checks if the other binary value (1 − v) was proposed by another
process (line 2). If it is not the case, p writes v into DECIDED (line 3), and assuming
that no other process has written a different value into DECIDED in the meantime,
it decides the value stored in DECIDED (line 10). If the other value was proposed
there is a conflict. Process p then decides the value kept in DECIDED if there is one
(lines 4 and 10). If there is no decided value, the conflict is solved with the help of
the lock (lines 5–7). Process p assigns the current value of AUX to DECIDED if that
register was still equal to ⊥ when it read it (lines 6) and p finally decides the value
kept in DECIDED.

Fig. 6.6 A contention sensitive implementation of a binary consensus object


174 6 Hybrid Concurrent Objects

Remark It is important to observe that this implementation of a binary consensus


object uses only bounded registers and is independent of the number of processes
that invoke the operation propose(v). The number of participating process can con-
sequently be arbitrary (and even infinite).
Number of shared memory accesses An invocation which does not use the lock
requires at most six accesses to atomic registers. Otherwise, it needs at most eight
accesses to atomic registers plus two invocations of the lock. (One register access
can be saved by saving v in a local variable if line 3 is executed, or by saving the
value of DECIDED = ⊥ read at line 6 if that line is executed.)
Theorem 21 The algorithm described in Fig. 6.6 is a contention sensitive imple-
mentation of an atomic consensus object where “favorable circumstances” means
the cases where all the invocations propose the same value or each invocation is
executed in a concurrency free context.
Proof A simple examination of the code of the algorithm shows that
DECIDED = ⊥ when a process decides at line 10. Moreover, the register DECIDED
can be assigned by a process only a proposed value (line 03) or the current value of
AUX (line 6), but in the latter case, AUX has necessarily been previously assigned
to a proposed value at line 6 by the same or another process. The consensus validity
property follows.
The termination property follows immediately from the code if the invocation of
C.propose() does not use the lock from the fact that the lock is starvation-free for
the invocations which use it.
As far the agreement property is concerned, we show that a single value is written
to the register DECIDED (the same value can be written by several processes). We
consider the two cases.
• Case 1. A single value is written in AUX (line 1). Let v be that value.
As 1 − v is not written in AUX, we conclude that any process p j that proposes
1 − v reads v from AUX at line 1 and consequently reads PROPOSED[v] = true
at line 2. It follows that p j executes lines 4–8.
Let pi be any process that proposes v. If it finds PROPOSED[1 − v] = false
(line 2), it assigns v to DECIDED (Fig. 6.7). Otherwise (similarly to process p j )
pi executes the lines 4–8. If DECIDED = ⊥ when either of pi or p j executes
line 6, it assigns the current value of AUX (i.e., v) to DECIDED. Moreover, the
processes that have proposed v and execute line 3 also assign v to DECIDED.
Hence, DECIDED is assigned the single value v, which proves the case.

Fig. 6.7 Proof of the contention sensitive consensus algorithm (a)


6.3 Contention-Sensitive Implementations 175

• Case 2. Both values v and 1 − v are written into AUX (line 1).
Let pi be a process that proposes v and reads ⊥ from AUX, and p j a process that
proposes 1 − v and reads ⊥ from AUX. As both pi and p j have read ⊥ from AUX,
we conclude that, at line 3, both pi and p j have read true from PROPOSED[1 − v]
and PROPOSED[v], respectively (Fig. 6.8). It follows that both of them execute
lines 4–8.
Let us now consider a process pk that proposes a value w and reads a non-⊥ value
from AUX. As it reads a non-⊥ value and both PROPOSED[0] and PROPOSED[1]
were equal to true when it read them, it follows that pk necessarily reads true from
PROPOSED[1 − w]. Hence, it executes lines 4–8.
It follows that all processes execute lines 4–8. The first process that acquires the
lock writes the current value of AUX into DECIDED, and that value becomes the
only decided value.
(Let us notice that, due to the arbitrary speed of processes, it is not possible to
predict if it is the first value written in AUX or the second one that will be the
decided value.)
Let us now show that the implementation satisfies the contention sensitiveness
property. We consider each case of “favorable circumstances” separately.
• Case 1: all participating processes propose the same value v.
In this case, PROPOSED[1 − v] remains forever equal to false. It follows that all
the processes that invoke C.propose() write v into the atomic register DECIDED
(line 3). Consequently none of the participating processes ever execute the lines
4–8, which proves the property.
• Case 2: the invocations of C.propose(v) are not concurrent.
Let us consider such an invocation. If it is the first one, it writes v into DECIDED
(line 3) and does not execute the lines 4–8, which proves the property. If other
invocations have been executed before this one, they have all terminated and at
least one of them has written a value into DECIDED (at line 3 or 6). Hence, the con-
sidered invocation C.propose(v) executes line 4, and as DECIDED = ⊥, it does
not execute lines 4–8, which concludes the proof of the contention sensitiveness
property.

Fig. 6.8 Proof of the contention sensitive consensus algorithm (b)


176 6 Hybrid Concurrent Objects

Finally, as far as the atomicity property is concerned, the linearization point of an


invocation of C.propose(v) is defined as follows:
• If the invoking process executes line 3, the linearization point of C.propose(v) is
the linearization point of the underlying atomic write DECIDED ← v.
• If the invoking process executes line 4, the linearization point of C.propose(v)
depends on the predicate (DECIDED = ⊥) checked at line 6.
– If DECIDED = ⊥, it is the linearization point of this read of DECIDED (which
returned a non-⊥ value).
– If DECIDED = ⊥, it is the linearization point of the write of a value w into
DECIDED (where w is the value previously obtained from the atomic register
AUX). 

6.3.2 A Contention Sensitive Non-blocking Double-Ended Queue

The double-ended queue A double-ended queue has two heads: one on its left side
and one on its right side. The head on one side is the last element of the queue seen
from the other side. Such an object has four operations:
• The operation right_enq(v) (or left_enq(v)) adds v to the queue such that v
becomes the last value on the right (or left) side of the queue.
• The operation right_deq() (or left_deq()) suppresses the last element at the right
(or left) of the queue. If the queue is empty, the operation returns the value empty.
A double-ended queue is defined by a sequential specification. This specification
contains all the correct sequences including all or a subset of the operations. It follows
that, in a concurrency context, a queue has to be an atomic object. A double-ended
queue is a powerful object that generalizes queues and stacks. More precisely, we
have the following (see Fig. 6.9, where the double-ended queue contains the list of
values a, b, c, d, e, f ):
• If either only the operations left_enq() and right_deq() or only the operations
right_enq() and left_deq() are used, the object is a queue.
• If either only the operations left_enq() and left_deq() or only the operations
right_enq() and right_deq() are used, the object is a stack.
Favorable circumstances The implementation that follows considers the following
notion of “favorable circumstances” from the contention sensitiveness point of view:
The operation invocations appear in a concurrency-free context. When this occurs,
such an operation invocation is not allowed to use locks.
Internal representation of a double-ended queue Let D Q be a double-ended
queue. Its internal representation is made up of the following objects:
6.3 Contention-Sensitive Implementations 177

• An infinite array Q[−∞.. + ∞] of atomic registers whose aim is to contain the


elements of the queue. Initially, all the registers Q[x] such that x < 0 are initialized
to a default control value denoted ⊥ , and all the registers Q[x] such that x ≥ 0
are initialized to a default control value denoted ⊥r .
• LI and RI are two atomic read/write registers that point to the first free entry
when looking from the left side and the right side of the queue, respectively. LI is
initialized to −1, while RI is initialized to 0.
• L_LOCK and R_LOCK are two locks. L_LOCK is used to solve conflicts (if any)
between invocations of left_enq() and left_deq(). Similarly, R_LOCK is used to
solve conflicts (if any) between invocations of right_enq() and right_deq().
The following invariant is associated with the internal representation of the double-
ended queue:

∀x, y : (x < y) ⇒
   
(Q[y] = ⊥ ) ⇒ (Q[x] = ⊥ ) ∧ (Q[x] = ⊥r ) ⇒ (Q[y] = ⊥r ) .

Hence, at any time, the list of values which have been enqueued and not yet dequeued
is the list kept in the array Q[(LI + 1)..(RI − 1)]. In Fig. 6.9, the current value of the
double-ended queue is represented by the array Q[−2..3].

Atomic operations for accessing a register Q[x] An atomic register Q[x] can
be accessed by three atomic operations, denoted LL() (linked load), SC() (store
conditional) and VL() (validate). These operations are provided by the hardware,
and their effects are described by the algorithms of Fig. 6.10.
Let X be any register Q[x]. The description given in Fig. 6.10 assumes there are
n processes whose indexes are in {1, . . . , n}. It considers that a distinct Boolean
array VALID X [1..n] is associated with each register X . This array is initialized to
[false, . . . , false].
An invocation of X.LL() (linked load) returns the current value of X and links
this read (issued by a process pi ) by setting VALID X [i] to true (line 1).
An invocation of X.SC(−, v) (store conditional) by a process pi is successful if
no process has written X since pi ’s last invocation of X.LL(). In that case, the write
is executed (line 2) and the value true is returned (line 4). If it is not successful, the
value false is returned (line 5). Moreover, if X.SC(−, v) is successful, all the entries

Fig. 6.9 A double-ended queue


178 6 Hybrid Concurrent Objects

Fig. 6.10 Definition of the atomic operations LL(), SC(), and VL() (code for process pi )

of the array VALID X [1..n] are set to false (line 3) to prevent the processes that have
previously invoked X.LL() from having a successful X.SC().
An invocation of X.VD() (validate) by a process pi returns true if and only if
no other process has issued a successful X.SC() operation since the last X.LL()
invocation issued by pi .
It is important to notice that between an invocation of X.LL() and an invocation
of X.SC() or X.VL(), a process pi can execute any code at any speed (including
invocations of Y.LL(), Y.SC(), and Y.VL() where Y = X ).
LL/SC primitives appear in MIPS architectures. Variants of these atomic opera-
tions are proposed in some architectures such as Alpha AXP (under the names idl_l
and stl_c), IBM PowerPC (under the names lwarx and stwcx), or ARM (under the
names ldrex and strex).

The algorithm implementing the operation right_enq(v) This algorithm is des-


cribed in Fig. 6.11. A process pi first saves the current value of RI in a local variable
my_index (line 1). Then, pi reads Q[my_index − 1] and Q[my_index] with the
LL() atomic operation in order to both obtain their values and link these reads to their
next (conditional) writes (line 2).
If these reads are such that Q[my_index − 1] = ⊥r and Q[my_index] = ⊥r
(line 3), the right side of the queue (as defined by RI) has not been modified since pi
read RI. In that case, there is a chance that the right_enq(v) might succeed. Process pi
consequently checks if no process has modified Q[my_index − 1] since it read it. To
that end, pi executes Q[my_index − 1].SC( pr ev) (line 4). If this conditional write
(which does not change the value of Q[my_index −1]) is successful, pi executes the
conditional write Q[my_index].SC(v) to add the value v as the last element on the
right of the double-ended queue (line 5). If no process has modified Q[my_index]
6.3 Contention-Sensitive Implementations 179

Fig. 6.11 Implementation of the operations right_enq() and right_deq() of a double-ended queue
180 6 Hybrid Concurrent Objects

Fig. 6.12 How D Q.right_enq() enqueues a value

since it was read by pi at line 2, the write is successful and pi consequently increases
the right index RI and returns the control value done. This behavior, which entails
the enqueue of v on the right side, is described in Fig. 6.12.
If the previous invocations of LL() and SC() (issued at lines 2, 4, and 5) reveal
that the right side of the double-ended queue was modified, pi requires the lock in
order to solve conflicts among the invocations of D Q.right_enq() (line 9). It then
executes a loop in which it does the same as before. Lines 10–16 are exactly the
same as lines 1–7 except for the statement return() at line 5, which is replaced at
line 14 by ter m ← true to indicate that the value v was added to the right side of the
double-ended queue. When this occurs, the process pi releases the lock and returns
the control value done.
Let us consider an invocation of right_enq() (by a process p) which is about to
terminate. More precisely, p starts executing the statements in the then part at line
5 (or line 14). If other processes are concurrently executing right_enq(), they will
loop until p has updated the right index RI to RI + 1. This is due to the fact that p
modifies Q[RI] at line 5 (or line 14) before updating RI.

The algorithm implementing the operation right_deq() This algorithm is des-


cribed in Fig. 6.11. It follows the same principles as and is similar to the algorithm
implementing the operation right_enq(). The main differences are the following:
• The first difference is due to the fact that, while the double-ended queue is
unbounded, it can become empty. This occurs when (Q[R I − 1] = ⊥ ) ∧
(Q[R I ] = ⊥r ). This is checked by the use of the operations LL() and VL()
and the predicates of lines 22–23 (or lines 32–33). When this occurs, the control
value empt y is returned (line 23 or 33).
• The second difference is that a data value has to be returned when the double-ended
queue is not empty. This is the value kept in Q[R I − 1] which was saved in the
local variable pr ev (line 21 or 31). Moreover, Q[R I − 1] has to be updated to ⊥r .
The update is done at line 25 or 35 by a successful invocation of Q[my_index −1].
SC(⊥r ).
6.4 The Notion of an Abortable Object 181

While the right_enq() operation issues SC() invocations first on Q[my_index −1]
and then on Q[my_index] (lines 4–5 or lines 13–14), the right_deq() opera-
tion has to issue them in the opposite order, first on Q[my_index] and then
on Q[my_index − 1] (lines 24–25 or lines 34–35). This is due to the fact that
right_enq() writes (a value v) into Q[my_index] while right_enq() writes (⊥r )
into Q[my_index − 1].
The algorithms implementing the operations left_enq() and left_deq() These
algorithms are similar to the algorithms implementing the right_enq() and
right_deq() operations. The only modifications to be made to the previous algo-
rithms are the following: replace RI by LI, replace R_LOCK by L_LOCK, replace
each occurrence of ⊥r by ⊥ , and replace the occurrence of ⊥ at line 33 by ⊥r .
A left-side operation and a right-side operation can be concurrent and try to invoke
an atomic SC() operation on the same register R[x]. In such a case, if one is unsuc-
cessful, it is because the other one was successful. More generally, the construction
is non-blocking.

6.4 The Notion of an Abortable Object

6.4.1 Concurrency-Abortable Object

Definition A concurrency-abortable (in short abortable) object is an object such


that any invocation of any of its operations (a) returns after a bounded number of
steps (shared memory accesses) and (b) is allowed to return the default value ⊥ in
presence of concurrency. (A similar notion was introduced in Sect. 2.1.6.) When an
invocation returns ⊥, we say that it aborted. When considering the object internal
representation level, an operation invocation that aborts is similar to an invocation
that has not been issued. Differently, an operation invoked in a concurrency-free
pattern never aborts and behaves as defined by the specification of the object.
Examples of invocations that occur in a concurrency-free pattern are described
in Fig. 6.13. There are three processes that have issued six operation invocations,
denoted inv x , 1 ≤ x ≤ 6. The invocations inv1 and inv2 are concurrent, as are the
invocations inv4 and inv5 . On the contrary, each invocation inv3 and inv6 is executed
in a concurrency-free pattern. For each of them, any other invocation either termi-
nated before it started or started after it terminated. (The notion of concurrency-free
pattern can be easily formally defined using the notion of a history introduced in
Chap. 4.)

Example: a non-blocking abortable stack Let us consider the non-blocking


implementation of a bounded stack presented in Fig. 5.10. A simple implementa-
tion of a stack that is both abortable and non-blocking can be easily obtained as
follows. Instead of looping when a compare and swap statement fails (i.e., returns
false), an operation returns ⊥. The corresponding implementation is described in
182 6 Hybrid Concurrent Objects

Fig. 6.13 Examples of concurrency-free patterns

Fig. 6.14, where the operations are denoted ab_push() and ab_pop(). The internal
representation of the stack is the same as the one defined in Sect. 5.2.5 with the fol-
lowing simple modification: in each operation, the loops are suppressed and replaced
by a return(⊥) statement. It is easy to see that this modification does not alter the
non-blocking property of the algorithm described in Fig. 5.10.

Fig. 6.14 A concurrency-abortable non-blocking stack


6.4 The Notion of an Abortable Object 183

Abortion-related properties It is easy to see that (a) an invocation of the operation


ab_push() or ab_pop() requires three or four shared memory accesses (three when
the stack is full or empty and four otherwise) and (b) an invocation of ab_push() or
ab_pop() that occurs in a concurrency-free pattern does not return ⊥.

6.4.2 From a Non-blocking Abortable Object


to a Starvation-Free Object

This section describes a simple contention sensitive algorithm which transforms the
implementation of any non-blocking concurrency-abortable object into a wait-free
implementation of the same object. This algorithm, which is based on a starvation-
free lock, is described in Fig. 6.15. (Let us remember that a simple algorithm which
builds a starvation-free lock from a deadlock-free lock was presented in Sect. 2.2.2.)

Favorable circumstances The “favorable circumstances” for this implementation


of a starvation-free object occur each time an operation executes in a concurrency-
free context. In such a case, the operation has to execute without using the underlying
lock.

Notation The algorithm is presented in Fig. 6.15. Let oper( par ) denote any
operation of the considered object O and ab_oper( par ) denote the corresponding
operation on its non-blocking concurrency-abortable version ABO. This means that,
when considering the stack object presented in the previous section, push() or pop()
denote the non-abortable counterparts of ab_push() or ab_pop(), respectively. It is
assumed that any invocation of an object operation oper( par ) returns a value which
is different from the default value ⊥. As in the previous section, ⊥ can be returned
by invocations of ab_oper( par ) only to indicate that they failed.

Fig. 6.15 From a concurrency-abortable object to a starvation-free object


184 6 Hybrid Concurrent Objects

Internal representation In addition to an underlying non-blocking abortable object


ABO, the internal representation of the starvation-free object O is made up of the
following objects;
• An atomic Boolean read/write register denoted CONTENTION which is initialized
to false. This Boolean is used to indicate that there is a process that has acquired
the lock and is invoking an underlying operation ABO.ab_oper( par ).
• A starvation-free lock denoted LOCK. To insist on its starvation-free property, its
operations are denoted acquire_SF_lock() and release_SF_lock().
The transformation The algorithm is described Fig. 6.15. When a process p
invokes oper( par ), it first checks if there is contention (line 1).
If CONTENTION = true, there are concurrent invocations and p proceeds to line
4 to benefit from the lock. If CONTENTION = false, p invokes ABO.ab_oper( par ).
As ABO is a concurrency-abortable object, this invocation (a) always terminates and
(b) always returns a non-⊥ value r es if there is no concurrent invocation. If this is the
case, p returns that value r es. On the contrary, if there are concurrent invocations,
ABO.ab_oper( par ) may return ⊥. In that case, p proceeds to line 4 to compete for
the lock.
LINES 4–9 constitute the lock-based part of the algorithm. When it has obtained
the lock, a process first sets CONTENTION to true to indicate there is contention, and
loops invoking the underlying operation ABO.ab_poper( par ) until it obtains a non-
⊥ value (i.e., its invocation is not aborted). When this occurs, it resets CONTENTION
to false and releases the starvation-free lock.

Number of shared memory accesses It is easy to see that, in a concurrency-


free context, an operation invocation requires a single shared memory access (to
the register CONTENTION) in addition to the shared memory accesses involved
in ABO.ab_oper( par ). When the underlying object ABO is the stack described in
the previous section, each invocation of ABO.ab_push( par ) (or ABO.ab_pop())
requires three shared memory accesses if the stack is full (or empty), and four shared
memory accesses otherwise.
In a concurrency context, an operation requires at most three accesses to the
register CONTENTION, two accesses to the lock, and a finite but arbitrary number
of accesses to the base objects which implement the underlying abortable object
ABO.
Theorem 22 Let us assume that the underlying lock is starvation-free and that the
number of processes is bounded. Given a non-blocking abortable object ABO, the
transformation described in Fig. 6.15 is a contention sensitive implementation of a
starvation-free object O where “favorable circumstances” means each operation
invocation is executed in a concurrency-free context. Moreover, if the underlying
object ABO is atomic, so is the constructed object O.
Proof Let us first consider the contention sensitiveness property. When an invoca-
tion of oper() executes in a concurrency-free context, the Boolean CONTENTION is
6.4 The Notion of an Abortable Object 185

equal to false when the operation starts and consequently oper() invokes
ABO.ab_oper() at line 2. As ABO is a concurrency-abortable object and there are
no concurrent operation invocations, the invocation of ab_oper() does not abort. It
follows that this invocation of oper() returns at line 2, which proves the property.
Let us now show that the implementation is starvation-free, i.e., that any invo-
cation of any operation oper() terminates. To this end, given an invocation inv_op p
of an operation oper() issued by a process p, we have to show that there is eventu-
ally an underlying invocation of ABO.ab_oper() invoked by inv_op p that does not
return ⊥.
Let us first observe that, as ABO is a concurrency-abortable object, any invoca-
tion of ABO.ab_oper() terminates (returning ⊥ or another value). If the underlying
invocation ABO.ab_oper() issued at line 2 returns a non-⊥ value, inv_op p does
terminate. If the underlying invocation ABO.ab_oper() returns ⊥ or if the Boolean
CONTENTION was equal to false when p executed line 1, p tries to acquire the lock
(line 4).
Among the process that compete for the lock, let q be the process which
has obtained and not yet released the lock. It repeatedly invokes some operation
ABO.ab_operq () until it obtains a non-⊥ value. It is possible that other processes exe-
cute, concurrently with q, some underlying operations ABO.ab_oper1(),
ABO.ab_oper2(), etc. This happens if these processes have found CONTENTION =
false at line 2 (which means that they have read CONTENTION before it was written
by q). Hence, in the worst case, there are (n −2) other processes executing operations
on ABO concurrently with q (all the processes but p and q). As ABO is non-blocking,
one of them returns a non-⊥ value and the corresponding process terminates its invo-
cation of an operation on the underlying object ABO. If this process is q ,we are done.
If it not q and invokes again an operation oper (), it is directed to require the lock
because now CONTENTION = false.
Hence, there are now at most (n − 3) processes executing operations on ABO
concurrently with q. If follows that, if q has not obtained a non-⊥ value before, it
eventually executes ABO.ab_poperq () in a concurrency-free context. It then obtains
a non-⊥ value and releases the lock.
As the lock is starvation-free, it follows that p eventually obtains it. Then, replac-
ing q by p in the previous reasoning, it follows that p eventually obtains a non-⊥
value from an invocation of ABO.ab_oper() and accordingly terminates its upper-
layer invocation of the operation oper().
The proof of atomicity follows from the following definition of the linearization
points associated with the invocations of the underlying object ABO. Given an invo-
cation of an operation O.oper(), let us consider its last invocation of ABO.ab_oper()
(that invocation returned a non-⊥ value). The linearization point of oper() is the
linearization of this underlying invocation. 
186 6 Hybrid Concurrent Objects

6.5 Summary

This chapter has introduced the notion of a hybrid implementation of concurrent


objects and the notion of an abortable concurrent object.
Hybrid implementations are partly mutex-free and partly lock-based. Two kinds
of hybridism have been presented: a static hybrid implementation considers mutex-
free operations and lock-based operations, while a dynamic hybrid implementation
is related to the current concurrency pattern (locks have not to be used in “favorables
circumstances”). These notions have been illustrated by presenting a static hybrid
implementation of a concurrent set object, and dynamic hybrid implementations of a
consensus object and a double-ended queue. Interestingly, the notion of LL/SC base
operations was introduced to implement the queue.
An abortable object is an object whose operation invocations can be aborted in
the presence of concurrency. This notion has been illustrated with a non-blocking
implementation of an abortable stack. Finally, it has been shown how a non-blocking
abortable object can be transformed into a starvation-free object as soon as one can
use a starvation-free lock.

6.6 Bibliographic Notes

• Without giving it the name “static hybrid”, the notion of static hybrid implemen-
tation of a concurrent object was implicitly introduced by S. Heller, M. Herlihy,
V. Luchangco, M. Moir, W. Scherer, and N. Shavit in [137].
The implementation of a concurrent set object described is Sect. 6.2 is due to to
the same authors [137]. This implementation was formally proved correct in [78].
• The notion of contention sensitive implementation is due to G. Taubenfeld [263].
The contention sensitive implementations of a binary consensus object and of a
double-ended queue are due to G. Taubenfeld [263]. The second of these imple-
mentations is an adaptation of an implementation of a double-ended queue based
on compare and swap() proposed by M. Herlihy, V. Luchangco, and M. Moir in
[143] (the notion of obstruction-freedom is also introduced in this paper).
• The notion of concurrency-abortable implementation used in this chapter is from
[214] where the methodology to go from a non-blocking abortable implementation
to a starvation-free implementation of an object is presented. This methodology
relies on a general approach introduced by G. Taubenfeld in [263].
• It is important to insist on the fact that the notion of “abortable object” used in this
chapter is different from the one used in [16] (where an operation that returns ⊥
may or may not have been executed).
6.7 Exercises and Problems 187

6.7 Exercises and Problems

1. Compare the notions of obstruction-freedom and non-blocking with the notion of


an abortable implementation. Are they equivalent? Are they incomparable? etc.
2. Design a contention sensitive implementation of a multi-valued consensus object
where the number of different values that can be proposed is bounded and this
bound b is known to the processes.
3. Considering only the successful invocations (i.e., the ones which do not return
⊥), prove that the implementation of a non-blocking abortable stack described in
Fig. 6.14 implements an atomic stack.
4. An implementation is k-contention sensitive if it uses locks only when the con-
currency level surpasses k (i.e., when more than k operation invocations execute
concurrently).
Let us consider a system where the processes cooperate by accessing atomic
read/write registers and atomic registers that can also be accessed by a swap()
operation. (Let us remember that X.swap(local), where X is an atomic register
and local a variable of the invoking process, exchanges atomically the content of
X and local.)
Considering such a system, design a 2-contention sensitive implementation of
a binary consensus object. Such an implementation can be obtained by appro-
priate modifications of the algorithm presented in Fig. 6.6. Prove then that your
implementation is correct.
Solution in [263].
Chapter 7
Wait-Free Objects
from Read/Write Registers Only

The two previous chapters were on the implementation of concurrent atomic objects
(such as queues and stacks). More precisely, the aim of Chap. 5 was to introduce
and illustrate the notion of a mutex-free implementation and associated progress
conditions, namely obstruction-freedom, non-blocking and wait-freedom. The aim
of Chap. 6 was to introduce and investigate the notion of a hybrid implementation. In
both cases, the internal representation of the high-level object that was constructed
was based on atomic read/write registers and more sophisticated registers accessed
by stronger hardware-provided operations such as compare&swap(), fetch&add(),
or swap().
This chapter and the two following ones address another dimension when one is
interested in building wait-free implementations of concurrent objects, namely the
case where the only base objects that can be used are atomic read/write registers.
Hence, these chapters investigate the power of base read/write registers to construct
wait-free implementations. This chapter is on the wait-free implementation of weak
counters and store-collect objects, while Chap. 8 addresses snapshot objects, and
Chap. 9 focuses on renaming objects.
As we are concerned with wait-free implementations, let us remember that it is
assumed that any number of processes may crash. Let us also remember that, as far
as terminology is concerned, a process is correct in a run if it does not crash in that
run; otherwise, it is faulty.

Keywords Adaptive implementation · Fast store-collect · Favorable circumstances ·


Infinitely many processes · Store-collect object · Weak counter

7.1 A Wait-Free Weak Counter for Infinitely Many Processes

This section has two aims: to present a wait-free implementation of a weak counter
object and to show how to cope with an unknown and arbitrarily large number of
processes. To that end, it first presents a very simple implementation of a (non-weak)

M. Raynal, Concurrent Programming: Algorithms, Principles, and Foundations, 189


DOI: 10.1007/978-3-642-32027-9_7, © Springer-Verlag Berlin Heidelberg 2013
190 7 Wait-Free Objects from Read/Write Registers Only

counter and then focuses on the wait-free implementation of a weak counter that can
be accessed by infinitely many processes.

7.1.1 A Simple Counter Object

A shared counter C is a concurrent object that has an integer value (initially 0) and
provides the processes with two operations denoted increment() and get_count().
Informally, the operation increment() increases the value of the counter by 1, while
the operation get_count() returns its current value. In a more precise way, the behav-
ior of a counter is defined by the three following properties:
• Liveness. Any invocation of increment() or get_count() by a correct process ter-
minates.
• Monotonicity. Let gt1 and gt2 be two invocations of get_count() such that (a) gt1
returns c1 , gt2 returns c2 , and gt1 terminates before gt2 starts. Then, c1 ≤ c2 .
• Freshness. Let gt be an invocation of get_count() and c the value it returns. Let ca
be the number of invocations of increment() that have terminated before gt starts
and cb be the number of invocations of increment() that have started before gt
terminates. Then, ca ≤ c ≤ cb .
The liveness property expresses that the implementation has to be wait-free.
Monotonicity and freshness are the safety properties which give meaning to the
object; namely, they define the domain of the value returned by a get_count() invo-
cation. As we will see in the proof of Theorem 23, the previous behavior can be
defined by a sequential specification.
A simple implementation A concurrent counter can be easily built as soon as
the number of processes n is known and the system provides one atomic SWMR
read/write register per process. More precisely, let REG[1..n] be an array of atomic
registers initialized to 0, such that, for any i, REG[i] can be read by any process but
is written only by pi .
The algorithm implementing the operations increment() and get_count() are triv-
ial (Fig. 7.1). The invocation of increment() by pi consists in adding 1 to REG[i]
(local_ct is a local variable of pi , initialized to 0). The invocation of get_count()
consists in reading (in any order) and summing up the values of all the entries of the
array REG[1..n].
Theorem 23 The algorithms described in Fig. 7.1 are a wait-free implementation
of an atomic counter object.
Proof The fact that the operations are wait-free follows directly from their code.
The proof that the construction provides an atomic counter is based on the atomicity
of the underlying base registers. Let us associate a linearization point with each
invocation as follows:
7.1 A Wait-Free Weak Counter for Infinitely Many Processes 191

operation is

end operation.

operation is
;
for do end for;

end operation.

Fig. 7.1 A simple wait-free counter for n processes (code for pi )

• The linearization point associated with an invocation issued by a process pi of the


operation increment() is the linearization point of the underlying write operation of
the underlying SWMR atomic register REG[i]. (If the invoking process pi crashes
before this write, it is as if the invocation had never been issued.)
• Let gt() be an invocation of the operation get_count() and c the value it returns.
This invocation gt() is linearized at the time of the read of the underlying register
REG[x] such the value c is attained (i.e., the sum of the values obtained from the
registers read before REG[x] was strictly smaller than c and the registers read after
it did not change the value of r ).
If several invocations return the same value c, they are ordered according to their
start events (let us remember, see Part II, that no two operations are assumed to
start simultaneously).
According to this definition of the linearization points, and the fact that no under-
lying atomic read/write register REG[x] ever decreases and does increase each time
px writes into REG[x] when it executes the operation increment(), it is easy to
conclude that (1) if two invocations of get_count() are sequential, the second one
cannot return a value strictly smaller than the first one (monotonicity property), and
(2) no invocation gt of the operation get_count() can return a value strictly smaller
than the number of invocations of increment() that have terminated before tgt started
or strictly greater than the number of invocations of increment() that have started
before gt terminates (freshness property). 

7.1.2 Weak Counter Object for Infinitely Many Processes

Infinitely many processes This section focuses on dynamic systems where each
run can have an unknown, arbitrarily large, and possibly infinite number of processes.
The only constraint is that in each finite time interval only finitely many processes
execute operations. Each process pi has an identity i, and it is common knowledge
that no two processes have the same identity.
192 7 Wait-Free Objects from Read/Write Registers Only

Differently from the static model where there are n processes p1 , . . . , pn , each
process knowing n and the whole set of identities, now the identities of the processes
that are in the system are not necessarily consecutive, and no process has a priori
knowledge of which other processes can execute operations concurrently with it.
(Intuitively, this means that a process can “enter” or “leave” the system at any time.)
Moreover, no process is provided with an upper bound n on their number, which
could be used by the algorithms (as, for example, in the previous algorithm, where
the operation get_count() scans the whole array REG[1..n]). This model, called the
finite concurrency model, captures existing physical systems where the only source
of “infiniteness” is the passage of time.
It is important to see that the algorithms designed for this computation model have
to be inherently wait-free as they have to guarantee progress even if new processes
keep on arriving: the progress of pending operations cannot be indefinitely delayed
because new processes keep on arriving.
Helping mechanism A basic principle when designing algorithms suited to the
previous dynamic model with finite concurrency consists in using a helping mecha-
nism. More generally, such mechanisms are central when one has to design wait-free
implementations of concurrent objects.
More precisely, ensuring the wait-freedom property despite the fact that infinitely
many processes can be involved in an algorithm requires a process to help other
processes terminate their operations. This strategy prevents slow processes from
never terminating despite the continuous arrival of new processes. This will clearly
appear in the weak counter algorithms described below.
Weak counter: definition A weak counter is a counter whose increment() and
get_count() operations satisfy the liveness and monotonicity properties of a classical
counter (as defined previously), plus the following property (which replaces the
previous freshness property):
• Weak increment. Let gt1 and gt2 be two invocations of the get_count() opera-
tion that return c1 and c2 , respectively. Let incr be an invocation of increment()
that (a) has started after gt1 has terminated (i.e., r es[gt1 ] < H inv[incr] using
the notations defined in Chap. 4), and (b) has terminated before gt2 has started
(i.e., r es[incr] < H inv[gt2 ]). We have then c1 < c2 .
With a classical counter, each invocation of the increment() operation, be it con-
current with other invocations or not, results in adding 1 to the value of the counter (if
the invoking process does not crash before updating the SWMR register it is associ-
ated with). The way the counter increases is different for a weak counter. Let k be the<