Automated Atomicity-Violation Fixing
Automated Atomicity-Violation Fixing
Abstract
Thread 1 Thread 2
Fixing software bugs has always been an important and time-
consuming process in software development. Fixing concurrency 1 void buf write(. . .) { void buf write(. . .) { 1
bugs has become especially critical in the multicore era. How- 2 int tmp = buf len + str len; int tmp = buf len + str len; 2
ever, fixing concurrency bugs is challenging, in part due to non- 3 if (tmp > MAX) if (tmp > MAX) 3
deterministic failures and tricky parallel reasoning. Beyond correctly 4 return; return; 4
fixing the original problem in the software, a good patch should also 5 5
avoid introducing new bugs, degrading performance unnecessarily, 6 memcpy(buf[buf len], memcpy(buf[buf len], 6
or damaging software readability. Existing tools cannot automate 7 str, str len); str, str len); 7
the whole fixing process and provide good-quality patches. 8 buf len = tmp; buf len = tmp; 8
We present AFix, a tool that automates the whole process 9 } } 9
of fixing one common type of concurrency bug: single-variable
atomicity violations. AFix starts from the bug reports of existing bug- Figure 1. Real-world concurrency bug from Apache. Interleaving
detection tools. It augments these with static analysis to construct “ ” could cause crash. Interleaving “ ” could corrupt the log.
a suitable patch for each bug report. It further tries to combine the
patches of multiple bugs for better performance and code readability.
Finally, AFix’s run-time component provides testing customized
for each patch. Our evaluation shows that patches automatically 1. Introduction
generated by AFix correctly eliminate six out of eight real-world
1.1 Motivation
bugs and significantly decrease the failure probability in the other
two cases. AFix patches never introduce new bugs and usually have Bug fixing is an indispensable part of software development. It
similar performance to manually-designed patches. requires developers to understand a bug’s root cause, design a patch,
implement the patch, and finally validate the patch. This process
Categories and Subject Descriptors D.1.3 [Programming Tech- consumes a huge amount of resources, especially manual effort,
niques]: Concurrent Programming; D.2.5 [Software Engineering]: during software development. Krebs [15] finds that it frequently
Testing and Debugging; D.4.1 [Operating Systems]: Process Man- takes more than one month to finish the fixing process for one bug.
agement Meanwhile, there are endless bugs waiting to be fixed. Software
companies such as Microsoft face pressure to release patches
General Terms Algorithms, Experimentation, Languages, Mea- monthly or even more frequently [24]. Furthermore, patches are
surement, Performance, Reliability, Verification error-prone. Even after consuming so many development resources,
nearly 70% of patches are buggy in their first release [7, 31, 33],
Keywords atomicity violations, automated debugging, concur- often at great financial cost [27]. Self-healing software that fixes its
rency, critical regions, deadlock, mutex locks, mutual exclusion, own bugs has long been desired but is yet unrealized [11]. The need
patching, static analysis for automatic repair techniques remains for concurrency bugs.
Concurrency bugs are synchronization mistakes in multithreaded
∗ Supported in part by AFOSR grants FA9550-07-1-0210 and FA9550-
programs. They are widespread due to software developers’ se-
09-1-0279; DoE contract DE-SC0002153; LLNL contract B580360; NSF quential thinking habits. Concurrency bugs have already caused
grants CCF-0621487, CCF-0701957, CCF-0953478, CCF-1018180, and real-world disasters [29]. In the current multi-core era, with mul-
CNS-0720565; and a Claire Boothe Luce faculty fellowship. Any opinions, tithreaded software becoming pervasive, concurrency bugs are a
findings, and conclusions or recommendations expressed in this material are growing threat to software reliability.
those of the authors and do not necessarily reflect the views of NSF or other Many advanced techniques have been proposed for bug detection,
institutions. software testing, and verification to help identify concurrency bugs
[9, 25, 37]. However, software reliability does not improve until
detected bugs are actually fixed. Unfortunately, fixing concurrency
bugs is not trivial and developers are left to themselves to face the
enormous pressure of fixing ever-so-many concurrency bugs.
Permission to make digital or hard copies of all or part of this work for personal or Figure 1 shows an example of a real-world concurrency bug. Ex-
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
isting bug detectors [5, 25, 37] can accurately report two problematic
on the first page. To copy otherwise, to republish, to post on servers or to redistribute atomicity violations: (1) when lines 2 and 6 are interleaved ( )
to lists, requires prior specific permission and/or a fee. by line 8, Apache could crash; and (2) when lines 6 and 8 are inter-
PLDI’11, June 4–8, 2011, San Jose, California, USA. leaved ( ) by line 8, Apache could corrupt its log. Unfortunately,
Copyright c 2011 ACM 978-1-4503-0663-8/11/06. . . $10.00 even with accurate bug detection, bug fixing is nontrivial:
389
• To fix the first report, if we simply lock before line 2 and unlock The fourth step is patch testing. AFix conducts testing and run-
after line 6, the program could deadlock after buf write exits at time analysis customized for each patch. Testing checks whether the
line 4. original problem has been fixed and looks for new problems. The
• To fix the second report, we should not simply put lines 6–8 and low-overhead run-time analysis targets correctness and performance
line 8 each into one critical region. As line 8 is part of the critical properties that cannot be statically guaranteed or optimized in
region for lines 6–8, this would lead to deadlock again. preceding steps. It provides the developers an option to compare
and refine patches. We discuss this step in detail in Section 5.
• Two patches that separately fix the above two atomicity viola- Overall, this paper makes the following contributions:
tions could deadlock with each other: one thread acquires the
first patch’s lock at line 2 and waits for the second patch’s lock • AFix makes a first step in automating the whole process of
before line 6; a different thread acquires the second patch’s lock fixing one common type of concurrency bug. AFix can save
at line 6 and waits for the first patch’s lock before line 8. developers’ manual bug fixing effort by automatically generating
patches or patch candidates for concurrency bugs detected
Generally speaking, developers face several unique challenges to during in-house testing or for concurrency failures discovered
generate good patches for concurrency bugs. First, concurrency bugs during production runs. AFix can also help address some tough
pose unique challenges in understanding their root causes. Their challenges that usually bother developers. For example, AFix
non-determinism makes manual inspection difficult. In addition, can avoid generating buggy patches. AFix’s run-time analysis
understanding their root causes demands non-local and parallel also helps developers conduct customized patch testing and
thinking. Even with the help of tools for bug detection [4, 12], evaluation.
failure diagnosis [26], and failure replay [39, 42], it is still non-
trivial to understand how to fix a bug’s root cause. As a result, it is • Our experience of applying AFix discovers several limitations
common that patches released by developers fail to (completely) fix of state-of-art bug detectors regarding helping bug diagnosis
the original concurrency bug. This happened for two of the eight and bug fixing, such as not grouping bugs with the same root
real-world concurrency bugs used in our experimental evaluation. cause together and reporting incorrect root causes. We expect
Second, patches to concurrency bugs often involve synchroniza- this experience to help guide future research in bug detection.
tion operations that have non-local impact. As a result, these patches • We evaluate AFix on eight real-world concurrency bugs, with
can easily introduce additional bugs, such as deadlocks and data promising results. AFix correctly fixes six out of eight reported
races. As a notorious example, the patch to Mozilla concurrency bugs. For the remaining two bugs, where the root causes were
bug #54743 led to new concurrency bugs in the field. Patches for incorrectly reported by CTrigger, AFix patches significantly
the latter caused further problems, which took more than one year decrease failure rates. AFix patches also have good readability
to finally fix [14]. based on our manual inspection and do not introduce any new
Third, patches to concurrency bugs often constrain program bugs. In comparison, naı̈ve patches cannot fix any of these bugs,
interleavings or introduce serialization bottlenecks, which could and cause deadlocks in seven out of eight cases. Even the original
cause unexpected and unnecessary performance degradation. developers made mistakes in fixing two of these eight bugs. AFix
Because of these challenges, long repair times and wrong patches patches also show good performance. Software patched by AFix
are common for concurrency bugs [20]. Future programming lan- is less than 1% slower than the original buggy software for all
guages may eventually help developers avoid some of these concur- but one case.
rency bugs. For now, though, software developers are in great need
of immediate support for automatic repair of concurrency bugs.
2. Background: Bug Detection Using CTrigger
1.2 Contributions AFix is a bug fixer, not a bug finder. It depends on problem reports
In this paper, we build a system, AFix, that automates the whole from existing bug-detection tools to guide its repairs. The high-level
process of fixing one common type of concurrency bug: single- ideas in AFix are general to all atomicity-violation detectors, but
variable atomicity violations [9, 19, 20, 25, 37]. AFix leverages the lower-level details are tuned to the specific bug finder used. This
existing techniques for bug detection and interleaving testing to section briefly describes CTrigger: the specific concurrency-bug
bootstrap its fixing process. It uses static analysis and static code detector and tester used by our current AFix implementation.
transformation to automatically design and implement code patches.
It further incorporates run-time monitoring to help developers 2.1 CTrigger General Operation
validate and evaluate each patch it generates. CTrigger is an in-house concurrency-bug detection and testing
The design of AFix focuses on the challenges encountered by framework [25] that targets single-variable atomicity-violation bugs.
developers in their manual patching process. Specifically, AFix As shown in Figure 2, when two consecutive accesses that read or
tries to fix the original bug without introducing new functionality write the same shared variable from the same thread are interleaved
problems, degrading performance excessively, or harming code by another access from a different thread, their execution effect
readability. may be different from any serial execution, in which case a single-
Guided by these goals, AFix automates a developer’s typical bug variable atomicity-violation bug has occurred. An empirical study
fixing process. The first step is bug understanding. AFix discovers by Lu et al. [20] finds that single-variable atomicity-violation bugs
single-variable atomicity violations using CTrigger [25], an existing are one of the most common types of concurrency bug in real code.
bug-detection and testing tool which we review further in Section 2. CTrigger includes a detection phase and a testing phase. Dur-
The second step is patching one bug. AFix maps the dynamically- ing its detection phase, CTrigger monitors a few executions of the
based bug report to static code structures. It conducts static analysis concurrent program, predicts what single-variable atomicity viola-
and static code transformation to fix one problem. Correctness issues tions could happen in the future under the same input, and outputs
are thoroughly analyzed in this step, which we discuss in Section 3. the instruction counters for the three instructions involved in each
The third step is patch merging and optimization for a set of atomicity violation. These three instructions are referred to as p
bugs. AFix collects the patches for each bug report together and (preceding), c (current), and r (remote), as shown in Figure 2. In its
statically identifies patches that can be merged or optimized for testing phase, CTrigger tries to force each atomicity violation iden-
better performance or readability. This step is addressed in Section 4. tified above by injecting delays at selected points. If an atomicity
390
Thread 1 Thread 2 tion counters for (p, c, r), which again is problematic in bug fixing,
especially when p and c are inside different functions (discussed in
p : read x Section 3). Therefore, we modified CTrigger to report the complete
call stack for each p, c, or r.
r : write x AFix can be no wiser than the bug detector which drives its
fixes. Our CTrigger-based implementation of AFix, then, is affected
c : read x by a few other features of CTrigger. For the most part these are
not peculiar to CTrigger, though, but rather are shared with other
concurrency-bug detection and testing tools.
First, separate CTrigger bugs sometimes should not be fixed by
separate patches. We discuss this further in Section 4.
Thread 1 Thread 2 Second, CTrigger might detect an atomicity violation that is a
side effect of another non-atomicity-violation bug. As a result, no
p : write x matter how well AFix fixes the reported CTrigger bug, the software
failure may still occur. We discuss this further in Section 6.
r : write x
2.3 A Naı̈ve Fixing Scheme
c : read x
Given one CTrigger bug report, a naı̈ve patch is to acquire some lock
before p and r, then release the same lock after c and r. As shown
in Figure 1, for many real-world bugs, this naı̈ve patch would cause
problems, such as introducing new bugs, introducing significant and
Thread 1 Thread 2 unnecessary performance degradation, and hurting code readability.
We have implemented this naı̈ve fixing scheme and compare it with
p : read x AFix using real-world bugs in Section 6.
3.1 Overview
Thread 1 Thread 2
As discussed in Section 2.1, a single CTrigger bug report identifies
an instruction r that may non-serializably interleave two other
p : write x
instructions p and c, thereby causing failure. Fixing this bug requires
changing the code to ensure that the code region from p to c is
r : read x
mutually exclusive with r. AFix performs this change in four steps:
c : write x
1. Put p and c into a critical region. The challenge here is to
guarantee that p and c are inside one critical region under all
possible control flows, without introducing new bugs. Potential
new bugs to avoid include double lock, double unlock, unlock
Figure 2. Types of atomicity violations that CTrigger detects. “ ”
without lock, deadlocks, and others shown in Figure 3. We
shows execution order. “x” is a shared memory location.
describe how AFix handles this step in Sections 3.2 to 3.4.
2. Put r into another critical region. This step can be easily done by
violation occurs and the program execution fails, CTrigger reports adding a lock-acquisition operation before r and a lock-release
the involved (p, c, r) triple. At the end, a list of atomicity violations operation after r.
that can truly harm the software are reported. 3. Make the above two regions mutually exclusive with respect to
In this project, we use CTrigger bug reports to drive AFix. We each other. This step involves more than just assigning the same
will discuss how to extend AFix for general atomicity-violation lock to both critical sections. We discuss hidden traps and how
bugs in Section 3.7. In remainder of this paper, when we talk to avoid them in Section 3.5.
about atomicity violations, by default, we mean the type of single-
variable bugs reported by CTrigger. We continue to use p, c, and r to 4. Select or introduce a lock to protect the two critical regions.
represent the three instructions involved in each atomicity violation, Section 3.6 discusses how a lock is chosen.
as shown in Figure 2.
3.2 Single-Function Operation
2.2 CTrigger Modifications and Limitations We start with an algorithm that decides where to acquire and release
During the design and evaluation of AFix, we modified two aspects locks when p and c reside in the same function. We further assume
of CTrigger to better drive automated bug fixing. First, the original that this function is not recursive. Under these restrictions, the
CTrigger only reports one (p, c, r) triple for each distinct c instruc- algorithm follows a natural strategy: find all nodes that are on any
tion, to simplify its design. This is probably a good decision for path from p to c, and make sure a lock is held at exactly these nodes.
bug detection, but not a good one for bug fixing, because it leads To implement this strategy, AFix first analyzes the control-flow
to incomplete patches. We modified CTrigger to output all possible graph to get the set of CFG nodes that are on any intraprocedural
combinations. Second, the original CTrigger only reports instruc- path that starts from p and ends at c without touching p or c in
391
p
p
p
lock; void foo() {
if (gPtr) { while (. . .) { p lock;
c c
puts(gPtr); lock; ptr = aPtr;
unlock; ptr = aPtr; if (. . .) foo();
} else { } c puts(ptr); c
... puts(ptr); unlock;
} unlock; }
(a) Unreleased lock; potential deadlock later (b) Potential double lock or unlock without (c) Potential double lock
lock
Figure 3. Traps in bug fixing. Lock and unlock operations have been placed before and after each p and c node, respectively.
p
lock; p
if (gPtr) { void foo() {
puts(gPtr); lock; p reentrant lock;
c c
unlock; while (. . .) { ptr = aPtr;
} else { ptr = aPtr; if (. . .) foo();
unlock; } c puts(ptr); c
... puts(ptr); reentrant unlock;
} unlock; }
(a) Added unlock avoids unreleased lock (b) Moved lock avoids double lock or unlock (c) Reentrant lock avoids double lock
without lock
Figure 4. AFix’s handling of traps from Figure 3. Shaded nodes comprise the p–c critical region.
between.1 We refer to this set of nodes as the protected nodes: these likewise at most linear in the number of nodes. Therefore the entire
are the nodes which will be included in the critical region. The algorithm to identify protected nodes is at most linear in the number
protected nodes may be computed as follows: of nodes and edges in the CFG for the function containing p and c.
Next, AFix inserts lock-acquisition operations on each edge that
1. Search forward from p (either depth- or breadth-first) without crosses from an unprotected node to a protected node, and inserts
ever crossing beyond c. That is, temporarily treat c as having lock-release operations on each edge that crosses from a protected
no successors for purposes of this search. Call this set of all node to an unprotected one.2
forward-reachable nodes P. Figure 4 shows the result of applying this approach to the traps
2. Search backward from c (either depth- or breadth-first) without from Figure 3. AFix avoids the first trap by inserting a lock release
ever crossing beyond p. That is, temporarily treat p as having operation for the else clause, and avoids the second one by moving
no predecessors for purposes of this search. Call this set of all the lock acquisition operation out of the loop. Handling the third
backward-reachable nodes C. trap requires additional analysis (Section 3.3).
AFix patches thus guarantee two important properties. First, the
3. The protected nodes are P ∩C. This is the set of nodes that are
lock is not held at any unprotected nodes, because it is released
reachable from p, and from which c can be reached, without
whenever execution crosses from protected nodes to unprotected
additional crossings through p or c along the way.
nodes. Second, the lock is held at every protected node, including
It is not difficult to prove that this algorithm correctly identifies p and c, because it is acquired whenever execution crosses from
the protected nodes which must form a critical region. Any node in unprotected nodes to protected nodes. These two properties help
P must be reachable from p without crossing any successor edge us answer the following questions regarding the suitability of the
of c. Conversely, any node in C must be able to reach c without above critical region as part of a bug fix:
crossing any predecessor edge of p. Therefore, any node in P ∩C Are p and c inside the critical region on all paths? Yes, they are.
must be along a p–c path in which p and c respectively appear only Given any path that goes from p to c, every node on that path must
first and last, never as intermediate nodes. be in the protected node set. The lock is always held throughout any
Each depth- or breadth-first search requires time at most linear execution along any such path.
in the number of nodes and edges. Computing the intersection is
2 Inpractice, to place code “on” an edge from x to y, create a new node n
1 No p or c in between is because CTrigger is designed to report atomicity and replace the x → y edge with two edges: x → n and n → y. Code intended
violations between two consecutive memory accesses to the same variable. to appear on the original x → y edge is placed in the newly-created node n.
392
Can the added code introduce double-lock bugs? No, it cannot. void newlog() void close() { void insert() {
AFix only acquires locks on an edge that crosses from an unprotected { ... ...
node to a protected node. Since the lock is not held at unprotected ... p: log = CLOSE; r: if (log == OPEN)
nodes, double-lock bugs never occur in AFix patches. By contrast, close(); } logwrite(. . .);
the naı̈ve fix suggested in Section 2.3 would easily lead to double- open(); ...
lock problems, as shown in Figure 3b. ... void open() { }
Can the added code introduce double-unlock or unlock-before- } ...
lock bugs? No, it cannot. AFix only releases locks on an edge c: log = OPEN;
that crosses from a protected node to an unprotected node. Since }
the lock is held at protected nodes, double-unlock or unlock-before-
lock bugs never occur in AFix patches. By contrast, the naı̈ve fix Figure 5. Atomicity violation in MySQL. When r executes between
suggested in Section 2.3 would easily lead to problems in scenarios p and c, the database log drops an “insert” log entry.
like Figure 3b when the while-loop condition is initially false.
Can the added code introduce new data races? No, it cannot. The
is possible, we use reentrant (a.k.a. counted or nested) locks. In
patch does not introduce any new interleavings into the program.
all other cases, we use non-reentrant locks as these can be faster.
Possible thread schedules in the patched program must be a subset
AFix implements reentrant locks by associating a [Link]
of those that were already possible in the original program.
counter and a [Link] thread-ID with each reentrant lock
Can the added code introduce new deadlocks? This is a much mutex. [Link] records the thread-ID of the lock’s current
tougher question to answer. In general, it is impractical to prove owner, and [Link] records the current nesting level in the
a C/C++ programs deadlock-free. We address this risk using a owner thread. AFix uses reentrant locks in all scenarios when an
mixture of static analysis (Section 3.3) and dynamic monitoring AFix-added critical region could be called by another AFix-added
(Section 5.1). critical region using the same lock. Figure 4c shows how AFix
handles the third trap using a reentrant lock.
Is this the best policy? There could be other policies to decide
where to add locks and unlocks in order to protect p and c. We 3.4 Multiple-Function Operation
have designed and implemented several other schemes. Some of
them generate smaller critical regions than the current AFix does for AFix’s task is more complicated when p and c come from different
some special control flow structures. We prefer to use the current functions. This is not unusual; Figure 5 shows one example taken
AFix algorithm presented above, because it has a huge advantage from MySQL.
in simplicity. It is far easier to reason about and has better compos- We could use interprocedural analysis in computing the pro-
ability in the interprocedural case than other schemes. The patches tected node set. For example, context-free language reachability
generated by it also always have good readability. Furthermore, our can describe the interprocedurally-valid paths from p to c with only
experiments never observe any excessive performance degradation non–p-or-c nodes in between. However, it is generally considered
caused by the current AFix algorithm. taboo to put matched lock and unlock operations in distinct functions
due to the poor composability of locks. Examination of manually-
3.3 Deadlock Analysis and Avoidance designed patches shows that programmers usually add each atomic
region’s lock acquire and release statements inside the same func-
AFix statically analyzes each critical region to determine whether
tion. We comply with this practice by adopting an intraprocedural
it includes any potentially-blocking operations. These operations
approach that starts and ends each newly-added atomic region inside
include lock-acquisitions, condition-wait operations, barriers, thread
one function.
join operations, and some ad-hoc synchronizations like spin loops.
The extention to the basic algorithm is inspired by the manual
The first few of these are easy to identify. To identify ad-hoc spin
approach a developer might take: study the calling contexts of p and
loops, AFix checks whether there is loop inside the critical region,
c, find a single common function through which both p and c are
and whether heap or global variables are accessed inside the loop.
reached, and add the critical region to this function. For example,
If any such loop exists, we conservatively assume that it might
for the bug shown in Figure 5, function newlog is a suitable home
constitute an ad-hoc spin loop. This could be made more precise in
for the p–c critical region.
the future [35, 41], but is sufficient for AFix’s current needs.
To carry out the above design, we modify the original CTrigger
If this analysis finds no potentially-blocking operations within
bug detection tool to output the complete call stack for each
the critical region, then there is no risk of deadlock and AFix
atomicity violation. AFix compares the call stacks of p and c, and
uses standard pthread mutex lock calls to acquire locks. This is
identifies the last (innermost) function f on the common prefix of
the case for AFix’s handling of the first two traps as shown in
the two call-stack chains. If p is not already directly in f , then we
Figures 4a and 4b. If potentially-blocking operations are found, then
find the call node in f that eventually leads to p in the CTrigger
deadlock is a real risk. In this case, AFix instead acquires locks
bug report. We treat this call node as the p node to be protected.
using pthread mutex timedlock: this will time out if it is unable to
Similarly, we replace c with the call node in f that eventually reaches
acquire the lock after some maximum delay. AFix’s run-time system
c. After performing these substitutions, the new p and c must both be
monitors each timed lock in the patch to identify when our attempted
contained within f , and AFix proceeds as described in Section 3.2.
fix has actually introduced a circular wait. This information can help
developers refine AFix patches. We discuss this further in Section 5.
Currently, we set the time-out delay to a relatively large value: 3.5 Harmonizing Two Critical Regions
ten seconds by default. The only disadvantage of long time-out The process described above yields a critical region protecting all
limits is longer latency in discovering any deadlocks that do arise. paths from p to c, guarded by some lock. A second critical region is
One subtle issue that is not covered above is recursion. As created by acquiring and releasing the same lock immediately before
shown in Figure 3c, double lock/unlock and deadlocks could arise and after r. However, this alone is not enough: some additional
in the case of recursive calls. AFix identifies all such cases using a analysis is needed to harmonize these two critical regions so that
simple reachability analysis of the static call graph. When recursion they cooperate without introducing new bugs.
393
AFix first checks whether these two critical regions overlap. Let lock(L);
f be the function containing both p and c, possibly after performing p: array[1] = 2;
the substitutions in Section 3.4. If f does not appear on CTrigger’s for (i = 0; i < 2; i++) {
call stack for r, then the two regions do not overlap and no additional ...
work is needed. If r appears directly within f , then the two regions p: array[1] = 2; c: array[i] = i;
overlap if r is actually in the set of protected nodes computed earlier. for (i = 0; i < 2; i++) { unlock(L);
If r is not directly in f but f does appear in the call stack leading ... ...
to r, then identify the specific call node in f that leads to r, and c: array[i] = i; lock(L);
check whether this call node is in the protected set. Recursion can ... }
lead to multiple f on the call chain. Handling this requires a minor } unlock(L);
extension that if any of the calls in f which lead to r are themselves
inside the p–c critical region, then the lock operations protecting (a) Original code (b) AFix patched
r are redundant and may be removed. Recall that AFix is working Figure 6. Made-up example for the rare case when AFix may fail
with each specific dynamic stack trace reported by AFix’s front-
end bug detector. Therefore, AFix knows the exact call instructions
along the call chain, which makes the identification easy. middle of a source code line, even if this requires slight expansion
In all cases, the effect is to determine whether reaching r in of critical regions.
the stack configuration reported by CTrigger necessarily implies
already being inside the p–c critical region. If it does, then the lock Assessing patch quality In terms of correctness, AFix guarantees
operations around r are redundant, and may simply be removed. not to introduce new bugs. In particular, AFix restricts possible
Otherwise, the lock operations around r are retained. interleavings, but never allows any interleaving that was not already
For example, consider the (line 6, line 8, line 8) atomicity possible before patching. Note that AFix patches could cause
violation depicted in Figure 1. A naı̈ve patch will put lines 6–8 temporary circular wait, but thanks to timed locks these do not
inside one critical region and line 8 inside another. AFix removes become deadlocks.
the redundant critical region around line 8. AFix can successfully fix bugs reported by CTrigger in all but
Note that we do not guarantee that there is no other way to reach two scenarios. One is that a lock inside an AFix patch may time out if
r; we only promise to protect r when reached in the specific stack AFix cannot statically prove deadlock-freedom, and atomicity is no
configuration reported by CTrigger. If other routes to r avoid the longer guaranteed when the lock does time out. This case is captured
p–c critical region and thereby cause additional failures, we assume by the AFix run time and feedback will be provided for further patch
that additional bug reports from CTrigger will eventually mark these refinement. The other case is very rare. It occurs when p or c has
as needing fixes as well. more than one dynamic instance and may access different memory
locations. In this situation, the p–c pair that requires protection
may not be the consecutive ones protected by AFix. This case is
3.6 Lock Selection and Reuse
theoretically possible, but very rare in reality. Figure 6 shows one
Lastly, AFix decides which lock to use. In our current implementa- made-up example. In this example, p is followed by two dynamic
tion, when we only face one bug report, AFix simply creates a new instances of c. The first instance accesses a different variable from
global lock to use at the boundaries of the p–c and r critical regions. p, while the second instance accesses the same variable as p. AFix
One might consider reusing an existing lock. This is especially puts p and the first dynamic instance of c into the critical region, as
appealing if p and c are already inside a critical region in the original shown in Figure 6b, and the control flow could leave the protected
program. Doing this requires identifying some lock that is live and region between the two instances of c. However, what should be in
reachable at r and at every edge entering or leaving the p–c critical the critical region is p and the second dynamic instance of c. In this
region. This is quite challenging in the general case of heap-allocated extremely rare example, AFix patch will mistakenly end the critical
locks reached by traversing complex, linked data structures. If r is region before the second instance of c.
not even in the same function as p and c, and therefore has access In terms of performance, AFix patch strives to avoid introducing
to different local variables, the challenge is even greater. unnecessary performance degradation. AFix always ends a critical
Global locks, however, have fixed names. This makes them region immediately when there is no hope to reach c. Of course, for
inviting targets for reuse. We have implemented a simple scheme specific bugs, there could be faster patches by using lock-free data
that identifies global locks that protect the intended p–c critical structures, reader-writer locks, etc. Crafting a general algorithm for
region in the same function. If some such global lock is identified, using these special tricks is left for future work.
AFix uses it instead of creating a new lock, and elides inserting
additional lock acquire/release statements. For example, when one Extending AFix AFix focuses on locks, because they are the most
p–c critical region is already protected by a global lock, AFix does commonly supported and used synchronization primitives in mul-
not insert additional lock acquire/release statements to protect it. In tithreaded programs. AFix could be easily extended to other prim-
practice, though, we never find any reusable global locks for the itives that support atomicity. For example, if we use transactional
bugs used in our experimental evaluation. memory, then the analysis to determine where to lock and unlock is
suitable for deciding where to begin and end transactions. Of course,
some detailed concerns will be different. For example, concerns
3.7 Implementation Details and Discussion over deadlocks and lock selection go away, while new concerns
Implementation Details AFix implements these static analyses arise over live-lock and I/O inside transactions.
in LLVM [17]. After analysis, AFix changes the target software’s Although our algorithm description and current AFix implemen-
LLVM byte-code to apply the patch. In our current implementation, tation is for CTrigger, the AFix algorithm can be easily extended to
AFix uses locks provided by the POSIX threads (pthread) library work with other atomicity-violation bug detectors including those
to enforce mutual exclusion. A subtle implementation issue is that for multi-variable atomicity-violation bugs. Bug reports from many
CTrigger describes (p, c, r) triples in terms of executable instruction atomicity-violation detectors can be generalized as “code region
addresses. To aid patch readability, AFix maps each instruction back X needs to be mutually exclusive with code region Y .” With this
to source code lines. We never insert lock or unlock operations in the knowledge, a patch can be generated using the AFix algorithms
394
lock(L1) Since we have modified CTrigger to provide the complete call-
p1 stack information, the above analysis is not difficult. Note that the
lock(L2) simplicity of our interprocedural subsumption analysis benefits from
lock(L1) lock(L1) p2 AFix’s policy requiring that each critical region’s locks and unlocks
p1 p1 c1 appear within the same function.
lock(L2) lock(L2) unlock(L1) Overall, AFix compares each pair of patches and deletes the
p2 p2 lock(L1) redundant ones.
c2 c1 r1 4.2 Merging Related Patches
unlock(L2) unlock(L1) unlock(L1)
c1 c2 c2 The general problem of patch merging can be described as follows.
unlock(L1) unlock(L2) unlock(L2) Given a set of critical regions, some guarded by the same lock
and some by different locks, how can we adjust the lock variables
(a) Eliminating (b) Improving readability (c) Avoiding deadlock and the critical-region boundaries to achieve the best balance of
redundant locks and performance performance, readability, and correctness?
Figure 7. Scenarios where patches should or must be merged This question rarely has a provably-optimal answer, because
performance is affected by many factors. These include but are not
limited to the length of each critical region, the cost of acquiring
discussed above: select one function to add lock/unlock operations, and releasing a lock, how many threads will execute each critical
determine where inside that function to add lock/unlock operations, region, and how much contention each lock-acquisition will face.
analyze the necessity of reentrant and timed locks, harmonize two Much of this cannot be decided statically.
critical regions, etc. Facing this challenge, AFix uses a simple heuristic to decide
when and how to merge critical regions. We should note that
4. Fixing Multiple Bug Reports although we believe this scheme can achieve a good balance between
performance, readability, correctness, and analysis simplicity, there
Bug detectors often report multiple bugs that should be fixed by is no firm guarantee of performance improvement going from non-
one patch, such as the two atomicity violations in Figure 1 and the merged to merged patches. Developers can make informed decisions
scenarios depicted in Figure 7. This section describes how AFix based on AFix’s run-time performance profiling results. Different
coordinates multiple fixes in order to improve patch quality. merging policies can also be plugged into AFix in the future.
Provided with a set of bug reports, AFix first designs patches
for each bug independently. Before applying these patches to the 4.2.1 When to Merge
software, AFix considers all patches together. If one patch subsumes AFix merges patches when one patch’s lock-protected critical
another, as shown in Figure 7a, the redundant patch is discarded. If regions include the p, c, or r nodes of another patch. More formally,
two patches have overlapping critical regions, as shown in Figures 7b consider two sets of nodes for any given patch i. Let Anchorsi
and 7c, AFix will further analyze how to merge the patches. represent the set of all nodes “anchoring” this patch. Initially this
4.1 Removing Redundant Patches consists of just {pi , ci , ri }: the set of nodes representing the original
bug for which this patch was created. Let Criticali represent the
AFix creates two critical regions for each bug triple (p, c, r): one set of all nodes contained in critical regions guarded by patch i’s
containing p and c, and one containing r alone. Strictly speaking, lock. Initially this will consist of ri (as a single-node critical region)
a patch is completely redundant if both of its critical regions can along with pi , ci , and any other protected nodes along pi –ci paths
be subsumed by another patch. Sometimes, a patch’s p–c region is as computed in Section 3.2.
subsumed by another patch, but its r region is not. Since the r region For any two patches i, j, if Anchorsi ∩ Critical j 6= 0, / then patches
is extremely short, AFix still chooses to discard the subsumed p–c i and j should be merged. In other words, if the critical regions of
critical region in this situation. The lock used to protect r will be one patch include any of the anchor nodes of another, then the two
changed to the lock used in the subsuming patch. patches should be combined into one. Let the Anchors and Critical
Under AFix’s locking policy, a critical region p–c is subsumed sets of this merged patch be the unions of the corresponding sets
by another critical region p0 –c0 if and only if the set of CFG nodes for the patches being merged. A single lock will guard all nodes in
in p–c critical region is a subset of those in p0 –c0 . The reasoning the combined Critical set. This merged patch is available for further
is clear. The lock for p0 –c0 is held at every node inside its critical merging; the process continues until the original patch collection
region. If all p–c nodes are contained within p0 –c0 , then the p–c has been collapsed into a (possibly) smaller one in which no patch’s
lock is redundant. p, c, or r nodes are contained in the lock-guarded critical regions of
The above conditions are straightforward to check for two any other.
critical regions in the same function intraprocedurally. For two
regions in different functions, AFix extends the above subsuming 4.2.2 Merging Two Critical Regions
conditions interprocedurally as follows. First, let f denote the Once AFix decides to merge, it enacts the merge in two steps:
function containing critical region p–c, and let f 0 denote the function
containing critical region p0 –c0 . Let n be the call node inside f 0 that 1. Update the positions of lock and unlock operations. AFix puts
eventually leads to f (and therefore to p–c), if such a node exists. an unlock operation on every edge that exits the merged Critical
Then p0 –c0 subsumes p–c if and only if set, and a lock operation at every edge that enters the Critical set.
This has the effect of removing all redundant lock and unlock
1. f 0 is in the common prefix of p and c’s call stacks, and operations among merged patches.
2. n is inside the p0 –c0 critical region. 2. Unite lock variables. AFix arbitrarily chooses one lock variable
to use and puts this variable into every lock and unlock operation
f0
Notice that both f and are in the common prefix of p and performed by the merged patch.
c’s call stacks, and f is chosen by our critical region identification
algorithm, so we know that f is closer to p and c than f 0 on the For example, AFix merges the two critical regions in Figure 7b,
common prefix of p and c’s call stacks. and deletes unlock(L1) and lock(L2). Similarly, AFix also merges
395
the three critical regions in Figure 7c, and discards all but the first graph at the moment of T , as follows. When some thread t releases
lock(L1) and the last unlock(L2), thereby eliminating the potential a lock l, AFix checks whether it also saw t return from a lock(l )
deadlock. For the bug shown in Figure 1, the final merged patch call. If not, then l must have been acquired by t sometime before T ,
has just one lock operation inserted before line 2, and two unlock and therefore was held by t at the moment of T . Right after a lock l
operations inserted before line 4 and after line 8. is acquired by thread t, AFix checks whether it has observed t begin
The above merging process is only used for critical regions in- a lock(l ) call. If not, then t must already have been waiting for l at
side the same function. Actually, it covers the intraprocedural case the moment of T . Eventually, AFix will recover the whole resource
of redundant patch removal from Section 4.1: if a patch is subsumed graph and report whether there was a deadlock. This post–time-out
by another patch in the same function, they will be merged based monitoring ends when either a deadlock is identified or the program
on AFix’s merging policy. Currently, AFix first conducts interproce- exits. If the program encounters another AFix lock time-out, AFix
dural redundant-patch removal and then conducts intraprocedural will work on recovering multiple resource graphs at the same time.
patch merging. If some other merging policy is adopted in the future, Since AFix lock time-outs are rare after in-house patch testing, this
intraprocedural redundant-patch removal may still be needed. scheme is well-suited to monitoring production runs.
Of course, AFix’s run-time system could miss a deadlock if the
4.2.3 Benefits of Merging deadlock involves ad-hoc synchronization, such as spin loops. This
Merging patches as described above has several beneficial effects is an open problem for all deadlock detection tools.
on patch quality: Currently, AFix relies on developers to refine the patch using
its deadlock-detection results. When a time-out is diagnosed as a
• Code readability is improved. In practice, it is common that non-deadlock, developers may want to extend the time-out threshold
many atomicity violations are reported within few lines of code, in related AFix locks. When the time-out is caused by deadlock,
such as the Apache case shown in Figure 1. Using many different developers could discard this patch and choose another, such as a
locks severely hurts code maintenance and readability. patch that has some critical regions merged. AFix’s run-time can
• Performance can usually improve due to fewer lock and unlock also be combined with previous deadlock-prevention systems [13]
operations without enlarging the critical region too much, such to automatically fix patch-induced deadlocks in the future.
as in Figure 7. Of course, there is no guarantee of performance Other than deadlock monitoring, AFix also supports performance
improvement, because merging can also reduce potential con- profiling during in-house testing. In profiling mode, AFix measures
currency in certain scenarios. the waiting time and the number of time-outs for each lock acqui-
• Correctness is either the same or improved, because we have
sition inside an AFix patch. If excessive waiting time or time-outs
are observed at a critical region that merges multiple bug reports’
larger critical regions now. In fact, Section 6 reports that this
patches, developers may want to split this big critical region to
helps AFix lower the failure rates of some real-world software
reduce lock contention and improve performance.
bugs that were inaccurately reported by CTrigger.
• Deadlock risk is reduced. It is easy to have one patch deadlock 5.2 Patch Testing
with another patch, as shown in Figure 7c. Merging solves this Each patch generated by AFix undergoes two testing phases. The
problem. Under AFix’s merging policy, holding one AFix lock first phase uses the existing CTrigger testing. CTrigger provides a
and trying to acquire another AFix lock is impossible, because noise injection scheme for each bug it reports. Through a binary
these two locks would have been merged. instrumentation framework [22], CTrigger deterministically calls
sleep before or after specific instructions: before c, after p, etc. We
5. Run-Time Monitoring and Feedback apply CTrigger testing to AFix-patched software and see whether
software failures could still occur.
Not all properties can be guaranteed statically. AFix collects addi-
The second patch testing phase is a more general interleaving
tional information at run time to help developers refine patches.
test implemented by us. Before executing every instruction inside
5.1 Deadlock and Performance Monitoring an AFix critical region, a random number generator decides whether
to sleep or not. Similar random delays are also inserted right before
AFix uses time-outs for lock acquisitions that cannot be guaranteed
the locks and right after the unlocks added by AFix. The sleep
to be deadlock-free as discussed in Section 3.3. Therefore, deadlocks probability and the length of the sleep are tunable knobs. This phase
caused by AFix patches manifest as lock time-outs. Of course, a can help identify patches that fail to completely fix the bug due to
time-out could also occur without deadlock: a lock may simply bug-detection limitations of CTrigger.
encounter too much contention and require a longer waiting period.
AFix implements two run-time deadlock-detection algorithms,
through LLVM byte-code rewriting, suitable for different usage 6. Experimental Results
scenarios. The first, suitable for in-house patch testing, reports AFix is implemented using LLVM version 2.7. Experiments are
whether a deadlock has occurred immediately after a time-out. It has performed on an eight-core Intel Xeon machine running Red Hat
small overhead at each lock/unlock operation. The second deadlock Linux 5 with kernel version 2.6.18.
detector, suitable for production-run deployment, has nearly zero We evaluated AFix on eight real-world bugs from six open-
overhead if there is no AFix lock time-out. It takes a little bit longer source applications. These bugs were all initially reported by soft-
to complete deadlock diagnosis. ware users to each application’s bug database or mailing list. We
In the first scheme, AFix follows the traditional deadlock- apply CTrigger to these applications using the bug-triggering inputs
detection algorithm: it maintains a resource graph and looks for described in the user bug reports. In each case, CTrigger detects one
cycles when an AFix-added lock times out. To maintain a resource or more (p, c, r) triples. We have confirmed that atomicity violations
graph, AFix monitors every lock acquisition, lock release, and described by each triple lead to failure symptoms matching users’
condition-variable signal and wait, all from the beginning of execu- descriptions. Table 1 gives additional information about these bugs,
tion. Its overhead, then, depends on the density of these operations. applications, and CTrigger detection results.
In the second scheme, AFix starts its monitoring and analysis For each bug listed in Table 1, AFix generates two versions of
only after an AFix lock times out, a moment that we will refer to as patched software: one with the patch-merging technique presented
T . AFix uses information collected after T to recover the resource in Section 4 applied and one without. We refer to the former as
396
Developer # CTrigger Bug ID original naı̈ve unmerged merged
Bug ID Application LoC Fix Time Reports
FFT 74% 73% 87% 30%
FFT FFT 1.2K N/A 5 PBZIP2 94% 100%* 66% 20%
PBZIP2 PBZIP2 2.0K N/A 4 Apache 85% 100%* 83% 0%
Apache Apache 333K 30 days 2
MySQL1 MySQL v4.0.12 681K 10 days 1
MySQL1 41% 100%* 0% 0%
MySQL2 MySQL v4.0.19 693K 13 days* 2 MySQL2 53% 100%* 0% 0%
Mozilla1 Mozilla-JS v1.4.2 87K 12 days* 2 Mozilla1 41% 100%* 0% 0%
Mozilla2 Mozilla-JS v1.5 108K > 3 days] 1 Mozilla2 48% 100%* 0% 0%
Cherokee Cherokee v0.9.2 83K > 1 day] 4 Cherokee 81% 100%* 0% 0%
Table 1. Bugs used in experimental evaluation. Developer fix time Table 3. Failure rates under interleaving testing. “100%*” marks
is the time between developers’ first response to a bug report and cases where the test input deterministically causes deadlock.
a correct patch checked in, if known. Mozilla-JS is the JavaScript
Engine of Mozilla. *: incorrect patches were submitted during this
period. ]: the bug was reported by developers who suggested a fixing bugs or intolerable performance deficiencies. We obtained the above
strategy in the initial bug report. information from corresponding Bugzilla records.
Patches generated with merging are highly competitive with
manually-generated patches. AFix successfully fixes six out of eight
bugs. In two cases, MySQL2 and Mozilla1, merged patches are even
Bug ID naı̈ve unmerged merged manual better than the first few patches generated by developers. For FFT
FFT - - ? X and PBZIP2, AFix is limited by CTrigger’s inaccurate root-cause
PBZIP2 - - - X identification. Even here, the merged patches reduce (but cannot
Apache - - X X entirely eliminate) failures.
MySQL1 - X X X Unmerged patches fix five out of eight bugs. Deadlock prevents
MySQL2 - X X ? this from fixing the Apache bug that merging does correctly fix. The
Mozilla1 - X X ? naı̈ve approach fixes no bug. It causes deadlock in all but FFT, and
Mozilla2 - X X X fails to fix FFT due to CTrigger inaccuracy.
Cherokee - X X X 6.2 Correctness Results
Table 2. Overall patch quality Table 3 shows the failure rates of different versions of software under
random noise-injection testing (Section 5.2). We set up exactly the
same noise injection environment for all versions of software for
the merged version and the latter as the unmerged version. We also each bug: the same program region to inject random noise, which is
compare AFix with two other versions of patched software: (1) the around the buggy code region; the same sleep probability; and the
patch manually generated by developers, referred to as manual, and same sleep length. We use slightly different sleep probabilities and
(2) the naı̈ve patch described in Section 2.3, referred to as naı̈ve. sleep lengths for different bugs, because we want to make sure the
Our experiments also compare the above patched versions with the testing is intensive enough to make the original unpatched software
original buggy software, referred to as original. fail frequently. This lets us effectively evaluate whether the patches
Our evaluation considers three aspects of patch quality: are useful. We execute each version of software 100 times.
Merged patches eliminate software failures for six out of eight
Correctness. We use CTrigger testing and intensive random noise bugs in our testing. Merging provides incomplete patches for FFT
injection to check whether the bug has been fixed and whether and PBZIP2, but drops failure rates from 74% to 30% in FFT and
new bugs are introduced. We report and compare the failure rates from 94% to 20% in PBZIP2. The failure rates do not drop to 0%
of different versions of software. We also use the AFix run time because CTrigger’s bug detection is inaccurate: atomicity violation
to check whether timed-out locks actually represent deadlocks. is a side effect but not the root-cause of these two bugs. Specifically,
Performance. We measure the performance of different versions following a CTrigger report (p, c, r), merging correctly ensures that
of patched or unpatched software. We also measure how long it r does not execute between p and c. However, the real problem in
takes AFix to perform its static analysis and patch insertion. FFT is that r should not execute after either p or c, while the problem
in PBZIP2 is that r should execute after both p and c. Merging can
Code readability. We manually compare AFix patches with the only partially fix these two problems.
manual patch. We present results through case studies. Unmerged patching behaves slightly worse than merged, elimi-
nating failures for five out of eight bugs. It leads to non-deterministic
6.1 Overall Results deadlocks in Apache and PBZIP2 due to the reason depicted in Fig-
Table 2 presents a compact summary of patch quality. “X” indi- ure 7c. Once the lock times out after a deadlock, atomicity violation
cates that the original bug is fixed, no new bug is observed, and and subsequent failure can still occur. For FFT, unmerged patching
performance degradation is negligible. “?” indicates that the patch has a much larger failure rate than merged due to its smaller critical-
is incomplete, but decreases the failure rate without hurting perfor- region size. In FFT, the bug occurs when an instruction r executes
mance. “-” marks cases where the fix introduces new bugs, does after any one of a set of six instructions. Merging puts all six into
not significantly reduce the failure rate, or imposes an intolerable one critical region, making the bug less likely to occur.
performance deficiency. For manual patches, “?” means develop- Naı̈ve patching is clearly a very bad choice, leading to deadlock
ers submitted intermediate patches that are later determined to be in seven out of eight bugs. There are several different reasons for
incomplete by developers or testing groups (i.e., the original soft- these deadlocks. For MySQL2 and PBZIP2, deadlocks are caused
ware failure can still occur with the patch applied). “X” means that due to intraprocedural control flows, as depicted in Figure 3. For
the first developer-submitted patch is complete. For the bugs in MySQL1 and Mozilla1, p and c are inside different functions.
our study, developers submitted no patch that could introduce new Locking in one function and unlocking in another easily causes
397
Bug ID naı̈ve unmerged merged manual therefore adds four global lock variables, six lock operations, and
seven unlock operations into the software. AFix finds some of these
FFT -0.02% -0.07% -0.02% 0.19% critical regions to be redundant and some to be mergeable. The final
PBZIP2 N/A 89,132% 181.82% 0.20% patch requires only one lock, one lock operation, and one unlock
Apache N/A 0.45% -0.97% -0.26% operation, for excellent code readability and maintainability. In fact,
MySQL1 N/A 0.48% 0.48% 0.45% the merged patch closely resembles the manual patch.
MySQL2 N/A -0.09% -0.09% 1.02%
Mozilla1 N/A 0.49% 0.55% 0.12% 6.5 Other Results
Mozilla2 N/A -0.40% -0.40% -0.20%
AFix’s run-time deadlock detection gives accurate results for both
Cherokee N/A -1.02% -1.04% 0.39%
merged and unmerged patches. For example, in PBZIP2, both
Table 4. Performance overheads relative to original merged and unmerged strategies encounter lock time-outs. The
AFix run-time system correctly determines that the time-outs in the
unmerged patch are caused by deadlocks among patches, but that
deadlocks when the first function is called twice without the second the time-outs in the merged patch are not. Rather, merging simply
function in between. In Apache, Cherokee, and Mozilla2, naı̈ve has produced large critical regions with much lock contention, and
double-lock bugs arise because r is inside a p–c region. Apache and therefore requires longer time-outs.
PBZIP2 also deadlock among different locks added by the patch. AFix’s post–time-out deadlock-detection algorithm exhibits
We also reapplied CTrigger to the patched code. According to excellent performance on all bugs. Its overhead is already included
CTrigger’s definitions, both merged and unmerged patches success- in the performance numbers measured in Table 4. AFix’s always-on
fully fix all eight bugs. We also manually checked all these patches. deadlock detection has less than 5% overhead for all bugs except
Our findings are consistent with the random testing results. In those for Mozilla1 and Mozilla2, where the always-on monitoring causes
cases with 0% failure rates, the bugs are all truly fixed. 300% and 97% overhead respectively. This is due to the frequent
Overall, merging generates correct patches that not only fix the lock/unlock operations in Mozilla’s JavaScript Engine.
original bugs but also introduce no new bugs, as long as its front-end
bug detector provides a reasonably-accurate bug report. Unmerged 7. Related Work
patching is also good, but is vulnerable to deadlock.
7.1 Concurrency Bug Detection
6.3 Performance Results Many detection tools have been built to identify interleaving prob-
Patched application performance Table 4 shows that merged and lems in multithreaded programs, such as races [4, 6, 10, 12, 32]
unmerged AFix fixes provide good run-time performance, with and atomicity violations [9, 12, 19, 37]. These tools provide good
negligible difference between them. Note that overheads cannot be starting points for automated bug fixing.
measured for most naı̈ve patches as these lead to deterministic Of course, since these tools are designed to identify problems,
deadlocks. In most cases, AFix patches impose no perceivable not to fix problems, they still leave many challenges for bug fix-
performance degradation compared with correct manual patches ing. For example, many bug detection tools have false positives.
or even the original buggy software. This is because the relevant “Fixing” false positives leads to over-synchronization and unnec-
critical regions are usually small and off performance-critical paths. essary performance degradation. Many bug reports do not include
Only PBZIP2 suffers from significant performance degradation complete run-time information, which could cause wrong or incom-
under AFix. The PBZIP2 bug is caused by a parent thread occa- plete patches. Bug detectors seldom put bug reports that can be
sionally destroying shared objects before worker threads finish. The fixed by one patch together, which can cause too many locks to be
manual patch makes the main thread wait until all worker threads are added, harming code maintainability. Furthermore, naı̈ve patches
done, which is correct and lightweight. Due to CTrigger inaccuracy, for accurately-described bug reports can easily introduce new bugs.
AFix mistakenly treats this bug as an atomicity-violation bug, and AFix has considered and addressed the challenges above. In the
fixes it by putting almost the whole worker thread into a critical future, AFix can be extended to work with more concurrency-bug
region, which causes huge overhead. The unmerged AFix patch is detectors to fix more bugs, which will also help bug-detection tools
much slower than the merged patch for PBZIP2, because the former to get more usage during software development.
suffers from deadlock time-outs. Deadlock time-outs can also occur
7.2 Concurrent Program Synthesis
in the unmerged patch of Apache, but this never happens during
performance evaluation without noise injection. Existing tools that automatically add synchronizations to software
mostly have different goals from AFix, and hence different foci. Pro-
AFix patch generation performance All of AFix’s static analyses gram synthesis and sketching [8, 34, 38] use smart state-space search
have been designed with scalability in mind. On every benchmark, and verification techniques to infer synchronization and make con-
AFix takes no more than one second to analyze the program, develop current programs satisfy certain specifications. They are powerful
its patches, and inject them into the code. Clearly, AFix has the in the sense that the specification can be flexible. Unfortunately, the
potential to significantly speed up the bug-fixing process. nature of the problem makes them hard to scale to large real-world
C/C++ applications such as the ones fixed by AFix.
6.4 Readability Case Studies Some tools [23, 37] encourage programmers to represent their
AFix’s patch merging technique improves patch readability and synchronization intentions in non-lock language constructs, such
maintainability. For six out of eight bugs, CTrigger reports two as atomic sets and atomic blocks, and transparently translate these
to five atomicity violations related to each bug. The unmerged fix non-lock constructs to lock/unlock operations. In these cases, the
therefore adds two to five new locks and up to ten new critical critical region boundaries are either directly specified by developers
regions into the software. Merging simplifies this to use just one lock or fixed at the entrances and exits of certain functions; the major
in five out of six cases. Manual inspection shows that all merging challenge is lock assignment. AFix, by contrast, derives critical
decisions made by AFix improve readability. region boundaries with limited or no human intervention.
Taking the Cherokee bug as an example, CTrigger reports four TraceFinder [36] performs whole-program synchronization anal-
atomicity violations in three different functions. The unmerged fix ysis and pointer-alias analysis to identify atomic-block boundaries
398
that can guarantee conflict-serializability for the whole software. References
TraceFinder has a different goal from AFix. It cannot scale to large [1] A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient system-enforced
applications. It uses atomic blocks for synchronization, and does deterministic parallelism. In OSDI, 2010.
not worry about lock assignment or deadlocks. [2] T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process
AFix is unique in fixing bugs reported by automatic bug- groups in dOS. In OSDI, 2010.
detectors. AFix does not face the scalability problems encountered
[3] E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded
by TraceFinder, because it does not try to figure out all the syn- programming for C/C++. In OOPSLA, 2009.
chronizations a program needs to use. Rather, AFix faces different
[4] M. D. Bond, K. E. Coons, and K. S. McKinley. Pacer: Proportional
challenges.
detection of data races. In PLDI, 2010.
7.3 Hot-Patching Concurrency Bugs at Run Time [5] L. Chew and D. Lie. Kivati: fast detection and prevention of atomicity
violations. In EuroSys, 2010.
Some proposed strategies for hot-patching software at run time are
[6] J.-D. Choi, K. Lee, A. Loginov, R. O’Callahan, V. Sarkar, and M. Srid-
not suitable for concurrency bugs [28]. Recent work by Wu et al. haran. Efficient and precise datarace detection for multithreaded object-
[40] provides a framework to deploy hot patches manually designed oriented programs. In PLDI, 2002.
by developers. AFix can complement this framework by generating
[7] C. Cowan, H. Hinton, C. Pu, and J. Walpole. The cracker patch choice:
patches automatically, instead of completely relying on developers. An analysis of post hoc security techniques. In In Proceedings of the
Some run-time tools do not try to permanently fix concurrency National Information Systems Security Conference (NISSC), 2000.
bugs in the software. Rather, they steer the execution to make fail-
[8] J. Deshmukh, G. Ramalingam, V. P. Ranganath, and K. Vaswani.
ure less likely. Some pay the cost of performance degradation or Logical concurrency control from sequential proofs. In European
require non-existing hardware support [21]. Others assume that Symposium on Programming, 2010.
critical-region boundaries are known [16, 30]. Deterministic exe-
[9] C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicity checker
cution systems [1–3] can make some concurrency bugs determin- for multithreaded programs. In POPL, 2004.
istically happen and some other bugs never occur. This promising
[10] C. Flanagan and S. N. Freund. FastTrack: efficient and precise dynamic
approach still faces many challenges, such as run-time overhead, in-
race detection. In PLDI, 2009.
tegration with system non-determinism, language design, etc. Even
for software executed inside a deterministic run-time, fixing bugs [11] M. Harman. Automated patching techniques: the fix is in: technical
perspective. Commun. ACM, 53(5):108–108, 2010. ISSN 0001-0782.
still requires manual intervention. In general, these tools look at
doi: [Link]
different problems from AFix. AFix generates patches that can com-
pletely fix bugs without unnecessary performance degradation. AFix [12] G. Jin, A. Thakur, B. Liblit, and S. Lu. Instrumentation and sampling
strategies for Cooperative Concurrency Bug Isolation. In OOPSLA,
and these tools can complement each other.
2010.
Kivati [5] combines run-time bug detection with temporary
patch generation based on hardware watch-points. Its focus is to [13] H. Jula, D. Tralamazza, C. Zamfir, and G. Candea. Deadlock immunity:
Enabling systems to defend against deadlocks. In OSDI, 2008.
lower bug detection overheads. However, the limited watch-point
resource (four per machine) prevents Kivati from managing many [14] E. Kandrot and B. Eich. Our JavaScript is 3x slower than IE’s, Sept.
different critical regions at the same time. Kivati does not handle 2000. URL [Link] [Link]?id=54743.
cases where p and c are inside different functions, such as the [15] B. Krebs. A time to patch II: Mozilla. The Washington Post Security Fix
Cherokee and MySQL1 bugs in Section 6. It causes unnecessarily blog, Feb. 2006. URL [Link]
long critical regions when there are branches between p and c, as 2006/02/a time to patch ii [Link].
shown in Figure 3a. It also does not handle critical-region merging [16] B. Krena, Z. Letko, R. Tzoref, S. Ur, and T. Vojnar. Healing data races
or deadlocks. Overall, Kivati lacks the off-line static analyses needed on-the-fly. In PADTAD, 2007.
to generate high-quality, permanent fixes. [17] C. Lattner and V. Adve. LLVM: A compilation framework for lifelong
AtomRace [16, 18] combines run-time bug detection with run- program analysis & transformation. In CGO, 2004.
time healing, with emphasis on bug detection. Its healer component [18] Z. Letko, T. Vojnar, and B. Křena. AtomRace: data race and atomicity
assumes that critical-region entrances and exits are all provided as violation detector and healer. In PADTAD, 2008.
inputs. They do not address those challenges faced by the naı̈ve [19] S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting atomicity
patches discussed in our paper, and do not consider patch merging. violations via access-interleaving invariants. In ASPLOS, 2006.
[20] S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes – a
8. Conclusion comprehensive study of real world concurrency bug characteristics. In
ASPLOS, Mar. 2008.
We have described AFix, a framework for automatically fixing
[21] B. Lucia, J. Devietti, L. Ceze, and K. Strauss. Atom-Aid: Detecting
a common type of concurrency bugs. We have implemented the and surviving atomicity violations. IEEE Micro, 29(1), 2009.
system and shown AFix to be effective at generating high-quality
[22] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney,
patches for atomicity-violation bugs detected by an automated
S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized
bug finder in several large, real-world applications. AFix conducts program analysis tools with dynamic instrumentation. In PLDI, 2005.
thorough static analysis to reach a good balance among correctness,
[23] B. McCloskey, F. Zhou, D. Gay, and E. Brewer. Autolocker: synchro-
performance, and code readability in its automatically generated nization inference for atomic sections. In POPL, 2006.
patches. AFix’s testing and monitoring run-time system also provide
useful feedback for further patch refinement. In the future we plan to [24] Microsoft. Revamping the microsoft security bulletin release process,
Feb. 2005. URL [Link]
extend AFix to work with more general synchronization primitives [Link].
and more types of concurrency bug detectors.
[25] S. Park, S. Lu, and Y. Zhou. CTrigger: exposing atomicity violation
bugs from their hiding places. In ASPLOS, 2009.
[26] S. Park, R. W. Vuduc, and M. J. Harrold. Falcon: fault localization in
concurrent programs. In ICSE, 2010.
[27] R. Pegoraro. Apple updates Leopard–again. Washington Post, Feb.
2008.
399
[28] J. H. Perkins, S. Kim, S. Larsen, S. P. Amarasinghe, J. Bachrach, [35] C. Tian, V. Nagarajan, R. Gupta, and S. Tallam. Dynamic recognition
M. Carbin, C. Pacheco, F. Sherwood, S. Sidiroglou, G. Sullivan, W.-F. of synchronization operations for improved data race detection. In
Wong, Y. Zibin, M. D. Ernst, and M. C. Rinard. Automatically patching ISSTA, 2008.
errors in deployed software. In SOSP, 2009. [36] G. Upadhyaya, S. P. Midkiff, and V. S. Pai. Automatic atomic region
[29] K. Poulsen. Software bug contributed to blackout. SecurityFocus, Feb. identification in shared memory spmd programs. In OOPSLA, 2010.
2004. URL [Link] [37] M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints
[30] P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. Zorn, R. Nagpal, with data in an object-oriented language. In POPL, 2006.
and K. Pattabiraman. Detecting and tolerating asymmetric races. In [38] M. T. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of
PPoPP, 2009. synchronization. In POPL, 2010.
[31] E. Rescorla. Security holes . . . who cares? In USENIX Security [39] D. Weeratunge, X. Zhang, and S. Jagannathan. Analyzing multicore
Conference, 2003. dumps to facilitate concurrency bug reproduction. In ASPLOS, 2010.
[32] S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. [40] J. Wu, H. Cui, and J. Yang. Bypassing races in live applications with
Eraser: A dynamic data race detector for multithreaded programs. ACM execution filters. In OSDI, 2010.
Transactions on Computer Systems, 15, 1997. [41] W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma. Ad hoc synchroniza-
[33] S. Sidiroglou, S. Ioannidis, and A. D. Keromytis. Band-aid patching. tion considered harmful. In OSDI, 2010.
In HotDep, 2007. [42] C. Zamfir and G. Candea. Execution synthesis: A technique for
[34] A. Solar-Lezama, C. G. Jones, and R. Bodik. Sketching concurrent automated software debugging. In EuroSys, 2010.
data structures. In PLDI, 2008.
400