Effective Data-Race Detection For The Kernel: John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk
Effective Data-Race Detection For The Kernel: John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk
Abstract
Data races are an important class of concurrency errors where two threads erroneously access a shared memory loca-
tion without appropriate synchronization. This paper presents DataCollider, a lightweight and effective technique
for dynamically detecting data races in kernel modules. Unlike existing data-race detection techniques, DataCollider
is oblivious to the synchronization protocols (such as locking disciplines) the program uses to protect shared
memory accesses. This is particularly important for low-level kernel code that uses a myriad of complex architec-
ture/device specific synchronization mechanisms. To reduce the runtime overhead, DataCollider randomly samples
a small percentage of memory accesses as candidates for data-race detection. The key novelty of DataCollider is that
it uses breakpoint facilities already supported by many hardware architectures to achieve negligible runtime over-
heads. We have implemented DataCollider for the Windows 7 kernel and have found 25 confirmed erroneous data
races of which 12 have already been fixed.
OnCodeBreakpoint( pc ) {
// disassemble the instruction at pc
Figure 1: An example of data race. Even though the
(loc, size, isWrite) = disasm( pc );
threads appear to be modifying different variables in
the source code, the variables are bit fields mapping DetectConflicts(loc, size, isWrite);
to the same integer
// set another code break point
pc = RandomlyChosenMemoryAccess();
SetCodeBreakpoint( pc );
using the happens-before [8] ordering induced by the }
synchronization operations [4,5,6] or a lock-set based
reasoning [1] or a combination of the two [2,3,7] DetectConflicts( loc, size, isWrite) {
temp = read( loc, size );
There are several challenges in engineering a data-race if ( isWrite )
detection tool for the kernel based on previous ap- SetDataBreakpointRW( loc, size );
proaches. First, the kernel-mode code operates at a low- else
er concurrency abstraction than user-mode code, which SetDataBreakpointW( loc, size );
can rely on clean abstractions of threads and synchroni-
zations provided by the kernel. In the kernel, the same delay();
thread context can execute code from a user-mode pro-
ClearDataBreakpoint( loc, size );
cess, a device interrupt service routine, or a deferred
procedure call (DPC). In addition, it is an onerous task temp’ = read( loc, size );
to understand the semantics of complex synchronization
primitives in order to infer the happens-before relation if( temp != temp’ ||
or lock-sets. For instance, Windows supports more than data breakpoint fired )
a dozen locks with different semantics on how the lock ReportDataRace( );
holder synchronizes with hardware interrupts, the }
scheduler, and the DPCs. It is also common for kernel
modules to roll-out custom implementations of syn-
chronization primitives. Figure 2: The basics of the DataCollider algo-
rithm. Right before a read or write access to shared
Second, hardware-facing kernel modules need to syn- memory location, chosen at random, DataCollider
chronize with hardware devices that concurrently modi- monitors for any concurrent accesses that conflict
fy device state and memory. It is important to design a with the current access.
data-race detection tool that can find these otherwise
hard-to-find data races between the hardware and the
kernel.
Moreover, these tools rely on invasive instrumentation
Finally, existing dynamic data-race detectors add pro- techniques that are difficult to get right on low-level
hibitive run-time overheads. It is not uncommon for kernel code.
such tools to incur up to 200x slowdowns [9]. The
overhead is primarily due to the need to monitor and DataCollider uses a different approach to overcome
process all memory and synchronization operations at these challenges. The crux of the algorithm is shown in
run time. Significant engineering effort in building da- Figure 2. DataCollider samples a small number of
ta-race detectors goes in reducing the runtime overhead memory accesses at runtime by inserting code break-
and the associated memory and log management [9,3]. points at randomly chosen memory access instructions.
Replicating these efforts within the constraints of kernel When a code breakpoint fires, DataCollider detects data
programming is an arduous, if not impossible, task. races involving the sampled memory access for a small
time window. It simultaneously employs two strategies
to do so. First, DataCollider sets a data breakpoint to after-free, which indirectly result in inadvertent sharing
trap conflicting accesses by other threads. To detect of memory.
conflicting writes performed by hardware devices and
by processors accessing the memory location through a Another important reason for avoiding data races is to
different virtual address, DataCollider use a repeated- protect the program from the weak memory models of
read strategy. It reads the value once before and once the compiler and the hardware. Both the compiler and
after the delay. A change in value is an indication of a hardware can reorder instructions and change the be-
conflicting write, and hence a data race. havior of racy programs in complex and confusing
ways [10,11]. Even if a racy program works correctly
The DataCollider algorithm has two features that make for the current compiler and hardware configuration, it
it suitable for kernel data-race detection. First and might fail on future configurations that implement more
foremost, it is easy to implement. Barring some imple- aggressive memory-model relaxations.
mentation details (Section 3), the entire algorithm is
shown in Figure 2. In addition, it is entirely oblivious to While bugs caused by data races may of course be
the synchronization protocols used by the kernel and found using more conventional testing approaches such
the hardware, a welcome design point as DataCollider as stress testing, the latter often fails to provide actiona-
does not have to understand the complex semantics of ble information to the programmer. Clearly, a data race
kernel synchronization primitives. report including stack traces or data values (or even
better, including a core dump that is demonstrating the
When the DataCollider finds a data race through the actual data race) is easier to understand and fix than a
data-breakpoint strategy, it catches both threads “red- silent data corruption that leads to an obscure failure at
handed,” as they are about to execute conflicting ac- some later point during program execution.
cesses. This greatly simplifies the debugging of data
race reports from DataCollider as the tool can collect 2.1. Definition of Data Race
useful debugging information, such as the stack trace of
the racing threads along with their context information, There is no “gold standard” for defining data races;
without incurring this overhead on non-sampled or non- several researchers have used the term to mean different
racy accesses. things. For our definition, we consulted two respected
standards (Posix threads [12] and the drafts of the C++
Not all data races are erroneous. Such benign races in- and C memory model standards [11,10]) and general-
clude races that do not affect the program outcome, ized their definitions to account for the particularities of
such as updates to logging/debugging variables, and kernel code. Our definition of data race is:
races that affect the program outcome in a manner ac-
ceptable to the programmer, such as conflicting updates Two operations that access main memory are
to a low-fidelity counter. DataCollider uses a post- called conflicting if
processing phase that prunes and prioritizes the data- o the physical memory they access is not
race reports before showing them to the user. In our disjoint,
experience with DataCollider, we have observed that o at least one of them is a write, and
only around 10% percentage of data-race reports corre- o they are not both synchronization access-
spond to real errors, making the post-processing step es.
absolutely crucial for the usability of the tool. A program has a data race if it can be executed on
a multiprocessor in such a way that two conflicting
2. Background and Motivation memory accesses are performed simultaneously
(by processors or any other device).
Shared memory multiprocessors are specifically built to
allow concurrent access to shared data. So why do data This definition is a simplification of [11,10] insofar we
races represent a problem at all? replaced the tricky notion of “not ordered before” with
the unambiguous “performed simultaneously” (which
The key motivation for data race detection is the empir- refers to real time).
ic fact that programmers most often use synchroniza-
tion to restrict accesses to shared memory. Data races An important part of our definition is the distinction
can thus be an indication of incorrect or insufficient between synchronization and data accesses. Clearly,
synchronization in the program. In addition, data races some memory accesses participate in perfectly desirable
can also reveal programming mistakes not directly re- races: for example, a mutex implementation may per-
lated to concurrency, such as buffer overruns or use- form a “release” by storing the value 0 in a shared loca-
tion, while another thread is performing an acquire and 2.3.1. Static vs. Dynamic
reads the same memory location. However, this is not a
data race because we categorize both of these accesses Data race detection can be broadly categorized into
as synchronization accesses. Synchronization accesses static race detection [13,14,15,16,17], which typically
either involve hardware synchronization primitives analyzes source or byte code without directly executing
such as interlocked instructions or use volatile or atom- the program, and dynamic race detection [1,2,3,4,5,6,7],
ic annotations supported by the compiler. which instruments the program and monitors its execu-
tion online or offline.
Note that our definition is general enough to apply to
code running in the kernel, which poses some unique Static race detectors have been successfully applied to
problems not found in user-mode code. For example, in large code bases [13,14]. However, as they rely on ap-
some cases data races can be avoided by turning off proximate information, such as pointer aliasing, they
interrupts; also, processes can exhibit a data race when are prone to excessive false warnings. Some tools, es-
accessing different virtual addresses that map to the pecially those targeting large code bases, approach this
same physical address. We talk more about these topics issue by filtering the reported warnings using heuristics
in Section 2.3.4. [13]. Such heuristics can successfully reduce the false
warnings to a tolerable level, but may unfortunately
2.2. Precision of Detection also eliminate correct warnings and lead to missed rac-
es. Other tools, targeted towards highly motivated users
Clearly, we would like data race detection tools to re- that wish to interactively prove absence of data races,
port as many data races as possible without inundating report all potential races to the user and rely on user-
the user with false error reports. We use the following supplied annotations that indicate synchronization dis-
terminology to discuss the precision and completeness ciplines [16,17].
of data race detectors. A missed race is a data race that
the tool does not warn about. A benign data race is a Dynamic data race detectors are less prone to false
data race that does not adversely affect the behavior of warnings than static techniques because they monitor
the program. Common examples of benign data races an actual execution of the program. However, they may
include threads racing on updates to logging or statistics miss races because successful detection might require
variables and threads concurrently updating a shared an error-inducing input and/or an appropriate thread
counter where the occasional incorrect update of the schedule. Also, many dynamic detectors employ several
counter does not affect the outcome of the program. On heuristics and approximations that can lead to false
the other hand, a false data race is an error report that alarms.
does not correspond to a data race in the program. Stat-
ic data-race detection techniques commonly produce Dynamic data race detectors can be classified into cate-
false data races due to their inherent inability to precise- gories based on whether they model a happens-before
ly reason about program paths, aliased heap objects, relation [6,5,7] (see Section 2.3.2), lock sets [1] (see
and function pointers. Dynamic data-race detectors can Section 2.3.3), or both [2,18].
report false data races if they do not identify or do not
understand the semantics of all the synchronizations 2.3.2. Happens-Before Tracking
used by the program.
Dynamic data race detectors do not just detect data rac-
2.3. Related Work es that actually took place (in the sense that the conflict-
ing accesses were truly simultaneous during the execu-
Researchers have proposed and built a plethora of race tion), but look for evidence that such a schedule would
detection tools. We now discuss the major approaches have been possible for a slightly different timing.
and implementation techniques appearing in related Tracking a happens-before relation on program events
work. We describe both happens-before-based and [8] is one way to infer the existence of a racy schedule.
lock-set-based tracking in some detail (Sections 2.3.2 This transitive relation is constructed by recording both
and 2.3.3), before explaining why neither one is very the ordering of events within a thread and the ordering
practical for data race detection in the kernel (Section effects of synchronization operations across threads.
2.3.4).
Once we can properly track the happens-before relation,
race detection is straightforward: For any two conflict-
ing accesses A and B, we simply check whether A hap-
pens-before B, or B happens-before A, or neither. If
neither, we know there exists a schedule where A and B For example, interrupts and interrupt handlers break the
are simultaneous. If properly tracked, happens-before thread abstraction, as the handler code may execute in a
does not lead to any false alarms. However, precise thread context without being part of that thread in a
tracking can be difficult to achieve in practice, as dis- logical sense. Similar problems arise when a thread
cussed in Section 2.3.4. calls into the kernel scheduler. The code executing in
the scheduler is not logically part of that same thread.
2.3.3. Lock Sets
Another example illustrating the difficulty of modeling
When detecting races in programs that follow a strict synchronization inside the kernel are DMA accesses.
and consistent locking discipline, using a lock-set ap- Such accesses are not executing inside a thread (in fact,
proach can provide some benefits. The basic idea is to they are not even executing on a processor). Clearly,
examine the lock set of each data access (that is, the set traditional monitoring techniques have a problem be-
of locks held during the access) and then to take for cause they cannot “instrument” the DMA access.
each memory location the intersection of the lock sets
of all accesses to it. If that intersection is empty, the Similar case holds for interrupt processing. For exam-
variable is not consistently protected by any one lock ple, code may first write some data and then raise an
and a warning is issued. interrupt, and then the same data is read by an interrupt
handler. Lock sets would report a false alarm because
The main limitation of the lock set approach is that it the data is not locked. But even happens-before tech-
does not check for true data races but for violations of a niques are problematic, because they would need to
specific locking discipline. Unfortunately, many appli- precisely track the causality between the instruction that
cations (and in particular kernel code) use locking dis- set the interrupt and the interrupt handler.
ciplines that are complex and use synchronization other
than locks. For these reasons, we decided to employ a design that
entirely avoids modeling the happens-before ordering
Whenever a program departs from a simple locking or lock-sets. As our results show, somewhat surprising-
scheme in any of the above ways, lock-set-based race ly, neither one is required to build an effective data race
detectors will be forced to either issue false warnings, detector.
or to use heuristics to suppress these warnings. The
latter approach is common, especially in the form of 2.3.5. Sampling to Reduce Overhead
state machines that track the “sharing status” of a varia-
ble [1,3]. Such heuristics are necessarily imperfect To detect races, dynamic data race detectors need to
compromises, however (they always fail to suppress monitor the synchronizations and memory accesses
some false warnings and always suppress some correct performed at runtime. This is typically done by instru-
warnings), and it is not clear how to tune them to be menting the code and inserting extra monitoring code
useful for a wide range of applications. for each data access. As the monitoring code executes
at every memory access, the overhead can be quite sub-
2.3.4. Problems with Tracking Synchroni- stantial.
zations
One way to ameliorate this issue is to exclude some
Both lock-set and happens-before tracking require a data accesses from processing. Prior work has identi-
thorough understanding of the synchronization seman- fied several promising strategies: adaptive sampling
tics, lest they produce false alarms or miss races. There that backs off hot locations [5] (the idea is that for such
are two fundamental difficulties we encountered when locations the monitoring can be less frequent and still
trying to apply these techniques in the kernel: detect races), or perform the full monitoring only for a
fixed fraction of the time [4] (the idea is that the proba-
bility of catching a race is roughly proportional to this
Abstractions that we take for granted in user mode
fraction multiplied by the number of times the race re-
(such as threads) are no longer clearly defined in
peats). But these techniques still suffer from the cost of
kernel mode.
sampling, performed at every memory access. DataCol-
The synchronization vocabulary of kernel code is lider avoids this problem by using hardware breakpoint
much richer and may include complicated se- mechanisms.
quences and ordering mechanisms provided by the
hardware.
3. DataCollider Implementation quires the debugging symbols of the program binary to
perform this disassembly. This requirement can be re-
This section describes the implementation of the laxed by using sophisticated disassemblers [19] in the
DataCollider algorithm for the Windows kernel on the future.
x86 architecture. The implementation heavily uses the
code and data breakpoint mechanisms available on x86. DataCollider performs a simple static analysis to identi-
The techniques described in this paper can be extended fy instructions that are guaranteed to only touch thread-
to other architectures and to user-mode code. But we local stack locations and removes them from the sam-
have not pursued this direction in this paper. pling set. Similarly, DataCollider removes synchroniz-
ing instructions from the sampling set by removing
Figure 2 describes the basics of the DataCollider algo- instructions that accesses memory locations tagged as
rithm. DataCollider uses the sampling algorithm, de- “volatile” or those that use hardware synchronization
scribed in Section 3.1, to process a small percentage of primitives, such as interlocked. This prevents DataCol-
memory accesses for data-race detection. For each of lider from reporting races on synchronization variables.
the sampled memory accesses, DataCollider uses a con- However, DataCollider can still detect a data race be-
flict detection mechanism, described in Section 3.2, to tween a synchronization access and a regular data ac-
find data races involving the sampled access. After de- cess, if the latter is in the sampling set.
tecting data races, DataCollider uses several heuristics,
described in Section 3.3, to prune benign data races. DataCollider samples program locations from the sam-
pling set by inserting code breakpoints. The initial
3.1. The Sampling Algorithm breakpoints are set at a small number of program loca-
tions chosen uniformly randomly from the sampling set.
If and when a code breakpoint fires, DataCollider per-
There are several challenges in designing a good sam-
forms conflict detection for the memory access at that
pling algorithm for data-race detection. First, data races
breakpoint. Then, DataCollider choses another program
involve two memory accesses both of which need to be
location uniformly randomly from the sampling set and
sampled to detect the race. If memory accesses are
sets a breakpoint at that location.
sampled independently, then the probability of finding
the data race is a product of the individual sampling
This algorithm uniformly samples all program locations
probabilities. DataCollider avoids this multiplicative
in the sampling set irrespective of the frequency with
effect by sampling the first access and using a data
which the program executes these locations. This is
breakpoint to trap the second access. This allows
because the choice of inserting a code breakpoint is
DataCollider to be effective at low sampling rates.
performed uniformly at random for all locations in the
sampling set. Over a period of time, the breakpoints
Second, data races are rare events – most executed in-
will tend to reside at rarely executed program locations,
structions do not result in a data race. The sampling
increasing the likelihood that those locations are sam-
algorithm should weed out the small percentage of rac-
pled the next time they execute.
ing accesses from the majority of non-racing accesses.
The key intuition behind the sampling algorithm is that
If DataCollider has information on which program loca-
if a program location is buggy and fails to use the right
tions are likely to participate in a race, either through
synchronization when accessing shared data, then every
user annotations or through prior analysis [20] then the
dynamic execution of that buggy code is likely to par-
tool can prioritize those locations by biasing their selec-
ticipate in a data race. Accordingly, DataCollider per-
tion from the sampling set.
forms static sampling of program locations rather than
dynamic sampling of executed instructions. A static
sampler provides equal preference to rarely execution 3.1.2. Controlling the Sampling Rate
instructions (which are likely to have bugs hidden in
them) and frequently executed instructions. While the program cannot affect the sampling distribu-
tion over program locations, the sampling rate is inti-
3.1.1. Static Sampling Using Code Break- mately tied to how frequently the program executes
locations with a code breakpoint. In the worst case, if
points all of the breakpoints are set on dead code, DataCollider
will stop performing data-race detection altogether. To
The static sampling algorithm works as follows. Given avoid this and to better control the sampling rate,
a program binary, DataCollider disassembles the binary DataCollider periodically checks the number of break-
to generate a sampling set consisting of all program points fired every second, and adjusts the number of
locations that access memory. The tool currently re-
breakpoints set in the program based on whether the When a data breakpoint fires, DataCollider has success-
experienced sampling rate is higher or lower than the fully detected a race. More importantly, it has caught
target rate. the racing threads “red handed” – the two threads are at
the point of executing conflicting accesses to the same
3.2. Conflict-Detection memory location.
As described in the previous section, DataCollider picks One particular shortcoming of data breakpoint support
a small percentage of memory accesses as likely candi- in x86 that we had to work around was the fact that,
dates for data-race detection. For these sampled access- when paging is enabled, x86 performs the breakpoint
es, DataCollider pauses the current thread waiting to comparisons based on the virtual address and has no
see if another thread makes a conflicting access to the mechanism to modify this behavior. Two concurrent
same memory location. It uses two strategies: data accesses to the same virtual addresses but different
breakpoints and repeated-reads. DataCollider uses these physical addresses do not race. In Windows, most of
two strategies simultaneously as each complements the the kernel resides in the same address space with two
weaknesses of the other. exceptions.
3.2.1. Detecting Conflicts with Data Break- Kernel threads accessing the user address space cannot
conflict if the threads are executing in the context of
points different processes. If a sampled access lies in the user
address space, DataCollider does not use breakpoints
Modern hardware architectures provide a facility to trap and defaults to the repeated-read strategy.
when a processor reads or writes a particular memory
location. This is crucial for efficient support for data Similarly, a range of kernel-address space, called ses-
breakpoints in debuggers. The x86 hardware supports sion memory, is mapped to different address spaces
four data breakpoint registers. DataCollider uses them based on the session the process belongs to. When a
to effectively monitor possible conflicting accesses to sampled access lies in the session memory space,
the currently sampled access. DataCollider sets a data breakpoint but checks if the
conflicting accesses belong to the same session before
When the current access is a write, DataCollider in- reporting the conflict to the user.
structs the processor to trap on a read or write to the
memory location. If the current access is a read, Finally, a data breakpoint will miss conflicts if a pro-
DataCollider instructs the processor to trap only on a cessor uses a different virtual address mapped to the
write, as concurrent reads to the same location do not same physical address as the sampled access. Similarly,
conflict. If no conflicting accesses are detected, data breakpoints cannot detect conflicts arising from
DataCollider resumes the execution of the current hardware devices directly accessing memory. The re-
thread after clearing the data breakpoint registers. peated-read strategy discussed below covers all these
cases.
Each processor has a separate data breakpoint register.
DataCollider uses an inter-processor interrupt to update
the break points on all processors atomically. This also
3.2.2. Detecting Conflicts with Repeated
synchronizes multiple threads attempting to sample Reads
different memory locations concurrently.
The repeated-read strategy relies on a simple insight: if
An x86 instruction can access variable sized memory. a conflicting write changes the value of a memory loca-
For 8, 16, or 32-bit accesses, DataCollider sets a break- tion, DataCollider can detect this by repeatedly reading
point of the appropriate size. The x86 processor traps if the memory location checking for value changes. An
another instruction accesses a memory location that obvious disadvantage of this approach is that it cannot
overlaps with a given breakpoint. Luckily, this is pre- detect conflicting reads. Similarly, it cannot detect mul-
cisely the semantics required for data-race detection. tiple conflicting writes the last of which writes the same
For accesses that span more than 32 bits, DataCollider value as the initial value. Despite these shortcomings,
uses more than one breakpoint up to the maximum we have found this strategy to be very useful in prac-
available of four. If DataCollider runs out of breakpoint tice. This is the first strategy we implemented (as it is
registers, it simply resorts to the repeated-read strategy easier to implement than using data breakpoints) and
discussed below. we were able to find several kernel bugs with this ap-
proach.
However, repeated-reads strategy catches only one of the number of references to a shared object, then the
the two threads “red-handed.” This makes it harder to data race could lead to a memory leak or a premature
debug data races, as one does not know which thread or free of the object.
device was responsible for the conflicting write. This
was our prime motivation for using data breakpoints. During the initial runs of the tool, we found that around
90% of the data-race reports are benign. Inspecting the-
3.2.3. Inserting Delays se we identified the following patterns that can be iden-
tified through simple static and/or dynamic analysis and
For a sampled memory access, DataCollider attempts to incorporated them in a post-process pruning phase.
detect a conflicting access to the same memory location
by delaying the thread for a short amount of time. For Statistics Counters: Around half of the benign data
DataCollider to be successful, this delay has to be long races involved conflicting updates to counters that
enough for the conflicting access to occur. On the other maintain various statistics about the program behavior
hand, delaying the thread for too long can be dangerous [21]. These counters are not necessarily write-only and
especially if the thread holds some resource crucial for could affect the control flow of the program. A com-
the proper functioning of the entire system. In general, mon scenario is to use these counter value to perform
it is impossible to predict how long to insert the delay. periodic computation such as flushing a log buffer. If
After experimenting with many values, we chose the DataCollider reports several data races involving an
following delay algorithm. increment instruction and the value of the memory loca-
tion consistently increases across these reports, then the
Depending on the IRQL (Interrupt Request Level) of pruning phase tags these data races as statistics-counter
the executing thread, DataCollider delays the thread for races. Checking for an increase in memory values helps
a preset maximum amount of time. At IRQLs higher the pruning phase in distinguishing these statistics
than the DISPATCH level (the level at which the kernel counters from reference counters that are usually both
scheduler operates), DataCollider does not insert any incremented and decremented.
delay. We considered inserting a small window of delay
at this level to identify possible data races between in- Safe Flag Updates: The next prominent class of benign
terrupt service routines. But we did not expect that races involves a thread reading a flag bit in a memory
DataCollider would be effective at short delays. location while another thread updates a different bit in
the same memory location. By analyzing few memory
Threads running at the DISPATCH level cannot yield instructions before and after the memory access, the
the processor to another thread. As such, the delay is pruning phase identifies read-write conflicts that in-
simply a busy loop. We currently delay threads at this volve different bits. On the other hand, write-write con-
level for a random amount of time less than 1 ms. For flicts can result in lost updates (as shown in Figure 1)
lower IRQLs, DataCollider delays the thread for a max- and are not tagged as benign.
imum of 15 ms by spinning in a loop that yields the
current time quantum. During this loop, the thread re- Special Variables: Some of the data races reported by
peatedly checks to see if other threads are making pro- DataCollider involve special variables in the kernel
gress by inspecting the rate at which breakpoints fire. If where races are expected. For instance, Windows main-
progress is not detected, the waiting thread prematurely tains the current time in a variable, which is read by
stops its wait. many threads while being updated by the timer inter-
rupt. The pruning phase has a database of such varia-
3.3. Dealing with Benign Data Races bles and prunes races involving these variables.
MemoryBarrier();
// ...
}
AddToCache();
assert( x & FLAG_CACHED );