I hope you enjoy it.
Our world is built on modular, composable software. We routinely expect that we don’t need to know about the internal implementation details of a library or plug-in in order to be able to use it correctly.
The problem is that locks, and most other kinds of synchronization we use today, are not modular or composable. That is, given two separately authored modules each of which is provably correct but uses a lock or similar synchronization internally, we generally can’t combine them and know that the result is still correct and will not deadlock. There are only two ways to combine such a pair of lock-using modules safely:
- Option 1 (generally impossible for code you don’t control): Each module must know about the complete internal implementation of any function it calls in the other module.
- Option 2: Each module must be careful not to call into the other module while the calling code is inside a critical section (e.g., while holding a lock).
Let’s examine why Option 2 is generally the only viable choice, and what consequences it has for how we write concurrent code. For convenience and familiarity, I’m going to cast the problem in terms of locks, but the general case arises whenever:
- The caller is inside one critical section.
- The callee directly or indirectly tries to enter another critical section, or performs another blocking call.
- Some other thread could try to enter the two critical sections in the opposite order.
Quick Recap: Deadlock
A deadlock (aka deadly embrace) can happen anytime two different threads can try to acquire the same two locks (or, more generally, acquire exclusive use of the resources protected by the same two synchronization objects) in opposite orders. Therefore anytime you write code where a thread holds one lock L1 and tries to acquire another lock L2, that code is liable to be one arm of a potential deadly embrace—unless you can eliminate the possibility that some other thread might try to acquire the locks in the other order.
We can use techniques such as lock hierarchies to guarantee that locks are taken in order. Unfortunately, those techniques do not compose either. For example, you can use a lock hierarchy and guarantee that the code you control follows the discipline, but you can’t guarantee that other people’s code will know anything about your lock levels, much less follow them correctly.
Example: Two Modules, One Lock Each
One fine day, you decide to write a new web browser that allows users to write plug-ins to customize the behavior or rendering of individual page elements.
Consider the following possible code, where we simply protect all the data structures representing elements on a given page using a single mutex mutPage:[3]
// Example 1: Thread 1 of a potential deadly embrace
//
class CoreBrowser {
… other methods …
private void RenderElements() {
mutPage.lock(); // acquire exclusion on the page elements
try {
for( each PageElement e on the page ) {
DoRender( e ); // do our own default processing
plugin.OnRender( e ); // let the plug-in have a crack at it
}
} finally {
mutPage.unlock(); // and then release it
}
}
}
Do you see the potential for deadlock? The trouble is that, if inside the call to plugin.OnRender the plug-in might acquire some internal lock of its own, then this could be one arm of a potential deadly embrace. For example, consider this plug-in implementation that just does some basic instrumentation of how many times certain actions have been performed, and it protects its internal data with a single mutex mutMyData:
class MyPlugin {
… other methods …
public void OnRender( PageElement e ) {
mutMyData.lock(); // acquire exclusion on some internal shared data
try {
renderCount[e]++; // update #times e has been rendered
} finally {
mutMyData.unlock(); // and then release it
}
}
}
Thread 1 can therefore acquire mutPage and mutMyData in that order. Thread 1 is potential deadlock-bait, but the trouble will only actually manifest if one fine day some other Thread 2 that could run concurrently with the above performs something like the following:
// Example 2: Thread 2 of a potential deadly embrace
//
class MyPlugin {
… other methods …
public void RefreshDisplay( PageElement e ) {
mutMyData.lock(); // acquire exclusion on some internal shared data
try { // display stuff in a debug window
for( each element e we’ve counted ) {
listRenderedCount.Add( e.Name(), renderCount[e] );
}
textHiddenCount = browser.CountHiddenElements();
} finally {
mutMyData.unlock(); // and then release it
}
}
}
Notice how the plugin calls code unknown to it, namely browser.CountHiddenElements? You can probably see the trouble coming on like a steamroller:
class CoreBrowser {
… other methods …
public int CountHiddenElements() {
mutPage.lock(); // acquire exclusion on the page elements
try {
int count = 0;
for( each PageElement e on the page ) {
if( e.Hidden() ) count++;
}
return count;
} finally {
mutPage.unlock(); // and then release it
}
}
}
Threads 1 and 2 can therefore acquire mutPage and mutMyData in the opposite order, and so this is a deadlock waiting to happen if Threads 1 and 2 can ever run concurrently. For added fun, note that each mutex is purely an internal implementation detail of its module that is never exposed in the interface; neither module knows anything about the internal lock being used within the other. (Nor, in a better programming world than the one we now inhabit, should it have to.)
Note: Both the CoreBrowser and MyPlugin violate the rule. For CoreBrowser, see below for workarounds. For the plug-in, it should easily move the call to browser.CountHiddenElements (which is code external to the plug-in) out before or after the critical section – it does need a lock for some of the work in the try block, but it doesn’t need the lock around all of that work, especially not the call out to unknown code.
Example: Two Modules, but Only One Has Locks
This kind of problem can arise even if both locks are in the same module, but control flow passes through another module so that you don’t know what locks are taken.
Consider the following modification, where the browser protects each page element using a separate mutex, which can be desirable to allow different parts of the page to be rendered concurrently:
// Example 3: Thread 1 of an alternative potential deadly embrace
//
class CoreBrowser {
… other methods …
private void RenderElements() {
for( each PageElement e on the page ) {
e.mut.lock(); // acquire exclusion on this page element
try {
DoRender( e ); // do our own default processing
plugin.OnRender( e ); // let the plug-in have a crack at it
} finally {
e.mut.unlock(); // and then release it
}
}
}
}
And consider a plug-in that does no locking of its own at all:
class MyPlugin {
… other methods …
public void OnRender( PageElement e ) {
GuiCoord cPrev = browser.GetElemPosition( e.Previous() );
GuiCoord cNext = browser.GetElemPosition( e.Next() );
Use( e, cPrev, cNext ); // do something with the coordinates
}
}
But which calls back into:
class CoreBrowser {
… other methods …
public GuiCoord GetElemPosition( PageElement e2 ) {
e2.mut.lock(); // acquire exclusion on this page element
try {
return FigureOutPositionFor( e2 );
} finally {
e2.mut.unlock(); // and then release it
}
}
}
The order of mutex acquisition is:
for each element e
acquire e.mut
acquire e.Previous().mut
release e.Previous().mut
acquire e.Next().mut
release e.Next().mut
release e.mut
Perhaps the most obvious issue is that any pair of locks on adjacent elements can be taken in either order by Thread 1; so this cannot possibly be part of a correct lock hierarchy discipline.
Because of the interference of the plug-in code, which does not even have any locks of its own, this code will have a latent deadlock if any other concurrently running thread (including perhaps another instance of Thread 1 itself) can take any two adjacent elements’ locks in any order. The deadlock-proneness is inherent in the design, which fails to guarantee a rigorous ordering of locks. In this respect, the original Example 1 was better, even though its locking was coarse-grained and less friendly to concurrent rendering of different page elements.
Consequences: What Is “Unknown Code”?
It’s one thing to say “avoid calling unknown code while holding a lock” or while inside a similar kind of critical section. It’s another to do it, because there are so many ways to get into “someone else’s code.” Let’s consider a few.
While inside a critical section, including while holding a lock:
- Avoid calling library functions. A library function is the classic case of “somebody else’s code.” Unless the library function is documented to not take any locks, deadlock problems can arise.
- Avoid calling plug-ins. Clearly a plug-in is “somebody else’s code.”
- Avoid calling other callbacks, function pointers, functors, delegates, etc. C function pointers, C++ functors, C# delegates, and the like can also fairly obviously lead to “somebody else’s code.” Sometimes you know that a function pointer, functor, or delegate will always lead to your own code, and calling it is safe; but if you don’t know that for certain, avoid calling it from inside a critical section.
- Avoid calling virtual methods. This may be less obvious and quite surprising, even Draconian; after all, virtual functions are common and pervasive. But every virtual function is an extensibility point that can lead to executing code that doesn’t even exist today. Unless you control all derived classes (for example, the virtual function is internal to your module and not exposed for others to derive from), or you can somehow enforce or document that overrides must not take locks, deadlock problems can arise if it is called while inside a critical section.
- Avoid calling generic methods, or methods on a generic type. When writing a C++ template or a Java or .NET generic, we have yet another way to parameterize and intentionally let “other people’s code” be plugged into our own. Remember that anything you call on a generic type or method can be provided by anyone, and unless you know and control all the types with which your template or generic can be instantiated (for example, your template or generic is internal to your module and so cannot be externally instantiated by someone else), avoid calling something generic from within a critical section.
Some of these restrictions may be obvious to you; others may be surprising at first.
Avoidance: Non-Critical Calls
So you want to remove a call to unknown code from a critical section. But how? What can you do? Four options are: (a) move the call out of the critical section if you didn’t need the exclusion anyway; (b) make copies of data inside the critical section and later pass the copies; (c) reduce the granularity or power of the critical section being held at the point of the call; or (d) instruct the callee sternly and hope for the best. Let’s look at each option in turn.
We can apply the first approach directly to Example 2. There is no reason the plugin needs to call browser.CountHiddenElements() while holding its internal lock. That call should simply be moved to before or after the critical section.
The second approach is to avoid the need for taking the lock by avoiding the use of mutable shared data. As shown in Items todo and todo, the two general approaches are: 1. Avoid shared data by passing copies of data, which solves the correctness problem by trading off space and some performance; variants of this approach include passing a subset of the data, and passing the copies via messages to run the callee asynchronously. 2. Avoid mutable data by using immutable snapshots of the state. todo refine this
To improve Example 1, for instance, it might be appropriate to change the RenderElements method to hold the lock only long enough to take copies of the necessary shared information in a local container, then doing processing outside the lock, passing the copied elements. However, this could be inappropriate if the data is very expensive to copy, or the callee needs to work on the real data. Alternatively, perhaps the callee doesn’t really need all the information it gets from being given direct access to the protected object, and it would be both sufficient and efficient to pass copies of just what parts of the data the callee does need.
The third option is to reduce the power or granularity of the critical section, which implicitly trades off ease of use because making your synchronization finer-tuned and/or finer-grained also makes it harder to code correctly. One example of reducing the power of the critical section is to replace a mutex with a reader-writer mutex so that multiple concurrent readers are allowed; if the only deadlocks could arise among threads that are only performing reads of the protected resources, then this can be a valid solution by enabling them to take a read-only lock instead of a read-write lock. And an example of making the critical section finer-grained is to replace a single mutex protecting a large data structure with mutexes protecting parts of the structure; if the only deadlocks could arise among threads that use different parts of the structure, then this can be a valid solution (note that Example 1 is not such a case).
The fourth option is to tell the callee not to block, which trades off enforceability. In particular, if you do enjoy the power to impose requirements on the callee (as you do with plug-ins to your software, but not with simple calls into existing third-party libraries), then you can require them to not take locks or otherwise perform blocking actions. Alas, these requirements are typically going to be limited to documentation, and are typically not enforceable automatically, but it can often be sufficient in practice to rely on programmer discipline to follow the rule. Tell the callee what (not) to do, and hope he follows the yellow brick road.
Summary
Be aware of the many opportunities modern languages give us to call “other people’s code,” and eliminate external opportunities for deadlock by not calling unknown code from within a critical section. If you additionally eliminate internal opportunities for deadlock by applying a lock hierarchy discipline within the code you control, your use of locks will be highly likely to be correct… and then we can move on to making it efficient, which we’ll consider in future articles.
Notes
[1] H. Sutter. “The Trouble With Locks” (C/C++ Users Journal, 23(3), March 2005). Available at http://www.gotw.ca/publications/mill36.htm.
[2] Note that calling unknown code from within a critical section is a problem even in single-threaded code, because while our code is inside a critical section we typically have broken invariants. For example, an object may be partway through the process of changing its values from one valid state to another, or we may be partway through a debit-credit transaction where one account has had money removed but the other account has not yet received it. If we call into unknown code while in such a situation, there’s the (usually remote) possibility that the code we call may in turn call other code, which in turn calls other code, which in turn calls into the original module and sees the broken invariant. Techniques like layering can minimize this problem by ensuring code at lower levels can’t “call back up” into higher-level code. The problem is greatly magnified, however, in the presence of concurrency, and we need the same kinds of “layering” tools, only now applied to locks and other ways to express critical sections, to ensure similar problems don’t occur. Unfortunately, those tools only work for code you control, which doesn’t help with modularity.
Finally, here are links to previous Effective Concurrency columns (based on the dates they hit the web, not the magazine print issue dates):