GotW #96: Oversharing

Herb Sutter C++ 2014-01-14 1 Minute

Following on from #95, let’s consider reasons and methods to avoid mutable sharing in the first place…

Problem

Consider the following code from GotW #95’s solution, where some_obj is a shared variable visible to multiple threads which then synchronize access to it.

// thread 1
{
    lock_guard hold(mut_some_obj);     // acquire lock
    code_that_reads_from( some_obj );  // passes some_obj by const &
}

// thread 2
{
    lock_guard hold(mut_some_obj);     // acquire lock
    code_that_modifies( some_obj );    // passes some_obj by non-const &
}

JG Questions

1. Why do mutable shared variables like some_obj make your code:

(a) more complex?

(b) more brittle?

Guru Questions

2. Give an example of how the code that uses a mutable shared variable like some_obj can be changed so that the variable is:

(a) not shared.

(b) not mutable.

3. Let’s say we’re in a situation where we can’t apply the techniques from the answers to #2, so that the variable itself must remain shared and apparently mutable. Is there any way that the internal implementation of the variable can make the variable be physically not shared and/or not mutable, so that the calling code can treat it as a logically shared-and-mutable object yet not need to perform external synchronization? If so, explain. If not, why not?

Published by Herb Sutter

Herb Sutter is an author and speaker, a technical fellow at Citadel Securities, and serves as chair of the ISO C++ standards committee and chair of the Standard C++ Foundation. View all posts by Herb Sutter

Published 2014-01-14

16 thoughts on “GotW #96: Oversharing”

xiongzou says:

2014-02-13 at 1:18 am

if we assume that Herb only wants thread 1 reads and thread 2 writes. we can make the variable itself an active object that when write, it will duplicate its latest state and enqueue it into a threading safe queue, when read, dequeue from the queue and process the copy of the latest state.

To have a better simulation of the synchronous read/write locks between two threads, make sure reading thread only process latest state if there are multiple states in the queue. Maybe a stack with sequence number in each duplicated states can help to speed up the performance.
Michael Golub says:

2014-01-20 at 8:27 am

1. a. Increased ratio between boilerplate code and actual logic makes code more complex.
1. b. If boilerplate code is forgotten problem is not seen immediately. Such buggy code will compile and pass unit tests.
1. c. With example as-is we can not have multiple readers simultaneously. If shared object is overly large (bad design anyway) with independently modifiable members we have sub-optimal lock granularity.
2. It requires a change in algorithm that uses shared object. If algorithm relies on sharing and modification to work data should be physically shared and modifiable. Or algorithm that works on once shared data must be changed.
3. Make shared object internally synchronized. Mark methods that modify shared data as const. Thanks to knowledge of internal structure may be more efficient. Can be hard to implement in practice.
duhonedd says:

2014-01-17 at 10:06 am
1) The primary problem is that it is too easy for a user of some_obj to forget, or not even realize that synchronization is required. Especially as a program gets very large. some_obj will be relying on a comment that programmers won’t read to inform them that synchronization is needed. On the plus side, by not embedding synchronization into some_obj’s class, you wont have to pay for it if you don’t need it on some_obj2.

2, 3) The main goal is to make sure users can’t accidently use a shared object without synchronization, and hopefully to at the same time not require it if another instance is used strictly internally somewhere that is already internally synchronized. It would also be nice if the maintainer of the class of some_obj didn’t have to think about synchronization, keep concerns separated. For this I like the handle body idiom. No classes provide synchronization directly, instead a family of thread safe handles can be used to wrap an instance in, when needed. We can also make use of this handle to cleanly synchronize when we need to make multiple calls to some_obj atomicly. Code is below.
```
template&lttypename T&gt
class Shared
{
public:
    void Lock(std::function&ltvoid (T& _instance)&gt _work)
    {
        std::lock_guard&ltstd::mutex7gt lock(m_mutex);
        _work(m_instance);
    }
private:
    std::mutex m_mutex;
    T m_instance;
};

Shared&ltFoo&gt some_obj;
some_obj.Lock([](auto _instance){
    _instance.foo1();
    _instance.foo2();
});
```
Bartosz Milewski says:

2014-01-16 at 2:17 pm

Question 3. If the variable is not physically shared, then threads won’t be able to see each other’s writes. But if that’s okay, then you can give each thread its own private copy to read and modify and then, at a later time, collect the results and merge them, a la map/reduce. Is that what you had in mind?
Bartosz Milewski says:

2014-01-16 at 12:50 pm

Question 2. Except for trivial cases, data has to be passed between threads. There are many ways to do it and some of them don’t require additional synchronization. I’m saying “additional” because there are two synchronization points in the life of every thread: its creation and its destruction.

So you can create data before you pass it to the thread constructor and, if you can guarantee that you won’t access it afterwards, the thread is free to use it without synchronization. Similarly, a thread may create and manipulate data before it returns it (e.g., through a future) to the caller. The best guarantee of non-access is provided by move semantics. When you move your data to the constructor of the thread, or move it back from the thread, you are guaranteed that there is only one thread at a time that can access it.

Another way to avoid synchronization is to use immutable data. In that case many threads may access the same data concurrently without synchronization. You have to make sure though that the data was created before the threads were started (or there is some other “happens before” edge between creation and use), because data creation almost always involves mutation.
sbohmann says:

2014-01-15 at 4:35 pm

@Bartosz true :) but what is in my opinion often underrated is that even in the absence of any kind of threads, shared mutable values can break a human mind’s limits of understanding the consequences of possible interference quite easily…

Coroutines are just a classic example, but they are also not a precondition for creating mind-boggling complexity.

Even in the simplest of purely procedural single-threaded programs without event loops driving coroutines etc. can you do that by modifying global variables.

And shared mutable state has a lot in common with non-const global variables.

The only basic difference is that you know for sure that such globals are visible and writable from everywhere, whereas in the case of shared mutable state, in the general case you must assume that they are visible and writable from anywhere, which isn’t much of a difference.

But luckily, shared mutable state at least allows you to encapsulate it in a business logic that defines clear and concise rules for modification, thus artificially limiting your system’s complexity, which most of developing non-trivial systems is about after all :)

Btw., @Herb, thanks for these wonderful questions, and I’m really looking forward to your answer! :)
Bartosz Milewski says:

2014-01-15 at 10:34 am

These are good questions with some obvious and non-obvious answers. Obviously, having to worry about one more thing adds complexity and fragility to your code. But what is it specifically about mutable shared variables that makes them hard to manage? Two things come to mind: non-locality and non-determinism. Just like gotos, mutable variables introduce dependencies between remote parts of code. When you’re reading a shared variable you are seeing the result of a write that happened in any of the threads that have access to it. And the threads can interleave differently every time you run your program. That makes both reasoning and testing of concurrent programs orders of magnitude harder. When we write code, we usually have one or two scenarios of execution in mind; with concurrent code that number is just not realistic. I’ve seen people arguing that a certain combination of event cannot happen only because they eliminated the few scenarios of execution that came to their minds. But there were other (often rare) disastrous scenarios that didn’t cross their mind. In concurrent programming the scenario generator in our minds just cannot keep up.
cartec69 says:

2014-01-14 at 2:32 pm

3. If the shared variable can be completely internally synchronized in a lock-free manner, then it can neither contribute to data races nor introduce potential deadlock.
Sebastian Bohmann says:

2014-01-14 at 1:54 pm

ad 3. in case a mutable datum is required that may be changed from different parts of a system, be it within the same thread or across multiple threads, opposed to just safely sharing values, which is easily done by choosing an appropriate technique as described above, it really gets more interesting :)

Such a capability demands a core requirement of your project for becoming a necessity.

And implementing it demands thinking about to what end it shall be implemented first. Software transactional memory is oft cited as a technological approach but that does not cover the complexities inherent to the situation from which the need usually emerges in the first place.

When there is a centrally stored value that requires manipulation from various places, various threads, and various sub-domains of a projects aim, it is usually not independent from other parts of a common business logic :)

Therefore, this case is in my honest opinion the hardest to phrase a general technological approach for and thus a bit out of reach of a discussion on a somewhat language- and platform-centric discussion ;)
Ralph Tandetzky says:

2014-01-14 at 1:38 pm
Without looking at previous answers:

1. Why do mutable shared variables like some_obj make your code:
(a) more complex?

Because it means that in almost all cases there must be a mutex associated with the variable. Often other invariants need to be established in order to reason about the behaviour of some code. Different implementations of concurrent queues are classic examples of this.

(b) more brittle?

It is hard to reason about shared state because it implies interdependencies of different parts of a program. Therefore, the code and logic of the application is usually not found in one place which makes the code harder to reason about. Often, when extending the functionality of a component more state needs to be added and it must be decided whether to protect the new state with the same mutex as well.

(c) less scalable?

Sharing is the root of contention. When many different threads are fighting for one mutex then this may easily lead to a performance bottleneck. If a thread is holding a mutex half of the time in a tight loop, then having multiple threads trying to do the same thing will at most double the performance. But multithreading might even worsen performance. Therefore, mutexes should be held only during very short actions which make some work take effect. For example such a short action might be changing a pointer or incrementing or decrementing a counter, etc.

When programming for the strong exception guarantee the technique of the red line of often employed, where the actual work is done on the side (without side effects) in a first phase and in a second phase that work is committed by performing some non-throwing mutating operations which are usually cheap to perform (like a swap or a move). Often multithreaded code uses the same technique with the difference that the mutex is held during the commit phase. Designing the code that way can be a hard task. We get scalability at the price of brittleness and complexity.

2. Give an example of how the code that uses a mutable shared variable like some_obj can be changed so that the variable is:
(a) not shared.
(b) not mutable.

Simple. Make a copy. Since the concurrency revolution it is often better to avoid contention via references or pointers to shared objects but instead make copies. The function code_that_modifies() can be changed to have the signature
```
	some_type code_that_modifies( some_type some_obj );
```
That way an object can be moved into the function which might be cheaper and exception-safer. That very same object can also be moved out of the function as a return value. In an ideal case the function will have approximately the same performance as the original, if not even better. Better, I say? — Yes, because there’s less indirection!

A second technique on top of that is to replace a variable of some_type by a shared_ptr to a const some_type. The pointee is immutable, but the pointer can be changed in a commit action while holding a mutex as mentioned above. This shortens the time the mutex is held. If the shared_ptr is the only variable protected by the mutex, then we might even be tempted to use atomic operations on shared_ptr as provided by the standard library and go without mutex.

These techniques might not always be practical, because copying and moving might be expensive or impossible.

3. Let’s say we’re in a situation where we can’t apply the techniques from the answers to #2, so that the variable itself must remain shared and apparently mutable. Is there any way that the internal implementation of the variable can make the variable be physically not shared and/or not mutable, so that the calling code can treat it as a logically shared-and-mutable object yet not need to perform external synchronization? If so, explain. If not, why not?

Sorry, Herb, I don’t understand what you have in mind with this question. What do you mean by *physically*? I’m thinking of thread-local storage in order to get variables that are physically not shared yet behave like one object (in object oriented design I have not found thread-local storage very helpful though). Alternatively, I’m thinking of a copy-on-write template class which makes sure that the data is immutable unless there is only one unique reference to it.

Otherwise, I would say, it’s not possible because I’m out of ideas. ;)
sbohmann says:

2014-01-14 at 1:10 pm

P.S. please forgive my orthography – it makes me cringe to read my posts but I’m writing on my iPad with autocorrect turned off because turning it off looked like a a smart idea at some point ;)
sbohmann says:

2014-01-14 at 1:01 pm

ad my response to 3.: such a mutable, but thread-safe and internally immutable and internally ref counted data wrapper can very easily be made immutable by making it const, e.g. for holding it in a persistent collection. it’s possible because its mutability is VERY shallow :)

Deeper mutability would in almost all cases be a show stopper here.

About std::string: when the part of a persisten collection that holds a string instance is modified (i.e. replaced) in the course of a persistent update (i.e. creation of an updated instance), the string will be copied.

therefore, if strings tends to be large in such a scenario, it may make sense to wrap it as in my response to 3., especially if mutability is NOT required – the wrapped string then behaves somewhat like a Java or C# string, i.e., its value is immutable (the embedded reference only if const), and copying it is practically free, which is just what persistent collections crave when it comes to their elemt types :)
Sebastian Bohmann says:

2014-01-14 at 12:27 pm

3. if the variable is of a class type, not a pointer, the internal representation may be a reference counted smart pointer to an immutable value or persistent collection. Plus, the reference counted smart pointer should either be thread-safe, else it must be guarded in the assignment op + copy c’tor.

Just another variant on the pimpl idiom with a ref counting smart pointer :)

1a. - the logic is polluted with mutexes and locks. this reduces readability.

1b. - possible to forget to lock leading to data races

    - possible deadlocks. for example:
        - high-level code calls low-level code while holding the lock
        - low level code invokes callback to high-level function which tries to lock
    - possibility of lifetime bugs (object which is destroyed in one thread
      while another thread is reading from it)    
      
1c. - locking prevents parallelism

2a. - writer thread owns the object and reader receives copies.
    - if your application has  a message loop then a thread can run in the
      background and send updated copies to the main thread via "post" events
      (concurrent_queues can be used to for a more general solution)
      
2b. - using techniques from (pure) functional programming languages where you
      never change the object but always pass updated copies back and forth.        
      
3b. I'm not sure what you have in mind. But here a few wild guesses:
    - maybe using software transactional memory? (not sure if transactions count
      as external synchronization)
    - perhaps a proxy type which stores a private copy of the real object? (state
      changes could be communicated to other proxies using some kind of
      messaging system.)

sbohmann says:

2014-01-14 at 9:30 am

1.a. because interdependencies between otherwise independent parts of the code arise – not only in the face of multiple threads but also in the face of – at first glance more innocent – coroutines.

1.b. because these interdependencies may manifest as side-effects of entirely unrelated things happening in other parts of code. Plus, these are often hard to find and almost always hard to debug when they emerge. Race conditions are but one popular example…

1.c. because wait states emerge that take up more and more time with increasing lock contention. Making data that is shared between vast parts of a project a source of lock contention can easily lead to a huge amount thereof. Throwing a vast number of threads at the problem won’t work because they always consume space (for their stacks, plus indirectly on the heap (yes, “free store” in C++)) and time because of the great price of context switches. Plus, making more threads wait all the time for the same resources wouldn’t help much in any case.

2.a. make copies. Defensive copying is not the brightest idea in most cases but may actually help if there are more or less outlandish reasons that make them the only feasible approach (like an existing code base that just can’t be adapted, or, in the worst case, political reasons).

2.b. use persistent collections plus immutable data objects, like custom classes, or const std::string etc. Immutable data is friendly to reference counting, and reference counting is well compatible with most of these non-circular tree-based persistent (i.e. immutable with low-cost copying) data structures. I really hope they will pop up in the STL as soon as possible so I can replace my home-brewn implementations (because the STL is ALWAYS more thoroughly tested :D)
Pingback: Guru of the Week 95 : Thread Safety and Synchronization | C++, Qt, OpenGL, CUDA

Comments are closed.