A simple C11 implementation of Vyukov multi-producer, single-consumer lockless queue[1].
make # Build tests
make run # Run tests
make benchmark # Run benchmarkThis is a single-header library, to be dropped and used in your project.
-
Multi-producer: multiple threads can write concurrently. Insertion is thread safe and costs one atomic exchange.
-
Single-consumer: only one thread can safely remove nodes from the queue.
-
Unbounded: The queue is a linked-list and does not limit the number of elements.
-
Wait-free writes: Writers will never wait for queue state sync when enqueuing.
-
Obstruction-free reads: The reader does not wait to see if a node is available in the queue. Peeking takes a bounded number of instructions. There is however no removal forward-guarantee, as it relies on other threads progressing. Livelock must be avoided with an out-of-band mechanism.
-
Intrusive: Queue elements are allocated as part of larger objects. Objects are retrieved by offset manipulation.
-
per-producer FIFO: Elements in the queue are kept in the order their producer inserted them. The consumer retrieves them in the same insertion order. When multiple producers insert at the same time, either will proceed.
This queue is well-suited for message passing between threads, where any number of thread can insert a message and a single thread is meant to receive and process them.
It could be used to implement the Actor concurrency model.
Note: this queue is serializable but not linearizable. After a series of insertion, the queue state remains consistent and the insertion order is compatible with their precedence. However, because one insertion consists in two separate memory transaction, the queue state can be found inconsistent within the series.
This has important implication regarding the concurrency environment this queue can be used with. To avoid livelocking the consumer thread, one must ensure that producer threads cannot be cancelled when inserting elements in the queue. Either cooperative threads should be used or insertions should be done outside cancellable sections.
A simple benchmark was implemented to compare several MPSC queue implementations.
Vyukov's queue is compared mainly against a basic doubly-linked intrusive list with
a lock, referenced as tailq during tests.
Additionally, a Treiber stack [2] is implemented, requiring only one Compare-And-Swap (CAS) per insertion, reversing the stack during element removal. This specific implementation is found very quickly insufficient and is only kept as a curiosity.
-
R. K. Treiber. Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, April 1986.