paulmck

RCU was accepted into C++26 a few years back, and C++26 should be coming out soon, this being 2026 and all.

The specification may be found in the draft C++ standard, but the last RCU working paper has background information and code examples that, though quite useful, are not appropriate for a standards-body specification.

But let's start by comparing the C++API to that of the Linux kernel. This will help people who are familiar with Linux-kernel RCU. Others might be better served by the last RCU working paper linked to above.

The following table maps the relevant portion of the Linux-kernel RCU (and the userspace-RCU library) API to the C++26 RCU API. The C++ in this table is more verbose than necessary, and this is intended to keep C++ novices from getting quite as many compiler diagnostics as they might otherwise. For example, you can use C++ using statement to avoid having to type std:: so many times.

Linux Kernel C++26
guard(rcu)() std::scoped_lock l(std::rcu_default_domain())
rcu_read_lock() std::rcu_default_domain().lock()
rcu_read_unlock() std::rcu_default_domain().unlock()
synchronize_rcu() std::rcu_synchronize()
struct rcu_head std::rcu_obj_base<T> *
call_rcu() std::rcu_obj_base<T> .retire() member function *
kfree_rcu_mightsleep() std::rcu_retire() *
rcu_barrier() std::rcu_barrier()

The two entries marked with an asterisk (”*“) are approximate.

First, where you would include a struct rcu_head into your RCU-portected C-language structure, in C++26 you would instead make your RCU-protected structure (or class) inherit from the rcu_obj_base<T> class. For example, the following C-language structure:

struct foo { struct rcu_head rh; int data; struct foo *next; };

Might translate to C++26 as follows:

struct foo : std::rcu_obj_base<foo> { int data; struct foo *next; };

Then given a pointer fp, your C-language call_rcu(&fp->rh, my_cb) would become in C++26 fp->retire(my_cb) or just fp->retire() if your my_cb() function was a simple wrapper around kfree(). Or, perhaps better, you can think of an argument-free call to fp->retire() as the C++26 counterpart to kfree_rcu(fp, rh).

And yes, raw pointers such as struct foo *next are frowned upon by many C++ developers. However, it does make the comparison more straightforward, and C++ experts will have no problem converting to other more-favored facilities.

Second, C++ needs to run in minimal environments, including those that cannot tolerate the background threads that both the Linux kernel (using the term “thread” rather loosely) and the userspace RCU library use to invoke callbacks. This means that a C++26 RCU implementation is permitted to invoke RCU callbacks/deleters within the invocation of rcu_retire() or the .retire() member function. The benefit of this permission is greater portability, but the corresponding drawback is that you are not permitted to hold a mutex across any call to rcu_retire() or .retire() if that mutex is unconditionally acquired by any deleter passed to any rcu_retire() or .retire() anywhere in your program. To do so would deadlock, given that any call to rcu_retire() or .retire() might invoke any deleter whatsoever!

If this restriction becomes a serious problem, and it might well, perhaps there will be a function call identifying C++ RCU implementations that are immune to this deadlock situation. Or maybe there will be a C++ type trait added to rcu_obj_base<T> that forces either deadlock-free retirement on one hand or a compiler diagnostic on the other, depending on whether the underlying C++ implementation is capable of deadlock-free operation.

Third, if your C-language structure does not have a struct rcu_head:

struct foo { int data; struct foo *next; };

Then this exact same structure definition works in C++26. And your C-language kfree_rcu_mightsleep(fp) would become in C++26 rcu_retire(fp). In this case, C++26 also allows a callback function (deleter in C++26) to be specified, for example, rcu_retire(fp, my_cb).

The C++26 RCU API is way smaller than that of Linux-kernel RCU. The point is not to needlessly restrict C++26 RCU users, but to track userspace implementation experience. Section 1.1 (“Proposed Entry to C++26 IS”) of the last RCU working paper lists some possible directions for the future.

Implementations for production C++ standard libraries are in progress, but in the meantime the reference C++ RCU implementation may be used for experimentation. This prototype implementation creates a thin C++ wrapper around the userspace RCU library.

RCU has a way of popping up unexpectedly. In the words of Fedor Pikus, “In fact, you may already be using the RCU approach in your program without realizing it!” This post will describe uses of a couple of highly unorthodox (some might say “completely irresponsible”) types of RCU implementations, both accidental and otherwise.

Timed-Wait RCU

One very simple class of RCU implementations uses a fixed period of time for the grace period. This can clearly work well in hard real-time kernels and applications, though it has also been used in a prototype non-real-time kernels. In an early 1990s verbal discussion, none other than Van Jacobson pointed out that a 15-second delay would suffice in the research version of a proprietary-UNIX OS that he was working with. I responded (also verbally) that in DYNIX/ptx interrupt handlers sometimes executed for more than a minute (as in more than 15 seconds), but that Jack Slingwine and I had a way to get the same low-overhead/high-scalability effect without the need for hard real-time constraints on readers. Van expressed interest, so I sent him an early draft of the first RCU conference paper. Some years later, I had the privilege of hearing Van say nice things about Linux-kernel RCU.

For the benefit of any long-time RCU users who (like me) have a “make it make sense” filter in their heads, here is a Linux-kernel implementation of a synchronize_rcu() for Van's RCU:

void synchronize_rcu(void) { schedule_timeout_uninterruptible(15 * HZ); }

In other words, synchronize_rcu() does a fixed wait of 15 seconds, after which it is assumed, without proof, that all pre-existing readers have completed.

In the mid-1990s, Aju John wrote the USENIX paper Dynamic vnodes — Design and Implementation, which proposed a fixed 10-minute wait time for reclaiming vnodes in a proprietary UNIX system, DEC OSF/1 Version 3.0.

This approach might actually make sense in a hard-real-time environment, but would of course be extremely dangerous even there. On the other hand, you have to admit that Van and Aju were taking a no-holds-bared approach to performance and scalability, and doing so very early in the game!

As far as I know, none of these two proprietary-UNIX OSes remains in production use, but it seems likely that timed-wait RCU saw at least some production use back in the day. And maybe it is still being used, dangerous though use of timed-wait RCU might be outside of hard-real-time environments! On the other hand, this is a rare instance of hard real-time making something way simpler, at least as long as non-real-time threads are prohibited from using RCU read-side critical sections.

Fixed-Buffer RCU

The previous section covered RCU grace-period detection based purely on time. But a recent discussion with Hui Xie led me to wonder about instead basing them purely on space, as in address space. Is this possible?

The Linux-kernel address sanitizer (KASAN) keeps newly freed blocks of memory in a fixed-size quarantine. This is done to increase the probability that a use-after-free access occurs while corresponding block of memory has not yet been reallocated, thus reducing the number of false negatives that could otherwise occur if freed memory was instead immediately reallocated.

However, this can be thought of another corner-case RCU implementation, where the “readers” are assumed to complete before the corresponding block of memory is reallocated.

And rcutorture does something similar by deferring freeing of RCU_TORTURE_PIPE_LEN (AKA 10) of the blocks previously referenced by rcu_torture_current. In addition, there are 10 times that many elements total (AKA 100 of them). As with KASAN, the idea is to reduce the incidence of false negatives, so that an extremely long RCU reader combined with a series of extremely short RCU grace periods would still have roughly a 99% chance of each reader issuing a diagnostic for the too-short grace periods.

Although in these two examples, the intent of the fixed buffers is bug detection, they could instead be used to create an extremely limited and very sharp-edged RCU grace-period detection mechanism. So the answer to the question raised by the discussion with Hui Xie is “yes”!

Time vs. Space and Implications

In short, it is possible (though risky!) to implement RCU grace-period detection purely in terms of time on the one hand and purely in terms of space on the other. This provides a nice counterpoint to RCU using both time and space for its underlying synchronization, with rcu_read_lock(), rcu_read_unlock(), synchronize_rcu(), and call_rcu() handing time and rcu_assign_pointer() and rcu_dereference() handling space.

And who knows? Perhaps someone is using fixed-buffer RCU in production. But if you are that someone, please understand that you need to sharply and deterministically bound both reader duration and update rate. Either that, or you (and your users!) need to tolerate some probability of memory corruption.

Some historical perspective is provided by the rumored 1980s DEC VAX system running BSD UNIX whose systems administrator chose to use known-buggy patches that were said to provide about a 15% performance boost. And about one extra crash per week. At the time, some felt that this was the acme of systems-administration excellence, while others felt that this was completely foolhardy.

A less intentional example is provided by none other than Linux-kernel RCU shortly after suspend and hibernation were accepted into the kernel. These features forced CPU hotplug, which silently negated my attempts to test RCU in non-CPU-hotplug kernels. This meant that rcutorture failed to spot a stupid bug that I injected into RCU that, in non-CPU-hotplug kernels, caused RCU grace periods to be a fixed few-millisecond delay.

And nobody noticed.

Aside from a developer who tested hundreds of concurrent kernel builds on a two-CPU system, who reported the bug, but disappeared when I asked for his .config. And a few months after that, Nick Piggin, whose dentry cache testing tickled this bug. Nick found and fixed my stupid bug. I then changed rcutorture to pay more attention to the .config file for the kernel under test, and to apply stress in a more targeted manner. And given today's much more aggressive Linux-kernel testing, I suspect that a similar bug today would be noticed much more quickly.

So, as always, choose wisely! ;–)

History

  • Add example synchronize_rcu() timed-wait implementation to reinforce the unorthodox nature of these RCU implementation approaches.
  • Add 1980s rumors of DEC VAX system choosing performance over reliabilty.

TL;DR: Unless you are doing very strange things with RCU (read-copy update), not much!!!

So why has the guy most responsible for Linux-kernel spent so much time over the past five years working on the provenance-related lifetime-end pointer zap within the C++ Standards Committee?

But first...

What is Pointer Provenance?

Back in the old days, provenance was for objets d'art and the like, and we did not need them for our pointers, no sirree!!! Pointers had bits, those bits formed memory addresses, and as often as not we didn't even need to worry about these addresses being translated. But life is more complicated now. On the other hand, computing life is also much bigger, faster, more reliable, and (usually) more productive, so be extremely careful what you wish for from back in the Good Old Days!

These days, pointers have provenance as well as addresses, and this has consequences. The C++ Standard (recent draft) states that when an object's storage duration ends, any pointers to that object become invalid. For its part, the C Standard states that when an object's storage duration ends, any pointers to that object become indeterminate. In both standards, the wording is more precise, but this will serve for our purposes.

For the remainder of this document, we will follow C++ and say “invalid”, which is shorter than “indeterminate”. We will balance this out by using C-language example code. Those preferring C++ will be happy to hear that this is the language that I use in my upcoming CPPCON presentation.

Neither standard places any constraints on what a compiler can do with an invalid pointer value, even if all you are doing is loading or storing that value.

Those of us who cut our teeth on assembly language might quite reasonably ask why anyone would even think to make pointers so grossly invalid that you cannot even load or store them. To see the historical reasons, let's start by looking at pointer comparisons using this code fragment:

p = kmalloc(...);
might_kfree(p);         // Pointer might become invalid (AKA "zapped")
q = kmalloc(...);       // Assume that the addresses of p and q are equal.
if (p == q)             // Compiler can optimize as "if (false)"!!!
    do_something();

Both p and q contain addresses, but the compiler also keeps track of the fact that their values were obtained from different invocations of kmalloc(). This information forms part of each pointer's provenance. This means that p and q have different provenance, which in turn means that the compiler does not need to generate any code for the p == q comparison. The two pointers' provenance differs, so no matter what the addresses might be, the result cannot be anything other than false.

And this is one motivation for pointer provenance and invalidity: The results of operations on invalid pointers are not guaranteed, which provides additional opportunities for optimization. This example perhaps seems a bit silly, but modern compilers can use pointer provenance and invalidity to carry out serious points-to and aliasing analysis.

Yes, you can have hardware provenance. Examples include ARM MTE, the CHERI research prototype (which last I checked had issues with C++'s requirement that pointers are trivially copiable), and the venerable IBM System i. Conventional systems provide pointer provenance of a sort via their page tables, which is used by a variety of memory-allocation-use debuggers, for but one example, the efence library. The pointer-provenance features of ARM MTE and IBM System i are not problematic, but last I checked, the jury was still out on CHERI.

Of course, using invalid (AKA “dangling”) pointers is known to be a bad idea. So why are we even talking about it???

Why Would Anyone Use Invalid/Dangling Pointers?

Please allow me to introduce you to the famous and frequently re-invented LIFO Push algorithm. You can find this in many places, but let's focus on the Linux kernel's llist_add_batch() and llist_del_all() functions. The former atomically pushes a list of elements on a linked-list stack, and the latter just as atomically removes the entire contents of the stack:

static inline bool llist_add_batch(struct llist_node *new_first,
                                   struct llist_node *new_last,
                                   struct llist_head *head)
{
    struct llist_node *first = READ_ONCE(head->first);

    do {
        new_last->next = first;
    } while (!try_cmpxchg(&head->first, &first, new_first));

    return !first;
}

static inline struct llist_node *llist_del_all(struct llist_head *head)
{
    return xchg(&head->first, NULL);
}

As lockless concurrent algorithms go, this one is pretty straightforward. The llist_add_batch() function reads the list header, fills in the ->next pointer, then does a compare-and-exchange operation to point the list header at the new first element. The llist_del_all() function is even simpler, doing a single atomic exchange operation to NULL out the list header and returning the elements that were previously on the list. This algorithm also has excellent forward-progress properties: the llist_add_batch() function is lock-free and the llist_del_all() function is wait-free.

So what is not to like?

In assembly language, or with a simple compiler, not much. But more heavily optimized languages have serious pointer-provenance issue with this code. To see them, consider the following sequence of events:

  1. CPU 0 allocates an llist_node B and passes it via both the new_first and new_last parameters of llist_add_batch().
  2. CPU 0 picks up the head->first pointer and places it in the first local variable, then assigns it to new_last->next. This new_last->next pointer now references llist_node A.
  3. CPU 1 invokes llist_del_all(), which returns a list containing llist_node A. The caller of llist_del_all() processes A and passes it to kfree().
  4. CPU 0's new_last->next pointer is now invalid due to llist_node A having been freed. But CPU 0 does not know this, though a sufficiently all-knowing compiler just might.
  5. CPU 1 allocates an llist_node C that happens to have the same address as the old llist_node A. It passes C via both the new_first and new_last parameters of llist_add_batch(), which runs to completion. The head pointer now points to llist_node C, which happens to have the same address as the now storage-duration-ended llist_node A. However, the two pointers reference objects created by different memory-allocation calls, and thus have different provenance, and thus are not necessarily equal.
  6. CPU 0 finally gets around to executing its try_cmpxchg(), which will succeed, courtesy of the fact that try_cmpxchg() compares only the bits actually represented in the pointer, and not any implicit pointer provenance (and please note that the same is true of both the C and C++ compare-and-exchange operations). The llist now contains an llist_node B that contains an invalid pointer to dead llist_node A, but whose address happens to reference the shiny new llist_node C. (We term this invalid pointer a “zombie pointer” because it has in some assembly-language sense come back from the dead.)
  7. Some CPU invokes llist_del_all() and gets back an llist containing an invalid ->next pointer.

One could argue that the Linux-kernel implementation of LIFO Push is simply buggy and should be fixed. Except that there is no reasonable way to fix it. Which of course raises the question...

What Are Unreasonable Fixes?

We can protect pointers from invalidity by storing them as integers, but:

  1. Suppose someone has an element that they are passing to a library function. They should not be required to convert all their ->next pointers to integer just because the library's developers decide to switch to the LIFO Push algorithm for some obscure internal operation.
  2. In addition, switching to integer defeats type-checking, because integers are integers no matter what type of pointer they came from.
  3. We could restore some type-checking capability by wrapping the integer into a differently named struct for each pointer type. Except that this requires a struct with some particular name to be treated as compatible with pointers of some type corresponding to that name, a notion that current compilers do not support.
  4. In C++, we could use template metaprogramming to wrap an integer into a class that converts automatically to and from compatibly typed pointers. But there would then be windows of time in which there was a real pointer, and at that time there would still be the possibility of pointer invalidity.
  5. All of the above hack-arounds put additional obstacles in the way of developers of concurrent software.

Alternatively, in environments such as the Linux kernel that provides their own memory allocators, we can hide them from the compiler. But this is not free, in fact, the patch that exposed the Linux-kernel's memory allocators to the compiler resulted in a small but significant improvement.

However, it is fair to ask...

Why Do We Care About Strange New Algorithms???

Let's take a look at the history, courtesy of Maged Michael's diligent software archaeology.

In 1986, R. K. Treiber presented an assembly language implementation of the LIFO Push algorithm in technical report RJ 5118 entitled “Systems Programming: Coping with Parallelism” while at the IBM Almaden Research Center.

In 1975, an assembly language implementation of this same algorithm (except with pop() instead of popall(), but still having the same ABA properties) was presented in the IBM System 370 Principles of Operation as a method for managing a concurrent freelist.

US Patent 3,886,525 was filed in June 1973, just a few months before I wrote my first line of code, and contains a prior-art reference to the LIFO Push algorithm (again with pop() instead of popall()) as follows: “Conditional swapping of a single address is sufficient to program a last-in, first-out single-user-at-a-time sequencing mechanism.” (If you were to ask a patent attorney, you would likely be told that this 50-year-old patent has long since expired. Which should be no surprise, given that it is even older than Dennis Ritchie's setuid Patent 4,135,240.)

All three of these references describe LIFO push as if it was straightforward and well known.

So we don’t know who first invented LIFO Push or when they invented it, but it was well known in 1973. Which is well over a decade before C was first standardized, more than two decades before C++ was first standardized, and even longer before Rust was even thought of.

And its combination of (relative) simplicity and excellent forward-progress properties just might be why this algorithm was anonymously invented so long ago and why it is so persistently and repeatedly reinvented. This frequent reinvention puts paid to any notion that LIFO Push is strange.

So sorry, but LIFO Push is neither new nor strange.

Nor is it the only situation where lifetime-end pointer zap causes problems. Please see the “Zap-Susceptible Algorithms” section of P1726R5 (“Pointer lifetime-end zap and provenance, too”) for additional use cases.

So What Do We Do?

The lifetime-end pointer-zap story is not yet over, and we are in fact currently pushing for the changes in four working papers.

Nondeterministic Pointer Provenance

P2434R4 (“Nondeterministic pointer provenance”) is the basis for the other three papers. It asks that when converting a pointer to an integer and back, the implementation must choose a qualifying pointed-to object (if there is one) whose storage duration began before or concurrently with the conversion back to a pointer. In particular, the implementation is free to ignore a qualifying pointed-to object when the conversion to pointer happens before the beginning of that object’s storage duration.

The “qualifying” qualifier includes compatible type, as well as sufficiently early and long storage duration.

But why restrict the qualifying pointed-to object's storage duration to begin before or concurrently with the conversion back to a pointer?

An instructive example by Hans Boehm may be found in P2434R4, which shows that reasonable (and more important, very heavily used) optimizations would be invalidated by this approach. Several examples that manage to be even more sobering may be found in David Goldblatt's P3292R0 (“Provenance and Concurrency”).

Pointer Lifetime-End Zap Proposed Solutions: Atomics and Volatile

P2414R10 (“Pointer lifetime-end zap proposed solutions: Atomics and volatile”) is motivated by the observation that atomic pointers are subject to update at any time by any thread, which means that the compiler cannot reasonably do much in the way of optimization. This paper therefore asks (1) that atomic operations be redefined to yield and to store prospective pointers values and (2) that operations on volatile pointers be defined to yield and to store prospective pointer values. The effect is as if atomic pointers were stored internally as integers. This includes the “old” pointer passed by reference to compare_exchange().

This helps, but is not a full solution because atomic pointers are converted to non-atomic pointers prior to use, at which point they are subject to lifetime-end pointer zap. And the standard does not even guarantee that a zapped pointer can even be loaded, stored, passed to a function, or returned from a function. Which brings us to the next paper.

Pointer Lifetime-End Zap Proposed Solutions: Tighten IDB for Invalid Pointers

P3347R4 (“Pointer lifetime-end zap proposed solutions: Tighten IDB for invalid pointers”) therefore asks that all non-comparison non-arithmetic non-dereference computations involving pointers, specifically including normal loads and stores, are fully defined even if the pointers are invalid. This permits invalid pointers to be loaded, stored, passed as arguments, and returned. Fully defining comparisons would rule out optimizations, and fully defining arithmetic would be complex and thus far unneeded. Fully defining dereferencing of invalid pointers would of course be problematic.

If these first three papers are accepted into the standard, the C++ implementation of LIFO Push show above becomes valid code. This is important because this algorithm has been re-invented many times over the past half century, and is often open coded. This frequent open coding makes it infeasible to construct tools that find LIFO Push implementations in existing code.

P3790R1: Pointer Lifetime-End Zap Proposed Solutions: Bag-of-Bits Pointer Class

P3790R1 (“Pointer lifetime-end zap proposed solutions: Bag-of-bits pointer class”) asks that (1) the addition to the C++ standard library of the function launder_ptr_bits() that takes a pointer argument and returns a prospective pointer value corresponding to its argument; and (2) the addition to the C++ standard library of the class template std::ptr_bits<T> that is a pointer-like type that is still usable after the pointed-to object’s lifetime has ended. Of course, such a pointer still cannot be dereferenced unless there is a live object at that pointer's address. Furthermore, some systems, such as ARMv9 with memory tagging extensions (MTE) enabled have provenance as well as address bits in the pointer, and on such systems dereferencing will fail unless the pointer's provenance bits happen to match those of the pointed-to object.

This function and template class is nevertheless quite useful, for example, it may be used to maintain hash maps keyed by pointers after the pointed-to object's lifetime has ended. These can be extremely useful for debugging, especially in cases where the overhead of full-up address sanitizers cannot be tolerated.

Unlike LIFO Push, source-code changes are required for these use cases. This is unfortunate, but we have thus far been unable to come up with a same-source-code approach.

Those who have participated in standards work (or even open-source work) will understand that the names launder_ptr_bits() and std::ptr_bits<T> just might still be subject to bikeshedding.

A Happen Lifetime-End Pointer Zap Ending?

It is still too early to say for certain, but thus far these proposals are making much better progress than did their predecessors. So who knows? Perhaps C++29 will address lifetime-end pointer zap.

This is part of the Kernel Recipes 2025 blog series.

The other posts in this series help with small improvements over a long time. But what do you do if you only have a few weeks until your presentation? Yes, it is best to avoid procrastination, but sometimes you simply don't have all that much notice.

First, have a very clear picture of what you want the audience to gain from your presentation. A carefully chosen and tight focus will save you time that might otherwise have been wasted on irrelevant details.

Second, do dry-run presentations, preferably to people who won't be shy about giving you honest feedback. If your dry-run audience has shy people, you can ask them questions to see if they picked up on the key points of your presentation. If you cannot scare up a human audience on short notice, record your presentation (on your smartphone if nothing else) and review it. In the old pre-smartphone days, we would do our audience-free dry runs in front of a mirror, which can still be useful, for example, if your smartphone's battery is empty.

Third, repeat the important portions of your presentation, which usually includes the opening, the conclusion, and any surprise “reveals” in the middle of the presentation. If it is an important presentation (but aren't they all?), do about 20 repetitions of the important portions. If it is an extremely important presentation, dry-run the entire presentation about 20 times. Yes, this can take time, but on the other hand, most of my extremely important presentations were quite short, on the order of 3-5 minutes.

Fourth and finally, get a good night's sleep before the day of the presentation.

This is part of the Kernel Recipes 2025 blog series.

I have been consciously working on speaking skills for more than half a century.  This section lists a few of the experiences along the way. My hope is that this motivates you to take the easier and faster approaches laid out in the rest of this blog series.

Comic Relief

A now-disgraced comedian who was immensely popular in the 1960s was said to have learned his craft at school.  They said that he discovered that if he could make the schoolyard bullies laugh, they would often forget about roughing him up.  I tried the same approach, though with just barely enough success to persist.  Part of my problem was that I spent most of my time focusing on academic skills, which certainly proved to be a wise choice longer term, but did limit the time available to improve my comedic capabilities.  I was also limited by my not-so-wise insistence on taking myself too seriously.  Choices, choices!

My classmates often told very funny jokes, and I firmly believed that making up jokes was a cognitive skill, and I just as firmly believed (and with some reason) that I was a cognitive standout.  If they could do it, so could I!!!

But for a very long time, my jokes were extremely weak compared to theirs.

Until one day, I told a joke that everyone laughed at.  Hard.  For a long time.  (And no, I do not remember that joke, but then again, it was a joke targeted towards seventh graders and you most likely are not in seventh grade.)

Once they recovered, one of them asked “What show did you see that on?”

Suddenly the awful truth dawned on me.  My classmates were not making up these jokes.  They were seeing them on television, and rushing to be the first to repeat them the next day.  Why was this not obvious to me?  Because my family did not have a television.

My surprise did not prevent me from replying “The Blank Wall”.  Which was the honest truth: I had in fact been staring at a blank wall the previous evening while composing my first successful joke.

The next day, my classmates asked me what channel “The Blank Wall” was on.  I of course gave evasive answers, but in a few minutes they figured out that I meant a literal blank wall.  They were not impressed with my attitude.  You saw jokes on television, after all, and no one in their right mind would even try to make one up!

I also did some grade-school acting, though my big role was Jonathan Brewster in a seventh-grade production of “Arsenic and Old Lace” rather than anything comedic.  The need to work prevented my acting in any high-school plays, though to be fair it is not clear that my acting abilities would have kept up with those of my classmates in any case.

Besides, those working in retail can attest that carefully deployed humor can be extremely useful.  So my high-school grocery-store job likely provided me with more and better experience than the high-school plays could possibly have done.  At least that is what I keep telling myself!

Speech Team

For reasons that were never quite clear to me, the high-school speech-team coach asked me to try out. I probably would have ignored her, but I well recalled my father telling me that those who have nothing to say, but can say it well, will often do better than those who have something to say but cannot say it. So, against my better 13-year-old judgment, I signed up.

I did quite well in extemporaneous speech during my first year due to my relatively deep understanding of the science behind the hot topic of that time, namely the energy crisis. During later years, the hot topics reverted to the usual political and evening-news fare, so the remaining three years were good practice, but did not result in wins. Until the end of my senior year, when the coach suggested that I try radio commentary, which had the great advantage of hiding my horribly geeky teenaged face from the judges. I did quite well, qualifying for district-level competition on the strength of my first-ever radio-commentary speech.

But I can only be thankful that my 17-year-old self decided to go to an engineering university as opposed to seeking employment at a local radio station.

University Coursework

I tested out of Freshman English Composition, but I did take a couple of courses on technical writing and technical presentation. A ca. 1980 mechanical-engineering presentation on ground-loop heat pumps featured my first use of cartoons in a technical presentation, courtesy of a teammate who knew a professional cartoonist. The four of us were quite proud of having kept the class’s attention during the full duration of our talk, which took place only a few days before the start of Christmas holidays.

1980s and 1990s Presentations

I did impromptu work-related presentations for my contract-programming work in the early 1980s. In the late 1980s, I joined a research institute where I was expected to do formal presentations, including at academic venues. I joined a startup in 1990, where I continued academic presentations, but focused mainly on internal training presentations.

Toastmasters

I became a founding member of a local Toastmasters club in 1993, and during the next seven years received CTM (“Competent Toastmaster) and ATM (“Advanced Toastmaster”) certifications. There is very likely a Toastmasters club near you, and you can search here: https://www.toastmasters.org/.

The purpose of Toastmasters is to help people develop public-speaking skills in a friendly environment. The members of the club help each other, evaluating each others’ short speeches and providing topics for even shorter impromptu speeches. The CTM and ATM certifications each have a manual that guides the member through a series of different types of speeches. For example, the 1990s CTM manual starts with a 4-6-minute speech in which the member introduces themselves. This has the benefit of ensuring that the speaker is expert on the topic, though I have come across an amnesiac who was an exception that proves this rule.

For me, the best of Toastmasters was “table topics”, in which someone is designated to bring a topic to the next meeting. The topic is called out, and people are expected to volunteer to give a short speech (a minute or two) on that topic. This is excellent preparation for those times when someone calls you out during a meeting.

Benchmarking

By the year 2000, I felt very good about my speaking ability. I was aware of some shortcomings, for example, I had difficulty with audiences larger than about 100 people, but was doing quite well, both in my own estimation and that of others. In short, it was time to benchmark myself against a professional speaker.

In that year, I attended an event whose keynote was given by none other than one of the least articulate of the US Presidents, George H. W. Bush. Now, Bush’s speaking abilities might have been unfairly compared to the larger-than-life capabilities of his predecessor (Ronald Reagan, AKA “The Great Communicator”) and his successor (Bill Clinton, whose command of people skills is the stuff of legends). In contrast, here is Ann Richards’s assessment of Bush’s skills: “born with a silver foot in his mouth”.

As noted above, I had just completed seven years in Toastmasters, so I was more than ready to do a Toastmasters-style evaluation of Bush’s keynote. I would record all the defects in this speech and email it to my Toastmasters group for their amusement.

Except that it didn’t turn out that way.

Bush gave a one-hour speech during which he did everything that I knew how to do, and did it effortlessly. Not only that, there were instances where he clearly expected a reaction from the audience, and got that reaction. I was watching him like a hawk the whole time and had absolutely no idea how he had made it happen.

Bush might well have been the most inarticulate of the US Presidents, but he was incomparably better than this software developer will ever be.

But that does not mean that I cannot continue to improve. In fact, I can now do a better job of presenting that Bush can. Not just due to my having spent the intervening decades practicing (practice makes perfect!), but mostly due to the fact that Bush has since passed away.

Linux Community

I joined the Linux community in 2001, where I faced large and diverse audiences. It quickly became obvious that I needed to apply my youthful Warner Brothers lessons, especially given that I was presenting things like RCU to audiences that were mostly innocent of any knowledge of or experience in concurrency.

This experience also gave me much-needed practice dealing with larger audiences, in a few cases, on the order of 1,000.

So I continue to improve, but there is much more for me to learn.

This is part of the Kernel Recipes 2025 blog series.

This blog series has covered why public speaking is important, ways and means, building bridges from your audience to where they need to go, who owns your words, telling stories, knowing your destination, use of humor, and speaking on short notice.

But if you would rather learn about what I actually did rather than what I advise you to do, please see here.

I close this series by reiterating the value and ubiquity of Toastmasters and the usefulness of both dry runs and reviewing videos of your past talks.

Best of everything in your presentations!

Acknowledgments

And last, but definitely not least, a big “thank you” (in chronological order) to Anne Nicolas, Willy Tarreau, Steven Rostedt, Gregory Price, and Michael Opendacker for their careful review of early versions of this series.

This is part of the Kernel Recipes 2025 blog series.

Humor is both difficult and dangerous, especially in a large and diverse group such as the audience for Kernel Recipes. My advice is to do many formal presentations before attempting much in the way of humor.

This section will nevertheless talk about use of humor in technical presentations.

One issue is that audience members have a wide range of languages and dialects, and a given joke in (say) American English might not go over well to (say) Welsh English speakers. And it might be completely mangled in translation to another language. For example, during a 1980s visit to China, George Bush Senior is said to have quipped “We are oriented to the Orient.” This translates to something like ”我们面向东方”, which translates back to something like “We face East”, completely destroying Bush’s oriented/Orient pun. So what did the poor translator say? “是笑话,笑吧”, which translates to something like “It is a joke, laugh.”

So if you tell jokes, keep translations to other cultures and languages firmly in mind. (To be fair, this is advice that I could do well to better heed myself!)

In addition, jokes make fun of some person or group or are based on what is considered to be abnormal, excessive, or unacceptable, all of which differ greatly across cultures. Besides which, given a large and diverse audience such as that of Kernel Recipes, there will almost certainly be someone in attendance who identifies with the person or group in question or who has strong feelings about the joke’s implications about abnormality, excessiveness, or unacceptability. That someone just might have a strong negative reaction. And this should be absolutely no surprise, given that humor is used with great effect as a weapon in social conflicts.

In my youth, there were outgroups that were frequently the butt of jokes. These were often groups that were not represented in my small community, but were just as often a single-person outgroup made up of some hapless fellow student. Then as now, the most cruel jokes all too often get the best laughs.

Yet humor can also make a speech much more enjoyable. So what is a speaker to do?

Outgroups are often used, with technical talks making jokes at the expense of managers, salespeople, marketing departments, lawyers, users, and occasionally even an especially incompetent techie. But these jokes always eventually find their way to the outgroup in question, sometimes with devastating consequences to the hapless speaker.

It is better to tell jokes where you yourself are the butt of the joke. This can be difficult at first: Let’s face it, most of us would prefer to be taken seriously. However, becoming comfortable with this is well worth the effort. For one thing, once you have demonstrated a willingness to make a joke at your own expense, the audience will usually be much more willing to accept their own shortcomings and need for improvement. Such an audience will usually also be more willing to learn, and the best technical talks are after all those that audiences learn from.

What jokes should you tell on yourself? I paraphrase advice from the late humorist Patrick McManus: The worst day of your life will make the audience laugh the hardest.

That said, you need to make sure that the audience can relate to the challenges you faced on that day. For example, my interactions with the legal profession would likely seem strange and irrelevant to a general audience. However, almost all members of a Kernel Recipes audience will have chased down a difficult bug, so a story about some idiotic mistake I made while chasing down an RCU bug will likely resonate. And this might be one way of entertaining a general audience while providing needed information to those wanting an RCU deep dive.

Or maybe you can figure out how to work some bathroom humor into your talk. Who is the butt of this joke? You decide! ;–)

Adding humor to your talk often does not come for free. Time spent telling jokes is not available for presenting on technology. This tradeoff can be tricky: Too much humor makes for a lightweight talk, and too little for a dry talk. Especially if you are just starting out, I strongly advise you to err in the direction of dryness. Instead, make your technical content be the source of your audience’s excitement.

Use of humor in technical talks is both difficult and dangerous, but careful use of humor can be a very powerful public-speaking tool.

Perhaps some day I, too, will master the use of humor.

This is part of the Kernel Recipes 2025 blog series.

An earlier section expounded on the importance of building your bridge starting from where your target audience already is.  The previous section talked about using stories to build your bridge.

However, it is just as important to understand where your bridge’s destination lies.  This might seem blindingly obvious, but suppose that you were just invited to speak.  That is right, a conference is going to give you a precious 30 minutes in front of their audience.  But where do you want to take them?  Where do they need to go?

I cannot decide this for you.  Instead, you must decide, based on your experiences and those of the audience.

But I can list the destinations that I chose for the example talks from the previous section:

  1. “What Happens When 4096 Cores All Do synchronize_rcu_expedited()?”: Demonstrate extreme scalability is possible, some techniques for scaling, and exposition of portions of Linux-kernel RCU.
  2. “RCU's First-Ever CVE, and How I Lived to Tell the Tale”: Show that ease of use is important even for low-level synchronization primitives, “a year in the life of the RCU maintainer”.
  3. “Bare-Metal Multicore Performance in a General-Purpose Operating System (Adventures in Ubiquity)”: Describe how extreme stress testing proves to not be all that extreme, introduction to RCU callback offloading and NOHZFULL, exposition of portions of Linux-kernel RCU.
  4. “Cautionary Tales on Implementing the Software That People Want”: “My users don’t know what they want” is not a valid excuse and never has been, connection between validation and natural selection, hazards of refusing to fix irrelevant bugs (never mind that we all too often have no choice).

Of course, the fact that we choose a particular destination does not necessarily mean that the audience will arrive there!

This is part of the Kernel Recipes 2025 blog series.

A well-crafted story covering a boring topic can be far more entertaining than a haphazard story covering an exciting topic. As they say, it is all in how you tell the story. People will usually be most interested in stories about themselves, and, failing that, stories that they can relate to and benefit from. Of course, the absolute best approach is to have a technical topic that is so interesting to your audience that they make up the stories themselves.

That bridge that an earlier section told you to build? You would be wise to build much if not all of it out of stories.

There are many books, courses, and videos on story-telling, which you should take full advantage of. For example, James Patterson has a video series that I watched and learned from on a long airline flight.

The key point is to make your story interesting to your audience. In a technical talk, you have an advantage in that the audience most likely expects that they will need the information that you are presenting. Back in the day, stories of this sort were termed “war stories” in which novice soldiers would listen carefully to stories from veterans, with the expectation that those stories’ lessons would increase the novices’ survival rate. We can all be thankful that most technical talks play for much lower stakes, but the “war story” pattern still encourages careful listening.

Other story patterns include overcoming obstacles, finding treasure, glorious failure, and many more. Describing them and others is beyond the scope of this document, but it turns out that there is much literature on this topic. In the meantime, here are some example talks of mine for each category:

  1. War story: “What Happens When 4096 Cores All Do synchronize_rcu_expedited()?”
  2. Overcoming obstacles: “RCU's First-Ever CVE, and How I Lived to Tell the Tale”
  3. Finding treasure: “Bare-Metal Multicore Performance in a General-Purpose Operating System (Adventures in Ubiquity)” where an attempt to spread wakeups across multiple tasks ended up reducing the number of wakeups by a factor of two. That said, LWN coverage termed this a war story.
  4. Glorious failure: “Cautionary Tales on Implementing the Software That People Want”, especially the “Eight-Bit CRM” section starting on slide 18 in which my highly acclaimed and innovative early 1980s software system was defeated in the market. By a filing cabinet.

Of course, better examples of these story patterns and many more may be found in the works of professional speakers and writers. Unlike these professionals, I tend to add the story pattern later in my preparations. This is because my first priority is not to tell a story, but rather to communicate some technical information. For me, the story is therefore simply a means by which to communicate technical information, not an end unto itself.

History

  1. Add qualifiers on the story's interest to the audience. October 9, 2025.

This is part of the Kernel Recipes 2025 blog series.

You own your words until the moment they leave your lips.  At that moment, ownership passes to your audience members, each of whom is free to interpret your words as they will.  In particular, they are free to misinterpret your words.  And trust me, they will do so.

If you give a presentation that is misinterpreted, it is tempting to blame the audience.  But this is the wrong reaction.  That misinterpretation is instead a bug report against your presentation.  Fix that bug as you prepare for the next instance of that presentation.

This applies doubly for documentation.

But what of audience members who intentionally misinterpret your wording?  This might still be a bug report.  Or it might be parody, and thus an opportunity to laugh at yourself.  And if laughter is good medicine, laughing at yourself is the best medicine, despite how it might feel at the time.  In fact, you might find it helpful to include that parody in the next instance of that presentation in order to spice things up a bit.

Does this mean that you must prepare each and every talk so as to be comprehensible and entertaining to anybody and everybody?

Not necessarily.

After all, university courses often have prerequisites, so that advanced classes can focus on new material rather than having to recap all of the prerequisites. And this can also be the case for presentations.

For example, during my time at Sequent in the 1990s, I was the guy who explained how to take kernel crash dumps. Initially, my target audience was kernel developers and technical service people, and so I targeted them, relying on their knowledge, skills, and abilities.

This worked well, but only for a few years, after which it became necessary for technical sales people and less-technical service people to take crash dumps. At that point, I needed to adjust in order to suit their knowledge, skills, and abilities. I did this by watching while technical sales people attempted to take crash dumps and adding material as needed until they were successful.

So you can restrict your audience, just as many university courses restrict theirs. And you will likely always need to. Returning to a previous example, it is probably not going to be all that useful for me to make a presentation on Linux-kernel RCU internals accessible to quantum physicists, any more than it would be for them to make their presentations accessible to me.

In short, understanding your audience will help you create a presentation that is more likely to have the desired effect. Your audience might well change over time, and if so, your presentation will also need to change.