{"@attributes":{"version":"2.0"},"channel":{"title":"artificial::mind blog","description":"Articles about Graphics, C++, Optimization, Game Development, Programming Languages, and more. Personal blog of Philip Tretter writing under the alias of Artificial Mind.\n","link":"https:\/\/artificial-mind.net","item":[{"title":"Moves in Returns","description":"Mini guide to 'when is my return a move?'","pubDate":"Sat, 23 Oct 2021 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2021\/10\/23\/return-moves","guid":"https:\/\/artificial-mind.net\/blog\/2021\/10\/23\/return-moves","content":"<p>Today we\u2019ll discuss code of the form:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">(<\/span><span class=\"cm\">\/* ... *\/<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"cm\">\/* ... *\/<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">x<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This is a classical \u201creturn-by-value\u201d and (wrongfully) associated with copies and overhead.<\/p>\n\n<p>In many cases, this will actually <code class=\"language-plaintext highlighter-rouge\">move<\/code> the result instead of copying it.\nFor modern C++, one could even argue that this will move in <em>most<\/em> cases (or, as we will see, completely <em>elide<\/em> the copy and directly construct in the result memory).<\/p>\n\n<p>This post discusses several common patterns and if they are moved, copies, or elided.<\/p>\n\n<blockquote>\n  <p>Side note: <em>technically<\/em> a move is a type of copy.\nFor example, <code class=\"language-plaintext highlighter-rouge\">T x = &lt;expr&gt;<\/code> performs a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/copy_initialization\">copy initialization<\/a>, which might select the move constructor during overload resolution.\nThe rest of this post will use the colloquial \u201cmove\u201d for \u201ccalls move ctor or assignment\u201d and \u201ccopy\u201d for \u201ccalls copy ctor or assignment\u201d.<\/p>\n<\/blockquote>\n\n<h1 id=\"tracking-construction-and-assignment\">Tracking Construction and Assignment<\/h1>\n\n<p>Reading the C++ standard (or cppreference) to reason about your code is valuable, but given the flood of information, it can be difficult to draw the correct inferences.\nThus, in addition to this <em>theoretical<\/em> understanding, I love to validate my findings on <a href=\"https:\/\/godbolt.org\/\">godbolt<\/a>.<\/p>\n\n<p>The following examples always use a type <code class=\"language-plaintext highlighter-rouge\">T<\/code>, defined as:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">T<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">T<\/span><span class=\"p\">();<\/span>                    <span class=\"c1\">\/\/ ctor<\/span>\n    <span class=\"o\">~<\/span><span class=\"n\">T<\/span><span class=\"p\">();<\/span>                   <span class=\"c1\">\/\/ dtor<\/span>\n    <span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">);<\/span>                 <span class=\"c1\">\/\/ move ctor<\/span>\n    <span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">);<\/span>            <span class=\"c1\">\/\/ copy ctor<\/span>\n    <span class=\"n\">T<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">);<\/span>      <span class=\"c1\">\/\/ move assignment<\/span>\n    <span class=\"n\">T<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ copy assignment<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>These functions are not implemented on purpose, so we will see the corresponding calls in the assembly.\nFurthermore, I\u2019ll mark the <code class=\"language-plaintext highlighter-rouge\">work<\/code> functions as <code class=\"language-plaintext highlighter-rouge\">__attribute__((noinline))<\/code> so that we can see which special function calls belong where (caller or callee).<\/p>\n\n<h1 id=\"constructing-objects-in-return\">Constructing Objects in Return<\/h1>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">T<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">use<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">obj<\/span> <span class=\"o\">=<\/span> <span class=\"n\">work<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Since C++17, this invokes <strong>mandatory<\/strong> <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/copy_elision\">copy elision<\/a>.\nNo copy or move constructor is called, even if they have side effects.<\/p>\n\n<p><a href=\"https:\/\/godbolt.org\/z\/sYx6h8Mq7\">Assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">work<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<span class=\"n\">use<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">work<\/span><span class=\"p\">()<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::~<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">destructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><code class=\"language-plaintext highlighter-rouge\">T<\/code> is constructed in <code class=\"language-plaintext highlighter-rouge\">work<\/code> in a memory location that is provided by the caller <code class=\"language-plaintext highlighter-rouge\">use<\/code>.\nAt the end of <code class=\"language-plaintext highlighter-rouge\">use<\/code>, <code class=\"language-plaintext highlighter-rouge\">T<\/code> is destructed.\nNo temporary copies are created, nothing is moved.<\/p>\n\n<p>This \u201cchains\u201d in the sense that it also applies to <code class=\"language-plaintext highlighter-rouge\">return other_work();<\/code> where <code class=\"language-plaintext highlighter-rouge\">other_work<\/code> also returns a <code class=\"language-plaintext highlighter-rouge\">T<\/code> by value.<\/p>\n\n<p>With <code class=\"language-plaintext highlighter-rouge\">return T();<\/code>, <code class=\"language-plaintext highlighter-rouge\">work<\/code> will always call exactly one constructor and nothing else from <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nHowever, <code class=\"language-plaintext highlighter-rouge\">use<\/code> is only this simple because we initialize <code class=\"language-plaintext highlighter-rouge\">obj<\/code> with the result of <code class=\"language-plaintext highlighter-rouge\">work()<\/code>.\nIf we assign it to an existing object, we get a temporary:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">T<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">use<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">obj<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">obj<\/span> <span class=\"o\">=<\/span> <span class=\"n\">work<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><a href=\"https:\/\/godbolt.org\/z\/Whcro3vno\">Assembly<\/a>:<\/p>\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">work<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<span class=\"n\">use<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">work<\/span><span class=\"p\">()<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::~<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">destructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><code class=\"language-plaintext highlighter-rouge\">work<\/code> constructs a <code class=\"language-plaintext highlighter-rouge\">T<\/code> for which <code class=\"language-plaintext highlighter-rouge\">use<\/code> provides the stack space.\nThis temporary <code class=\"language-plaintext highlighter-rouge\">T<\/code> is then <em>moved<\/em> into <code class=\"language-plaintext highlighter-rouge\">obj<\/code> using <code class=\"language-plaintext highlighter-rouge\">T::operator=(T&amp;&amp;)<\/code>.\nFinally, the temporary <code class=\"language-plaintext highlighter-rouge\">T<\/code> is destroyed.<\/p>\n\n<p>Before C++17, this type of optimization was allowed, but optional.\nIn particular, if your object is neither copyable nor movable, you could run into compile errors depending on if this optimization was applied or not (e.g. debug vs. release).<\/p>\n\n<blockquote>\n  <p>Note: In C++17, this direct creation of the result in space provided by the caller has the fancy name \u201cunmaterialized value passing\u201d.<\/p>\n<\/blockquote>\n\n<h1 id=\"returning-a-local-variable\">Returning a Local Variable<\/h1>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">T<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n    <span class=\"c1\">\/\/ ...<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">use<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">obj<\/span> <span class=\"o\">=<\/span> <span class=\"n\">work<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Interestingly enough, this results in the same <a href=\"https:\/\/godbolt.org\/z\/nq6zTPKEo\">assembly<\/a> as our previous case:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">work<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<span class=\"n\">use<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">work<\/span><span class=\"p\">()<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::~<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">destructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>No temporary object is created, nothing is moved or copied.\nHowever, this form of <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/copy_elision\">copy elision<\/a> is not mandatory.\nThis is also known as \u201cnamed return value optimization\u201d or NRVO.\nNote that in this case, in contrast to the previous case, <code class=\"language-plaintext highlighter-rouge\">T<\/code> must be copyable or movable, even if the actual copy or move constructor is not called in the end.<\/p>\n\n<p>It gets more interesting if we have other ways out of the function:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">some_condition<\/span><span class=\"p\">())<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">T<\/span><span class=\"p\">();<\/span>\n\n    <span class=\"n\">T<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><a href=\"https:\/\/godbolt.org\/z\/o7Ws33fj1\">Assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">work<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">some_condition<\/span><span class=\"p\">()<\/span>\n  <span class=\"k\">if<\/span><span class=\"o\">:<\/span>\n    <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n    <span class=\"p\">...<\/span>\n    <span class=\"n\">ret<\/span>\n  <span class=\"k\">else<\/span><span class=\"o\">:<\/span>\n    <span class=\"p\">...<\/span>\n    <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n    <span class=\"p\">...<\/span>\n    <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n    <span class=\"p\">...<\/span>\n    <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::~<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">destructor<\/span><span class=\"p\">]<\/span>\n    <span class=\"p\">...<\/span>\n    <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>If the condition is <code class=\"language-plaintext highlighter-rouge\">true<\/code>, we construct the result directly as before.\nHowever, the second case now creates a temporary <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nThis temporary is then move-constructed into the return value.\nAfterwards, the temporary is destructed.<\/p>\n\n<p>No copy elision was performed for <code class=\"language-plaintext highlighter-rouge\">obj<\/code> (though I am not 100% sure why. It should be allowed and possible here.)\nStill, the result is a <em>move<\/em>, not a <em>copy<\/em>.\nThis is a feature of the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/return\">return statement<\/a>:\nSince C++11, <code class=\"language-plaintext highlighter-rouge\">return x<\/code> (or <code class=\"language-plaintext highlighter-rouge\">return (x)<\/code> or <code class=\"language-plaintext highlighter-rouge\">return ((x))<\/code> for that matter) will try to use the move constructor if <code class=\"language-plaintext highlighter-rouge\">x<\/code> is a local variable or a function parameter.<\/p>\n\n<blockquote>\n  <p>Note: the actual rule has a few nuances, but this is a good first-order approximation.<\/p>\n<\/blockquote>\n\n<h1 id=\"moving-from-a-local-variable\">Moving from a Local Variable<\/h1>\n\n<p>You might have seen the following:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">T<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"n\">obj<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Many compilers, IDEs, and linters warn about this.\nGCC might say \u201cmoving a local object in a return statement prevents copy elision\u201d.\nAnd indeed, the <a href=\"https:\/\/godbolt.org\/z\/q6cbzWzrx\">assembly<\/a> is now:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">work<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::~<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">destructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>A temporary that is move-constructed into the return value.\nWithout the <code class=\"language-plaintext highlighter-rouge\">std::move<\/code>, we had no move construction at all.<\/p>\n\n<h1 id=\"returning-a-function-parameter\">Returning a Function Parameter<\/h1>\n\n<p>Local variables and function parameters have slightly different behavior.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span> <span class=\"n\">obj<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">use<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">obj<\/span> <span class=\"o\">=<\/span> <span class=\"n\">work<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">());<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>When <code class=\"language-plaintext highlighter-rouge\">obj<\/code> was a local variable, we had the freedom to change <em>where<\/em> it is allocated.\nIf all paths through the function end in <code class=\"language-plaintext highlighter-rouge\">return obj<\/code>, the compiler could use the caller-provided space for the return value, thus <em>eliding<\/em> any move into the result.<\/p>\n\n<p>However, function parameters are already allocated by the caller and distinct from the return value.\nLuckily, the rules for <code class=\"language-plaintext highlighter-rouge\">return<\/code> statements still apply and we get a move in the <a href=\"https:\/\/godbolt.org\/z\/xj3Wqo8Wb\">assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">work<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<span class=\"n\">use<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">work<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::~<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">destructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::~<\/span><span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">destructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Even passed-by-value, no <code class=\"language-plaintext highlighter-rouge\">T<\/code> is copied in this whole example.\nThe caller (<code class=\"language-plaintext highlighter-rouge\">use<\/code>) creates a <code class=\"language-plaintext highlighter-rouge\">T<\/code> where <code class=\"language-plaintext highlighter-rouge\">work<\/code> will expect it.\n<code class=\"language-plaintext highlighter-rouge\">work<\/code> itself only move-constructs <code class=\"language-plaintext highlighter-rouge\">T<\/code> in the return value.\n<code class=\"language-plaintext highlighter-rouge\">use<\/code> then destructs the argument <code class=\"language-plaintext highlighter-rouge\">T<\/code> (\u201cat the end of the statement\u201d), followed by destruction of the result of <code class=\"language-plaintext highlighter-rouge\">work<\/code> (\u201cat the end of the scope\u201d).<\/p>\n\n<h1 id=\"non-matching-types\">Non-Matching Types<\/h1>\n\n<p>Copy elision only works if the result type matches what we want to return.\nThis might not always be the case.\nSomething I find myself writing with decent frequency:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">optional<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">some_condition<\/span><span class=\"p\">())<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">nullopt<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">return<\/span> <span class=\"n\">T<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And the obvious question is if the second <code class=\"language-plaintext highlighter-rouge\">return<\/code> <em>copies<\/em> or <em>moves<\/em> the <code class=\"language-plaintext highlighter-rouge\">T<\/code> into the <code class=\"language-plaintext highlighter-rouge\">optional&lt;T&gt;<\/code>.<\/p>\n\n<p>More general, let\u2019s say we have a second type <code class=\"language-plaintext highlighter-rouge\">U<\/code> and <code class=\"language-plaintext highlighter-rouge\">T<\/code> has implicit conversions:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">T<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"p\">...<\/span>\n    <span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">U<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"k\">noexcept<\/span><span class=\"p\">;<\/span>      <span class=\"c1\">\/\/ \"move-convert\"<\/span>\n    <span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">U<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span> <span class=\"k\">noexcept<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ \"copy-convert\"<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Now we can ask the question what the following code will call:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">U<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>or<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">U<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Both result in the <a href=\"https:\/\/godbolt.org\/z\/cW5f1K65b\">same<\/a> <a href=\"https:\/\/godbolt.org\/z\/5jsbcWeKz\">assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">work<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">U<\/span><span class=\"o\">::<\/span><span class=\"n\">U<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">T<\/span><span class=\"o\">::<\/span><span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">U<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">constructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">U<\/span><span class=\"o\">::~<\/span><span class=\"n\">U<\/span><span class=\"p\">()<\/span> <span class=\"p\">[<\/span><span class=\"n\">complete<\/span> <span class=\"n\">object<\/span> <span class=\"n\">destructor<\/span><span class=\"p\">]<\/span>\n  <span class=\"p\">...<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>A temporary <code class=\"language-plaintext highlighter-rouge\">U<\/code> is created, move-\u201cconverted\u201d into the result <code class=\"language-plaintext highlighter-rouge\">T<\/code>, and then destructed.\nNo copy involved.<\/p>\n\n<blockquote>\n  <p>Fun fact: in \u201cvanilla\u201d C++11, this created a copy.\nThe behavior was fixed in C++14 and back-ported via <a href=\"https:\/\/wg21.cmeerw.net\/cwg\/issue1579\">defect report<\/a>.\nThus, most compiler with C++14 support will emit the move, even if you explicitly compile for C++11.\nHowever, pre-C++14 compiler might emit the copy.<\/p>\n<\/blockquote>\n\n<h1 id=\"where-copy\">Where Copy??<\/h1>\n\n<p>It delights me to see so many cases where the default (without any <code class=\"language-plaintext highlighter-rouge\">std::move<\/code> involved) will result in either move construction or even complete elision of copy or move.<\/p>\n\n<p>So, when will return-by-value actually copy?\nA small collection of patterns to look out for:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">struct<\/span> <span class=\"p\">{<\/span> <span class=\"n\">T<\/span> <span class=\"n\">t<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">v<\/span><span class=\"p\">.<\/span><span class=\"n\">t<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ COPY! returning a member<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"n\">T<\/span> <span class=\"n\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"n\">T<\/span> <span class=\"n\">globalT<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">globalT<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ COPY! not a local var<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"n\">T<\/span> <span class=\"n\">work<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">obj<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ COPY! T&amp; matches T const&amp;, not T&amp;&amp; <\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"n\">T<\/span> <span class=\"n\">work<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">obj<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ COPY!<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"n\">T<\/span> <span class=\"n\">work<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">obj<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ COPY! \"inside\" work, obj is an lvalue<\/span>\n                <span class=\"c1\">\/\/ NOTE: careful with lifetimes here<\/span>\n                <span class=\"c1\">\/\/ NOTE: is a move in C++20<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"n\">T<\/span> <span class=\"n\">work<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span> <span class=\"k\">const<\/span> <span class=\"n\">obj<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ COPY! T const cannot be moved<\/span>\n                <span class=\"c1\">\/\/ NOTE: const is ignored for the signature,<\/span>\n                <span class=\"c1\">\/\/       which is work(T) and not work(T const)<\/span>\n                <span class=\"c1\">\/\/       but has \"effect\" inside the function<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Maybe unexpectedly, the following <a href=\"https:\/\/godbolt.org\/z\/heveTn3EW\">is elided<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">T<\/span> <span class=\"k\">const<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ no copy or move involved<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Because while we cannot move <code class=\"language-plaintext highlighter-rouge\">T const<\/code>, we can still allocate it directly \u201cin\u201d the return value.\nStill, this is a bit brittle, as a slightly more complex function will create a copy:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">T<\/span> <span class=\"nf\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">some_condition<\/span><span class=\"p\">())<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">T<\/span><span class=\"p\">();<\/span>\n\n    <span class=\"n\">T<\/span> <span class=\"k\">const<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ now it's a copy<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"n\">T<\/span> <span class=\"n\">work<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">T<\/span> <span class=\"k\">const<\/span> <span class=\"n\">obj<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"n\">obj<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ also a copy<\/span>\n                           <span class=\"c1\">\/\/ T const&amp;&amp; matches T const&amp;, not T&amp;&amp;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h1 id=\"conclusion\">Conclusion<\/h1>\n\n<p>We saw many examples where modern C++ now naturally creates moves instead of copies or even elides them altogether and directly constructs the return value \u201cin the proper location\u201d.\nAs the last section showed, you still have to look out for references or non-local variables.\nThis is probably a good thing, because those tend to have multiple aliases, which might take offense if their data suddenly moved away.<\/p>\n\n<p>Most of the explanations in this post are somewhat simplified to make them palatable.\nIn particular, exceptions and <code class=\"language-plaintext highlighter-rouge\">volatile<\/code> variables can complicate the situation a lot.\nAlso, keep in mind that inside a lambda, captures are not considered local variables.<\/p>\n\n<p>The assembly shown in the examples can be considered a kind of worst case scenario, as the compiler has no access to the special member functions of <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nWhen these functions are visible, they can often be inlined and further optimized.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/unsplash.com\/photos\/3vlGNkDep4E\">unsplash<\/a><\/em>)<\/p>"},{"title":"Optimization without Inlining","description":"Even without inlining, the compiler does not always has to assume the worst case.","pubDate":"Sun, 17 Oct 2021 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2021\/10\/17\/optimize-without-inline","guid":"https:\/\/artificial-mind.net\/blog\/2021\/10\/17\/optimize-without-inline","content":"<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Inline_expansion\">Inlining<\/a> is one of the most important compiler optimizations.\nWe can often write abstractions and thin wrapper functions without incurring any performance penalty, because the compiler will expand the method for us at call site.<\/p>\n\n<p>If a function is not inlined, conventional wisdom says that the compiler has to assume that the method can modify any global state and change the memory behind any pointer or reference that might have \u201cescaped\u201d.<\/p>\n\n<p>In this short post, I\u2019ll demonstrate exactly this effect.\nFurthermore, we will see that even if a function is not inlined, as long as the implementation is visible, some optimizations are still performed and sometimes to great effect.<\/p>\n\n<h1 id=\"example\">Example<\/h1>\n\n<p>Let us consider this simple <code class=\"language-plaintext highlighter-rouge\">test<\/code> function,<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ implemented in different TU or otherwise linked in<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n    <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>which results in the following <a href=\"https:\/\/godbolt.org\/z\/oE5ozq1b4\">assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span><span class=\"o\">:<\/span>\n  <span class=\"n\">push<\/span> <span class=\"n\">rbp<\/span>\n  <span class=\"n\">push<\/span> <span class=\"n\">rbx<\/span>\n  <span class=\"n\">push<\/span> <span class=\"n\">rax<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">rbx<\/span><span class=\"p\">,<\/span> <span class=\"n\">rdi<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">edi<\/span><span class=\"p\">,<\/span> <span class=\"n\">dword<\/span> <span class=\"n\">ptr<\/span> <span class=\"p\">[<\/span><span class=\"n\">rdi<\/span><span class=\"p\">]<\/span> <span class=\"c1\">\/\/ load n from memory<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span>            <span class=\"c1\">\/\/ call foo<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">ebp<\/span><span class=\"p\">,<\/span> <span class=\"n\">eax<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">edi<\/span><span class=\"p\">,<\/span> <span class=\"n\">dword<\/span> <span class=\"n\">ptr<\/span> <span class=\"p\">[<\/span><span class=\"n\">rbx<\/span><span class=\"p\">]<\/span> <span class=\"c1\">\/\/ load n from memory _again_<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span>            <span class=\"c1\">\/\/ call foo<\/span>\n  <span class=\"n\">add<\/span> <span class=\"n\">eax<\/span><span class=\"p\">,<\/span> <span class=\"n\">ebp<\/span>\n  <span class=\"n\">add<\/span> <span class=\"n\">rsp<\/span><span class=\"p\">,<\/span> <span class=\"mi\">8<\/span>\n  <span class=\"n\">pop<\/span> <span class=\"n\">rbx<\/span>\n  <span class=\"n\">pop<\/span> <span class=\"n\">rbp<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Unsurprising, <code class=\"language-plaintext highlighter-rouge\">foo<\/code> is called twice.\nFor all we know, it has important side effects that have to be executed twice.\nMaybe a bit more subtle: <code class=\"language-plaintext highlighter-rouge\">n<\/code> had to be reloaded from memory, because the first call to <code class=\"language-plaintext highlighter-rouge\">foo<\/code> might have changed the memory.\n(For example, <code class=\"language-plaintext highlighter-rouge\">n<\/code> might be a reference to a global <code class=\"language-plaintext highlighter-rouge\">int<\/code> that <code class=\"language-plaintext highlighter-rouge\">foo<\/code> happens to increment.)<\/p>\n\n<p>This is really the worst case for the compiler.\nNothing about the inner workings of <code class=\"language-plaintext highlighter-rouge\">foo<\/code> is known.<\/p>\n\n<p>On the other end of the spectrum, we have a small, known <code class=\"language-plaintext highlighter-rouge\">foo<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">int<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">n<\/span> <span class=\"o\">*<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n    <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>which results in the following <a href=\"https:\/\/godbolt.org\/z\/6T6qsd79x\">assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span><span class=\"o\">:<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">eax<\/span><span class=\"p\">,<\/span> <span class=\"n\">dword<\/span> <span class=\"n\">ptr<\/span> <span class=\"p\">[<\/span><span class=\"n\">rdi<\/span><span class=\"p\">]<\/span> <span class=\"c1\">\/\/ load n from memory<\/span>\n  <span class=\"n\">imul<\/span> <span class=\"n\">eax<\/span><span class=\"p\">,<\/span> <span class=\"n\">eax<\/span>            <span class=\"c1\">\/\/ tmp = n*n<\/span>\n  <span class=\"n\">add<\/span> <span class=\"n\">eax<\/span><span class=\"p\">,<\/span> <span class=\"n\">eax<\/span>             <span class=\"c1\">\/\/ return tmp + tmp<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Not only is the call to <code class=\"language-plaintext highlighter-rouge\">foo<\/code> inlined, further optimizations made sure that <code class=\"language-plaintext highlighter-rouge\">n * n<\/code> is not computed twice.<\/p>\n\n<h1 id=\"to-inline-or-not-to-inline\">To Inline or not to Inline<\/h1>\n\n<p>If <code class=\"language-plaintext highlighter-rouge\">foo<\/code> is part of a different TU, the compiler can obviously not inline the call.\nShould you have enough patience, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Interprocedural_optimization\">link-time optimization<\/a> might come to the rescue.\n(It\u2019s usually really expensive, so I can only really recommend it for CI building release versions.)<\/p>\n\n<p>However, there are many other reasons why <code class=\"language-plaintext highlighter-rouge\">foo<\/code> is not inlined.\nIn general, inlining is a double-edged sword.\nEach inlined function increases the pressure on the <a href=\"https:\/\/en.wikipedia.org\/wiki\/CPU_cache\">instruction cache<\/a> and makes the parent function harder to analyze.\nMore local variables mean more <a href=\"https:\/\/en.wikipedia.org\/wiki\/Register_allocation\">register spills<\/a>.\nAll these can negatively affect performance and compilers have various heuristic to try to make good decisions.<\/p>\n\n<p>Most small functions will get inlined by default.\nAs the complexity of the function body rises, this will stop at some point.\nRecursion <em>usually<\/em> prevents inlining, though <a href=\"https:\/\/godbolt.org\/z\/Ehdqcq37c\">compilers are definitely able to inline recursive functions<\/a> via <a href=\"https:\/\/en.wikipedia.org\/wiki\/Tail_call\">tail-call elimination<\/a>.<\/p>\n\n<p>We can artificially prevent inlining by declaring the function <code class=\"language-plaintext highlighter-rouge\">__declspec(noinline)<\/code> (for msvc) or <code class=\"language-plaintext highlighter-rouge\">__attribute__((noinline))<\/code> (for gcc\/clang):<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">__attribute__<\/span><span class=\"p\">((<\/span><span class=\"n\">noinline<\/span><span class=\"p\">))<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">n<\/span> <span class=\"o\">*<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n    <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>which results in the following <a href=\"https:\/\/godbolt.org\/z\/v5bPd99bo\">assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span><span class=\"o\">:<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">edi<\/span><span class=\"p\">,<\/span> <span class=\"n\">dword<\/span> <span class=\"n\">ptr<\/span> <span class=\"p\">[<\/span><span class=\"n\">rdi<\/span><span class=\"p\">]<\/span> <span class=\"c1\">\/\/ load n from memory<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span>            <span class=\"c1\">\/\/ tmp = foo(n)<\/span>\n  <span class=\"n\">add<\/span> <span class=\"n\">eax<\/span><span class=\"p\">,<\/span> <span class=\"n\">eax<\/span>             <span class=\"c1\">\/\/ return tmp + tmp<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>So, obviously <code class=\"language-plaintext highlighter-rouge\">foo<\/code> was not inlined.\nHowever, the compiler could still see and analyze it.\nIt came to the conclusion that <code class=\"language-plaintext highlighter-rouge\">foo<\/code> does not read or modify global state.\n<code class=\"language-plaintext highlighter-rouge\">foo<\/code> was flagged as a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pure_function\">pure function<\/a>.<\/p>\n\n<p>Pure functions are nice.\nPure functions return the same result if given the same input.\nPure functions do not modify global state, they work solely on their input values.\nThus, <code class=\"language-plaintext highlighter-rouge\">foo(n) + foo(n)<\/code> was simplified to <code class=\"language-plaintext highlighter-rouge\">tmp = foo(n); tmp + tmp<\/code>, even without inlining.<\/p>\n\n<h1 id=\"fun-with-loops\">Fun with Loops<\/h1>\n\n<p>The difference becomes even larger with loops:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ different TU<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"nf\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span> <span class=\"o\">++<\/span><span class=\"n\">i<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>which results in the following <a href=\"https:\/\/godbolt.org\/z\/4ErEvzecE\">assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span><span class=\"o\">:<\/span>\n  <span class=\"n\">push<\/span> <span class=\"n\">rbp<\/span>\n  <span class=\"n\">push<\/span> <span class=\"n\">r14<\/span>\n  <span class=\"n\">push<\/span> <span class=\"n\">rbx<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">r14<\/span><span class=\"p\">,<\/span> <span class=\"n\">rdi<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">edi<\/span><span class=\"p\">,<\/span> <span class=\"n\">dword<\/span> <span class=\"n\">ptr<\/span> <span class=\"p\">[<\/span><span class=\"n\">rdi<\/span><span class=\"p\">]<\/span>\n  <span class=\"n\">xor<\/span> <span class=\"n\">ebp<\/span><span class=\"p\">,<\/span> <span class=\"n\">ebp<\/span>\n  <span class=\"n\">test<\/span> <span class=\"n\">edi<\/span><span class=\"p\">,<\/span> <span class=\"n\">edi<\/span> <span class=\"c1\">\/\/ n &lt; 0 case<\/span>\n  <span class=\"n\">js<\/span> <span class=\"p\">.<\/span><span class=\"n\">LBB0_3<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">ebx<\/span><span class=\"p\">,<\/span> <span class=\"o\">-<\/span><span class=\"mi\">1<\/span>\n<span class=\"p\">.<\/span><span class=\"n\">LBB0_2<\/span><span class=\"o\">:<\/span>         <span class=\"c1\">\/\/ loop body begin<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span>            <span class=\"c1\">\/\/ call foo<\/span>\n  <span class=\"n\">add<\/span> <span class=\"n\">ebp<\/span><span class=\"p\">,<\/span> <span class=\"n\">eax<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">edi<\/span><span class=\"p\">,<\/span> <span class=\"n\">dword<\/span> <span class=\"n\">ptr<\/span> <span class=\"p\">[<\/span><span class=\"n\">r14<\/span><span class=\"p\">]<\/span> <span class=\"c1\">\/\/ reload n from memory<\/span>\n  <span class=\"n\">inc<\/span> <span class=\"n\">ebx<\/span>\n  <span class=\"n\">cmp<\/span> <span class=\"n\">ebx<\/span><span class=\"p\">,<\/span> <span class=\"n\">edi<\/span>\n  <span class=\"n\">jl<\/span> <span class=\"p\">.<\/span><span class=\"n\">LBB0_2<\/span>     <span class=\"c1\">\/\/ loop body end<\/span>\n<span class=\"p\">.<\/span><span class=\"n\">LBB0_3<\/span><span class=\"o\">:<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">eax<\/span><span class=\"p\">,<\/span> <span class=\"n\">ebp<\/span>\n  <span class=\"n\">pop<\/span> <span class=\"n\">rbx<\/span>\n  <span class=\"n\">pop<\/span> <span class=\"n\">r14<\/span>\n  <span class=\"n\">pop<\/span> <span class=\"n\">rbp<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>With no information, the compiler has to call <code class=\"language-plaintext highlighter-rouge\">foo<\/code> (and load <code class=\"language-plaintext highlighter-rouge\">n<\/code> from memory) in each iteration.\nContrast this with:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">__attribute__<\/span><span class=\"p\">((<\/span><span class=\"n\">noinline<\/span><span class=\"p\">))<\/span> <span class=\"kt\">int<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">n<\/span> <span class=\"o\">*<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span> <span class=\"o\">++<\/span><span class=\"n\">i<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>which results in the following <a href=\"https:\/\/godbolt.org\/z\/96Mxvsv9b\">assembly<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span><span class=\"o\">:<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">edx<\/span><span class=\"p\">,<\/span> <span class=\"n\">dword<\/span> <span class=\"n\">ptr<\/span> <span class=\"p\">[<\/span><span class=\"n\">rdi<\/span><span class=\"p\">]<\/span>\n  <span class=\"n\">test<\/span> <span class=\"n\">edx<\/span><span class=\"p\">,<\/span> <span class=\"n\">edx<\/span>\n  <span class=\"n\">js<\/span> <span class=\"p\">.<\/span><span class=\"n\">L5<\/span> <span class=\"c1\">\/\/ special case n &lt; 0<\/span>\n  <span class=\"n\">mov<\/span> <span class=\"n\">edi<\/span><span class=\"p\">,<\/span> <span class=\"n\">edx<\/span>\n  <span class=\"n\">call<\/span> <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span>  <span class=\"c1\">\/\/ tmp1 = foo(n)<\/span>\n  <span class=\"n\">imul<\/span> <span class=\"n\">edx<\/span><span class=\"p\">,<\/span> <span class=\"n\">eax<\/span>  <span class=\"c1\">\/\/ tmp2 = tmp1 * n<\/span>\n  <span class=\"n\">add<\/span> <span class=\"n\">eax<\/span><span class=\"p\">,<\/span> <span class=\"n\">edx<\/span>   <span class=\"c1\">\/\/ return tmp2 + n<\/span>\n  <span class=\"n\">ret<\/span>\n<span class=\"p\">.<\/span><span class=\"n\">L5<\/span><span class=\"o\">:<\/span>\n  <span class=\"n\">xor<\/span> <span class=\"n\">eax<\/span><span class=\"p\">,<\/span> <span class=\"n\">eax<\/span> <span class=\"c1\">\/\/ return 0<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>So gcc is able to optimize the whole thing to basically <code class=\"language-plaintext highlighter-rouge\">foo(n) * (n + 1)<\/code>, without inlining.\nFunnily enough, clang tries (and fails) to be clever with lots of SIMD.<\/p>\n\n<h1 id=\"conclusion\">Conclusion<\/h1>\n\n<p>This is not a long post, but it shows that while inlining is a very important optimization, a non-inlined function is not the end of <del>the world<\/del> optimization.\nAs long as the function implementation is visible, compilers can and will analyze them so that they don\u2019t have to assume the worst case.\nThis, in turn, re-enables many optimizations such as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Value_numbering\">value numbering<\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Common_subexpression_elimination\">common subexpression elimination<\/a>, and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Loop-invariant_code_motion\">loop-invariant code motion<\/a>.<\/p>\n\n<p>PS: gcc and clang have a variety of <a href=\"https:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Common-Function-Attributes.html\">function attributes<\/a> that can be used to enable optimizations, even if the implementation is in a different TU.\nThe two most important ones are <code class=\"language-plaintext highlighter-rouge\">__attribute__((const))<\/code> and <code class=\"language-plaintext highlighter-rouge\">__attribute__((pure))<\/code>.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/unsplash.com\/photos\/Y-VYK0SDLxs\">unsplash<\/a><\/em>)<\/p>"},{"title":"Measuring std::unordered_map Badness","description":"When is a good hash a good hash?","pubDate":"Sat, 09 Oct 2021 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2021\/10\/09\/unordered-map-badness","guid":"https:\/\/artificial-mind.net\/blog\/2021\/10\/09\/unordered-map-badness","content":"<p>I probably could have generated more clicks by titling this \u201cHow I Improved <code class=\"language-plaintext highlighter-rouge\">std::unordered_map<\/code> Performance By 50x With This Little Trick\u201d.<\/p>\n\n<p>In reality, this post is mostly about my journey crafting the following snippet that allows you to measure the quality of your hash function on a concrete <code class=\"language-plaintext highlighter-rouge\">std::unordered_map<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">Map<\/span><span class=\"p\">&gt;<\/span> \n<span class=\"kt\">double<\/span> <span class=\"nf\">unordered_map_badness<\/span><span class=\"p\">(<\/span><span class=\"n\">Map<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">map<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"k\">const<\/span> <span class=\"n\">lambda<\/span> <span class=\"o\">=<\/span> <span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">()<\/span> <span class=\"o\">\/<\/span> <span class=\"kt\">double<\/span><span class=\"p\">(<\/span><span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket_count<\/span><span class=\"p\">());<\/span>\n\n    <span class=\"k\">auto<\/span> <span class=\"n\">cost<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"p\">[<\/span><span class=\"n\">k<\/span><span class=\"p\">,<\/span> <span class=\"n\">_<\/span><span class=\"p\">]<\/span> <span class=\"o\">:<\/span> <span class=\"n\">map<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">cost<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket_size<\/span><span class=\"p\">(<\/span><span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket<\/span><span class=\"p\">(<\/span><span class=\"n\">k<\/span><span class=\"p\">));<\/span>\n    <span class=\"n\">cost<\/span> <span class=\"o\">\/=<\/span> <span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">();<\/span>\n\n    <span class=\"k\">return<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">max<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.<\/span><span class=\"p\">,<\/span> <span class=\"n\">cost<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">(<\/span><span class=\"mi\">1<\/span> <span class=\"o\">+<\/span> <span class=\"n\">lambda<\/span><span class=\"p\">)<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>It measures the expected overhead when working with keys that are already in the map.\n0 means that your hash distribution is close to optimal.\nIn my case, the initial hash function had a badness of 550.<\/p>\n\n<p>Note: this post is not about <a href=\"https:\/\/en.wikipedia.org\/wiki\/Perfect_hash_function\">perfect hashes<\/a>, which guarantee zero collisions.\nThose are also super interesting, but have different use cases.<\/p>\n\n<h1 id=\"origin-story\">Origin Story<\/h1>\n\n<p>Either from grad school or simply through years of osmosis, we all learned that hash maps are awesome associative containers that offer a staggering O(1) insert, delete, and lookup (albeit with some overhead).<\/p>\n\n<p>The design space of hash maps is quite large and depending on the use case, the trade-off space can change radically.\n<code class=\"language-plaintext highlighter-rouge\">std::unordered_map<\/code> is (in)famous for having an API that basically forces implementers to use \u201cbuckets with linked lists\u201d, also known as <em>separate chaining<\/em>.\nMany performance-critical applications swear on <em>open addressing<\/em>, often storing keys and values directly in arrays (either together or separate).\nThese are often called <code class=\"language-plaintext highlighter-rouge\">flat_<\/code>maps.\nMany requirements and quality attributes influence which particular type is \u201cbest\u201d:<\/p>\n\n<ul>\n  <li>Is pointer stability required? (often rules out <code class=\"language-plaintext highlighter-rouge\">flat_<\/code> versions, unless only stability of <em>value<\/em> is required)<\/li>\n  <li>Can entries be deleted individually? (often not required, removing tombstone handling)<\/li>\n  <li>How big are the keys and values? (I\u2019ve seen all combinations, even huge-key-small-value is occasionally useful)<\/li>\n  <li>What are the relative frequencies of insert, delete, lookup-with-success, and lookup-without-success?<\/li>\n  <li>Is robustness against adversarial attacks required? (e.g. DoS attacks based on enforced collisions)<\/li>\n  <li>Is the hash collision-free? (keys might share the same bucket, but same hash implies same key)<\/li>\n<\/ul>\n\n<p>There is already a large corpus of constructive (and not so constructive) discussion on all these particulars.\nMany excellent general-purpose and special-purpose hash map implementations are available.\nI\u2019ve added a few links at the end of this post.<\/p>\n\n<p>However, before choosing a certain hash map implementation, there is a certain elephant in the room that I found myself investigating.\nYou see, hash maps require \u201cgood hashes\u201d.\nEveryone knows that.\nBenchmarks often work on random input data, which easily map to \u201cgood hashes\u201d.<\/p>\n\n<p>I have written a few bad hashes over the years and they are really not an issue.\nA really bad hash elevates O(1) to O(n).\nIf simply inserting 100k entries into a hashmap takes half an hour, the problem basically detects itself.<\/p>\n\n<p>No, \u201calmost bad hashes\u201d and \u201cless-than-optimal hashes\u201d are the real issue, the silent killers.\nI recently investigated a piece of code that was too slow for my tastes, but not critically so.\nYou know, code that works on a few hundred thousand elements and takes a few seconds.\nNot suspicious per se, but it happened to be the next chunk I investigated.\nA napkin calculation later, the runtime cost was roughly 2000 CPU cycles per element.\nIt <em>felt<\/em> too much for some simple floating point arithmetic, but knowing that a cold read from memory can take 200 cycles, I argued to myself that it might be an issue with <code class=\"language-plaintext highlighter-rouge\">std::unordered_map<\/code> as we probably all heard that it\u2019s \u201cbadly designed\u201d and \u201ctoo slow\u201d.<\/p>\n\n<p>I was half-way into pulling some Google-grade <code class=\"language-plaintext highlighter-rouge\">flat_map<\/code> into the project when, on a whim, I slightly modified my hash function.\nInstead of the three <code class=\"language-plaintext highlighter-rouge\">float<\/code>s whose bit pattern I scrambled together via <code class=\"language-plaintext highlighter-rouge\">boost::hash_combine<\/code>, I discretized the <code class=\"language-plaintext highlighter-rouge\">float<\/code>s into <code class=\"language-plaintext highlighter-rouge\">int<\/code>s before passing them to <code class=\"language-plaintext highlighter-rouge\">hash_combine<\/code>.<\/p>\n\n<p>The result: 50x improved performance.<\/p>\n\n<h1 id=\"hash-vs-bucket-collisions\">Hash vs. Bucket Collisions<\/h1>\n\n<p>I did not need more evidence that something with the hash was wrong.\nJust to provide the context, these were my hash functions:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">void<\/span> <span class=\"nf\">hash_add<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">hash<\/span><span class=\"p\">,<\/span> <span class=\"kt\">size_t<\/span> <span class=\"n\">new_hash<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ taken from boost::hash_combine<\/span>\n    <span class=\"n\">hash<\/span> <span class=\"o\">^=<\/span> <span class=\"n\">new_hash<\/span> <span class=\"o\">+<\/span> <span class=\"mh\">0x9e3779b9<\/span> <span class=\"o\">+<\/span> <span class=\"p\">(<\/span><span class=\"n\">hash<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"mi\">6<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"p\">(<\/span><span class=\"n\">hash<\/span> <span class=\"o\">&gt;&gt;<\/span> <span class=\"mi\">2<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">size_t<\/span> <span class=\"n\">myhash_float<\/span><span class=\"p\">(<\/span><span class=\"kt\">float<\/span> <span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">y<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">z<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">size_t<\/span> <span class=\"n\">h<\/span> <span class=\"o\">=<\/span> <span class=\"cm\">\/* some fixed seed *\/<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">hash_add<\/span><span class=\"p\">(<\/span><span class=\"n\">h<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">bit_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">uint32_t<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">x<\/span><span class=\"p\">));<\/span>\n    <span class=\"n\">hash_add<\/span><span class=\"p\">(<\/span><span class=\"n\">h<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">bit_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">uint32_t<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">y<\/span><span class=\"p\">));<\/span>\n    <span class=\"n\">hash_add<\/span><span class=\"p\">(<\/span><span class=\"n\">h<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">bit_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">uint32_t<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">z<\/span><span class=\"p\">));<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">h<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">size_t<\/span> <span class=\"n\">myhash_int<\/span><span class=\"p\">(<\/span><span class=\"kt\">float<\/span> <span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">y<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">z<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">size_t<\/span> <span class=\"n\">h<\/span> <span class=\"o\">=<\/span> <span class=\"cm\">\/* some fixed seed *\/<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">hash_add<\/span><span class=\"p\">(<\/span><span class=\"n\">h<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int32_t<\/span><span class=\"p\">(<\/span><span class=\"mi\">256<\/span> <span class=\"o\">*<\/span> <span class=\"n\">x<\/span><span class=\"p\">));<\/span>\n    <span class=\"n\">hash_add<\/span><span class=\"p\">(<\/span><span class=\"n\">h<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int32_t<\/span><span class=\"p\">(<\/span><span class=\"mi\">256<\/span> <span class=\"o\">*<\/span> <span class=\"n\">y<\/span><span class=\"p\">));<\/span>\n    <span class=\"n\">hash_add<\/span><span class=\"p\">(<\/span><span class=\"n\">h<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int32_t<\/span><span class=\"p\">(<\/span><span class=\"mi\">256<\/span> <span class=\"o\">*<\/span> <span class=\"n\">z<\/span><span class=\"p\">));<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">h<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>So, 50x performance difference between <code class=\"language-plaintext highlighter-rouge\">myhash_int<\/code> and <code class=\"language-plaintext highlighter-rouge\">myhash_float<\/code>, eh?<\/p>\n\n<p>First, let me note that <code class=\"language-plaintext highlighter-rouge\">myhash_float<\/code> is not a bad hash per se and it is definitely the more versatile one.\n<code class=\"language-plaintext highlighter-rouge\">myhash_int<\/code> has many collisions if the inputs are too small or if different keys differ by only a small amount.\nIn my case, it worked due to the nature of the input data.<\/p>\n\n<p>I don\u2019t know the complete history and rationale of <code class=\"language-plaintext highlighter-rouge\">boost::hash_combine<\/code> but I guess it was not designed with <code class=\"language-plaintext highlighter-rouge\">float<\/code>s in mind and probably comes from the 32 bit era.\nOn my real-world data set with 90000 entries, I had 3% hash collisions with <code class=\"language-plaintext highlighter-rouge\">myhash_float<\/code> and only 0.2% with <code class=\"language-plaintext highlighter-rouge\">myhash_int<\/code>.\nWhile a \u201creal\u201d hash like <a href=\"https:\/\/github.com\/Cyan4973\/xxHash\"><code class=\"language-plaintext highlighter-rouge\">xxHash<\/code><\/a> realistically produces no collisions for this number of entries, a few collisions do not explain the large performance gap.<\/p>\n\n<blockquote>\n  <p>Side note: <code class=\"language-plaintext highlighter-rouge\">xxHash<\/code> and similar hash functions are mainly designed for high throughput when processing larger amounts of data, like complete files or buffers.\nThat being said, they still often try to guarantee good performance on small data, like keys for hash maps.\n<code class=\"language-plaintext highlighter-rouge\">xxHash<\/code> is also explicitly optimized for good throughput and latency on data consisting of only a few bytes.\nHowever, its higher hash quality is not free and the overhead can be an order of magnitude slowdown when compared to ad-hoc special-purpose hashes for hash maps, like <code class=\"language-plaintext highlighter-rouge\">myhash_xyz<\/code> above.<\/p>\n<\/blockquote>\n\n<p>Back to my issue.\nWhere do we lose 50x performance if the number of collisions is way too low to justify the difference?<\/p>\n\n<p>On a 64 bit desktop, hash maps tend to use 64 bit hashes, like a <code class=\"language-plaintext highlighter-rouge\">size_t<\/code>.\nHowever, the number of buckets is significantly less, typically within factor 2 of the number of entries.\nThus, each hash map has a way to map a hash <code class=\"language-plaintext highlighter-rouge\">h<\/code> to a bucket index <code class=\"language-plaintext highlighter-rouge\">i<\/code>.<\/p>\n\n<p>The naive mapping would be <code class=\"language-plaintext highlighter-rouge\">i = h % bucket_count<\/code>.\nFull 64 bit division and modulo is quite expensive, requiring 25-40 cycles on typical desktops.\nIf <code class=\"language-plaintext highlighter-rouge\">bucket_count<\/code> is a power of two, we can optimize the mapping to <code class=\"language-plaintext highlighter-rouge\">i = h &amp; (bucket_count - 1)<\/code>, which is effectively free.<\/p>\n\n<p>The discerning reader might already see the problem:\nThis mapping now amounts to throwing away most of the bits of <code class=\"language-plaintext highlighter-rouge\">h<\/code>.<\/p>\n\n<p>Imagine your key consists of <code class=\"language-plaintext highlighter-rouge\">uint32_t a, b<\/code> and your hash is <code class=\"language-plaintext highlighter-rouge\">(a &lt;&lt; 32) | b<\/code>.\nThis hash is completely free of collisions.\nHowever, if you have less than 4 billion buckets, then the bucket index will completely ignore <code class=\"language-plaintext highlighter-rouge\">a<\/code>, leading to tons of actual collisions for keys that differ only in <code class=\"language-plaintext highlighter-rouge\">a<\/code>.<\/p>\n\n<p>As said earlier, these obvious cases are usually trivial to detect.\nIn my case, I witnessed a partial quality degradation of the <code class=\"language-plaintext highlighter-rouge\">key -&gt; hash -&gt; idx<\/code> mapping.\nThe input <code class=\"language-plaintext highlighter-rouge\">float<\/code>s came from decompressed 3D positions, so they only had a small range of exponents and a few mantissa patterns that really appeared.\nWith <code class=\"language-plaintext highlighter-rouge\">boost::hash_combine<\/code>, this was <em>somehow<\/em> assembled into a 64 bit hash.\nThe 3% <em>hash<\/em> collision rate probably means that <code class=\"language-plaintext highlighter-rouge\">hash_combine<\/code> did a less-than-perfect job scrambling the <code class=\"language-plaintext highlighter-rouge\">float<\/code> patterns.\nHowever, the real killer came from <code class=\"language-plaintext highlighter-rouge\">std::unordered_map<\/code>, mapping the hash to a bucket index.\nIt turned out that more than 98.6% of the keys had \u201cbucket collisions\u201d, i.e. had to share their bucket with other keys.\nWith <code class=\"language-plaintext highlighter-rouge\">myhash_int<\/code>, this was still 86.2%.<\/p>\n\n<h1 id=\"optimal-behavior\">Optimal Behavior<\/h1>\n\n<p>Before trying to quantify how bad my first hash was, let\u2019s briefly talk about what is the best-case scenario.\nZero bucket collisions are the realm of perfect hashes, which require heavy precomputation and in general only work with known input data.<\/p>\n\n<p>If only the input distribution (not the actual data) is known, we want a <code class=\"language-plaintext highlighter-rouge\">key -&gt; idx<\/code> mapping that is <em>uniform<\/em>.\nWithout getting too fancy in the math: if someone hands you a <code class=\"language-plaintext highlighter-rouge\">key<\/code> drawn from the input distribution, you want to hand back an <code class=\"language-plaintext highlighter-rouge\">idx<\/code> that has a roughly uniform distribution over <code class=\"language-plaintext highlighter-rouge\">0 .. bucket_count-1<\/code>.<\/p>\n\n<p>In reality, the user is responsible for the <code class=\"language-plaintext highlighter-rouge\">key -&gt; hash<\/code> mapping, while the hash map provides the <code class=\"language-plaintext highlighter-rouge\">hash -&gt; idx<\/code> mapping.\nSome cooperation is required to make the total mapping high-quality.\nIn theory, the hash map could always provide a strong <code class=\"language-plaintext highlighter-rouge\">hash -&gt; idx<\/code> mapping, e.g. via <code class=\"language-plaintext highlighter-rouge\">xxHash<\/code>, but that is usually considered net-negative for performance.\nIf the input data is already sufficiently uniform, always paying for this extra hashing is extremely wasteful.<\/p>\n\n<p>So, assuming a uniform mapping, what is the expected number of bucket collisions?<\/p>\n\n<p>Well, that depends on the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/container\/unordered_map\/load_factor\">load_factor<\/a>, i.e. the ratio of input keys to buckets.\nIf the <code class=\"language-plaintext highlighter-rouge\">load_factor<\/code> is 1, then we have an equal number of keys and buckets.\nHere, on average, 37% of buckets are empty, another 37% of buckets have exactly 1 key, 18% have 2 keys, 6% have 3, and 2% have 4 or more keys.\nThis follows a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Poisson_distribution\">Poisson distribution<\/a> where the load factor is lambda.<\/p>\n\n<p>What load factor is optimal is a different discussion, but given a fixed load factor (typically between 0.5 and 1.0), if our hash function produces roughly the same number of collisions as the corresponding Poisson distribution predicts, I would consider it \u201coptimal enough\u201d.<\/p>\n\n<h1 id=\"measuring-badness\">Measuring Badness<\/h1>\n\n<p>So, how does my hash compare to an optimal one?\nHow do we measure that?<\/p>\n\n<p>It turns out that <code class=\"language-plaintext highlighter-rouge\">std::unordered_map<\/code> has a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/container\/unordered_map#Bucket_interface\">bucket API<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">unordered_map<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Key<\/span><span class=\"p\">,<\/span> <span class=\"n\">Value<\/span><span class=\"p\">,<\/span> <span class=\"n\">Hasher<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">my_map<\/span><span class=\"p\">;<\/span>\n\n<span class=\"c1\">\/\/ number of buckets<\/span>\n<span class=\"kt\">size_t<\/span> <span class=\"n\">bcnt<\/span> <span class=\"o\">=<\/span> <span class=\"n\">my_map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket_count<\/span><span class=\"p\">();<\/span>\n\n<span class=\"c1\">\/\/ bucket index from key<\/span>\n<span class=\"kt\">size_t<\/span> <span class=\"n\">bi<\/span> <span class=\"o\">=<\/span> <span class=\"n\">my_map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket<\/span><span class=\"p\">(<\/span><span class=\"n\">some_key<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ number of keys in this bucket<\/span>\n<span class=\"kt\">size_t<\/span> <span class=\"n\">bsize<\/span> <span class=\"o\">=<\/span> <span class=\"n\">my_map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket_size<\/span><span class=\"p\">(<\/span><span class=\"n\">bi<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>With this API, we can measure how close (or far away) we are from the Poisson distribution.\nFor each key in the map, we sum up the corresponding <code class=\"language-plaintext highlighter-rouge\">bucket_size<\/code>.\nDivided by the number of keys, this is the average bucket size from the perspective of a key.\nHalf of this is the expected number of comparisons needed when looking up the key.<\/p>\n\n<p>In the optimal case, <code class=\"language-plaintext highlighter-rouge\">load_factor<\/code> is the average number of keys in a bucket (the expected value of the Poisson distribution).\nHowever, when we know that a given key is part of the map, this average increases to <code class=\"language-plaintext highlighter-rouge\">1 + load_factor<\/code>.<\/p>\n\n<p>The result is this little snippet:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">Map<\/span><span class=\"p\">&gt;<\/span> \n<span class=\"kt\">double<\/span> <span class=\"nf\">unordered_map_badness<\/span><span class=\"p\">(<\/span><span class=\"n\">Map<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">map<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"k\">const<\/span> <span class=\"n\">lambda<\/span> <span class=\"o\">=<\/span> <span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">()<\/span> <span class=\"o\">\/<\/span> <span class=\"kt\">double<\/span><span class=\"p\">(<\/span><span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket_count<\/span><span class=\"p\">());<\/span>\n\n    <span class=\"k\">auto<\/span> <span class=\"n\">cost<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"p\">[<\/span><span class=\"n\">k<\/span><span class=\"p\">,<\/span> <span class=\"n\">_<\/span><span class=\"p\">]<\/span> <span class=\"o\">:<\/span> <span class=\"n\">map<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">cost<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket_size<\/span><span class=\"p\">(<\/span><span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">bucket<\/span><span class=\"p\">(<\/span><span class=\"n\">k<\/span><span class=\"p\">));<\/span>\n    <span class=\"n\">cost<\/span> <span class=\"o\">\/=<\/span> <span class=\"n\">map<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">();<\/span>\n\n    <span class=\"k\">return<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">max<\/span><span class=\"p\">(<\/span><span class=\"mf\">0.<\/span><span class=\"p\">,<\/span> <span class=\"n\">cost<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">(<\/span><span class=\"mi\">1<\/span> <span class=\"o\">+<\/span> <span class=\"n\">lambda<\/span><span class=\"p\">)<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>I was too lazy to template it on all 5 types required for <code class=\"language-plaintext highlighter-rouge\">std::unordered_map<\/code>.\nOfficially, I call that a feature because now it supports all types with a compatible bucket API.<\/p>\n\n<p>The return value is slightly remapped.\n<code class=\"language-plaintext highlighter-rouge\">cost \/ (1 + lambda)<\/code> is the relative cost factor to the optimal distribution.\nWe subtract 1 and clamp it to 0 from below so it\u2019s a bit easier to read:\nA badness of roughly 0 means that the current bucket distribution is close to optimal.\n1 means that on average 100% more comparisons than optimal are required.<\/p>\n\n<h1 id=\"fixing-my-issue\">\u201cFixing\u201d my Issue<\/h1>\n\n<p>Turns out, <code class=\"language-plaintext highlighter-rouge\">myhash_float<\/code> has a badness of 550 on my data. Ouch.<\/p>\n\n<p>Of course, this does not mean that the performance gap to the optimal case is a factor of 550.\nSimilar to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Amdahl%27s_law\">Amdahl\u2019s law<\/a>, this factor is only realized if the program literally does nothing else (and we ignore caching and other effects that reality pesters us with).<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">myhash_int<\/code> still has a badness of 1.3.<\/p>\n\n<p>Not optimal, but so much better that my program sped up by a factor of 50+.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">xxHash<\/code> directly on my 3 input <code class=\"language-plaintext highlighter-rouge\">float<\/code>s has a badness of 0.<\/p>\n\n<p>Interestingly enough, using the result of <code class=\"language-plaintext highlighter-rouge\">myhash_float<\/code> as a seed for a single round of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Xorshift\">xorshift<\/a> has a badness of 0.04.\nSo, even if <code class=\"language-plaintext highlighter-rouge\">myhash_float<\/code> has 3% innate collision rate, a cheap scrambling at the end is all it takes to get near optimal hash distribution.<\/p>\n\n<p>Xorshift consists of a state update and an output scramble.\nAs only the second step is needed, you can funnily enough fix most suboptimal hashes by simply calling:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">size_t<\/span> <span class=\"nf\">fix_my_hash<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">h<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">h<\/span> <span class=\"o\">*<\/span> <span class=\"mh\">0xd989bcacc137dcd5ull<\/span> <span class=\"o\">&gt;&gt;<\/span> <span class=\"mi\">32u<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h1 id=\"conclusion\">Conclusion<\/h1>\n\n<p>In my opinion, hash maps are among the most interesting data structures.\nThe only container that is more useful is the array, but apart from abstractions like <code class=\"language-plaintext highlighter-rouge\">vector<\/code> and <code class=\"language-plaintext highlighter-rouge\">span<\/code>, arrays are quite compact in design space.\nHash maps have a plethora of useful variants and tradeoffs.<\/p>\n\n<p>This post is about the practical quality of hash functions and how to measure them.\nWith the bucket API of <code class=\"language-plaintext highlighter-rouge\">std::unordered_map<\/code>, we can actually quantify how far from optimal our concrete map is.<\/p>\n\n<p>My little snipped can be adapted to hash maps with open addressing by measuring the number of comparisons needed until a key is found.\nThis is usually not an exposed metric, though I suppose one could simply write a counting equality comparer for that.<\/p>\n\n<h1 id=\"further-reading\">Further Reading<\/h1>\n\n<ul>\n  <li><a href=\"https:\/\/tessil.github.io\/2016\/08\/29\/benchmark-hopscotch-map.html\">Benchmark of major hash maps implementations (2016)<\/a><\/li>\n  <li><a href=\"https:\/\/probablydance.com\/2017\/02\/26\/i-wrote-the-fastest-hashtable\/\">I Wrote The Fastest Hashtable (2017)<\/a><\/li>\n  <li><a href=\"https:\/\/martin.ankerl.com\/2019\/04\/01\/hashmap-benchmarks-01-overview\/\">Hashmaps Benchmarks - Overview (2019)<\/a><\/li>\n<\/ul>\n\n<p>Additional discussion and comments on <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/q551zg\/measuring_stdunordered_map_hash_badness\/\">reddit<\/a>.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/europe-travel-map-world-1264062\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"std::sort multiple ranges","description":"Sorting a range of keys while keeping a range of values in sync.","pubDate":"Sat, 28 Nov 2020 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/11\/28\/std-sort-multiple-ranges","guid":"https:\/\/artificial-mind.net\/blog\/2020\/11\/28\/std-sort-multiple-ranges","content":"<p><code class=\"language-plaintext highlighter-rouge\">std::sort<\/code> is a great utility.\nYou can easily sort subranges and provide custom comparison functions.\nHowever, it struggles with the following scenario:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">keys<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">values<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">sort<\/span><span class=\"p\">(...);<\/span> <span class=\"c1\">\/\/ ???<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>We want to sort by <code class=\"language-plaintext highlighter-rouge\">keys<\/code> but keep the 1-on-1 correspondence with <code class=\"language-plaintext highlighter-rouge\">values<\/code>, i.e. keep the ranges \u201cin sync\u201d during sorting.\nA common solution is to allocate a vector of indices, sort these indices, and then apply the resulting permutation.\nHowever, the need for an additional allocation and bad cache locality due to indirection make this a suboptimal solution.<\/p>\n\n<p>In this post, we will design a custom iterator that allows us to sort the two ranges directly without allocation overhead.\nThe final usage will be:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">sort<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span><span class=\"p\">{<\/span>          <span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"n\">keys<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">(),<\/span> <span class=\"n\">values<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">()},<\/span>\n          <span class=\"n\">sort_it<\/span><span class=\"p\">{<\/span><span class=\"n\">keys<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">(),<\/span> <span class=\"n\">keys<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">(),<\/span> <span class=\"n\">values<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">()});<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>For clarity of exposition, we will assume <code class=\"language-plaintext highlighter-rouge\">int<\/code> keys, <code class=\"language-plaintext highlighter-rouge\">string<\/code> values, and contiguous ranges for this post.\nMaking the technique generic, variadic, and support all random access iterators is left as an exercise for the reader (or an additional blog post).<\/p>\n\n<h2 id=\"the-index-solution\">The Index Solution<\/h2>\n\n<p>Before we start, let\u2019s take a quick look at the mentioned index-based solution:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ make indices = {0, 1, 2, ...}<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">indices<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">keys<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">());<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">iota<\/span><span class=\"p\">(<\/span><span class=\"n\">indices<\/span><span class=\"p\">.<\/span><span class=\"n\">begin<\/span><span class=\"p\">(),<\/span> <span class=\"n\">indices<\/span><span class=\"p\">.<\/span><span class=\"n\">end<\/span><span class=\"p\">(),<\/span> <span class=\"mi\">0<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ sort indices while comparing keys<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">sort<\/span><span class=\"p\">(<\/span><span class=\"n\">indices<\/span><span class=\"p\">.<\/span><span class=\"n\">begin<\/span><span class=\"p\">(),<\/span> <span class=\"n\">indices<\/span><span class=\"p\">.<\/span><span class=\"n\">end<\/span><span class=\"p\">(),<\/span> <span class=\"p\">[<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">keys<\/span><span class=\"p\">](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">keys<\/span><span class=\"p\">[<\/span><span class=\"n\">a<\/span><span class=\"p\">]<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">keys<\/span><span class=\"p\">[<\/span><span class=\"n\">b<\/span><span class=\"p\">];<\/span>\n<span class=\"p\">});<\/span>\n\n<span class=\"c1\">\/\/ apply permutation<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">old_keys<\/span> <span class=\"o\">=<\/span> <span class=\"n\">keys<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ copy<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">old_values<\/span> <span class=\"o\">=<\/span> <span class=\"n\">values<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ copy<\/span>\n<span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">keys<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">();<\/span> <span class=\"o\">++<\/span><span class=\"n\">i<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"n\">keys<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">old_keys<\/span><span class=\"p\">[<\/span><span class=\"n\">indices<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]];<\/span>\n    <span class=\"n\">values<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">old_values<\/span><span class=\"p\">[<\/span><span class=\"n\">indices<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]];<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The first allocation is needed for the temporary <code class=\"language-plaintext highlighter-rouge\">indices<\/code> vector.\nSlightly more non-obvious is the need for copies of <code class=\"language-plaintext highlighter-rouge\">keys<\/code> and <code class=\"language-plaintext highlighter-rouge\">values<\/code> before applying the permutation.\nA simple <code class=\"language-plaintext highlighter-rouge\">keys[i] = keys[indices[i]];<\/code> would yield the wrong result (imagine what happens if <code class=\"language-plaintext highlighter-rouge\">keys[indices[i]]<\/code> was already overwritten in a previous loop iteration).<\/p>\n\n<p>There are ways to avoid the key\/value copies but they are quite a bit more involved.\nPermutations can be decomposed into disjoint cycles and then applied via a cycling swap for each cycle.\nAlternatively, permutations can also be decomposed into a series of transpositions, i.e. a sequence of <code class=\"language-plaintext highlighter-rouge\">std::swap(keys[i], keys[j])<\/code>.<\/p>\n\n<h2 id=\"a-custom-iterator\">A Custom Iterator<\/h2>\n\n<p>Ok, let\u2019s design a non-allocating \u201cmulti-sort\u201d solution.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">std::sort<\/code> accepts any random access iterator (technically, the iterator must satisfy the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/named_req\/RandomAccessIterator\">LegacyRandomAccessIterator<\/a> and <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/named_req\/ValueSwappable\">ValueSwappable<\/a> concepts).\nThus, we should be able to write an iterator that basically \u201cbundles\u201d iterators into <code class=\"language-plaintext highlighter-rouge\">keys<\/code> and <code class=\"language-plaintext highlighter-rouge\">values<\/code>, compares <code class=\"language-plaintext highlighter-rouge\">keys<\/code>, and swaps both, thus keeping the ranges in sync.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">sort_it<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">size_t<\/span> <span class=\"n\">index<\/span><span class=\"p\">;<\/span>\n    <span class=\"kt\">int<\/span><span class=\"o\">*<\/span> <span class=\"n\">keys<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"o\">*<\/span> <span class=\"n\">values<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The idea is that the iterator only updates <code class=\"language-plaintext highlighter-rouge\">index<\/code> and keeps the pointers to <code class=\"language-plaintext highlighter-rouge\">keys<\/code> and <code class=\"language-plaintext highlighter-rouge\">values<\/code> unchanged.\nIterators need to provide a certain set of operations, <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/named_req\/RandomAccessIterator\">LegacyRandomAccessIterator<\/a> a whole zoo of them.<\/p>\n\n<p>First, we define some types in our <code class=\"language-plaintext highlighter-rouge\">sort_it<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">using<\/span> <span class=\"n\">iterator_category<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">random_access_iterator_tag<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">difference_type<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">int64_t<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">value_type<\/span> <span class=\"o\">=<\/span> <span class=\"o\">???<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">pointer<\/span> <span class=\"o\">=<\/span> <span class=\"n\">value_type<\/span><span class=\"o\">*<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">reference<\/span> <span class=\"o\">=<\/span> <span class=\"o\">???<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"the-reference-and-value-problem\">The Reference and Value Problem<\/h2>\n\n<p>Usually, <code class=\"language-plaintext highlighter-rouge\">value_type<\/code> is some type <code class=\"language-plaintext highlighter-rouge\">T<\/code> and we would choose <code class=\"language-plaintext highlighter-rouge\">reference<\/code> to be <code class=\"language-plaintext highlighter-rouge\">T&amp;<\/code>.\nHowever, in our case, this doesn\u2019t work.\n<code class=\"language-plaintext highlighter-rouge\">value_type<\/code> would be something like <code class=\"language-plaintext highlighter-rouge\">pair&lt;int, string&gt;<\/code>.\nBut <code class=\"language-plaintext highlighter-rouge\">reference<\/code> cannot be <code class=\"language-plaintext highlighter-rouge\">pair&lt;int, string&gt;&amp;<\/code>, as we have two separate arrays, not a single array of pairs.\nThus, we have to roll our own, separate types for them.\nLet\u2019s call these <code class=\"language-plaintext highlighter-rouge\">val<\/code> and <code class=\"language-plaintext highlighter-rouge\">ref<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">val<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">ref<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">int<\/span><span class=\"o\">*<\/span> <span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"o\">*<\/span> <span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">sort_it<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"p\">...<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">value_type<\/span> <span class=\"o\">=<\/span> <span class=\"n\">val<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">reference<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ref<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">...<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><code class=\"language-plaintext highlighter-rouge\">std::sort<\/code> uses more than one sorting algorithm in most implementations.\nUsually, we see some <code class=\"language-plaintext highlighter-rouge\">O(n log n)<\/code> algorithm (e.g. quicksort) for large ranges and an <code class=\"language-plaintext highlighter-rouge\">O(n^2)<\/code> algorithm (like insertion sort) for small ranges, as they tend to be faster for small <code class=\"language-plaintext highlighter-rouge\">n<\/code> in practice.\nSome of these use <code class=\"language-plaintext highlighter-rouge\">swap<\/code>, some do not.\nWhen reading the gcc implementation, I found that we need to support the following uses:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ case A:<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">iter_swap<\/span><span class=\"p\">(<\/span><span class=\"n\">it_a<\/span><span class=\"p\">,<\/span> <span class=\"n\">it_b<\/span><span class=\"p\">);<\/span>\n<span class=\"c1\">\/\/ ... which calls:<\/span>\n<span class=\"n\">swap<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">it_a<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"n\">it_b<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ case B:<\/span>\n<span class=\"n\">value_type<\/span> <span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">it<\/span><span class=\"p\">);<\/span>\n<span class=\"o\">*<\/span><span class=\"n\">it<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">other_it<\/span><span class=\"p\">);<\/span>\n<span class=\"o\">*<\/span><span class=\"n\">other_it<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Our iterator returns <code class=\"language-plaintext highlighter-rouge\">ref<\/code> via <code class=\"language-plaintext highlighter-rouge\">sort_it::operator*()<\/code>.\nThis is used directly in <code class=\"language-plaintext highlighter-rouge\">swap<\/code>, so we need to provide a <code class=\"language-plaintext highlighter-rouge\">swap(ref, ref)<\/code>.\nNote that <code class=\"language-plaintext highlighter-rouge\">ref<\/code> is passed by value and cannot be <code class=\"language-plaintext highlighter-rouge\">ref&amp;<\/code> as <code class=\"language-plaintext highlighter-rouge\">*it<\/code> is not an lvalue.\nThe desired semantic is, of course, to swap where <code class=\"language-plaintext highlighter-rouge\">ref<\/code> points to, not the pointers in <code class=\"language-plaintext highlighter-rouge\">ref<\/code> themselves.\nWe provide <code class=\"language-plaintext highlighter-rouge\">swap<\/code> via <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/friend\">(hidden) friend<\/a> in <code class=\"language-plaintext highlighter-rouge\">ref<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">ref<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"p\">...<\/span>\n\n    <span class=\"k\">friend<\/span> <span class=\"kt\">void<\/span> <span class=\"n\">swap<\/span><span class=\"p\">(<\/span><span class=\"n\">ref<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">ref<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"k\">using<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">swap<\/span><span class=\"p\">;<\/span>\n        <span class=\"n\">swap<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">);<\/span>\n        <span class=\"n\">swap<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>That solves case A.<\/p>\n\n<p>Case B needs a bit more attention.<\/p>\n\n<p>First, we have <code class=\"language-plaintext highlighter-rouge\">value_type v = std::move(*it);<\/code>, where <code class=\"language-plaintext highlighter-rouge\">val<\/code> has to be implicitly constructible from <code class=\"language-plaintext highlighter-rouge\">ref&amp;&amp;<\/code>.\nThe intention is to move a value temporarily out of the range and later \u201creturn\u201d it via <code class=\"language-plaintext highlighter-rouge\">*other_it = std::move(v);<\/code>. \nThis implicit conversion can be simply provided by a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/cast_operator\">user-defined conversion operator<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">ref<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"p\">...<\/span>\n\n    <span class=\"k\">operator<\/span> <span class=\"n\">val<\/span><span class=\"p\">()<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">key<\/span><span class=\"p\">),<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">value<\/span><span class=\"p\">)};<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Note the <code class=\"language-plaintext highlighter-rouge\">&amp;&amp;<\/code> at the end of the signature.\nThis is a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/member_functions#ref-qualified_member_functions\">ref-qualified member function<\/a> and basically means that the <code class=\"language-plaintext highlighter-rouge\">ref<\/code> to <code class=\"language-plaintext highlighter-rouge\">val<\/code> conversion is only allowed for rvalue <code class=\"language-plaintext highlighter-rouge\">ref&amp;&amp;<\/code>s.\nThus, we can safely move <code class=\"language-plaintext highlighter-rouge\">key<\/code> and <code class=\"language-plaintext highlighter-rouge\">value<\/code> into our <code class=\"language-plaintext highlighter-rouge\">val<\/code> and no <code class=\"language-plaintext highlighter-rouge\">std::string<\/code> was copied in the process.<\/p>\n\n<p>Alternatively, we could have added a <code class=\"language-plaintext highlighter-rouge\">val(ref&amp;&amp;)<\/code> constructor to <code class=\"language-plaintext highlighter-rouge\">val<\/code>.<\/p>\n\n<p>The \u201cinverse\u201d operator of <code class=\"language-plaintext highlighter-rouge\">ref&amp;&amp;<\/code> to <code class=\"language-plaintext highlighter-rouge\">val<\/code> is done at <code class=\"language-plaintext highlighter-rouge\">*other_it = std::move(v);<\/code>.\nThis is a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/operator_assignment\">simple assignment operator<\/a>, accepting <code class=\"language-plaintext highlighter-rouge\">val&amp;&amp;<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">ref<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"p\">...<\/span>\n\n    <span class=\"n\">ref<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">val<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">v<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"o\">*<\/span><span class=\"n\">key<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">);<\/span>\n        <span class=\"o\">*<\/span><span class=\"n\">value<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"p\">);<\/span>\n        <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"k\">this<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>There is one, slightly weird assignment left: <code class=\"language-plaintext highlighter-rouge\">*it = std::move(*other_it);<\/code>.\nHere, we assign <code class=\"language-plaintext highlighter-rouge\">ref&amp;&amp;<\/code> to <code class=\"language-plaintext highlighter-rouge\">ref<\/code> and the semantics is to move what <code class=\"language-plaintext highlighter-rouge\">*other_it<\/code> points to into what <code class=\"language-plaintext highlighter-rouge\">*it<\/code> points to.\nImplementation-wise, this is another simple assignment:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">ref<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"p\">...<\/span>\n\n    <span class=\"n\">ref<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">ref<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"o\">*<\/span><span class=\"n\">key<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">);<\/span>\n        <span class=\"o\">*<\/span><span class=\"n\">value<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"p\">);<\/span>\n        <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"k\">this<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And with this, we have finally modelled the \u201creference semantics\u201d of <code class=\"language-plaintext highlighter-rouge\">ref<\/code> and proper interactions with <code class=\"language-plaintext highlighter-rouge\">val<\/code>.<\/p>\n\n<h2 id=\"remaining-operators\">Remaining Operators<\/h2>\n\n<p>There are two classes of operators missing.<\/p>\n\n<p>First, we want properly defined default comparison, i.e. by default, <code class=\"language-plaintext highlighter-rouge\">std::sort<\/code> should sort by <code class=\"language-plaintext highlighter-rouge\">operator&lt;<\/code> of our keys.\nUnfortunately, almost all combinations are <code class=\"language-plaintext highlighter-rouge\">ref<\/code> and <code class=\"language-plaintext highlighter-rouge\">val<\/code> are actually used.\nWe have <code class=\"language-plaintext highlighter-rouge\">*it_a &lt; *it_b<\/code>, <code class=\"language-plaintext highlighter-rouge\">v &lt; *it<\/code>, and <code class=\"language-plaintext highlighter-rouge\">*it &lt; v<\/code>.\nThe only missing combination is <code class=\"language-plaintext highlighter-rouge\">val &lt; val<\/code>.\nOur implicit conversion from <code class=\"language-plaintext highlighter-rouge\">ref&amp;&amp;<\/code> to <code class=\"language-plaintext highlighter-rouge\">val<\/code> is moving values, which we obviously don\u2019t want for a comparison.\nThus, we simply define all needed combinations:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">ref<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">val<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">val<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">ref<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span> <span class=\"o\">&lt;<\/span> <span class=\"o\">*<\/span><span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">ref<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">ref<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span> <span class=\"o\">&lt;<\/span> <span class=\"o\">*<\/span><span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Secondly, we need the previously mentioned zoo of random access iterator operators.\nThe only slightly interesting one in our case is <code class=\"language-plaintext highlighter-rouge\">operator*<\/code> for producing <code class=\"language-plaintext highlighter-rouge\">ref<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">sort_it<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"p\">...<\/span>\n\n    <span class=\"n\">ref<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">keys<\/span> <span class=\"o\">+<\/span> <span class=\"n\">index<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span> <span class=\"o\">+<\/span> <span class=\"n\">index<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>All the others are the typical <code class=\"language-plaintext highlighter-rouge\">==<\/code>, <code class=\"language-plaintext highlighter-rouge\">!=<\/code>, <code class=\"language-plaintext highlighter-rouge\">+<\/code>, <code class=\"language-plaintext highlighter-rouge\">-<\/code>, <code class=\"language-plaintext highlighter-rouge\">++<\/code>, <code class=\"language-plaintext highlighter-rouge\">--<\/code>, <code class=\"language-plaintext highlighter-rouge\">&lt;<\/code>, etc. that you\u2019d expect of a random access iterator.\nI was too lazy to implement all of them and only did what I needed for <code class=\"language-plaintext highlighter-rouge\">std::sort<\/code> on gcc:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">sort_it<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"p\">...<\/span>\n\n    <span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">==<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">index<\/span> <span class=\"o\">==<\/span> <span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n    <span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">!=<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">index<\/span> <span class=\"o\">!=<\/span> <span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n\n    <span class=\"n\">sort_it<\/span> <span class=\"k\">operator<\/span><span class=\"o\">+<\/span><span class=\"p\">(<\/span><span class=\"n\">difference_type<\/span> <span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">index<\/span> <span class=\"o\">+<\/span> <span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">keys<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n    <span class=\"n\">sort_it<\/span> <span class=\"k\">operator<\/span><span class=\"o\">-<\/span><span class=\"p\">(<\/span><span class=\"n\">difference_type<\/span> <span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">index<\/span> <span class=\"o\">-<\/span> <span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">keys<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n\n    <span class=\"n\">difference_type<\/span> <span class=\"k\">operator<\/span><span class=\"o\">-<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> \n    <span class=\"p\">{<\/span> \n        <span class=\"k\">return<\/span> <span class=\"n\">difference_type<\/span><span class=\"p\">(<\/span><span class=\"n\">index<\/span><span class=\"p\">)<\/span> <span class=\"o\">-<\/span> <span class=\"n\">difference_type<\/span><span class=\"p\">(<\/span><span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">);<\/span> \n    <span class=\"p\">}<\/span>\n\n    <span class=\"n\">sort_it<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">++<\/span><span class=\"p\">()<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"o\">++<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span>\n        <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"k\">this<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n    <span class=\"n\">sort_it<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">--<\/span><span class=\"p\">()<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"o\">--<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span>\n        <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"k\">this<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n\n    <span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">index<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>In C++20, we would need to implement all operators as the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/named_req\/RandomAccessIterator\">associated concept<\/a> would be checked.\nOf course, in a production-grade implementation, we would also implement them all.<\/p>\n\n<p>And with this, we\u2019re done.<\/p>\n\n<p>Now, the example in the introduction works:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">keys<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">values<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">sort<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span><span class=\"p\">{<\/span>          <span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"n\">keys<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">(),<\/span> <span class=\"n\">values<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">()},<\/span>\n          <span class=\"n\">sort_it<\/span><span class=\"p\">{<\/span><span class=\"n\">keys<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">(),<\/span> <span class=\"n\">keys<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">(),<\/span> <span class=\"n\">values<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">()});<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"final-version\">Final Version<\/h2>\n\n<p>A fully working example can be found <a href=\"https:\/\/godbolt.org\/z\/fb76e1\">here on godbolt<\/a>.\nI\u2019ve added a <code class=\"language-plaintext highlighter-rouge\">main<\/code> function, so be sure to check the output.<\/p>\n\n<p>Our custom iterator, value, and reference type is below 100 LOC, so it\u2019s almost a compact solution!<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">val<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">ref<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">int<\/span><span class=\"o\">*<\/span> <span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"o\">*<\/span> <span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"n\">ref<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">ref<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"o\">*<\/span><span class=\"n\">key<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">);<\/span>\n        <span class=\"o\">*<\/span><span class=\"n\">value<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"p\">);<\/span>\n        <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"k\">this<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n\n    <span class=\"n\">ref<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">val<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"o\">*<\/span><span class=\"n\">key<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">);<\/span>\n        <span class=\"o\">*<\/span><span class=\"n\">value<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"p\">);<\/span>\n        <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"k\">this<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n\n    <span class=\"k\">friend<\/span> <span class=\"kt\">void<\/span> <span class=\"n\">swap<\/span><span class=\"p\">(<\/span><span class=\"n\">ref<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">ref<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">swap<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">);<\/span>\n        <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">swap<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"p\">,<\/span> <span class=\"o\">*<\/span><span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">}<\/span>\n\n    <span class=\"k\">operator<\/span> <span class=\"n\">val<\/span><span class=\"p\">()<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">key<\/span><span class=\"p\">),<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">move<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">value<\/span><span class=\"p\">)};<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">ref<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">val<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">val<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">ref<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span> <span class=\"o\">&lt;<\/span> <span class=\"o\">*<\/span><span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">ref<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">ref<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span> <span class=\"o\">&lt;<\/span> <span class=\"o\">*<\/span><span class=\"n\">b<\/span><span class=\"p\">.<\/span><span class=\"n\">key<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">sort_it<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">iterator_category<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">random_access_iterator_tag<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">difference_type<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">int64_t<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">value_type<\/span> <span class=\"o\">=<\/span> <span class=\"n\">val<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">pointer<\/span> <span class=\"o\">=<\/span> <span class=\"n\">value_type<\/span><span class=\"o\">*<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">reference<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ref<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"kt\">size_t<\/span> <span class=\"n\">index<\/span><span class=\"p\">;<\/span>\n    <span class=\"kt\">int<\/span><span class=\"o\">*<\/span> <span class=\"n\">keys<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"o\">*<\/span> <span class=\"n\">values<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">==<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">index<\/span> <span class=\"o\">==<\/span> <span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n    <span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">!=<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">index<\/span> <span class=\"o\">!=<\/span> <span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n\n    <span class=\"n\">sort_it<\/span> <span class=\"k\">operator<\/span><span class=\"o\">+<\/span><span class=\"p\">(<\/span><span class=\"n\">difference_type<\/span> <span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">index<\/span> <span class=\"o\">+<\/span> <span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">keys<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n    <span class=\"n\">sort_it<\/span> <span class=\"k\">operator<\/span><span class=\"o\">-<\/span><span class=\"p\">(<\/span><span class=\"n\">difference_type<\/span> <span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">index<\/span> <span class=\"o\">-<\/span> <span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">keys<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n    \n    <span class=\"n\">difference_type<\/span> <span class=\"k\">operator<\/span><span class=\"o\">-<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> \n    <span class=\"p\">{<\/span> \n        <span class=\"k\">return<\/span> <span class=\"n\">difference_type<\/span><span class=\"p\">(<\/span><span class=\"n\">index<\/span><span class=\"p\">)<\/span> <span class=\"o\">-<\/span> <span class=\"n\">difference_type<\/span><span class=\"p\">(<\/span><span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">);<\/span> \n    <span class=\"p\">}<\/span>\n    \n    <span class=\"n\">sort_it<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">++<\/span><span class=\"p\">()<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"o\">++<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span>\n        <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"k\">this<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n    <span class=\"n\">sort_it<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">--<\/span><span class=\"p\">()<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"o\">--<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span>\n        <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"k\">this<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n\n    <span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">sort_it<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">r<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">index<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">r<\/span><span class=\"p\">.<\/span><span class=\"n\">index<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n\n    <span class=\"n\">ref<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">keys<\/span> <span class=\"o\">+<\/span> <span class=\"n\">index<\/span><span class=\"p\">,<\/span> <span class=\"n\">values<\/span> <span class=\"o\">+<\/span> <span class=\"n\">index<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>We set out to write a custom iterator that is able to sort two ranges in parallel, treating one as the \u201ckey\u201d and the other as \u201cvalue\u201d that has to be kept in sync.\n<code class=\"language-plaintext highlighter-rouge\">swap<\/code> alone was not sufficient to get the correct behavior, as some parts of the <code class=\"language-plaintext highlighter-rouge\">std::sort<\/code> implementation operate on <code class=\"language-plaintext highlighter-rouge\">value_type<\/code> and <code class=\"language-plaintext highlighter-rouge\">reference<\/code> directly.\nThus, we created custom <code class=\"language-plaintext highlighter-rouge\">ref<\/code> and <code class=\"language-plaintext highlighter-rouge\">val<\/code> structs and implemented all required operators to model the proper semantics.\nNote that we could not use <code class=\"language-plaintext highlighter-rouge\">val&amp;<\/code> as the reference type because our ranges are independent and do not store elements of type <code class=\"language-plaintext highlighter-rouge\">val<\/code>.<\/p>\n\n<p>The result is a custom random access iterator <code class=\"language-plaintext highlighter-rouge\">sort_it<\/code> that keeps the two ranges in sync, as can be seen in <a href=\"https:\/\/godbolt.org\/z\/fb76e1\">this example on godbolt<\/a>.<\/p>\n\n<p>For a production-grade implementation, there are a few additional concerns:<\/p>\n\n<ul>\n  <li>all required operators for the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/named_req\/RandomAccessIterator\">random access iterator<\/a> should be implemented<\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">sort_it<\/code>, <code class=\"language-plaintext highlighter-rouge\">ref<\/code>, and <code class=\"language-plaintext highlighter-rouge\">val<\/code> should be templated on key and value type<\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">sort_it<\/code> and <code class=\"language-plaintext highlighter-rouge\">ref<\/code> should not use pointers, but other random access iterators<\/li>\n  <li>the whole system could be made variadic and support arbitrary many value ranges<\/li>\n<\/ul>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/noodles-pasta-colorful-pasta-food-1312384\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"consteval in C++17","description":"Forcing compile time evaluation in C++17","pubDate":"Sat, 14 Nov 2020 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/11\/14\/cpp17-consteval","guid":"https:\/\/artificial-mind.net\/blog\/2020\/11\/14\/cpp17-consteval","content":"<p>A common misconception is that <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> functions are evaluated during compilation and not during runtime.\nIn reality, a <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> function makes it <em>possible<\/em> to be evaluated at compile time without guaranteeing it.<\/p>\n\n<p>The rest of this post motivates why <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> is sometimes not enough and how the following snippet can be used for a stronger compile-time evaluation guarantee without jeopardizing the ability to pass runtime values:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">auto<\/span> <span class=\"n\">V<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">static<\/span> <span class=\"k\">constexpr<\/span> <span class=\"k\">auto<\/span> <span class=\"n\">force_consteval<\/span> <span class=\"o\">=<\/span> <span class=\"n\">V<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<hr \/>\n\n<p>Consider the following <code class=\"language-plaintext highlighter-rouge\">stringhash<\/code> function:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">constexpr<\/span> <span class=\"kt\">size_t<\/span> <span class=\"nf\">stringhash<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"kt\">size_t<\/span> <span class=\"n\">h<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">while<\/span> <span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">s<\/span><span class=\"p\">)<\/span> \n    <span class=\"p\">{<\/span>\n        <span class=\"n\">h<\/span> <span class=\"o\">=<\/span> <span class=\"n\">h<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">6364136223846793005ULL<\/span> <span class=\"o\">+<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span> <span class=\"o\">+<\/span> <span class=\"mh\">0xda3e39cb94b95bdbULL<\/span><span class=\"p\">;<\/span>\n        <span class=\"n\">s<\/span><span class=\"o\">++<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">h<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This function can be used to compute string hashes at compile time:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">stringhash<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">)<\/span> <span class=\"o\">!=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Without <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code>, we would get:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nl\">error:<\/span> <span class=\"n\">non<\/span><span class=\"o\">-<\/span><span class=\"n\">constant<\/span> <span class=\"n\">condition<\/span> <span class=\"k\">for<\/span> <span class=\"k\">static<\/span> <span class=\"n\">assertion<\/span>\n   <span class=\"mi\">14<\/span> <span class=\"o\">|<\/span> <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">stringhash<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">)<\/span> <span class=\"o\">!=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">);<\/span>\n      <span class=\"o\">|<\/span>               <span class=\"o\">~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>When the compiler is forced to, <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> functions can be evaluated at compile time.\nIf not forced, there is no guarantee:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">size_t<\/span> <span class=\"nf\">test<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">stringhash<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>With <code class=\"language-plaintext highlighter-rouge\">-O0<\/code>, <a href=\"https:\/\/godbolt.org\/z\/K5xP78\">neither clang nor gcc bother to evaluate the call at compile time<\/a>.\nFortunately, <a href=\"https:\/\/godbolt.org\/z\/zGxdqY\">with optimizations enabled, they actually do<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">test<\/span><span class=\"p\">()<\/span><span class=\"o\">:<\/span>\n  <span class=\"n\">movabsq<\/span> <span class=\"err\">$<\/span><span class=\"o\">-<\/span><span class=\"mi\">9068320177771933951<\/span><span class=\"p\">,<\/span> <span class=\"o\">%<\/span><span class=\"n\">rax<\/span>\n  <span class=\"n\">ret<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>However, this strongly depends on how the <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> function is written.\nA small variation, e.g. a recursive implementation, can be enough to change the compiler\u2019s mood:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">constexpr<\/span> <span class=\"kt\">size_t<\/span> <span class=\"nf\">stringhash<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"o\">!*<\/span><span class=\"n\">s<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">stringhash<\/span><span class=\"p\">(<\/span><span class=\"n\">s<\/span> <span class=\"o\">+<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">6364136223846793005ULL<\/span> <span class=\"o\">+<\/span> <span class=\"o\">*<\/span><span class=\"n\">s<\/span> <span class=\"o\">+<\/span> <span class=\"mh\">0xda3e39cb94b95bdbULL<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Now, <a href=\"https:\/\/godbolt.org\/z\/cfjrb8\">clang does not feel to be obligated to optimize it anymore<\/a>, even at <code class=\"language-plaintext highlighter-rouge\">-O2<\/code> or <code class=\"language-plaintext highlighter-rouge\">-O3<\/code>.<\/p>\n\n<p>In C++20, <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/consteval\">we could use <code class=\"language-plaintext highlighter-rouge\">consteval<\/code><\/a> to specify that <code class=\"language-plaintext highlighter-rouge\">stringhash<\/code> must always produce a compile time constant expression.\nNote that this would mean that we cannot use <code class=\"language-plaintext highlighter-rouge\">stringhash<\/code> with runtime values anymore.<\/p>\n\n<p>One way to force compile time evaluation is to declare a <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> variable:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">size_t<\/span> <span class=\"nf\">test<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">constexpr<\/span> <span class=\"k\">auto<\/span> <span class=\"n\">h<\/span> <span class=\"o\">=<\/span> <span class=\"n\">stringhash<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">h<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This guarantees that <code class=\"language-plaintext highlighter-rouge\">h<\/code> is initialized by a constant expression, which in turn means <a href=\"https:\/\/godbolt.org\/z\/K1qoTE\">compile time evaluation in practice<\/a>, even for <code class=\"language-plaintext highlighter-rouge\">-O0<\/code>.<\/p>\n\n<p>Declaring additional <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> variables for every call makes this quite cumbersome to use.\nBy using <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/variable_template\">variable templates<\/a> and <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/template_parameters#Non-type_template_parameter\"><code class=\"language-plaintext highlighter-rouge\">auto<\/code> non-type template parameters<\/a>, there is another, quite elegant solution:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">auto<\/span> <span class=\"n\">V<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">static<\/span> <span class=\"k\">constexpr<\/span> <span class=\"k\">auto<\/span> <span class=\"n\">force_consteval<\/span> <span class=\"o\">=<\/span> <span class=\"n\">V<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Which can be used as:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">size_t<\/span> <span class=\"nf\">test<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">force_consteval<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">stringhash<\/span><span class=\"p\">(<\/span><span class=\"s\">\"hello world\"<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The reason I like this solution is because it enforces compile-time evaluation in <code class=\"language-plaintext highlighter-rouge\">-O0<\/code> and <code class=\"language-plaintext highlighter-rouge\">-O2<\/code> and <a href=\"https:\/\/godbolt.org\/z\/nrozxb\">does not even introduce additional symbols<\/a>.\nIf <code class=\"language-plaintext highlighter-rouge\">force_consteval<\/code> were not a <code class=\"language-plaintext highlighter-rouge\">static<\/code> variable but rather <code class=\"language-plaintext highlighter-rouge\">inline<\/code>, a function, or a type, then symbols might get emitted to ensure the value has the same address in each translation unit.\nThis would have a negative impact on binary size, especially if many different values are used across the program.<\/p>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">auto<\/code> means that all supported non-type template parameters are supported, e.g. any integral type, enum, or pointers.\nWith C++20 we also get floating-point types and literal types (including \u201cuser-defined <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> classes\u201d).<\/p>\n\n<p>This solution is still a bit verbose.\nIn C++17, it could be hidden behind a macro:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#define STRINGHASH(str) force_consteval&lt;stringhash(str)&gt;\n<\/span><\/code><\/pre><\/div><\/div>\n\n<p>In C++20, we could <a href=\"https:\/\/ctrpeach.io\/posts\/cpp20-string-literal-template-parameters\/\">use custom literals<\/a> to build a <code class=\"language-plaintext highlighter-rouge\">stringhash&lt;\"hello world\"&gt;<\/code>.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/binoculars-watch-observation-2474698\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"Approximating 'constexpr for'","description":"'if constexpr' is awesome, but can we do 'constexpr for'?","pubDate":"Sat, 31 Oct 2020 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/10\/31\/constexpr-for","guid":"https:\/\/artificial-mind.net\/blog\/2020\/10\/31\/constexpr-for","content":"<p>In the ancient times, template metaprogramming was all about recursive templates and template specialization.\nWith each version after C++11, the set of tools for convenient compile-time programming became larger and larger.\nMany compile-time computations can now be solved elegantly via <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/constexpr\">various <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> constructs<\/a>.\n<a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/fold\">Fold expressions<\/a> massively simplify operations on variadic templates.<\/p>\n\n<p>Traditionally, switching between different implementations based on compile-time computed properties has be done via static functions in template specializations, <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/sfinae\">SFINAE<\/a>, or <a href=\"https:\/\/arne-mertz.de\/2016\/10\/tag-dispatch\/\">tag dispatch<\/a>.\nThese require writing at least one function per implementation.\nSince C++17, <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/if\">constexpr if<\/a> enables us to coalesce different implementation in the same (templated) function.\nThis has many advantages, from improving compile times to reducing DRY violations by reusing parts of the implementation and not having to repeat the function signature.<\/p>\n\n<p>One notable missing feature: <code class=\"language-plaintext highlighter-rouge\">constexpr for<\/code>.<\/p>\n\n<p>While we can use <code class=\"language-plaintext highlighter-rouge\">for<\/code> loops in <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> functions, the loop variable cannot be used in a constant expression itself:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span> <span class=\"n\">N<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">array<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"n\">N<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">make_data<\/span><span class=\"p\">();<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span> <span class=\"n\">Start<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">End<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">float<\/span> <span class=\"nf\">data_sum<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">float<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Start<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">End<\/span><span class=\"p\">;<\/span> <span class=\"o\">++<\/span><span class=\"n\">i<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"k\">auto<\/span> <span class=\"n\">data<\/span> <span class=\"o\">=<\/span> <span class=\"n\">make_data<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">i<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">();<\/span>\n        <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">v<\/span> <span class=\"o\">:<\/span> <span class=\"n\">data<\/span><span class=\"p\">)<\/span>\n            <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Here, <code class=\"language-plaintext highlighter-rouge\">make_data<\/code> produces data with a predefined size <code class=\"language-plaintext highlighter-rouge\">N<\/code>.\nIn <code class=\"language-plaintext highlighter-rouge\">data_sum<\/code>, we want to sum up all provided data for a certain range of values <code class=\"language-plaintext highlighter-rouge\">N<\/code>.\n<code class=\"language-plaintext highlighter-rouge\">Start<\/code> and <code class=\"language-plaintext highlighter-rouge\">End<\/code> are known compile-time, so <a href=\"https:\/\/godbolt.org\/z\/K9rddb\">why does <code class=\"language-plaintext highlighter-rouge\">make_data&lt;i&gt;()<\/code> not work<\/a>?<\/p>\n\n<p>Well, <code class=\"language-plaintext highlighter-rouge\">i<\/code> changes its values in each iteration, which changes the type of <code class=\"language-plaintext highlighter-rouge\">data<\/code>, which means we need different code for the loop body in each iteration.\nIntuitively, we long for a <code class=\"language-plaintext highlighter-rouge\">constexpr for<\/code> that does a kind of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Loop_unrolling\">loop unrolling<\/a>.<\/p>\n\n<p>Note that in this example, the <code class=\"language-plaintext highlighter-rouge\">constexpr for<\/code> would even be useful if the functions are not <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code>: <code class=\"language-plaintext highlighter-rouge\">Start<\/code> and <code class=\"language-plaintext highlighter-rouge\">End<\/code> are compile-time constant and the roadblock here is that the type of <code class=\"language-plaintext highlighter-rouge\">data<\/code> depends on <code class=\"language-plaintext highlighter-rouge\">i<\/code>.<\/p>\n\n<p>In this post I will present three approximations to <code class=\"language-plaintext highlighter-rouge\">constexpr for<\/code> that solve some common use cases.<\/p>\n\n<h2 id=\"classical-integral-for\">Classical Integral For<\/h2>\n\n<p>First, let\u2019s fix the introductory example.\nIn the absence of a better name, I call these \u201cintegral for\u201d loops:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Start<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">End<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">Inc<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">i<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Where <code class=\"language-plaintext highlighter-rouge\">Start<\/code>, <code class=\"language-plaintext highlighter-rouge\">End<\/code>, and <code class=\"language-plaintext highlighter-rouge\">Inc<\/code> are compile-time constant integral types, e.g. <code class=\"language-plaintext highlighter-rouge\">int<\/code>s.<\/p>\n\n<p>We need to solve two problems:\nFirst, we must ensure that <code class=\"language-plaintext highlighter-rouge\">i<\/code> is also a compile-time constant.\nSecond, a different loop body must be instantiated for each <code class=\"language-plaintext highlighter-rouge\">i<\/code>, otherwise it\u2019s impossible to use types that depend on <code class=\"language-plaintext highlighter-rouge\">i<\/code> inside the body.<\/p>\n\n<p>The solution is surprisingly simple using <code class=\"language-plaintext highlighter-rouge\">if constexpr<\/code>, recursive instantiation, and <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/types\/integral_constant\">std::integral_constant<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">auto<\/span> <span class=\"n\">Start<\/span><span class=\"p\">,<\/span> <span class=\"k\">auto<\/span> <span class=\"n\">End<\/span><span class=\"p\">,<\/span> <span class=\"k\">auto<\/span> <span class=\"n\">Inc<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">constexpr_for<\/span><span class=\"p\">(<\/span><span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"k\">constexpr<\/span> <span class=\"p\">(<\/span><span class=\"n\">Start<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">End<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">integral_constant<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">Start<\/span><span class=\"p\">),<\/span> <span class=\"n\">Start<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">());<\/span>\n        <span class=\"n\">constexpr_for<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Start<\/span> <span class=\"o\">+<\/span> <span class=\"n\">Inc<\/span><span class=\"p\">,<\/span> <span class=\"n\">End<\/span><span class=\"p\">,<\/span> <span class=\"n\">Inc<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Since C++17, we can use <code class=\"language-plaintext highlighter-rouge\">auto<\/code> as a non-type template parameter to support different loop variable types.\nThe code should be largely self-explanatory but the use of <code class=\"language-plaintext highlighter-rouge\">std::integral_constant<\/code> might be non-obvious.\nIt should become clear if we take a look at how this solves our initial example:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span> <span class=\"n\">N<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">array<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"n\">N<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">make_data<\/span><span class=\"p\">();<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span> <span class=\"n\">Start<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">End<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">float<\/span> <span class=\"nf\">data_sum<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">float<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">constexpr_for<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Start<\/span><span class=\"p\">,<\/span> <span class=\"n\">End<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">sum<\/span><span class=\"p\">](<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span><span class=\"p\">){<\/span>\n        <span class=\"k\">auto<\/span> <span class=\"n\">data<\/span> <span class=\"o\">=<\/span> <span class=\"n\">make_data<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">i<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">();<\/span>\n        <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">v<\/span> <span class=\"o\">:<\/span> <span class=\"n\">data<\/span><span class=\"p\">)<\/span>\n            <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">});<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>First of all, <a href=\"https:\/\/godbolt.org\/z\/TPTxEW\">this works as intended and unrolls the loop<\/a>. \nEven in a full <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> context, <a href=\"https:\/\/godbolt.org\/z\/e3soa1\">it compiles and can be fully precomputed<\/a>.<\/p>\n\n<p>So\u2026 why <code class=\"language-plaintext highlighter-rouge\">std::integral_constant<\/code>?<\/p>\n\n<p>Because that\u2019s our way to instantiate a different function for each iteration <em>and<\/em> have a constant expression <code class=\"language-plaintext highlighter-rouge\">i<\/code>.\nWe pass a generic lambda with <code class=\"language-plaintext highlighter-rouge\">(auto i)<\/code> as parameter.\nThus, each different value for <code class=\"language-plaintext highlighter-rouge\">i<\/code> causes a new instantiation as <code class=\"language-plaintext highlighter-rouge\">std::integral_constant&lt;int, 1&gt;<\/code> and <code class=\"language-plaintext highlighter-rouge\">std::integral_constant&lt;int, 2&gt;<\/code> are different types.\n<code class=\"language-plaintext highlighter-rouge\">make_data&lt;i&gt;()<\/code> works because <code class=\"language-plaintext highlighter-rouge\">std::integral_constant&lt;class T, T v&gt;<\/code> has an implicit <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> conversion to <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nIt becomes clearer if you think about it as <code class=\"language-plaintext highlighter-rouge\">make_data&lt;decltype(i)::value&gt;()<\/code>.\nOf course, the shorter <code class=\"language-plaintext highlighter-rouge\">make_data&lt;i.value&gt;()<\/code> would have worked as well as static members can also be accessed via dot syntax.<\/p>\n\n<blockquote>\n  <p>In C++20, we could have used <code class=\"language-plaintext highlighter-rouge\">[&amp;sum]&lt;int I&gt;() { ... }<\/code> instead.<\/p>\n<\/blockquote>\n\n<h2 id=\"parameter-packs\">Parameter Packs<\/h2>\n\n<p>Another common use case is iterating over a parameter pack:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span><span class=\"o\">...<\/span> <span class=\"nc\">Args<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">print_all<\/span><span class=\"p\">(<\/span><span class=\"n\">Args<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">...<\/span> <span class=\"n\">args<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span> <span class=\"o\">:<\/span> <span class=\"n\">args<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">a<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>But it doesn\u2019t actually work that way.\nSame problem as before: each loop iteration needs a different instantiation as the types of <code class=\"language-plaintext highlighter-rouge\">args<\/code> are heterogeneous.\nThe generic solution is almost trivial:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span><span class=\"o\">...<\/span> <span class=\"nc\">Args<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">constexpr_for<\/span><span class=\"p\">(<\/span><span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">Args<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">...<\/span> <span class=\"n\">args<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"p\">(<\/span><span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">forward<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Args<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">args<\/span><span class=\"p\">)),<\/span> <span class=\"p\">...);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Here, we used a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/fold\">fold expression<\/a> of the comma operator to call <code class=\"language-plaintext highlighter-rouge\">f<\/code> separately with each argument.\nAn empty parameter pack leads to zero invocations of <code class=\"language-plaintext highlighter-rouge\">f<\/code>, in which case the pack expands to <code class=\"language-plaintext highlighter-rouge\">void()<\/code>.\nNote that the perfect forwarding here allows us to properly handle rvalues as well, e.g. passing a <code class=\"language-plaintext highlighter-rouge\">unique_ptr<\/code> by value.\nThis can now be used as:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span><span class=\"o\">...<\/span> <span class=\"nc\">Args<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">print_all<\/span><span class=\"p\">(<\/span><span class=\"n\">Args<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">...<\/span> <span class=\"n\">args<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">constexpr_for<\/span><span class=\"p\">([](<\/span><span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n        <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">},<\/span> <span class=\"n\">args<\/span><span class=\"p\">...);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Again, <a href=\"https:\/\/godbolt.org\/z\/53ebaP\">the compiler can unroll and properly call each function<\/a>.<\/p>\n\n<blockquote>\n  <p>For more fold expression magic, I can wholeheartedly recommend <a href=\"https:\/\/foonathan.net\/2020\/05\/fold-tricks\/\">nifty fold expression tricks<\/a> by Jonathan M\u00fcller.<\/p>\n<\/blockquote>\n\n<h2 id=\"tuples-and-tuple-likes\">Tuples and Tuple-Likes<\/h2>\n\n<p><code class=\"language-plaintext highlighter-rouge\">std::tuple<\/code>, <code class=\"language-plaintext highlighter-rouge\">std::array<\/code>, and <code class=\"language-plaintext highlighter-rouge\">std::pair<\/code> have specializations for <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/utility\/tuple\/tuple_size\">std::tuple_size<\/a>.\nThis is also the recommended customization point when adding <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/structured_binding\">structured binding<\/a> support for non-trivial custom types.<\/p>\n\n<p>We can easily define a <code class=\"language-plaintext highlighter-rouge\">constexpr for<\/code> approximation for tuple-like types using our \u201cintegral for\u201d helper from the beginning:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">Tuple<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">void<\/span> <span class=\"nf\">constexpr_for_tuple<\/span><span class=\"p\">(<\/span><span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">Tuple<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">tuple<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">constexpr<\/span> <span class=\"kt\">size_t<\/span> <span class=\"n\">cnt<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">tuple_size_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">decay_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Tuple<\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"n\">constexpr_for<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">size_t<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">),<\/span> <span class=\"n\">cnt<\/span><span class=\"p\">,<\/span> <span class=\"kt\">size_t<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n        <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">get<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">i<\/span><span class=\"p\">.<\/span><span class=\"n\">value<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">tuple<\/span><span class=\"p\">));<\/span>\n    <span class=\"p\">});<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Which can then be used to iterate over tuple-like objects:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">constexpr_for_tuple<\/span><span class=\"p\">([](<\/span><span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n        <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">},<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">make_tuple<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"sc\">'c'<\/span><span class=\"p\">,<\/span> <span class=\"nb\">true<\/span><span class=\"p\">));<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>While there is currently no <code class=\"language-plaintext highlighter-rouge\">constexpr for<\/code> in C++, we can easily write a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/algorithm\/for_each\">std::for_each<\/a>-like approximation for it.\nThe important part is that each iteration has a unique instantiation to make it possible to use different types per iteration.\nThis can be elegantly handled by a generic lambda.<\/p>\n\n<p>For the \u201cintegral for\u201d loop, it may also be important to have access to the index as a constant expression, so it can be used as non-type template parameter.\nThis can be supported by using <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/types\/integral_constant\">std::integral_constant<\/a> to pass the value.<\/p>\n\n<p>In the end, we have three <code class=\"language-plaintext highlighter-rouge\">constexpr for<\/code> approximations for different use cases:<\/p>\n\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">for (auto i = Start; i &lt; End; i += Inc)<\/code> (\u201cintegral for\u201d)<\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">for (auto&amp;&amp; a : args...)<\/code> (\u201cparameter pack for\u201d)<\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">for (auto&amp;&amp; v : tuple_like_obj)<\/code> (\u201ctuple-like for\u201d)<\/li>\n<\/ul>\n\n<p>Of course, you should typically prefer a normal <code class=\"language-plaintext highlighter-rouge\">for<\/code> loop as per-iteration instantiation will not be gentle on the compile time.\nHowever, sometimes this is not possible because we either need the loop index as non-type template parameter or the variable type might change in each iteration.<\/p>\n\n<h2 id=\"further-reading\">Further Reading<\/h2>\n\n<ul>\n  <li>There is official proposal <a href=\"http:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2019\/p1306r1.pdf\">P1306<\/a> for <code class=\"language-plaintext highlighter-rouge\">for...<\/code><\/li>\n  <li><a href=\"https:\/\/quuxplusone.github.io\/blog\/2019\/02\/28\/expansion-statements\/\">Thoughts on P1306<\/a> by Arthur O\u2019Dwyer<\/li>\n  <li><a href=\"https:\/\/www.boost.org\/doc\/libs\/1_65_1\/libs\/hana\/doc\/html\/index.html\">Boost.Hana<\/a>, especially <a href=\"https:\/\/www.boost.org\/doc\/libs\/1_63_0\/libs\/hana\/doc\/html\/group__group-Foldable.html#ga2af382f7e644ce3707710bbad313e9c2\">hana::for_each<\/a><\/li>\n  <li><a href=\"https:\/\/vittorioromeo.info\/index\/blog\/cpp20_lambdas_compiletime_for.html\">compile-time iteration with C++20 lambdas<\/a> by Vittorio Romeo<\/li>\n<\/ul>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/rollercoaster-looping-amusement-801833\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"Multi-Level Break in C++ via IIFE","description":"Ever wanted to break more than one level at once?","pubDate":"Wed, 28 Oct 2020 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/10\/28\/multi-level-break-iife","guid":"https:\/\/artificial-mind.net\/blog\/2020\/10\/28\/multi-level-break-iife","content":"<p>I guess we all have been at this point.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">:<\/span> <span class=\"p\">...)<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">j<\/span> <span class=\"o\">:<\/span> <span class=\"p\">...)<\/span>\n        <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">condition<\/span><span class=\"p\">(<\/span><span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">j<\/span><span class=\"p\">))<\/span>\n        <span class=\"p\">{<\/span>\n            <span class=\"k\">break<\/span> <span class=\"n\">outer<\/span><span class=\"o\">???<\/span>\n        <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>You want to search something, and for one reason or another you end up with a nested loop.\nYou find what you searched for and now want to <code class=\"language-plaintext highlighter-rouge\">break<\/code> all the way to the outer loop.<\/p>\n\n<p>If only we had multi-level <code class=\"language-plaintext highlighter-rouge\">breaks<\/code>.<\/p>\n\n<p>But we don\u2019t.<\/p>\n\n<p>So people introduce flags:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">found<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">false<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">:<\/span> <span class=\"p\">...)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">j<\/span> <span class=\"o\">:<\/span> <span class=\"p\">...)<\/span>\n        <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">condition<\/span><span class=\"p\">(<\/span><span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">j<\/span><span class=\"p\">))<\/span>\n        <span class=\"p\">{<\/span>\n            <span class=\"n\">found<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">true<\/span><span class=\"p\">;<\/span>\n            <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n        <span class=\"p\">}<\/span>\n\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">found<\/span><span class=\"p\">)<\/span>\n        <span class=\"k\">break<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Which introduces quite the clutter and can be error-prone if the loops contain more code and the second <code class=\"language-plaintext highlighter-rouge\">break<\/code> is somehow missed or not executed.<\/p>\n\n<p>Or people (*gasp*) introduce <code class=\"language-plaintext highlighter-rouge\">goto<\/code>s:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">:<\/span> <span class=\"p\">...)<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">j<\/span> <span class=\"o\">:<\/span> <span class=\"p\">...)<\/span>\n        <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">condition<\/span><span class=\"p\">(<\/span><span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">j<\/span><span class=\"p\">))<\/span>\n        <span class=\"p\">{<\/span>\n            <span class=\"k\">goto<\/span> <span class=\"n\">next<\/span><span class=\"p\">;<\/span>\n        <span class=\"p\">}<\/span>\n<span class=\"nl\">next:<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>But C is the enemy, isn\u2019t it?<\/p>\n\n<p>Okay, so we clearly need a multi-level break appropriate for modern C++.<\/p>\n\n<p>I mean, we could propose new syntax\u2026 <code class=\"language-plaintext highlighter-rouge\">co_break<\/code>, anyone?<\/p>\n\n<blockquote>\n  <p>Yeah, yeah, I get it. Has been done too many times already.<\/p>\n<\/blockquote>\n\n<p>But behold!\nWe already have shiny <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/lambda\">lambda expressions<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Immediately_invoked_function_expression\">immediately invoked function expressions (IIFEs)<\/a>.\nAnd they are more than adequate to solve our problem:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">[<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">:<\/span> <span class=\"p\">...)<\/span>\n        <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">j<\/span> <span class=\"o\">:<\/span> <span class=\"p\">...)<\/span>\n            <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">condition<\/span><span class=\"p\">(<\/span><span class=\"n\">i<\/span><span class=\"p\">,<\/span> <span class=\"n\">j<\/span><span class=\"p\">))<\/span>\n            <span class=\"p\">{<\/span>\n                <span class=\"k\">return<\/span><span class=\"p\">;<\/span>\n            <span class=\"p\">}<\/span>\n<span class=\"p\">}();<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>No need to introduce additional flags, identifiers, labels. Just good old <code class=\"language-plaintext highlighter-rouge\">[&amp;]{}();<\/code> and <code class=\"language-plaintext highlighter-rouge\">return<\/code>.<\/p>\n\n<blockquote>\n  <p>This post is tagged as <code class=\"language-plaintext highlighter-rouge\">C++<\/code> and <code class=\"language-plaintext highlighter-rouge\">fun<\/code> and is suited best as some mid-week entertainment in turbulent times.<\/p>\n<\/blockquote>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/auto-repair-workshop-brake-disc-1954643\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"range_ref<T>","description":"A fast, lightweight, non-owning view of a range","pubDate":"Sat, 24 Oct 2020 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/10\/24\/range_ref","guid":"https:\/\/artificial-mind.net\/blog\/2020\/10\/24\/range_ref","content":"<p>Passing references to functions is great.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">some_user_type<\/span><span class=\"p\">;<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">some_user_type<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">v<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ freely read from v<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Memory management and lifetime handling is done by the caller.\nUsers of your function \/ API have a liberating amount of freedom how they organize their data: on the stack, on the heap, in smart pointers, in vectors, it doesn\u2019t matter.\nThey can pass a reference to your function.\nNo (potentially expensive) copy is performed.<\/p>\n\n<p>From an API perspective, C++ references are views on a single object.\nWe already have a few \u201cview types\u201d for more complex needs:<\/p>\n\n<ul>\n  <li><a href=\"https:\/\/en.cppreference.com\/w\/cpp\/string\/basic_string_view\">std::string_view<\/a> for \u201cviews on strings\u201d (C++17)<\/li>\n  <li><a href=\"https:\/\/en.cppreference.com\/w\/cpp\/container\/span\">std::span<\/a> for \u201cviews on contiguous ranges\u201d (C++20)<\/li>\n  <li><a href=\"http:\/\/open-std.org\/JTC1\/SC22\/WG21\/docs\/papers\/2020\/p0009r10.html\">mdspan<\/a> for a multidimensional version of <code class=\"language-plaintext highlighter-rouge\">std::span<\/code> (proposed)<\/li>\n  <li><a href=\"https:\/\/foonathan.net\/2017\/01\/function-ref-implementation\/\">function_ref<\/a> for \u201cviews on callables\u201d<\/li>\n<\/ul>\n\n<blockquote>\n  <p>These types can quickly lead to dangling references and should mainly be used as function parameters.\nArthur O\u2019Dwyer calls them <a href=\"https:\/\/quuxplusone.github.io\/blog\/2018\/03\/27\/string-view-is-a-borrow-type\">borrow types or parameter-only types<\/a>.\nA more general term is \u201cview type\u201d.\nI personally avoid calling them \u201creference types\u201d as that only ever leads to confusion.<\/p>\n<\/blockquote>\n\n<p>In this blog post I propose and present a new view type that I find quite useful:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">void<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">range_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">values<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">values<\/span><span class=\"p\">.<\/span><span class=\"n\">for_each<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> \n        <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">s<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">});<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ usage examples:<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">list<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*&gt;<\/span> <span class=\"n\">l<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">l<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">span<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">s<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">set<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">s<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">a<\/span><span class=\"p\">[]<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">foo<\/span><span class=\"p\">({<\/span><span class=\"s\">\"hello\"<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"world\"<\/span><span class=\"p\">});<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>A <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> is a non-owning, lightweight view of a range whose element type is <em>convertible<\/em> to <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nIf used as a function parameter, we can pass any object that supports <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/range-for\">range-based for<\/a> and where <code class=\"language-plaintext highlighter-rouge\">T(*std::begin(my_obj))<\/code> is valid.<\/p>\n\n<p>In the previous example, <code class=\"language-plaintext highlighter-rouge\">foo<\/code> is not templated and can be implemented in a source file.\nHowever, we still have extreme freedom how to pass parameters, making for a delightful API.<\/p>\n\n<p>The best thing: <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> is completely non-allocating and quite fast (though not as fast as a templated implementation can be).<\/p>\n\n<h2 id=\"motivation\">Motivation<\/h2>\n\n<p>Before we start with <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>, I want to motivate the need for view types a bit further.\n(If you already drank the view type kool-aid, you can safely skip to the next section.)<\/p>\n\n<p>On a basic level, we use <code class=\"language-plaintext highlighter-rouge\">T&amp;<\/code> for a mutable view on a single object, and, correspondingly, <code class=\"language-plaintext highlighter-rouge\">T const&amp;<\/code> for a readonly view.\n(Don\u2019t confuse this with a view on an immutable object, as the underlying object might get changed through a different alias.)<\/p>\n\n<p>So, why do we even bother with more complex view types?<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">void<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ do some parsing, maybe?<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Isn\u2019t <code class=\"language-plaintext highlighter-rouge\">std::string const&amp;<\/code> already a view on a string?<\/p>\n\n<p>Yes and no.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">std::string const&amp;<\/code> is a view on a string, but not all views on a string have to be <code class=\"language-plaintext highlighter-rouge\">std::string const&amp;<\/code>.\n<code class=\"language-plaintext highlighter-rouge\">(char const*, size_t)<\/code> is also a perfectly fine view on a string.<\/p>\n\n<p>One could say that <code class=\"language-plaintext highlighter-rouge\">std::string const&amp;<\/code> is \u201ctoo concrete\u201d.\nIt forces the caller to use a specific type to manage their strings.\nEven if the actual call of <code class=\"language-plaintext highlighter-rouge\">foo<\/code> does not allocate, the caller is forced to convert their strings into a <code class=\"language-plaintext highlighter-rouge\">std::string<\/code> (either explicitly or implicitly), potentially doing a short-lived heap allocation.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">foo(\"'sup\")<\/code> will compile, but constructs a temporary string.<\/p>\n\n<blockquote>\n  <p>Most standard libraries actually implement <a href=\"https:\/\/akrzemi1.wordpress.com\/2014\/04\/14\/common-optimizations\/\">small<\/a> <a href=\"https:\/\/shaharmike.com\/cpp\/std-string\/\">string<\/a> <a href=\"https:\/\/blogs.msmvps.com\/gdicanio\/2016\/11\/17\/the-small-string-optimization\/\">optimization<\/a>.\nDepending on the actual implementation, strings up to 23 characters might actually not allocate.<\/p>\n<\/blockquote>\n\n<p>Views don\u2019t have to be references to concrete types.\nEspecially if there are multiple choices for representation.\nMost complex view types operate on a slightly higher level of abstraction and wrap enough information to support a larger class of concrete types.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">std::string_view<\/code> is the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/string\/basic_string_view\">standard library\u2019s view on strings<\/a>.<\/p>\n\n<p>From an implementation perspective, it\u2019s just a glorified <code class=\"language-plaintext highlighter-rouge\">std::pair&lt;char const*, size_t&gt;<\/code>.<\/p>\n\n<p>From an API perspective, <code class=\"language-plaintext highlighter-rouge\">std::string_view<\/code> is a view on any contiguous range of <code class=\"language-plaintext highlighter-rouge\">char<\/code>s.\nAs a non-owning type, lifetime is handled by the caller and any string-like type that can provide <code class=\"language-plaintext highlighter-rouge\">char const*<\/code> and size can be passed via <code class=\"language-plaintext highlighter-rouge\">std::string_view<\/code> without allocation.\nThis includes <code class=\"language-plaintext highlighter-rouge\">std::string<\/code>, but also C strings, <code class=\"language-plaintext highlighter-rouge\">std::vector&lt;char&gt;<\/code>, <code class=\"language-plaintext highlighter-rouge\">std::array&lt;char, 20&gt;<\/code>, and even subranges of these.<\/p>\n\n<p>With C++20, we get <code class=\"language-plaintext highlighter-rouge\">std::span&lt;T&gt;<\/code>, the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/container\/span\">view on contiguous ranges of <code class=\"language-plaintext highlighter-rouge\">T<\/code><\/a>, or view on array-like types.\n<code class=\"language-plaintext highlighter-rouge\">std::span&lt;T&gt;<\/code> is a really great type.\nToo often I see APIs that take <code class=\"language-plaintext highlighter-rouge\">std::vector&lt;float&gt; const&amp; values<\/code>, because they want to accept a range of <code class=\"language-plaintext highlighter-rouge\">float<\/code>s.\nIf the values live in a local array, the caller has to construct a temporary <code class=\"language-plaintext highlighter-rouge\">vector<\/code> to call the function.\nPassing <code class=\"language-plaintext highlighter-rouge\">{1.5f, 2.5f, 3.5f}<\/code> constructs a temporary <code class=\"language-plaintext highlighter-rouge\">vector<\/code>.<\/p>\n\n<p>Sometimes, there is an additional overload that takes <code class=\"language-plaintext highlighter-rouge\">float const*<\/code> and <code class=\"language-plaintext highlighter-rouge\">size_t<\/code>, either to acknowledge the inappropriateness of <code class=\"language-plaintext highlighter-rouge\">std::vector&lt;float&gt; const&amp;<\/code> or to channel their inner C programmer.\nWhile this does solve the performance issues, it doesn\u2019t make the API easier to use:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">array<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"mi\">10<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">get_values<\/span><span class=\"p\">();<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">process_values<\/span><span class=\"p\">(<\/span><span class=\"kt\">float<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">values<\/span><span class=\"p\">,<\/span> <span class=\"kt\">size_t<\/span> <span class=\"n\">size<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ what we have to do:<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">vals<\/span> <span class=\"o\">=<\/span> <span class=\"n\">get_values<\/span><span class=\"p\">();<\/span>\n<span class=\"n\">process_values<\/span><span class=\"p\">(<\/span><span class=\"n\">vals<\/span><span class=\"p\">.<\/span><span class=\"n\">data<\/span><span class=\"p\">(),<\/span> <span class=\"n\">vals<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">());<\/span>\n\n<span class=\"c1\">\/\/ what we would love to do:<\/span>\n<span class=\"n\">process_values<\/span><span class=\"p\">(<\/span><span class=\"n\">get_values<\/span><span class=\"p\">());<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>With <code class=\"language-plaintext highlighter-rouge\">std::span<\/code>, we can!<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">void process_values(std::span&lt;float const&gt; values)<\/code> is exactly the abstraction we want to use here.<\/p>\n\n<h2 id=\"the-need-for-range_reft\">The Need for <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code><\/h2>\n\n<p>So \u2026<\/p>\n\n<p>Why <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>?<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">std::span&lt;T&gt;<\/code> has two big limitations: the objects have to be contiguous in memory and they have to match quite well.\nYou can neither pass <code class=\"language-plaintext highlighter-rouge\">std::set&lt;double&gt;<\/code> nor <code class=\"language-plaintext highlighter-rouge\">std::vector&lt;int&gt;<\/code> to a <code class=\"language-plaintext highlighter-rouge\">std::span&lt;double const&gt;<\/code>.<\/p>\n\n<p>Consider the following function:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"nf\">concatenate<\/span><span class=\"p\">(<\/span><span class=\"o\">???<\/span> <span class=\"n\">strings<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"n\">result<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">s<\/span> <span class=\"o\">:<\/span> <span class=\"n\">strings<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">result<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>What type would you give <code class=\"language-plaintext highlighter-rouge\">strings<\/code>?<\/p>\n\n<p>Well, we can certainly avoid this question by templating the function:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">StringRange<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"nf\">concatenate<\/span><span class=\"p\">(<\/span><span class=\"n\">StringRange<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">strings<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"n\">result<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">s<\/span> <span class=\"o\">:<\/span> <span class=\"n\">strings<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">result<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>However, this most likely forces us to implement <code class=\"language-plaintext highlighter-rouge\">concatenate<\/code> in the header and will probably increase compile time.\nFor more complex functions, we might have to include additional headers, leading to increased header dependency.<\/p>\n\n<p>Before C++17 we might have been tempted to pass <code class=\"language-plaintext highlighter-rouge\">std::vector&lt;std::string&gt; const&amp; strings<\/code>.\nAfter C++17\/20, you might consider:<\/p>\n\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::vector&lt;std::string_view&gt; const&amp; strings<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::span&lt;std::string const&gt; strings<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::span&lt;std::string_view const&gt; strings<\/code><\/li>\n<\/ul>\n\n<p>Especially the last one seems great, no? view of views, sounds delicious.<\/p>\n\n<p>Well, you cannot pass a <code class=\"language-plaintext highlighter-rouge\">std::vector&lt;std::string&gt;<\/code> to the first and third.<\/p>\n\n<p>And the second would not accept a <code class=\"language-plaintext highlighter-rouge\">std::array&lt;char const*, 3&gt;<\/code>.<\/p>\n\n<h2 id=\"designing-range_reft\">Designing <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code><\/h2>\n\n<p>You are hopefully convinced by now that a proper type for the <code class=\"language-plaintext highlighter-rouge\">strings<\/code> in <code class=\"language-plaintext highlighter-rouge\">concatenate<\/code> is missing.<\/p>\n\n<p>I call this type <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> and it is a view on:<\/p>\n\n<ul>\n  <li>any type that supports <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/range-for\">range-based for<\/a><\/li>\n  <li>whose elements are <em>convertible<\/em> to <code class=\"language-plaintext highlighter-rouge\">T<\/code><\/li>\n<\/ul>\n\n<p>It should be non-owning, non-allocating, and fast.<\/p>\n\n<p>The idea is to provide a type-erased wrapper of the following templated function:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">Range<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">Callback<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">call_for_each<\/span><span class=\"p\">(<\/span><span class=\"n\">Range<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">range<\/span><span class=\"p\">,<\/span> <span class=\"n\">Callback<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">callback<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">:<\/span> <span class=\"n\">range<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">callback<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">));<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> is type-erasing <code class=\"language-plaintext highlighter-rouge\">Range<\/code> in the sense that while it accepts a generic <code class=\"language-plaintext highlighter-rouge\">Range<\/code> from the caller, neither the <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> itself nor the API author know the concrete <code class=\"language-plaintext highlighter-rouge\">Range<\/code> compile-time.<\/p>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">T<\/code> is our element type contract: only elements convertible to <code class=\"language-plaintext highlighter-rouge\">T<\/code> are allowed.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">callback<\/code> on the other hand will be hidden from the caller and is provided by the consumer of a <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>.<\/p>\n\n<p>Let\u2019s assume that we have a <code class=\"language-plaintext highlighter-rouge\">function_ref&lt;ReturnT(ArgsT...)&gt;<\/code> type as described by <a href=\"https:\/\/vittorioromeo.info\/index\/blog\/passing_functions_to_functions.html\">Vittorio Romeo<\/a> or <a href=\"https:\/\/foonathan.net\/2017\/01\/function-ref-implementation\/\">Jonathan M\u00fcller<\/a>.\nThis is basically a function pointer that also allows capturing lambdas and other callables (a non-owning view of a callable, implemented roughly via function pointer plus <code class=\"language-plaintext highlighter-rouge\">void*<\/code>).<\/p>\n\n<p>With this we can formulate our first version of <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">range_ref<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">void<\/span> <span class=\"n\">for_each<\/span><span class=\"p\">(<\/span><span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">callback<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"n\">_for_each<\/span><span class=\"p\">(<\/span><span class=\"n\">_range<\/span><span class=\"p\">,<\/span> <span class=\"n\">callback<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">}<\/span>\n\n<span class=\"nl\">private:<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">range_fun_t<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span> <span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)(<\/span><span class=\"kt\">void<\/span><span class=\"o\">*<\/span><span class=\"p\">,<\/span> <span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">);<\/span>\n\n    <span class=\"kt\">void<\/span><span class=\"o\">*<\/span> <span class=\"n\">_range<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">nullptr<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">range_fun_t<\/span> <span class=\"n\">_for_each<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">nullptr<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Before we see how to construct this type, we can already look at its consumer-site API:\nThe type itself is only templated on <code class=\"language-plaintext highlighter-rouge\">T<\/code>, so it is agnostic to the actual range type and also to the callback type.<\/p>\n\n<p>We now have a solution for our <code class=\"language-plaintext highlighter-rouge\">concatenate<\/code> example:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"nf\">concatenate<\/span><span class=\"p\">(<\/span><span class=\"n\">range_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">strings<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"n\">result<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">strings<\/span><span class=\"p\">.<\/span><span class=\"n\">for_each<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> \n        <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span> \n    <span class=\"p\">});<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">result<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<blockquote>\n  <p>Unfortunately, we cannot use range-based for with <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>.\nWhile it is possible to design <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> to support it, it adds quite some overhead and can easily lead to dangling references.\nIn the callback pattern, it is guaranteed that any temporary that we convert to <code class=\"language-plaintext highlighter-rouge\">std::string_view<\/code> outlives the callback function.<\/p>\n<\/blockquote>\n\n<p>However, we are still missing the construction of <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">Range<\/span><span class=\"p\">,<\/span> \n          <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">is_compatible_range<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Range<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span><span class=\"p\">&gt;<\/span><span class=\"o\">::<\/span><span class=\"n\">value<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"n\">range_ref<\/span><span class=\"p\">(<\/span><span class=\"n\">Range<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">range<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">_range<\/span> <span class=\"o\">=<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">range<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">_for_each<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">void<\/span><span class=\"o\">*<\/span> <span class=\"n\">r<\/span><span class=\"p\">,<\/span> <span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">callback<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n        <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">:<\/span> <span class=\"o\">*<\/span><span class=\"k\">reinterpret_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">range<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">r<\/span><span class=\"p\">))<\/span>\n            <span class=\"n\">callback<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">};<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Here, we accept any range that is compatible, i.e. whose elements are convertible to <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nWe use <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/sfinae\">SFINAE<\/a> to reject incompatible ranges.\nThis way, we can for example overload functions on <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;double&gt;<\/code> and <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;std::string_view&gt;<\/code> without any problems.\nNote <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/types\/enable_if#Notes\">the particular form of the <code class=\"language-plaintext highlighter-rouge\">std::enable_if<\/code><\/a>.\nWhile we only have one constructor for now, the <code class=\"language-plaintext highlighter-rouge\">std::enable_if_t&lt;cond, int&gt; = 0<\/code> pattern prevents errors when overloading multiple templates functions.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">decltype(&amp;range)<\/code> is used to recover the pointer to the correct range type.\n<code class=\"language-plaintext highlighter-rouge\">Range*<\/code> would not work when passing lvalue references as pointers to references are forbidden.<\/p>\n\n<p>A possible implementation of <code class=\"language-plaintext highlighter-rouge\">is_compatible_range<\/code> is:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">RangeT<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">ElementT<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">is_compatible_range<\/span> <span class=\"o\">:<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">false_type<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">RangeT<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">ElementT<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">is_compatible_range<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RangeT<\/span><span class=\"p\">,<\/span> <span class=\"n\">ElementT<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">void_t<\/span><span class=\"o\">&lt;<\/span>\n        <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">ElementT<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">begin<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">declval<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RangeT<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">()))),<\/span>\n        <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">end<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">declval<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RangeT<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">()))<\/span><span class=\"o\">&gt;<\/span>\n    <span class=\"o\">&gt;<\/span>\n     <span class=\"o\">:<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">true_type<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This uses <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/types\/void_t\">the C++17 helper <code class=\"language-plaintext highlighter-rouge\">std::void_t<\/code><\/a> that simplifies partial specialization SFINAE.\n<code class=\"language-plaintext highlighter-rouge\">std::begin<\/code> and <code class=\"language-plaintext highlighter-rouge\">std::end<\/code> make sure that C arrays work.\nWe ensure that the element type is convertible to our target type via <code class=\"language-plaintext highlighter-rouge\">ElementT(*std::begin(...))<\/code>.<\/p>\n\n<p>One small problem arises when passing <code class=\"language-plaintext highlighter-rouge\">const&amp;<\/code> ranges: <code class=\"language-plaintext highlighter-rouge\">_range = &amp;range;<\/code> is invalid because it tries to convert <code class=\"language-plaintext highlighter-rouge\">Range const*<\/code> to <code class=\"language-plaintext highlighter-rouge\">void*<\/code>.\nDepending on your preference for purity, we can either use a <code class=\"language-plaintext highlighter-rouge\">const_cast<\/code> (which is fine as the later <code class=\"language-plaintext highlighter-rouge\">reinterpret_cast<\/code> will re-add the <code class=\"language-plaintext highlighter-rouge\">const<\/code>) or a <code class=\"language-plaintext highlighter-rouge\">union<\/code> of <code class=\"language-plaintext highlighter-rouge\">void*<\/code> and <code class=\"language-plaintext highlighter-rouge\">void const*<\/code>.<\/p>\n\n<p>Congratulations, we have our first working version of <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>!<\/p>\n\n<p>And we gained a lot of freedom when passing types to <code class=\"language-plaintext highlighter-rouge\">concatenate<\/code>.\nA small collection of possible caller types that are supported and do not cause additional allocations:<\/p>\n\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::vector&lt;std::string&gt;<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::set&lt;std::string_view&gt;<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::list&lt;char const*&gt;<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::array&lt;std::string, 5&gt;<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">char const* strings[10]<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::span&lt;std::string&gt;<\/code><\/li>\n<\/ul>\n\n<p>We can even support <code class=\"language-plaintext highlighter-rouge\">concatenate({\"hello\", \" \", \"world\"})<\/code> if we add an <code class=\"language-plaintext highlighter-rouge\">std::initializer_list<\/code> ctor:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">U<\/span><span class=\"p\">,<\/span> \n          <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_convertible_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">U<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span><span class=\"p\">&gt;,<\/span> <span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"n\">range_ref<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">initializer_list<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">U<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">range<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ or via union depending on preference<\/span>\n    <span class=\"n\">_range<\/span> <span class=\"o\">=<\/span> <span class=\"k\">const_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"o\">*&gt;<\/span><span class=\"p\">(<\/span><span class=\"k\">static_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span> <span class=\"k\">const<\/span><span class=\"o\">*&gt;<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">range<\/span><span class=\"p\">));<\/span>\n    <span class=\"n\">_for_each<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">void<\/span><span class=\"o\">*<\/span> <span class=\"n\">r<\/span><span class=\"p\">,<\/span> <span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">f<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n        <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">:<\/span> <span class=\"o\">*<\/span><span class=\"k\">static_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">range<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">r<\/span><span class=\"p\">))<\/span>\n            <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">};<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>It is important to note that the initializer list must be passed via <code class=\"language-plaintext highlighter-rouge\">const&amp;<\/code> because <code class=\"language-plaintext highlighter-rouge\">&amp;range<\/code> would otherwise be a pointer to a local variable and be invalid when we later call <code class=\"language-plaintext highlighter-rouge\">for_each<\/code>.<\/p>\n\n<p>Finally, it might be interesting to define a default constructed <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> as the empty range:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">range_ref<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">_for_each<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">void<\/span><span class=\"o\">*<\/span><span class=\"p\">,<\/span> <span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">)<\/span> <span class=\"p\">{};<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"composition\">Composition<\/h2>\n\n<p>Another neat aspect of <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> is that it composes properly with respect to nesting:\n<code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> is any range whose elements are convertible to T, <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;range_ref&lt;T&gt;&gt;<\/code> is any range whose elements are ranges whose elements are convertible to <code class=\"language-plaintext highlighter-rouge\">T<\/code>.<\/p>\n\n<p>Note that <code class=\"language-plaintext highlighter-rouge\">std::span<\/code> does not have this property.\n<code class=\"language-plaintext highlighter-rouge\">std::span&lt;int&gt;<\/code> is a contiguous range of <code class=\"language-plaintext highlighter-rouge\">int<\/code>s.\n<code class=\"language-plaintext highlighter-rouge\">std::span&lt;std::span&lt;int&gt;&gt;<\/code> is NOT a contiguous range of contiguous range of <code class=\"language-plaintext highlighter-rouge\">int<\/code>s, but only a contiguous range of <code class=\"language-plaintext highlighter-rouge\">std::span&lt;int&gt;<\/code>.<\/p>\n\n<p>With <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>, we can for example define the following function:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"nf\">make_html_table<\/span><span class=\"p\">(<\/span><span class=\"n\">range_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">range_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span><span class=\"o\">&gt;&gt;<\/span> <span class=\"n\">rows<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"n\">result<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"s\">\"&lt;table&gt;\"<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">rows<\/span><span class=\"p\">.<\/span><span class=\"n\">for_each<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"n\">range_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">cols<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n        <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"s\">\"&lt;tr&gt;\"<\/span><span class=\"p\">;<\/span>\n        <span class=\"n\">cols<\/span><span class=\"p\">.<\/span><span class=\"n\">for_each<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">entry<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n            <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"s\">\"&lt;td&gt;\"<\/span><span class=\"p\">;<\/span>\n            <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">entry<\/span><span class=\"p\">;<\/span>\n            <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"s\">\"&lt;\/td&gt;\"<\/span><span class=\"p\">;<\/span>\n        <span class=\"p\">});<\/span>\n        <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"s\">\"&lt;\/tr&gt;\"<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">});<\/span>\n    <span class=\"n\">result<\/span> <span class=\"o\">+=<\/span> <span class=\"s\">\"&lt;\/table&gt;\"<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">result<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And now it doesn\u2019t matter if we want to pass <code class=\"language-plaintext highlighter-rouge\">std::vector&lt;std::vector&lt;std::string&gt;&gt;<\/code> or <code class=\"language-plaintext highlighter-rouge\">std::list&lt;std::array&lt;char const*, 3&gt;&gt;<\/code>.<\/p>\n\n<p>Unfortunately, while <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> works nicely with nested ranges, it itself does not support range-based for and thus does not compose with itself.\nFor example, a <code class=\"language-plaintext highlighter-rouge\">std::vector&lt;range_ref&lt;int&gt;&gt;<\/code> could not be passed as <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;range_ref&lt;int&gt;&gt;<\/code>.\nIt remains future work to fix this shortcoming without compromising other design goals.\nNote that <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> typically appears in parameters and is usually not stored in other data structures.<\/p>\n\n<h2 id=\"final-version\">Final Version<\/h2>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">RangeT<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">ElementT<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">is_compatible_range<\/span> <span class=\"o\">:<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">false_type<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">RangeT<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">ElementT<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">is_compatible_range<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RangeT<\/span><span class=\"p\">,<\/span> <span class=\"n\">ElementT<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">void_t<\/span><span class=\"o\">&lt;<\/span>\n        <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">ElementT<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">begin<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">declval<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RangeT<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">()))),<\/span>\n        <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">end<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">declval<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RangeT<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">()))<\/span><span class=\"o\">&gt;<\/span>\n    <span class=\"o\">&gt;<\/span>\n     <span class=\"o\">:<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">true_type<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n\n<span class=\"c1\">\/\/ a non-owning, lightweight view of a range<\/span>\n<span class=\"c1\">\/\/ whose element types are convertible to T<\/span>\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">range_ref<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ iterates over the viewed range and invokes callback for each element<\/span>\n    <span class=\"kt\">void<\/span> <span class=\"n\">for_each<\/span><span class=\"p\">(<\/span><span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">callback<\/span><span class=\"p\">)<\/span> \n    <span class=\"p\">{<\/span> \n        <span class=\"n\">_for_each<\/span><span class=\"p\">(<\/span><span class=\"n\">_range<\/span><span class=\"p\">,<\/span> <span class=\"n\">callback<\/span><span class=\"p\">);<\/span> \n    <span class=\"p\">}<\/span>\n\n    <span class=\"c1\">\/\/ empty range<\/span>\n    <span class=\"n\">range_ref<\/span><span class=\"p\">()<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"n\">_for_each<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">void<\/span><span class=\"o\">*<\/span><span class=\"p\">,<\/span> <span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">)<\/span> <span class=\"p\">{};<\/span>\n    <span class=\"p\">}<\/span>\n\n    <span class=\"c1\">\/\/ any compatible range<\/span>\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">Range<\/span><span class=\"p\">,<\/span> \n              <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">is_compatible_range<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Range<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span><span class=\"p\">&gt;<\/span><span class=\"o\">::<\/span><span class=\"n\">value<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"o\">&gt;<\/span>\n    <span class=\"n\">range_ref<\/span><span class=\"p\">(<\/span><span class=\"n\">Range<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">range<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"n\">_range<\/span> <span class=\"o\">=<\/span> <span class=\"k\">const_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"o\">*&gt;<\/span><span class=\"p\">(<\/span><span class=\"k\">static_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span> <span class=\"k\">const<\/span><span class=\"o\">*&gt;<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">range<\/span><span class=\"p\">));<\/span>\n        <span class=\"n\">_for_each<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">void<\/span><span class=\"o\">*<\/span> <span class=\"n\">r<\/span><span class=\"p\">,<\/span> <span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">callback<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n            <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">:<\/span> <span class=\"o\">*<\/span><span class=\"k\">reinterpret_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">range<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">r<\/span><span class=\"p\">))<\/span>\n                <span class=\"n\">callback<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">);<\/span>\n        <span class=\"p\">};<\/span>\n    <span class=\"p\">}<\/span>\n\n    <span class=\"c1\">\/\/ {initializer, list, syntax}<\/span>\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">U<\/span><span class=\"p\">,<\/span> \n              <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_convertible_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">U<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span><span class=\"p\">&gt;,<\/span> <span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"o\">&gt;<\/span>\n    <span class=\"n\">range_ref<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">initializer_list<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">U<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">range<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"n\">_range<\/span> <span class=\"o\">=<\/span> <span class=\"k\">const_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"o\">*&gt;<\/span><span class=\"p\">(<\/span><span class=\"k\">static_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span> <span class=\"k\">const<\/span><span class=\"o\">*&gt;<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">range<\/span><span class=\"p\">));<\/span>\n        <span class=\"n\">_for_each<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">void<\/span><span class=\"o\">*<\/span> <span class=\"n\">r<\/span><span class=\"p\">,<\/span> <span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">f<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n            <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">:<\/span> <span class=\"o\">*<\/span><span class=\"k\">static_cast<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"o\">&amp;<\/span><span class=\"n\">range<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">r<\/span><span class=\"p\">))<\/span>\n                <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">);<\/span>\n        <span class=\"p\">};<\/span>\n    <span class=\"p\">}<\/span>\n\n<span class=\"nl\">private:<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">range_fun_t<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span> <span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)(<\/span><span class=\"kt\">void<\/span><span class=\"o\">*<\/span><span class=\"p\">,<\/span> <span class=\"n\">function_ref<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">);<\/span>\n\n    <span class=\"c1\">\/\/ or via union depending on const_cast preference<\/span>\n    <span class=\"kt\">void<\/span><span class=\"o\">*<\/span> <span class=\"n\">_range<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">nullptr<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">range_fun_t<\/span> <span class=\"n\">_for_each<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">nullptr<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Note that you need a <code class=\"language-plaintext highlighter-rouge\">function_ref<\/code>, e.g. from <a href=\"https:\/\/foonathan.net\/2017\/01\/function-ref-implementation\/\">here<\/a> or <a href=\"https:\/\/vittorioromeo.info\/index\/blog\/passing_functions_to_functions.html\">here<\/a>.<\/p>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>In this post I proposed a new view type: <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code>.<\/p>\n\n<p>This type accepts any range that has elements that are <em>convertible<\/em> to <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nIt is non-owning, non-allocating, and quite lightweight: only two pointers, same size as <code class=\"language-plaintext highlighter-rouge\">std::string_view<\/code> or <code class=\"language-plaintext highlighter-rouge\">std::span&lt;T&gt;<\/code>.\nPerformance-wise, this solution is slower than a templated function but still quite fast:\nThe range-based for loop itself (increment, dereference, condition) can be inlined (in the <code class=\"language-plaintext highlighter-rouge\">_for_each<\/code> function defined in the <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> ctor).\nThe loop body is behind a <code class=\"language-plaintext highlighter-rouge\">function_ref<\/code>, which translates to a single, perfectly predictable function pointer call.<\/p>\n\n<p>If there is interest, I might do a follow-up post in the future that includes:<\/p>\n\n<ul>\n  <li>benchmarks<\/li>\n  <li>how to model cancellation<\/li>\n  <li>automatic deref and the case for <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T const&amp;&gt;<\/code><\/li>\n  <li>a full reference implementation on github<\/li>\n  <li>exploring a <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> that supports range-based for<\/li>\n<\/ul>\n\n<p>Additional discussion and comments on <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/jhtso1\/range_reft_a_fast_nonowning_view_on_a_range\/\">reddit<\/a>.<\/p>\n\n<h3 id=\"update-2020-10-26\">Update 2020-10-26:<\/h3>\n\n<p>So, <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;T&gt;<\/code> might not be the best name for this view type.\nWhile it models all kind of ranges, it is itself not a range, thus creating some confusion.\nThough I don\u2019t have a preferred version yet, alternative names include:<\/p>\n\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">sequence_ref&lt;T&gt;<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">iterable_ref&lt;T&gt;<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">foreachable_ref&lt;T&gt;<\/code><\/li>\n<\/ul>\n\n<p><a href=\"https:\/\/www.boost.org\/doc\/libs\/1_67_0\/libs\/range\/doc\/html\/range\/reference\/ranges\/any_range.html\">Boost has any_range<\/a> which is itself a range and type erases increment, dereference, comparison.\nrange-v3 has a similar <code class=\"language-plaintext highlighter-rouge\">any_view&lt;T&gt;<\/code>.\nHowever, they suffer from the mentioned lifetime issue (apart from being slower because they have to type erase \u201cmore\u201d).\nConsider for example:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">int<\/span> <span class=\"n\">values<\/span><span class=\"p\">[]<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span><span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">r<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">ranges<\/span><span class=\"o\">::<\/span><span class=\"n\">views<\/span><span class=\"o\">::<\/span><span class=\"n\">transform<\/span><span class=\"p\">(<\/span><span class=\"n\">values<\/span><span class=\"p\">,<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">to_string<\/span><span class=\"p\">(<\/span><span class=\"n\">i<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">});<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Now, if you try to wrap <code class=\"language-plaintext highlighter-rouge\">r<\/code> into a <code class=\"language-plaintext highlighter-rouge\">any_view&lt;std::string_view&gt;<\/code> and iterate over it, the <code class=\"language-plaintext highlighter-rouge\">string_view<\/code> will bind to a temporary <code class=\"language-plaintext highlighter-rouge\">std::string<\/code>.\nThis string is then destroyed <em>before<\/em> you use it in the loop, leading to a dangling reference.\nIn contrast, my <code class=\"language-plaintext highlighter-rouge\">range_ref&lt;std::string_view&gt;<\/code> would also create a <code class=\"language-plaintext highlighter-rouge\">string_view<\/code> from a temporary <code class=\"language-plaintext highlighter-rouge\">std::string<\/code>.\nHowever, it immediately the <code class=\"language-plaintext highlighter-rouge\">string_view<\/code> to the callback function, which can safely use it as the temporary <code class=\"language-plaintext highlighter-rouge\">std::string<\/code> is destroyed at the end of the expression, i.e. <em>after<\/em> the callback finished.<\/p>\n\n<p>Currently, I do not know how to provide a view on a range that works with view type elements and is itself a range again.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/nature-animals-butterflies-2769471\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"Static Registration Macros","description":"Central registration of types or functions just doesn't feel very DRY sometimes.","pubDate":"Sat, 17 Oct 2020 04:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/10\/17\/static-registration-macro","guid":"https:\/\/artificial-mind.net\/blog\/2020\/10\/17\/static-registration-macro","content":"<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">TEST<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my test case\"<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">a<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1<\/span> <span class=\"o\">+<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">2<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">CHECK<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">==<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>You might have seen similar code in a testing framework <a href=\"https:\/\/github.com\/google\/googletest\">of<\/a> <a href=\"https:\/\/github.com\/catchorg\/Catch2\">your<\/a> <a href=\"https:\/\/github.com\/onqtam\/doctest\">choice<\/a>.\nThis example even sports a nifty <a href=\"\/blog\/2020\/09\/19\/destructuring-assertions\">destructuring assertion<\/a> that will print the values of <code class=\"language-plaintext highlighter-rouge\">a<\/code> and <code class=\"language-plaintext highlighter-rouge\">b<\/code> should the check fail, though that is not the focus today.<\/p>\n\n<p>One question that arises from time to time for code like this: how is this code even executed when it doesn\u2019t contain a <code class=\"language-plaintext highlighter-rouge\">main<\/code> function and is never referenced elsewhere?\nWorse, you can put that in a source file that has zero overlap with your other translation units, potentially not even a shared header, and it works.<\/p>\n\n<p>So in today\u2019s episode of How It\u2019s Made, we will construct such a <em>static registration macro<\/em> from scratch.\nThis will touch many intermediate C++ concepts that might be trivial to some, but worth repeating to others.<\/p>\n\n<p>The end result is a macro that enables decentralized static registration of functions or types and can be used to reduce code duplication and unnecessary file coupling.\nIt can also reduce errors by keeping definition and registration code close to each other.\nOf course, one drawback is that there is no central location where you can see all registrations.\nThus, as always, it is the responsibility of the developer to decide if this technique represents the local optimum in the particular trade-off space at hand.\nIn my opinion, it is a worthwhile tool to have at one\u2019s disposal.<\/p>\n\n<p>While creating this tool, we will (re)visit the following topics:<\/p>\n\n<ul>\n  <li>how to execute code before main<\/li>\n  <li>what is <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/storage_duration#Storage_duration\">static storage duration<\/a>?<\/li>\n  <li>evading the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/siof\">static initialization order fiasco<\/a> using the <em>construct on first use<\/em> idiom<\/li>\n  <li>using <code class=\"language-plaintext highlighter-rouge\">static<\/code> and <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/namespace#Unnamed_namespaces\">unnamed namespaces<\/a> to prevent <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/definition\">ODR<\/a> violations and linker goulash<\/li>\n  <li>use <code class=\"language-plaintext highlighter-rouge\">__LINE__<\/code> to support multiple registrations per file<\/li>\n  <li>writing a macro that concatenates two identifiers that might contain further macros<\/li>\n  <li>adding parameters to the registered function<\/li>\n  <li>approximating extensible named arguments<\/li>\n  <li>a minimal boilerplate macro-free version<\/li>\n<\/ul>\n\n<p>This post is deliberately a little meandering, briefly explaining relevant C++ concepts during each step.\nIf you are just interested in the result, you can skip directly to the <a href=\"#final-version\">final version<\/a>.<\/p>\n\n<h2 id=\"most-basic-version\">Most Basic Version<\/h2>\n\n<p>Let\u2019s start with the foundation:\nHow can one write code that is executed without being referenced?<\/p>\n\n<p>The answer is surprisingly simple:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"s\">\"look ma, before main!\"<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"c1\">\/\/ outside of a function:<\/span>\n<span class=\"n\">foo<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Here, <code class=\"language-plaintext highlighter-rouge\">f<\/code> is declared at namespace level, e.g. the global namespace, and thus has <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/storage_duration#Storage_duration\">static storage duration<\/a>.\nObjects with static storage duration are allocated (and their constructor is called) <em>before<\/em> the first statement of your <code class=\"language-plaintext highlighter-rouge\">main()<\/code> function.\nConsequently, they are destroyed when the program ends.\n(Initialization in C++ is, of course, a mess, so <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/initialization#Non-local_variables\">some exceptions apply<\/a>)<\/p>\n\n<p>Thus, if we want to register code automatically, before <code class=\"language-plaintext highlighter-rouge\">main<\/code>, we might be tempted to write:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">using<\/span> <span class=\"n\">fun_t<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)();<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">;<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"nf\">my_fun<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ user-code ...<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">.<\/span><span class=\"n\">push_back<\/span><span class=\"p\">(<\/span><span class=\"n\">my_fun<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"n\">foo<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"n\">main<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">f<\/span> <span class=\"o\">:<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">f<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<blockquote>\n  <p>Did you know? The <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/main_function\"><code class=\"language-plaintext highlighter-rouge\">main<\/code> function<\/a> in C++ is the only non-void function that has defined behavior when you don\u2019t return: an implicit <code class=\"language-plaintext highlighter-rouge\">return 0<\/code>.<\/p>\n<\/blockquote>\n\n<h2 id=\"initialization-order\">Initialization Order<\/h2>\n\n<p>We now want to \u201cscale up\u201d our previous code and use our technique among multiple files.\nThus, we write:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ A.hh<\/span>\n<span class=\"cp\">#pragma once\n<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">fun_t<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)();<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">fun_t<\/span> <span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">execute_registered_functions<\/span><span class=\"p\">();<\/span>\n\n\n<span class=\"c1\">\/\/ A.cc<\/span>\n<span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;A.hh&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">;<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"nf\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">fun_t<\/span> <span class=\"n\">f<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span> \n    <span class=\"n\">registered_functions<\/span><span class=\"p\">.<\/span><span class=\"n\">push_back<\/span><span class=\"p\">(<\/span><span class=\"n\">f<\/span><span class=\"p\">);<\/span> \n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">execute_registered_functions<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">f<\/span> <span class=\"o\">:<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">f<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n\n\n<span class=\"c1\">\/\/ B.cc<\/span>\n<span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;A.hh&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"kt\">void<\/span> <span class=\"n\">my_fun<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ user-code ...<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"n\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">my_fun<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"n\">foo<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n\n\n<span class=\"c1\">\/\/ main.cc<\/span>\n<span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;A.hh&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"kt\">int<\/span> <span class=\"n\">main<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">execute_registered_functions<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Congratulations! \nWe are now a victim of the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/siof\">static initialization order fiasco<\/a>.\nObjects with static storage duration are initialized before <code class=\"language-plaintext highlighter-rouge\">main()<\/code>, but their relative order is not specified.\nInside the same translation unit, it is top-to-bottom.\nOutside, it might depend on the order in which the files are passed to the compiler (\u2026 or the current day of the week).<\/p>\n\n<p>In our case, <code class=\"language-plaintext highlighter-rouge\">foo f<\/code> in <code class=\"language-plaintext highlighter-rouge\">B.cc<\/code> might get initialized before <code class=\"language-plaintext highlighter-rouge\">registered_functions<\/code> in <code class=\"language-plaintext highlighter-rouge\">A.cc<\/code>, thus either crash on start (as <code class=\"language-plaintext highlighter-rouge\">registered_functions<\/code> could contain uninitialized memory), ignore the registration (if the default ctor of <code class=\"language-plaintext highlighter-rouge\">registered_functions<\/code> \u201cclears\u201d the vector), or even occasionally work (if a zero-initialized vector is valid and its default ctor does nothing).\nAccessing <code class=\"language-plaintext highlighter-rouge\">registered_functions<\/code> before it\u2019s initialized is undefined behavior but in practice, your code might sometimes work, sometimes not, sometimes crash.<\/p>\n\n<p>An effective solution is the so called <em>construct on first use<\/em> idiom, similar to how singletons are often implemented.<\/p>\n\n<p>My preferred implementation of this idiom are function-local <code class=\"language-plaintext highlighter-rouge\">static<\/code> variables.\nThese also have static storage duration, but <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/storage_duration#Static_local_variables\">are initialized when the declaration \u201cis executed\u201d<\/a>.\nWhile not relevant in our case, note that function-local <code class=\"language-plaintext highlighter-rouge\">static<\/code> initialization is guaranteed to be thread-safe since C++11.<\/p>\n\n<p>Okay, let\u2019s fix the initialization order problem:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ A.cc<\/span>\n<span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;A.hh&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;&amp;<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">fun_t<\/span> <span class=\"n\">f<\/span><span class=\"p\">)<\/span> \n<span class=\"p\">{<\/span> \n    <span class=\"n\">registered_functions<\/span><span class=\"p\">().<\/span><span class=\"n\">push_back<\/span><span class=\"p\">(<\/span><span class=\"n\">f<\/span><span class=\"p\">);<\/span> \n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">execute_registered_functions<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">f<\/span> <span class=\"o\">:<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">())<\/span>\n        <span class=\"n\">f<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Now it doesn\u2019t matter in which order the translation units are initialized, <code class=\"language-plaintext highlighter-rouge\">registered_functions()<\/code> will always construct the <code class=\"language-plaintext highlighter-rouge\">static std::vector&lt;fun_t&gt; v<\/code> on the first call.<\/p>\n\n<h2 id=\"odr-violations\">ODR Violations<\/h2>\n\n<p>While we fixed the initialization order problem, we still violate one of C++\u2019s most (in)famous rules: the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/definition#One_Definition_Rule\">One Definition Rule<\/a><\/p>\n\n<blockquote>\n  <p>Only one definition of any variable, function, class type, enumeration type, concept, or template is allowed in any one translation unit (some of these may have multiple declarations, but only one definition is allowed).<\/p>\n<\/blockquote>\n\n<p>The most notable exception are functions that are either <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/inline\">explicitly or implicitly defined <code class=\"language-plaintext highlighter-rouge\">inline<\/code><\/a>.\nImplicit <code class=\"language-plaintext highlighter-rouge\">inline<\/code> is surprisingly common: functions that are directly defined inside a class\/struct\/union, <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> functions, fully defined templated functions, <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code> static data members.<\/p>\n\n<blockquote>\n  <p>Opinion: I consider <code class=\"language-plaintext highlighter-rouge\">inline<\/code> one of the most confusing keywords for newcomers, though <code class=\"language-plaintext highlighter-rouge\">static<\/code> comes close.\nI keep being surprised how many people (even those that have used C++ for years) still believe <code class=\"language-plaintext highlighter-rouge\">inline<\/code> is for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Inline_expansion\">inlining functions<\/a>.\nWhile it might have been the original intent and <a href=\"https:\/\/blog.tartanllama.xyz\/inline-hints\/\">some modern compilers still take it as an optimization hint<\/a>, it is mostly misleading.\n<code class=\"language-plaintext highlighter-rouge\">inline<\/code> is mainly about telling the linker to shut up about multiple definitions and pinky-promising that all definitions are exactly the same.\nSecondary purpose is to guarantee that function pointers and inline variable addresses are the same across translation units.<\/p>\n<\/blockquote>\n\n<p>So, how did we violate the ODR?<\/p>\n\n<p>Not yet, but we are almost begging for it.\nConsider what happens if we add a new file:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ C.cc<\/span>\n<span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;A.hh&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">my_fun<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ more user-code ...<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"n\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">my_fun<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"n\">foo<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><code class=\"language-plaintext highlighter-rouge\">foo<\/code>, <code class=\"language-plaintext highlighter-rouge\">f<\/code>, and <code class=\"language-plaintext highlighter-rouge\">my_fun<\/code> exist in <code class=\"language-plaintext highlighter-rouge\">B.cc<\/code> and <code class=\"language-plaintext highlighter-rouge\">C.cc<\/code> and do different things.\nBut neither \u201csees\u201d the other version, so this will most likely compile.\nODR is <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/ndr\">no diagnostic required<\/a> and in my experience, it is often not diagnosed.\nModern linkers became better diagnosing this kind of problem, though not with 100% accuracy.\nIn this case, the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Gold_(linker)\">ld.gold linker<\/a> actually complains:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">\/<\/span><span class=\"n\">usr<\/span><span class=\"o\">\/<\/span><span class=\"n\">bin<\/span><span class=\"o\">\/<\/span><span class=\"n\">ld<\/span><span class=\"p\">.<\/span><span class=\"n\">gold<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"n\">B<\/span><span class=\"p\">.<\/span><span class=\"n\">cc<\/span><span class=\"p\">.<\/span><span class=\"n\">o<\/span><span class=\"o\">:<\/span> <span class=\"n\">multiple<\/span> <span class=\"n\">definition<\/span> <span class=\"n\">of<\/span> <span class=\"err\">'<\/span><span class=\"n\">my_fun<\/span><span class=\"p\">()<\/span><span class=\"err\">'<\/span>\n<span class=\"o\">\/<\/span><span class=\"n\">usr<\/span><span class=\"o\">\/<\/span><span class=\"n\">bin<\/span><span class=\"o\">\/<\/span><span class=\"n\">ld<\/span><span class=\"p\">.<\/span><span class=\"n\">gold<\/span><span class=\"o\">:<\/span> <span class=\"n\">C<\/span><span class=\"p\">.<\/span><span class=\"n\">cc<\/span><span class=\"p\">.<\/span><span class=\"n\">o<\/span><span class=\"o\">:<\/span> <span class=\"n\">previous<\/span> <span class=\"n\">definition<\/span> <span class=\"n\">here<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>(Note that the <em>compiler<\/em> is often unable to diagnose this, but the <em>linker<\/em> can.)<\/p>\n\n<p>The problem here is that <code class=\"language-plaintext highlighter-rouge\">f<\/code> and <code class=\"language-plaintext highlighter-rouge\">my_fun<\/code> have <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/storage_duration#external_linkage\">external linkage<\/a>.\nThey are available to other translation units and are exported as symbols that are resolved by the linker.<\/p>\n\n<p>However, we never intended those to be visible to other TUs.\nThey are only a vehicle to implement our static registration.\nEven <code class=\"language-plaintext highlighter-rouge\">my_fun<\/code> should not be visible, we just want to access the function pointer later.<\/p>\n\n<p>The solution is to use <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/storage_duration#internal_linkage\">internal linkage<\/a>.\nNames with internal linkage are \u201clocal\u201d to a translation unit and are neither exported as symbols nor do they conflict with identical names from other translation units.<\/p>\n\n<p>There are two main mechanisms to switch to internal linkage:\n<code class=\"language-plaintext highlighter-rouge\">static<\/code> functions or variables, and <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/namespace#Unnamed_namespaces\">unnamed namespaces<\/a>, also known as anonymous namespaces.<\/p>\n\n<p>Note that <code class=\"language-plaintext highlighter-rouge\">foo::foo()<\/code> is defined inside <code class=\"language-plaintext highlighter-rouge\">struct foo<\/code> and thus is implicitly <code class=\"language-plaintext highlighter-rouge\">inline<\/code>.\nHowever, it still has external linkage and will conflict with <code class=\"language-plaintext highlighter-rouge\">foo<\/code>s defined in other TUs.\nWorse, because they are <code class=\"language-plaintext highlighter-rouge\">inline<\/code>, you will not get a <code class=\"language-plaintext highlighter-rouge\">multiple definition<\/code> error but the linker will arbitrarily pick one definition, almost always leading to weird errors.<\/p>\n\n<p>Thus, we arrive at the first, safely usable version:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ A.cc<\/span>\n<span class=\"p\">...<\/span>\n<span class=\"k\">static<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;&amp;<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<span class=\"p\">...<\/span>\n\n\n<span class=\"c1\">\/\/ B.cc \/ C.cc<\/span>\n<span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;A.hh&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"k\">namespace<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"kt\">void<\/span> <span class=\"n\">my_fun<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ more user-code ...<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"n\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">my_fun<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"n\">foo<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"reducing-boilerplate-via-macro\">Reducing Boilerplate via Macro<\/h2>\n\n<p>Macros don\u2019t have the best reputation in C++ as they are <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hygienic_macro\">rather unhygienic<\/a>, lead to <a href=\"https:\/\/gcc.gnu.org\/onlinedocs\/cpp\/Duplication-of-Side-Effects.html\">double expansion errors<\/a>, <a href=\"http:\/\/www.suodenjoki.dk\/us\/archive\/2010\/min-max.htm\">don\u2019t respect namespaces<\/a>, interact poorly with IDE features such as renaming, among others.<\/p>\n\n<p>Still, they are sometimes the local optimum for reducing boilerplate and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Don%27t_repeat_yourself\">DRY<\/a> violations.<\/p>\n\n<p>In this case, I\u2019d argue that a macro is justified to hide the noisy implementation detail:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#define REGISTER(Name)                                   \\\n    static_assert(true, Name \" must be string literal\"); \\\n    static void my_fun();                                \\\n    namespace                                            \\\n    {                                                    \\\n    struct foo                                           \\\n    {                                                    \\\n        foo() { register_function(Name, my_fun); }       \\\n    } f;                                                 \\\n    }                                                    \\\n    static void my_fun()\n<\/span><\/code><\/pre><\/div><\/div>\n\n<p>which can then be used as:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">REGISTER<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my fun\"<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ user code<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Apart from introducing a macro, I made some slight adjustments.<\/p>\n\n<p>Firstly, we often want to associate a name with whatever we registered, so I assumed that we can now register via <code class=\"language-plaintext highlighter-rouge\">void register_function(char const* name, fun_t f)<\/code>.\nThe <code class=\"language-plaintext highlighter-rouge\">static_assert(true, Name \" must be string literal\");<\/code> is an optional safeguard to guarantee that only string literals are used in the <code class=\"language-plaintext highlighter-rouge\">REGISTER<\/code> macro:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">8<\/span><span class=\"o\">:<\/span><span class=\"mi\">5<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"n\">expected<\/span> <span class=\"n\">string<\/span> <span class=\"n\">literal<\/span> <span class=\"k\">for<\/span> <span class=\"n\">diagnostic<\/span> <span class=\"n\">message<\/span> <span class=\"n\">in<\/span> <span class=\"k\">static_assert<\/span>\n<span class=\"n\">REGISTER<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">)<\/span>\n         <span class=\"o\">^<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Its use is optional, some people prefer to use <code class=\"language-plaintext highlighter-rouge\">register_function(#Name, my_fun);<\/code> to stringify <code class=\"language-plaintext highlighter-rouge\">Name<\/code> or even extend it to <code class=\"language-plaintext highlighter-rouge\">REGISTER(...)<\/code> and <code class=\"language-plaintext highlighter-rouge\">#__VA_ARGS__<\/code>.<\/p>\n\n<p>Secondly, for brevity, we can write <code class=\"language-plaintext highlighter-rouge\">struct foo { \/* ... *\/ } f;<\/code> to define a <code class=\"language-plaintext highlighter-rouge\">struct foo<\/code> and a variable <code class=\"language-plaintext highlighter-rouge\">foo f<\/code> at the same time.<\/p>\n\n<p>And finally, by forward declaring <code class=\"language-plaintext highlighter-rouge\">my_fun<\/code> (which must be visible in <code class=\"language-plaintext highlighter-rouge\">foo::foo()<\/code>) and defining it at the end without <code class=\"language-plaintext highlighter-rouge\">{ \/* ... *\/ }<\/code>, we enable the quite intuitive and readable <code class=\"language-plaintext highlighter-rouge\">REGISTER(\"my fun\") { \/* ... *\/ }<\/code> syntax.<\/p>\n\n<h2 id=\"increasing-macro-hygiene\">Increasing Macro Hygiene<\/h2>\n\n<p>While we have cleaned up the registration code, it still introduces (TU-local) identifiers that can easily conflict with other use code (<code class=\"language-plaintext highlighter-rouge\">my_fun<\/code>, <code class=\"language-plaintext highlighter-rouge\">f<\/code>, <code class=\"language-plaintext highlighter-rouge\">foo<\/code> are not <em>that<\/em> exotic).\nBehind the macro, their definitions are now basically invisible.\nThough unlikely to introduce silent errors, it can still lead to unexpected and arbitrary-seeming compile errors.<\/p>\n\n<p>Worse, we can currently only register a single function per TU.\nTrying to use <code class=\"language-plaintext highlighter-rouge\">REGISTER(\"some name\")<\/code> twice, even with different names, leads to duplicated definitions for <code class=\"language-plaintext highlighter-rouge\">foo<\/code>, <code class=\"language-plaintext highlighter-rouge\">f<\/code>, and <code class=\"language-plaintext highlighter-rouge\">my_fun<\/code>.<\/p>\n\n<p>We start by choosing identifiers that are less likely to conflict.\nWe don\u2019t have to fully uglify our code like the <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/h0flxv\/why_is_std_implementation_so_damn_ugly\/\">standard library has to<\/a>.\nEven if we wanted to, <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/identifiers#In_declarations\">we are technically not allowed to<\/a>.<\/p>\n\n<p>However, this will not solve the multiple registrations per file problem.\nFor that we need unique identifiers per file.\nUsing <code class=\"language-plaintext highlighter-rouge\">a##b<\/code>, we can concatenate identifiers in macros.\nWhile we cannot use <code class=\"language-plaintext highlighter-rouge\">Name<\/code> (a string literal), we can use <code class=\"language-plaintext highlighter-rouge\">__LINE__<\/code>, the current line number, to create our unique names.\nThat will allow us to register any number of functions per file, as long as we don\u2019t register two functions on the same line.<\/p>\n\n<p>In our naivety, we write:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#define REGISTER(Name) \\\n  ...\n<\/span>  <span class=\"k\">static<\/span> <span class=\"kt\">void<\/span> <span class=\"n\">my_fun_<\/span><span class=\"err\">##<\/span><span class=\"n\">__LINE__<\/span><span class=\"p\">();<\/span>\n  <span class=\"p\">...<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>\u2026 which <a href=\"https:\/\/godbolt.org\/z\/8W8vch\">does not work<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">19<\/span><span class=\"o\">:<\/span><span class=\"mi\">1<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"n\">redefinition<\/span> <span class=\"n\">of<\/span> <span class=\"err\">'<\/span><span class=\"n\">foo__LINE__<\/span><span class=\"err\">'<\/span>\n<span class=\"n\">REGISTER<\/span><span class=\"p\">(<\/span><span class=\"s\">\"fun b\"<\/span><span class=\"p\">)<\/span>\n<span class=\"o\">^<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">8<\/span><span class=\"o\">:<\/span><span class=\"mi\">12<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span> <span class=\"n\">expanded<\/span> <span class=\"n\">from<\/span> <span class=\"n\">macro<\/span> <span class=\"err\">'<\/span><span class=\"n\">REGISTER<\/span><span class=\"err\">'<\/span>\n    <span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span><span class=\"err\">##<\/span><span class=\"n\">__LINE__<\/span>\n           <span class=\"o\">^<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Turns out, <code class=\"language-plaintext highlighter-rouge\">a##b<\/code> does not expand <code class=\"language-plaintext highlighter-rouge\">a<\/code> or <code class=\"language-plaintext highlighter-rouge\">b<\/code> if they are macros themselves.<\/p>\n\n<blockquote>\n  <p>A ## operator between any two successive identifiers in the replacement-list runs parameter replacement on the two identifiers <strong>(which are not macro-expanded first)<\/strong> and then concatenates the result. This operation is called \u201cconcatenation\u201d or \u201ctoken pasting\u201d.<\/p>\n<\/blockquote>\n\n<p>Thus, we need the popular \u201ctwo-step\u201d macro concatenation:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#define CONCAT_IMPL(a, b) a##b\n#define CONCAT(a, b) CONCAT_IMPL(a, b)\n<\/span><\/code><\/pre><\/div><\/div>\n\n<p>The details are <a href=\"https:\/\/gcc.gnu.org\/onlinedocs\/cpp\/Macro-Arguments.html\">somewhat<\/a> <a href=\"https:\/\/gcc.gnu.org\/onlinedocs\/cpp\/Argument-Prescan.html\">arcane<\/a>.\nThe gist is: if you want to expand macros that in a concatenation, you need to go \u201cone level deeper\u201d:\ncalling <code class=\"language-plaintext highlighter-rouge\">CONCAT_IMPL(my_fun, __LINE__)<\/code> creates <code class=\"language-plaintext highlighter-rouge\">my_fun__LINE__<\/code>, but <code class=\"language-plaintext highlighter-rouge\">CONCAT(my_fun, __LINE__)<\/code> creates <code class=\"language-plaintext highlighter-rouge\">CONCAT_IMPL(my_fun, __LINE__)<\/code>, then does the so called <a href=\"https:\/\/gcc.gnu.org\/onlinedocs\/cpp\/Argument-Prescan.html\">argument prescan<\/a> which does a complete expansion to <code class=\"language-plaintext highlighter-rouge\">CONCAT_IMPL(my_fun, 17)<\/code> and finally <code class=\"language-plaintext highlighter-rouge\">my_fun17<\/code> (assuming we call the macro on line 17).<\/p>\n\n<p>For our hygienic version, I\u2019ve decided to put all declarations into a <code class=\"language-plaintext highlighter-rouge\">detail::<\/code> namespace and additionally choose long-ish names starting with an underscore (which is allowed as long as it\u2019s not the global namespace and the second character is neither underscore nor capital):<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#define REGISTER(Name)                                                   \\\n    static_assert(true, Name \" must be string literal\");                 \\\n    namespace detail                                                     \\\n    {                                                                    \\\n    <\/span><span class=\"cm\">\/* function we later define *\/<\/span><span class=\"cp\">                                       \\\n    static void CONCAT(_registered_fun_, __LINE__)();                    \\\n                                                                         \\\n    namespace <\/span><span class=\"cm\">\/* ensure internal linkage for struct *\/<\/span><span class=\"cp\">                   \\\n    {                                                                    \\\n    <\/span><span class=\"cm\">\/* helper struct for static registration in ctor *\/<\/span><span class=\"cp\">                  \\\n    struct CONCAT(_register_struct_, __LINE__)                           \\\n    {                                                                    \\\n        CONCAT(_register_struct_, __LINE__)()                            \\\n        { <\/span><span class=\"cm\">\/* called once before main *\/<\/span><span class=\"cp\">                                  \\\n            register_function(Name, CONCAT(_registered_fun_, __LINE__)); \\\n        }                                                                \\\n    } CONCAT(_register_struct_instance_, __LINE__);                      \\\n    }                                                                    \\\n    }                                                                    \\\n    <\/span><span class=\"cm\">\/* now actually defined to allow REGISTER(\"name\") { ... } syntax *\/<\/span><span class=\"cp\">  \\\n    void detail::CONCAT(_registered_fun_, __LINE__)()\n<\/span><\/code><\/pre><\/div><\/div>\n\n<p>With this, we can <a href=\"https:\/\/godbolt.org\/z\/cb36d7\">register multiple functions per TU<\/a>.\nNote that <code class=\"language-plaintext highlighter-rouge\">_registered_fun_<\/code> must be static and cannot be inside the unnamed namespace as <a href=\"https:\/\/godbolt.org\/z\/19aKPW\">our out-of-line definition would not work otherwise<\/a>.<\/p>\n\n<p>Finally, instead of <code class=\"language-plaintext highlighter-rouge\">__LINE__<\/code>, it is possible to use <code class=\"language-plaintext highlighter-rouge\">__COUNTER__<\/code> instead, which is supported by the major compilers.\nHowever, it does not work by simply replacing <code class=\"language-plaintext highlighter-rouge\">__LINE__<\/code> by it as it would generate different names for the different <code class=\"language-plaintext highlighter-rouge\">CONCAT(_registered_fun_, __COUNTER__)<\/code> instances inside our macro.\nA solution would be to create a helper <code class=\"language-plaintext highlighter-rouge\">REGISTER_IMPL(Name, ID)<\/code> macro and <code class=\"language-plaintext highlighter-rouge\">#define REGISTER(Name) REGISTER_IMPL(Name, __COUNTER__)<\/code>.\nI usually don\u2019t need to register more than one function per line and go with the <code class=\"language-plaintext highlighter-rouge\">__LINE__<\/code> version.<\/p>\n\n<p>In production, it is often useful to add <code class=\"language-plaintext highlighter-rouge\">__LINE__<\/code> and <code class=\"language-plaintext highlighter-rouge\">__FILE__<\/code> to <code class=\"language-plaintext highlighter-rouge\">register_function<\/code>.\nFor example, this can be used to output where a test was defined when an assertion failed.<\/p>\n\n<h2 id=\"adding-parameters\">Adding Parameters<\/h2>\n\n<p>We now have a basic, hygienic static registration macro.\nThis is often already enough for the desired purposes (e.g. declaring tests or registering polymorphic types in a deserialization system).<\/p>\n\n<p>However, sometimes the registered functions themselves have parameters.\nFor example, in my own testing library, one can declare fuzz tests via:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">FUZZ_TEST<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my test\"<\/span><span class=\"p\">)(<\/span><span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">rng<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">rng<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">float<\/span> <span class=\"n\">a<\/span> <span class=\"o\">=<\/span> <span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"n\">rng<\/span><span class=\"p\">,<\/span> <span class=\"o\">-<\/span><span class=\"mf\">10.<\/span><span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"mf\">10.<\/span><span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n    <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"n\">rng<\/span><span class=\"p\">,<\/span> <span class=\"o\">-<\/span><span class=\"mf\">10.<\/span><span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"mf\">10.<\/span><span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n    <span class=\"n\">CHECK<\/span><span class=\"p\">((<\/span><span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"o\">*<\/span> <span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"n\">a<\/span> <span class=\"o\">*<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span> <span class=\"o\">*<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Here, <code class=\"language-plaintext highlighter-rouge\">tg::rng&amp;<\/code> is a pseudorandom number generator provided by the testing library.\nThe fuzz tests should be deterministic relative to the <code class=\"language-plaintext highlighter-rouge\">rng<\/code> which makes it possible to exactly reproduce failing random tests by providing the same seed again.<\/p>\n\n<p>Of course, this can be simply added to the macro by either hard-coding the parameters in both declaration and definition, or by <code class=\"language-plaintext highlighter-rouge\">#define REGISTER(Name, ...)<\/code> and use <code class=\"language-plaintext highlighter-rouge\">__VA_ARGS__<\/code> in declaration and definition:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ solution 1: (tg::rng&amp; rng) is hard-coded in REGISTER(Name)<\/span>\n<span class=\"n\">REGISTER<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my test\"<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">a<\/span> <span class=\"o\">=<\/span> <span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"n\">rng<\/span><span class=\"p\">,<\/span> <span class=\"o\">-<\/span><span class=\"mf\">10.<\/span><span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"mf\">10.<\/span><span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n    <span class=\"n\">CHECK<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">abs<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span><span class=\"p\">)<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">10<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ solution 2: __VA_ARGS__ is used to change declaration and definition<\/span>\n<span class=\"n\">REGISTER<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my test\"<\/span><span class=\"p\">,<\/span> <span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">rng<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">rng<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">a<\/span> <span class=\"o\">=<\/span> <span class=\"n\">uniform<\/span><span class=\"p\">(<\/span><span class=\"n\">rng<\/span><span class=\"p\">,<\/span> <span class=\"o\">-<\/span><span class=\"mf\">10.<\/span><span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"mf\">10.<\/span><span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n    <span class=\"n\">CHECK<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">abs<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span><span class=\"p\">)<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">10<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>I dislike solution 1 because <code class=\"language-plaintext highlighter-rouge\">rng<\/code> becomes invisible at use-site.\nReaders have to know that it\u2019s available and how it\u2019s called.<\/p>\n\n<p>The second solution works fine and modern IDEs often even provide decent support for refactoring (e.g. renaming <code class=\"language-plaintext highlighter-rouge\">rng<\/code>), though it is far from guaranteed.<\/p>\n\n<p>This can also be used to support different signatures with the same macro:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">register_function<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">name<\/span><span class=\"p\">,<\/span> <span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"k\">constexpr<\/span> <span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_invocable_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">F<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">register_int_version<\/span><span class=\"p\">(<\/span><span class=\"n\">name<\/span><span class=\"p\">,<\/span> <span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">else<\/span> <span class=\"k\">if<\/span> <span class=\"k\">constexpr<\/span> <span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_invocable_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">F<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">register_float_version<\/span><span class=\"p\">(<\/span><span class=\"n\">name<\/span><span class=\"p\">,<\/span> <span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">else<\/span>\n        <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">always_false<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">F<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"only int and float versions are supported\"<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>(Where <code class=\"language-plaintext highlighter-rouge\">always_false&lt;T&gt;<\/code> is <a href=\"\/blog\/2020\/10\/03\/always-false\">the helper I\u2019ve blogged about before<\/a>.)<\/p>\n\n<p>While the second solution is definitely not bad, I like to use the variadic macro for named options instead (see next section).\nFor the <code class=\"language-plaintext highlighter-rouge\">FUZZ_TEST<\/code> macro, I only hard-coded the <code class=\"language-plaintext highlighter-rouge\">static void CONCAT(_registered_fun_, __LINE__)(tg::rng&amp;);<\/code> declaration, so the user has to provide the parameters for the definition <em>outside<\/em> the macro.\nThis allows choosing a different name for the parameter but NOT a different type, which may or may not be desirable.<\/p>\n\n<p>If the parameters should be fully user-defined (e.g. with a templated or overloaded <code class=\"language-plaintext highlighter-rouge\">register_function<\/code>) <strong>and<\/strong> the parameters should be outside the macro invocation, then the only way I could think of uses lambdas and requires an <code class=\"language-plaintext highlighter-rouge\">;<\/code> after the closing <code class=\"language-plaintext highlighter-rouge\">}<\/code>.\nThe idea is:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">namespace<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"n\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">Name<\/span><span class=\"p\">,<\/span> <span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"p\">}<\/span>\n<span class=\"k\">static<\/span> <span class=\"n\">foo<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[]<\/span> <span class=\"cm\">\/* end of macro *\/<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>which can then be used as:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">REGISTER<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my fun\"<\/span><span class=\"p\">)(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ user code<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>(Unfortunately, clang format butchers the formatting.)<\/p>\n\n<blockquote>\n  <p>If you find a way to have user-specified parameters <em>outside<\/em> the macro and <em>without<\/em> a closing <code class=\"language-plaintext highlighter-rouge\">;<\/code>, please let me know!<\/p>\n<\/blockquote>\n\n<h2 id=\"extensible-named-argument-approximation\">Extensible Named-Argument Approximation<\/h2>\n\n<p>Continuing with my test framework example, sometimes we want to disable a test, start it with a specific seed, or always run it after some other test.\nWith a variadic register macro, we can realize the following:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">TEST<\/span><span class=\"p\">(<\/span><span class=\"s\">\"test A\"<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"p\">...<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"n\">FUZZ_TEST<\/span><span class=\"p\">(<\/span><span class=\"s\">\"test B\"<\/span><span class=\"p\">,<\/span> <span class=\"n\">after<\/span><span class=\"p\">(<\/span><span class=\"s\">\"test A\"<\/span><span class=\"p\">),<\/span> <span class=\"n\">seed<\/span><span class=\"p\">(<\/span><span class=\"mi\">123456<\/span><span class=\"p\">))(<\/span><span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">rng<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">rng<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"p\">...<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"n\">TEST<\/span><span class=\"p\">(<\/span><span class=\"s\">\"test C\"<\/span><span class=\"p\">,<\/span> <span class=\"n\">disabled<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"p\">...<\/span> <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<blockquote>\n  <p>Note: I quite like this syntax, though I accept that it is a bit \u201ctoo much syntactic sugar\u201d for some people.\nIt\u2019s discoverability is not as high as for normal member functions but it is low-noise, flexible, and extensible.<\/p>\n<\/blockquote>\n\n<p>Let\u2019s say our test framework has the namespace <code class=\"language-plaintext highlighter-rouge\">tf<\/code>.\nWe define the following:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">namespace<\/span> <span class=\"n\">tf<\/span><span class=\"o\">::<\/span><span class=\"n\">config<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">after<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">explicit<\/span> <span class=\"n\">after<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">p<\/span><span class=\"p\">)<\/span> <span class=\"o\">:<\/span> <span class=\"n\">pattern<\/span><span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"p\">}<\/span>\n    <span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">pattern<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">static<\/span> <span class=\"k\">constexpr<\/span> <span class=\"k\">struct<\/span> <span class=\"nc\">disabled_t<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"p\">}<\/span> <span class=\"n\">disabled<\/span><span class=\"p\">;<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">seed<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">explicit<\/span> <span class=\"n\">seed<\/span><span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"o\">:<\/span> <span class=\"n\">value<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"p\">}<\/span>\n    <span class=\"kt\">size_t<\/span> <span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This is our extensible config namespace.\nInstead of a <code class=\"language-plaintext highlighter-rouge\">register_function<\/code>, we have:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">tf<\/span><span class=\"o\">::<\/span><span class=\"n\">Test<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">register_test<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">name<\/span><span class=\"p\">,<\/span> <span class=\"n\">fun_t<\/span> <span class=\"n\">f<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This allocates a <code class=\"language-plaintext highlighter-rouge\">Test<\/code> object and returns a reference to it.\nWe add:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">namespace<\/span> <span class=\"n\">tf<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">void<\/span> <span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">Test<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">test<\/span><span class=\"p\">,<\/span> <span class=\"n\">config<\/span><span class=\"o\">::<\/span><span class=\"n\">after<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">);<\/span>\n    <span class=\"kt\">void<\/span> <span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">Test<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">test<\/span><span class=\"p\">,<\/span> <span class=\"n\">config<\/span><span class=\"o\">::<\/span><span class=\"n\">disabled_t<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">);<\/span>\n    <span class=\"kt\">void<\/span> <span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">Test<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">test<\/span><span class=\"p\">,<\/span> <span class=\"n\">config<\/span><span class=\"o\">::<\/span><span class=\"n\">seed<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">);<\/span>\n\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span><span class=\"o\">...<\/span> <span class=\"nc\">Args<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"kt\">void<\/span> <span class=\"n\">do_configure<\/span><span class=\"p\">(<\/span><span class=\"n\">Test<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">test<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">name<\/span><span class=\"p\">,<\/span> <span class=\"n\">Args<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">...<\/span> <span class=\"n\">args<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"n\">test<\/span><span class=\"p\">.<\/span><span class=\"n\">setName<\/span><span class=\"p\">(<\/span><span class=\"n\">name<\/span><span class=\"p\">);<\/span>\n        <span class=\"p\">(<\/span><span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">test<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">forward<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Args<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">args<\/span><span class=\"p\">)),<\/span> <span class=\"p\">...);<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Where each <code class=\"language-plaintext highlighter-rouge\">configure<\/code> sets \/ changes appropriate members in <code class=\"language-plaintext highlighter-rouge\">Test<\/code>.\n<code class=\"language-plaintext highlighter-rouge\">do_configure<\/code> is a variadic helper that sets the test name and calls <code class=\"language-plaintext highlighter-rouge\">configure<\/code> for each (perfectly forwarded) argument.\nThe final piece is our variadic macro <code class=\"language-plaintext highlighter-rouge\">REGISTER_TEST(...)<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ previous ctor:<\/span>\n<span class=\"n\">CONCAT<\/span><span class=\"p\">(<\/span><span class=\"n\">_register_struct_<\/span><span class=\"p\">,<\/span> <span class=\"n\">__LINE__<\/span><span class=\"p\">)()<\/span>\n<span class=\"p\">{<\/span> <span class=\"cm\">\/* called once before main *\/<\/span>\n    <span class=\"n\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">Name<\/span><span class=\"p\">,<\/span> <span class=\"n\">CONCAT<\/span><span class=\"p\">(<\/span><span class=\"n\">_registered_fun_<\/span><span class=\"p\">,<\/span> <span class=\"n\">__LINE__<\/span><span class=\"p\">));<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ new ctor:<\/span>\n<span class=\"n\">CONCAT<\/span><span class=\"p\">(<\/span><span class=\"n\">_register_struct_<\/span><span class=\"p\">,<\/span> <span class=\"n\">__LINE__<\/span><span class=\"p\">)()<\/span>\n<span class=\"p\">{<\/span> <span class=\"cm\">\/* called once before main *\/<\/span>\n    <span class=\"k\">auto<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">test<\/span> <span class=\"o\">=<\/span> <span class=\"n\">tf<\/span><span class=\"o\">::<\/span><span class=\"n\">register_test<\/span><span class=\"p\">(<\/span><span class=\"n\">CONCAT<\/span><span class=\"p\">(<\/span><span class=\"n\">_registered_fun_<\/span><span class=\"p\">,<\/span> <span class=\"n\">__LINE__<\/span><span class=\"p\">));<\/span>\n\n    <span class=\"k\">using<\/span> <span class=\"k\">namespace<\/span> <span class=\"n\">tf<\/span><span class=\"o\">::<\/span><span class=\"n\">config<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">tf<\/span><span class=\"o\">::<\/span><span class=\"n\">do_configure<\/span><span class=\"p\">(<\/span><span class=\"n\">test<\/span><span class=\"p\">,<\/span> <span class=\"n\">__VA_ARGS__<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Here we can see why <code class=\"language-plaintext highlighter-rouge\">do_configure<\/code> also sets the name: if the variadic part would not include the name (e.g. <code class=\"language-plaintext highlighter-rouge\">#define REGISTER_TEST(Name, ...)<\/code>), then the <code class=\"language-plaintext highlighter-rouge\">do_configure<\/code> would not work when no options are passed: <code class=\"language-plaintext highlighter-rouge\">tf::do_configure(test, );<\/code> is not valid C++.\nThere are compiler extensions that make <code class=\"language-plaintext highlighter-rouge\">tf::do_configure(test,##__VA_ARGS__);<\/code> behave as desired and in C++20 one can use <code class=\"language-plaintext highlighter-rouge\">__VA_OPT__(,)<\/code>.<\/p>\n\n<p>With <code class=\"language-plaintext highlighter-rouge\">using namespace tf::config<\/code> we can directly pass <code class=\"language-plaintext highlighter-rouge\">disabled<\/code> instead of <code class=\"language-plaintext highlighter-rouge\">tf::config::disabled<\/code>, while <code class=\"language-plaintext highlighter-rouge\">tf::do_configure(test, __VA_ARGS__)<\/code> applies all options to the test (and sets the name).<\/p>\n\n<p>This mechanism is extensible because the <code class=\"language-plaintext highlighter-rouge\">tf::config<\/code> namespace is a customization point.\nNew options can be added, even by end-users of this library.\nThey are found by straightforward overload resolution.\nIn its current version, new options can also be added in custom namespaces as long as an appropriate <code class=\"language-plaintext highlighter-rouge\">configure<\/code> can be found via <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/adl\">ADL<\/a>.<\/p>\n\n<h2 id=\"macro-free-version\">Macro-Free Version<\/h2>\n\n<p>Finally, inspired by the \u201clambda trick\u201d of a <a href=\"#adding-parameters\">previous section<\/a>, we can also create a macro-free version that has very little boilerplate.\nUsing a templated constructor, we only need to define a single variable for registration.<\/p>\n\n<p>We have a common test framework header:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">namespace<\/span> <span class=\"n\">tf<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">test<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">name<\/span><span class=\"p\">,<\/span> <span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"c1\">\/\/ registration code as before ...<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>and then for each test in some source file:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">static<\/span> <span class=\"k\">auto<\/span> <span class=\"n\">my_test<\/span> <span class=\"o\">=<\/span> <span class=\"n\">tf<\/span><span class=\"o\">::<\/span><span class=\"n\">test<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my test\"<\/span><span class=\"p\">,<\/span> <span class=\"p\">[]<\/span> <span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ test code<\/span>\n<span class=\"p\">});<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Similar to a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/thread\/lock_guard\">std::lock_guard<\/a>, we need to assign a variable name, even if it is never actually used.\nApart from that, this method has surprisingly little \u201cnoise\u201d (for a non-macro approach).<\/p>\n\n<p>In C++20 we can even add back line and file information using <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/utility\/source_location\">std::source_location<\/a>.<\/p>\n\n<h2 id=\"final-version\">Final Version<\/h2>\n\n<p>This version of our \u201cfunction framework\u201d (namespace <code class=\"language-plaintext highlighter-rouge\">ff<\/code>) summarizes everything in this post except the macro-free version.\nFor demonstration purposes, I assume that the functions to register have a signature of <code class=\"language-plaintext highlighter-rouge\">(int, float)<\/code> and can be configured with <code class=\"language-plaintext highlighter-rouge\">some_flag<\/code> or <code class=\"language-plaintext highlighter-rouge\">some_cnt(n)<\/code>.\nWe start with the shared framework header that must be included whenever a function should be registered:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">namespace<\/span> <span class=\"n\">ff<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"c1\">\/\/\/ function signature that we want to register<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">fun_t<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span> <span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)(<\/span><span class=\"kt\">int<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/\/ wrapper class for a \"configured function\"<\/span>\n<span class=\"k\">class<\/span> <span class=\"nc\">RegisteredFunction<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"nl\">public:<\/span>\n    <span class=\"n\">RegisteredFunction<\/span><span class=\"p\">(<\/span><span class=\"n\">fun_t<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">line<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">file<\/span><span class=\"p\">);<\/span>\n\n    <span class=\"kt\">void<\/span> <span class=\"n\">setName<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">name<\/span><span class=\"p\">);<\/span>\n\n<span class=\"nl\">private:<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">_name<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">fun_t<\/span> <span class=\"n\">_fun<\/span><span class=\"p\">;<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">_line<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">_file<\/span><span class=\"p\">;<\/span>\n    <span class=\"c1\">\/\/ ... more options<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"c1\">\/\/\/ registers a new function<\/span>\n<span class=\"c1\">\/\/\/ returns a reference to the wrapper class so that we can configure it later<\/span>\n<span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">fun_t<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">line<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">file<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/\/ \"config\" namespace that contains our named argument approximation<\/span>\n<span class=\"k\">namespace<\/span> <span class=\"n\">config<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"k\">static<\/span> <span class=\"k\">constexpr<\/span> <span class=\"k\">struct<\/span> <span class=\"nc\">some_flag_t<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"p\">}<\/span> <span class=\"n\">some_flag<\/span><span class=\"p\">;<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">some_cnt<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">explicit<\/span> <span class=\"n\">some_cnt<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"o\">:<\/span> <span class=\"n\">value<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"p\">{}<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">config<\/span><span class=\"o\">::<\/span><span class=\"n\">some_flag_t<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">config<\/span><span class=\"o\">::<\/span><span class=\"n\">some_cnt<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">cnt<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/\/ the variadic do_configure sets test name and dispatches options to their configure(f, option)<\/span>\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span><span class=\"o\">...<\/span> <span class=\"nc\">Args<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"n\">do_configure<\/span><span class=\"p\">(<\/span><span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">name<\/span><span class=\"p\">,<\/span> <span class=\"n\">Args<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">...<\/span> <span class=\"n\">args<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">f<\/span><span class=\"p\">.<\/span><span class=\"n\">setName<\/span><span class=\"p\">(<\/span><span class=\"n\">name<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">(<\/span><span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">forward<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">Args<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">args<\/span><span class=\"p\">)),<\/span> <span class=\"p\">...);<\/span>\n<span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/\/ two-step macro concatenation to make CONCAT(a, __LINE__) work<\/span>\n<span class=\"cp\">#define CONCAT_IMPL(a, b) a##b\n#define CONCAT(a, b) CONCAT_IMPL(a, b)\n<\/span>\n<span class=\"c1\">\/\/\/ the main registration macro that registers the function on program startup<\/span>\n<span class=\"cp\">#define REGISTER_FUN(...)                                                                            \\\n    namespace detail                                                                                 \\\n    {                                                                                                \\\n    <\/span><span class=\"cm\">\/* function we later define *\/<\/span><span class=\"cp\">                                                                   \\\n    static void CONCAT(_registered_fun_, __LINE__)(int, float);                                      \\\n                                                                                                     \\\n    namespace <\/span><span class=\"cm\">\/* ensure internal linkage for struct *\/<\/span><span class=\"cp\">                                               \\\n    {                                                                                                \\\n    <\/span><span class=\"cm\">\/* helper struct for static registration in ctor *\/<\/span><span class=\"cp\">                                              \\\n    struct CONCAT(_register_struct_, __LINE__)                                                       \\\n    {                                                                                                \\\n        CONCAT(_register_struct_, __LINE__)()                                                        \\\n        { <\/span><span class=\"cm\">\/* called once before main *\/<\/span><span class=\"cp\">                                                              \\\n            auto&amp; f = ff::register_function(CONCAT(_registered_fun_, __LINE__), __LINE__, __FILE__); \\\n                                                                                                     \\\n            using namespace ff::config;                                                              \\\n            ff::do_configure(f, __VA_ARGS__);                                                        \\\n        }                                                                                            \\\n    } CONCAT(_register_struct_instance_, __LINE__);                                                  \\\n    }                                                                                                \\\n    }                                                                                                \\\n    <\/span><span class=\"cm\">\/* now actually defined to allow REGISTER(\"name\") { ... } syntax *\/<\/span><span class=\"cp\">                              \\\n    void detail::CONCAT(_registered_fun_, __LINE__)\n<\/span><\/code><\/pre><\/div><\/div>\n\n<p>Then we have a single translation unit that manages the registered functions:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">namespace<\/span> <span class=\"n\">ff<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"c1\">\/\/\/ static vector of all registered functions<\/span>\n<span class=\"c1\">\/\/\/ (while avoiding static initialization order fiasco)<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">unique_ptr<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&gt;&gt;&amp;<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">unique_ptr<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&gt;&gt;<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">register_function<\/span><span class=\"p\">(<\/span><span class=\"n\">fun_t<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">line<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">file<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">funs<\/span> <span class=\"o\">=<\/span> <span class=\"n\">registered_functions<\/span><span class=\"p\">();<\/span>\n    <span class=\"n\">funs<\/span><span class=\"p\">.<\/span><span class=\"n\">push_back<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">make_unique<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">line<\/span><span class=\"p\">,<\/span> <span class=\"n\">file<\/span><span class=\"p\">));<\/span>\n    <span class=\"k\">return<\/span> <span class=\"o\">*<\/span><span class=\"n\">funs<\/span><span class=\"p\">.<\/span><span class=\"n\">back<\/span><span class=\"p\">();<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">config<\/span><span class=\"o\">::<\/span><span class=\"n\">some_flag_t<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ set up f ...<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">configure<\/span><span class=\"p\">(<\/span><span class=\"n\">RegisteredFunction<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"n\">config<\/span><span class=\"o\">::<\/span><span class=\"n\">some_cnt<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">cnt<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ set up f ...<\/span>\n<span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And finally, new functions can be registered in any file:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">REGISTER_FUN<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my fun\"<\/span><span class=\"p\">)(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ user code<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"n\">REGISTER_FUN<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my fun 2\"<\/span><span class=\"p\">,<\/span> <span class=\"n\">some_flag<\/span><span class=\"p\">,<\/span> <span class=\"n\">some_cnt<\/span><span class=\"p\">(<\/span><span class=\"mi\">17<\/span><span class=\"p\">))(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">y<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ user code<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>This post got a bit longer than intended by I hope it still can provide to a broad audience.\nThe initial motivation was to create our own \u201ctest\/function registration macros\u201d like popular test frameworks do:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">TEST<\/span><span class=\"p\">(<\/span><span class=\"s\">\"my test case\"<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">a<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1<\/span> <span class=\"o\">+<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">2<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">CHECK<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">==<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The main ingredient was constructors of namespace-level objects.\nWe then started a journey of avoiding the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/siof\">static initialization order fiasco<\/a>, protecting against accidental <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/definition\">ODR violations<\/a>, and concatenating identifiers that are themselves macros.<\/p>\n\n<p>I would say our original goal was accomplished after we finished the <a href=\"#increasing-macro-hygiene\">hygienic macro version<\/a>.\nIf that version satisfies your needs, go for it.<\/p>\n\n<p>The next sections are basically stretch goals: how to handle parametric functions\/tests and how to approximate named arguments.\nWe even have a macro-free version that shows that, while more convenient, macros are not strictly required to have low-boilerplate function or test registration.<\/p>\n\n<p>Finally, not all \u201cfunction registration scenarios\u201d should be handled using one of the presented techniques.\nThis post describes <em>decentralized<\/em> systems for registration, which makes it easy and convenient to add new entries, i.e. something desirable for test frameworks.\nHowever, sometimes a central place where all functions are registered is more appropriate.<\/p>\n\n<p>As always: know your problem and choose the best tool.<\/p>\n\n<p>Additional discussion and comments on <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/jdgxcl\/static_registration_macros_eg_test_casefoo\/\">reddit<\/a>.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/old-cash-register-finance-money-1645813\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"Overloading by Return Type in C++","description":"Everyone overloads by argument types. But can you do return type?","pubDate":"Sat, 10 Oct 2020 04:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/10\/10\/return-type-overloading","guid":"https:\/\/artificial-mind.net\/blog\/2020\/10\/10\/return-type-overloading","content":"<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ this is OK<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"nf\">to_string<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">i<\/span><span class=\"p\">);<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"nf\">to_string<\/span><span class=\"p\">(<\/span><span class=\"kt\">bool<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"n\">si<\/span> <span class=\"o\">=<\/span> <span class=\"n\">to_string<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">);<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span> <span class=\"n\">sb<\/span> <span class=\"o\">=<\/span> <span class=\"n\">to_string<\/span><span class=\"p\">(<\/span><span class=\"nb\">true<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ this is not OK<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"nf\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"7\"<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"false\"<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Overloading by argument types is a pretty straightforward feature of many imperative languages.\nHowever, most of them don\u2019t support overloading by return types.\nIn particular, C++ does not.\nFor example, <a href=\"https:\/\/godbolt.org\/z\/ddWcE8\">clang complains<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">4<\/span><span class=\"o\">:<\/span><span class=\"mi\">6<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"n\">functions<\/span> <span class=\"n\">that<\/span> <span class=\"n\">differ<\/span> <span class=\"n\">only<\/span> <span class=\"n\">in<\/span> <span class=\"n\">their<\/span> <span class=\"k\">return<\/span> <span class=\"n\">type<\/span> <span class=\"n\">cannot<\/span> <span class=\"n\">be<\/span> <span class=\"n\">overloaded<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"nf\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"o\">~~~~<\/span> <span class=\"o\">^<\/span>\n\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">3<\/span><span class=\"o\">:<\/span><span class=\"mi\">5<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span> <span class=\"n\">previous<\/span> <span class=\"n\">declaration<\/span> <span class=\"n\">is<\/span> <span class=\"n\">here<\/span>\n<span class=\"kt\">int<\/span> <span class=\"nf\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"o\">~~~<\/span> <span class=\"o\">^<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>So\u2026 what if I told you we actually <em>can<\/em> overload by return type in C++?<\/p>\n\n<p>\u2026<\/p>\n\n<p>By a slight misuse of user-defined conversion operators.<\/p>\n\n<h2 id=\"a-proof-of-concept\">A Proof-of-Concept<\/h2>\n\n<p><a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/cast_operator\">Conversion operators can be user-defined<\/a> in C++.\nThey allow us to add custom implicit or explicit conversion to our types.\nThese conversions themselves can also be overloaded, which leads us to a simple prototype:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">to_string_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">operator<\/span> <span class=\"kt\">int<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"p\">;<\/span>  <span class=\"c1\">\/\/ int  from_string(std::string_view s);<\/span>\n    <span class=\"k\">operator<\/span> <span class=\"kt\">bool<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ bool from_string(std::string_view s);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">to_string_t<\/span><span class=\"p\">{<\/span><span class=\"s\">\"7\"<\/span><span class=\"p\">};<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"n\">to_string_t<\/span><span class=\"p\">{<\/span><span class=\"s\">\"true\"<\/span><span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><a href=\"https:\/\/godbolt.org\/z\/f3chze\">Looking at godbolt<\/a>, this compiles and calls the desired conversion operators.\nAn important point to note here is that the compiler needs to know the <em>target<\/em> type for the conversion.\nThus, <code class=\"language-plaintext highlighter-rouge\">auto i = to_string_t{\"7\"};<\/code> does not work as intended.\n<code class=\"language-plaintext highlighter-rouge\">i<\/code> will be of type <code class=\"language-plaintext highlighter-rouge\">to_string_t<\/code> and not <code class=\"language-plaintext highlighter-rouge\">int<\/code>.<\/p>\n\n<h2 id=\"packaging-and-usage\">Packaging and Usage<\/h2>\n\n<p>We can achieve the original goal of an overloaded function by simply returning <code class=\"language-plaintext highlighter-rouge\">to_string_t<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">to_string_t<\/span> <span class=\"nf\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">to_string_t<\/span><span class=\"p\">{<\/span><span class=\"n\">s<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"kt\">int<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"7\"<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"true\"<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Alternatively, one can adhere to <a href=\"https:\/\/herbsutter.com\/2013\/08\/12\/gotw-94-solution-aaa-style-almost-always-auto\/\">almost always auto<\/a> and write:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">int<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"7\"<\/span><span class=\"p\">));<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">bool<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"true\"<\/span><span class=\"p\">));<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This technique also works when calling other functions:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">void<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">(<\/span><span class=\"kt\">bool<\/span> <span class=\"n\">b<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">i<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"false\"<\/span><span class=\"p\">),<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"0\"<\/span><span class=\"p\">));<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And even interacts properly with more complex, potentially templated objects:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">vec<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">map<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"p\">,<\/span> <span class=\"kt\">bool<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">map<\/span><span class=\"p\">;<\/span>\n\n<span class=\"n\">vec<\/span><span class=\"p\">.<\/span><span class=\"n\">push_back<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"11\"<\/span><span class=\"p\">));<\/span>\n<span class=\"n\">map<\/span><span class=\"p\">[<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"3\"<\/span><span class=\"p\">)]<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"true\"<\/span><span class=\"p\">);<\/span>\n<span class=\"n\">vec<\/span><span class=\"p\">.<\/span><span class=\"n\">emplace_back<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"5\"<\/span><span class=\"p\">));<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Note how <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/container\/vector\/emplace_back\"><code class=\"language-plaintext highlighter-rouge\">emplace_back<\/code> is templated<\/a> and internally constructs an <code class=\"language-plaintext highlighter-rouge\">int<\/code> from our <code class=\"language-plaintext highlighter-rouge\">to_string_t<\/code>.<\/p>\n\n<p>Finally, <code class=\"language-plaintext highlighter-rouge\">if (cond)<\/code> tries to convert <code class=\"language-plaintext highlighter-rouge\">cond<\/code> to <code class=\"language-plaintext highlighter-rouge\">bool<\/code> and thus also works:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"true\"<\/span><span class=\"p\">))<\/span>\n    <span class=\"n\">on_true<\/span><span class=\"p\">();<\/span>\n<span class=\"k\">else<\/span>\n    <span class=\"nf\">on_false<\/span><span class=\"p\">();<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>These examples can be seen in action <a href=\"https:\/\/godbolt.org\/z\/1x5de4\">at godbolt<\/a>.<\/p>\n\n<h2 id=\"where-it-doesnt-work\">Where It Doesn\u2019t Work<\/h2>\n\n<p>While we can achieve overloading by return type in many cases using the conversion operator technique, it doesn\u2019t always apply.\nAs previously mentioned, the compiler needs to know the target type to choose the proper conversion operator and will not convert unless forced to.\nWe already saw the simplest case where no conversion is applied:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">to_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"7\"<\/span><span class=\"p\">);<\/span>\n<span class=\"c1\">\/\/ decltype(i) is to_string_t, not int<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Another problematic case arises when paired with normal overloading that leads to ambiguities:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">void<\/span> <span class=\"nf\">bar<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">bar<\/span><span class=\"p\">(<\/span><span class=\"kt\">bool<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">bar<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"true\"<\/span><span class=\"p\">));<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Which results in:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nl\">error:<\/span> <span class=\"n\">call<\/span> <span class=\"n\">to<\/span> <span class=\"err\">'<\/span><span class=\"n\">bar<\/span><span class=\"err\">'<\/span> <span class=\"n\">is<\/span> <span class=\"n\">ambiguous<\/span>\n<span class=\"nf\">bar<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"true\"<\/span><span class=\"p\">));<\/span>\n<span class=\"o\">^~~<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">41<\/span><span class=\"o\">:<\/span><span class=\"mi\">6<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span> <span class=\"n\">candidate<\/span> <span class=\"n\">function<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">bar<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">);<\/span>\n     <span class=\"o\">^<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">42<\/span><span class=\"o\">:<\/span><span class=\"mi\">6<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span> <span class=\"n\">candidate<\/span> <span class=\"n\">function<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">bar<\/span><span class=\"p\">(<\/span><span class=\"kt\">bool<\/span><span class=\"p\">);<\/span>\n     <span class=\"o\">^<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Similarly, this means that <code class=\"language-plaintext highlighter-rouge\">std::cout &lt;&lt; from_string(\"2\") &lt;&lt; std::endl;<\/code> does not work.\n(The error message for that is slightly ghastly as we have at least 16 candidate overloads.)<\/p>\n\n<p>Finally, only one user-defined conversion can be applied implicitly, so the following <a href=\"https:\/\/godbolt.org\/z\/Gxq1rb\">doesn\u2019t work<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">bar<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"n\">bar<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">i<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"kt\">void<\/span> <span class=\"n\">test_bar<\/span><span class=\"p\">(<\/span><span class=\"n\">bar<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">test_bar<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"3\"<\/span><span class=\"p\">));<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The compiler only tries to directly convert <code class=\"language-plaintext highlighter-rouge\">to_string_t<\/code> to <code class=\"language-plaintext highlighter-rouge\">bar<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">22<\/span><span class=\"o\">:<\/span><span class=\"mi\">5<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"n\">no<\/span> <span class=\"n\">matching<\/span> <span class=\"n\">function<\/span> <span class=\"k\">for<\/span> <span class=\"n\">call<\/span> <span class=\"n\">to<\/span> <span class=\"err\">'<\/span><span class=\"n\">test_bar<\/span><span class=\"err\">'<\/span>\n    <span class=\"n\">test_bar<\/span><span class=\"p\">(<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"3\"<\/span><span class=\"p\">));<\/span>\n    <span class=\"o\">^~~~~~~~<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">18<\/span><span class=\"o\">:<\/span><span class=\"mi\">6<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span> <span class=\"n\">candidate<\/span> <span class=\"n\">function<\/span> <span class=\"n\">not<\/span> <span class=\"n\">viable<\/span><span class=\"o\">:<\/span> <span class=\"n\">no<\/span> <span class=\"n\">known<\/span> <span class=\"n\">conversion<\/span> <span class=\"n\">from<\/span> <span class=\"err\">'<\/span><span class=\"n\">to_string_t<\/span><span class=\"err\">'<\/span> <span class=\"n\">to<\/span> <span class=\"err\">'<\/span><span class=\"n\">bar<\/span><span class=\"err\">'<\/span> <span class=\"k\">for<\/span> <span class=\"mi\">1<\/span><span class=\"n\">st<\/span> <span class=\"n\">argument<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">test_bar<\/span><span class=\"p\">(<\/span><span class=\"n\">bar<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n     <span class=\"o\">^<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>All these cases can be resolved by explicitly adding a cast to the desired type, e.g. <code class=\"language-plaintext highlighter-rouge\">int(to_string(\"10\"))<\/code>.<\/p>\n\n<h2 id=\"extensibility\">Extensibility<\/h2>\n\n<p>One important aspect of normal function overloading in C++ is the extensibility of the overload set.\nIndependent authors and libraries can add to the same overload set simply by providing a function with the proper name in the same namespace.\nWe might also add to the overload set via <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/adl\">argument-dependent lookup<\/a>, though if this should be considered feature or bug is slightly controversial.<\/p>\n\n<p>In its base form, our conversion operator approach is not extensible.\nUser-defined conversion functions must be member functions and we cannot add members to other classes post-hoc.\nThus, if our library defines<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">to_string_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">operator<\/span> <span class=\"kt\">int<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"p\">;<\/span>  <span class=\"c1\">\/\/ int  from_string(std::string_view s);<\/span>\n    <span class=\"k\">operator<\/span> <span class=\"kt\">bool<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ bool from_string(std::string_view s);<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Then this means \u201coverload for return types <code class=\"language-plaintext highlighter-rouge\">int<\/code> and <code class=\"language-plaintext highlighter-rouge\">bool<\/code>\u201d and other libraries \/ files cannot add to that.<\/p>\n\n<p>There is a way to fix this and add extensibility.\nThis will add some implementation complexity and for more specialized use cases, extensibility might not actually be desired.\nHowever, I would argue that <code class=\"language-plaintext highlighter-rouge\">from_string<\/code> should be designed with extensibility in mind.<\/p>\n\n<blockquote>\n  <p>Note: the rest of this section focuses more on metaprogramming and API design than the return-type overloading.\nYou can skip to the next section to see the final version.<\/p>\n<\/blockquote>\n\n<p>The solution here is that conversion functions can be templated.\nWe will use that to delegate the conversion to a template specialization, which is then properly extensible:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">always_false<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"conversion to T not supported\"<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"k\">operator<\/span> <span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;::<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"n\">to_string_t<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">to_string_t<\/span><span class=\"p\">{<\/span><span class=\"n\">s<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The conversion in <code class=\"language-plaintext highlighter-rouge\">to_string_t<\/code> is now templated and always calls <code class=\"language-plaintext highlighter-rouge\">to_string_impl&lt;T&gt;::from_string(s)<\/code>.\n<code class=\"language-plaintext highlighter-rouge\">to_string_impl&lt;T&gt;<\/code> is a class template that is specialized for all supported conversions.\nShould a non-supported conversion be called, the <a href=\"\/blog\/2020\/10\/03\/always-false\"><code class=\"language-plaintext highlighter-rouge\">always_false&lt;T&gt;<\/code> produces a nice(-ish) error message<\/a>.\nWe can now add our supported conversions via:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">bool<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"kt\">bool<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And similarly, other authors or the end user can add conversions for custom types.\nSometimes, it is useful to conditionally add conversions.\nFor example <code class=\"language-plaintext highlighter-rouge\">my_range&lt;T&gt;<\/code> might only be supported if <code class=\"language-plaintext highlighter-rouge\">T<\/code> itself supports <code class=\"language-plaintext highlighter-rouge\">from_string<\/code>.\nThus, it is customary to add a second template argument to the base template:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">always_false<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"conversion to T not supported\"<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This enables our imaginary end user to write:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">my_range<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;&gt;&gt;<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"n\">my_range<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ e.g. \"[1, 2, 3]\"<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The partial specialization is only \u201cactive\u201d, if <code class=\"language-plaintext highlighter-rouge\">T<\/code> itself satisfies <code class=\"language-plaintext highlighter-rouge\">has_from_string&lt;T&gt;<\/code>.\n(This, of course, is an example of <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/sfinae\">SFINAE<\/a>).<\/p>\n\n<p>Such a <code class=\"language-plaintext highlighter-rouge\">has_from_string&lt;T&gt;<\/code> might look like this:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">auto<\/span> <span class=\"nf\">impl_has_from_string<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span> <span class=\"o\">-&gt;<\/span> <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span>\n    <span class=\"n\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;::<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">declval<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">()),<\/span> \n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">true_type<\/span><span class=\"p\">{});<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">false_type<\/span> <span class=\"nf\">impl_has_from_string<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span><span class=\"p\">);<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">bool<\/span> <span class=\"n\">has_from_string<\/span> <span class=\"o\">=<\/span> <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">impl_has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">))<\/span><span class=\"o\">::<\/span><span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Here, we use <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/sfinae#Expression_SFINAE\">Expression SFINAE<\/a> to disable the first <code class=\"language-plaintext highlighter-rouge\">impl_has_from_string<\/code> overload if <code class=\"language-plaintext highlighter-rouge\">to_string_impl&lt;T&gt;::from_string<\/code> does not exist.\n<code class=\"language-plaintext highlighter-rouge\">impl_has_from_string<\/code> itself is overloaded on <code class=\"language-plaintext highlighter-rouge\">int<\/code> and <code class=\"language-plaintext highlighter-rouge\">char<\/code> and called via <code class=\"language-plaintext highlighter-rouge\">impl_has_from_string&lt;T&gt;(0)<\/code>.\nThis is a cheap way to say \u201ctry the <code class=\"language-plaintext highlighter-rouge\">int<\/code> overload first and if it doesn\u2019t apply, take the <code class=\"language-plaintext highlighter-rouge\">char<\/code> overload\u201d.\nHowever, if we try to check <code class=\"language-plaintext highlighter-rouge\">has_from_string&lt;T&gt;<\/code> for some type that has no <code class=\"language-plaintext highlighter-rouge\">from_string<\/code>, we trigger the <code class=\"language-plaintext highlighter-rouge\">static_assert(always_false&lt;T&gt;);<\/code> in the base template.\nThus, we move the <code class=\"language-plaintext highlighter-rouge\">static_assert<\/code> to <code class=\"language-plaintext highlighter-rouge\">to_string_t::operator T()<\/code> (see next section).<\/p>\n\n<p>Note that the templated <code class=\"language-plaintext highlighter-rouge\">to_string_impl<\/code> class is not the only option.\nWe could also use <a href=\"https:\/\/arne-mertz.de\/2016\/10\/tag-dispatch\/\">tag dispatch<\/a> or even normal overloading, e.g. by delegating to (a user-extensible) <code class=\"language-plaintext highlighter-rouge\">void convert_to(std::string_view s, T&amp; v)<\/code> that is overloaded on the second parameter.<\/p>\n\n<h2 id=\"final-version\">Final Version<\/h2>\n\n<p>For reference, our extensible and checkable version of return-type overloading in C++:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ base template, specialize and provide a static from_string method<\/span>\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span> \n<span class=\"p\">{<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">namespace<\/span> <span class=\"n\">detail<\/span> <span class=\"c1\">\/\/ hide impl detail<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">has_from_string<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span> <span class=\"o\">-&gt;<\/span> <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span>\n    <span class=\"n\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;::<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">declval<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">()),<\/span> \n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">true_type<\/span><span class=\"p\">{});<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">false_type<\/span> <span class=\"n\">has_from_string<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ check if T has a from_string<\/span>\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">bool<\/span> <span class=\"n\">has_from_string<\/span> <span class=\"o\">=<\/span> <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">detail<\/span><span class=\"o\">::<\/span><span class=\"n\">has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">))<\/span><span class=\"o\">::<\/span><span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n\n<span class=\"c1\">\/\/ return-type overload mechanism<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"k\">operator<\/span> <span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span> \n    <span class=\"p\">{<\/span>\n        <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"conversion to T not supported\"<\/span><span class=\"p\">);<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;::<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span> \n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"c1\">\/\/ convenience wrapper to provide a \"return-type overloaded function\"<\/span>\n<span class=\"n\">to_string_t<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">to_string_t<\/span><span class=\"p\">{<\/span><span class=\"n\">s<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Anyone can register new types, optionally using SFINAE to conditionally support them:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">bool<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"kt\">bool<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">my_range<\/span> <span class=\"p\">{<\/span> <span class=\"cm\">\/* ... *\/<\/span> <span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">my_range<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;&gt;&gt;<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static<\/span> <span class=\"n\">my_range<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><code class=\"language-plaintext highlighter-rouge\">has_from_string&lt;T&gt;<\/code> can be used to test (at compile time) if a <code class=\"language-plaintext highlighter-rouge\">from_string<\/code> is available for a certain type:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">);<\/span>\n<span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"o\">!<\/span><span class=\"n\">has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">char<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">);<\/span>\n<span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">my_range<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"p\">);<\/span>\n<span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"o\">!<\/span><span class=\"n\">has_from_string<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">my_range<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;&gt;<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Finally, we still retain the original usage that looks like a return-type overloaded function:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">int<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"7\"<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"true\"<\/span><span class=\"p\">);<\/span>\n<span class=\"n\">my_range<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">r<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"[0, 1, 2]\"<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>As always, <a href=\"https:\/\/godbolt.org\/z\/chfGa1\">a godbolt link to back up my claims<\/a>.<\/p>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>Overloading by argument types is ubiquitous in modern imperative languages but overloading by return type is usually not supported.\nHowever, we can emulate it in C++ by (mis)using <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/cast_operator\">user-defined conversion operators<\/a>.\nAs long as the target type is known, the proper \u201coverload\u201d is selected.\nThe basic version is simple:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">to_string_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">operator<\/span> <span class=\"kt\">int<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"p\">;<\/span>  <span class=\"c1\">\/\/ int  from_string(std::string_view s);<\/span>\n    <span class=\"k\">operator<\/span> <span class=\"kt\">bool<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ bool from_string(std::string_view s);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"n\">to_string_t<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">to_string_t<\/span><span class=\"p\">{<\/span><span class=\"n\">s<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>By default, this solution does not have the extensibility of normal by-argument-type overloading.\nHowever, we can restore it via a templated conversion operator that delegates to a templated class that can be specialized.\nIn the process, we can also define a <code class=\"language-plaintext highlighter-rouge\">has_from_string&lt;T&gt;<\/code> to help with diagnostics or SFINAE.<\/p>\n\n<p>Additional discussion and comments on <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/j94jd8\/overloading_by_return_type_in_c\/\">reddit<\/a> and <a href=\"https:\/\/news.ycombinator.com\/item?id=24752527\">hacker news<\/a>.<\/p>\n\n<h3 id=\"update-2020-10-12\">Update 2020-10-12:<\/h3>\n\n<p>Ok full disclosure: this post was written in a semi-serious mindset but turned to be something that can be misunderstood as \u201chow to use return-type overloading in production\u201d.\nI think <a href=\"https:\/\/gunshowcomic.com\/513\">this comic<\/a> summarizes my original feelings.\nStill, I think there are some valid use cases for this technique.<\/p>\n\n<p>One example I like is an adaptation from redditor <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/j94jd8\/overloading_by_return_type_in_c\/g8hmaye\">Skoparov<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">namespace<\/span> <span class=\"n\">limits<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"k\">struct<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"k\">constexpr<\/span> <span class=\"k\">operator<\/span> <span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">numeric_limits<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;::<\/span><span class=\"n\">max<\/span><span class=\"p\">();<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span> <span class=\"n\">max<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Now, <code class=\"language-plaintext highlighter-rouge\">int i = limits::max;<\/code> works like a return-type deduced constant.<\/p>\n\n<p>While the example <code class=\"language-plaintext highlighter-rouge\">from_string<\/code> might be questionable, my personal library contains a <code class=\"language-plaintext highlighter-rouge\">uniform(rng)<\/code> function that auto-converts to (and uniformly samples from) types with known, finite domains.\nMakes it very convenient to write <code class=\"language-plaintext highlighter-rouge\">color3 c = uniform(rng)<\/code>, <code class=\"language-plaintext highlighter-rouge\">angle a = uniform(rng)<\/code>, or <code class=\"language-plaintext highlighter-rouge\">if (uniform(rng)) { ... }<\/code>.<\/p>\n\n<p>Finally, yes, storing the returned <code class=\"language-plaintext highlighter-rouge\">to_string_t<\/code> can easily turn into a lifetime problem.\nHowever, this can easily be diagnosed with very high accuracy if one allows the conversion only for rvalue references:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">to_string_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">operator<\/span> <span class=\"kt\">int<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">;<\/span>  <span class=\"c1\">\/\/ int  from_string(std::string_view s);<\/span>\n    <span class=\"k\">operator<\/span> <span class=\"kt\">bool<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ bool from_string(std::string_view s);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"n\">to_string_t<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">to_string_t<\/span><span class=\"p\">{<\/span><span class=\"n\">s<\/span><span class=\"p\">};<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ user code:<\/span>\n<span class=\"kt\">int<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"7\"<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ still works<\/span>\n\n<span class=\"k\">auto<\/span> <span class=\"n\">j<\/span> <span class=\"o\">=<\/span> <span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string<\/span><span class=\"p\">(<\/span><span class=\"s\">\"10\"<\/span><span class=\"p\">));<\/span>\n<span class=\"kt\">int<\/span> <span class=\"n\">k<\/span> <span class=\"o\">=<\/span> <span class=\"n\">j<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ in the original version, this uses a dangling reference<\/span>\n           <span class=\"c1\">\/\/ now it gives a compile error<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>In the full solution, one could even do:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">to_string_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">s<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"k\">operator<\/span> <span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">to_string_impl<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;::<\/span><span class=\"n\">from_string<\/span><span class=\"p\">(<\/span><span class=\"n\">s<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"k\">operator<\/span> <span class=\"n\">T<\/span><span class=\"p\">()<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"p\">{<\/span> <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">always_false<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"must not be stored (for lifetime reasons)\"<\/span><span class=\"p\">);<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/india-merchant-dealer-ox-goods-4780853\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"always_false<T>","description":"The working version of static_assert(false);","pubDate":"Sat, 03 Oct 2020 04:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/10\/03\/always-false","guid":"https:\/\/artificial-mind.net\/blog\/2020\/10\/03\/always-false","content":"<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"nb\">false<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"must use correct specialization\"<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ valid use of foo&lt;T&gt; ...<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Sometimes, good intentions are punished by <a href=\"https:\/\/godbolt.org\/z\/YdK3vc\">the compiler<\/a>:<\/p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>&lt;source&gt;:4:19: error: static assertion failed: must use correct specialization\n    4 |     static_assert(false, \"must use correct specialization\");\n      |                   ^~~~~\n<\/code><\/pre><\/div><\/div>\n\n<p>To be fair, it\u2019s just doing <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/static_assert\">its job<\/a>:<\/p>\n\n<blockquote>\n  <p>If <code class=\"language-plaintext highlighter-rouge\">bool_constexpr<\/code> returns <code class=\"language-plaintext highlighter-rouge\">true<\/code>, this declaration has no effect. Otherwise a compile-time error is issued, and the text of message, if any, is included in the diagnostic message.<\/p>\n<\/blockquote>\n\n<p>Our intention was to trigger the static assertion only when <code class=\"language-plaintext highlighter-rouge\">foo&lt;T&gt;<\/code> is instantiated with a non-supported type.\nHowever, while processing declarations, the compiler sees <code class=\"language-plaintext highlighter-rouge\">static_assert(false, ...)<\/code>, which it can immediately evaluate and thus issues an error.<\/p>\n\n<p>In today\u2019s (rather short) post, we will use <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/dependent_name#Type-dependent_expressions\">type-dependent expressions<\/a> to realize our intentions.<\/p>\n\n<h2 id=\"a-simple-always_falset\">A simple <code class=\"language-plaintext highlighter-rouge\">always_false&lt;T&gt;<\/code><\/h2>\n\n<p>Our initial attempt does not work because the compiler can immediately evaluate the condition.\nJust making the expression more complex is <a href=\"https:\/\/godbolt.org\/z\/n1oMxa\">not a solution<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"must use correct specialization\"<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Still generates an error immediately.<\/p>\n\n<blockquote>\n  <p>Side note: am I the only on who constantly tries to write <code class=\"language-plaintext highlighter-rouge\">static_assert&lt;cond&gt;<\/code> instead of <code class=\"language-plaintext highlighter-rouge\">static_assert(cond)<\/code>?<\/p>\n<\/blockquote>\n\n<p>So, how can we \u201cdefer\u201d the evaluation until the class is actually instantiated?<\/p>\n\n<p>Using a similar solution to what we did in <a href=\"\/blog\/2020\/09\/26\/dont-deduce\">the <code class=\"language-plaintext highlighter-rouge\">dont_deduce&lt;T&gt;<\/code> post<\/a>: taking away the compiler\u2019s ability to reason about an expression without an actual type for <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nThe technical term for what we need is a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/dependent_name#Type-dependent_expressions\">type-dependent expression<\/a>.\nInstead of <code class=\"language-plaintext highlighter-rouge\">static_assert(false)<\/code>, you sometimes see the following patterns:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span> <span class=\"o\">!=<\/span> <span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">));<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"mi\">1<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">0<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"nb\">false<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">));<\/span>\n\n    <span class=\"c1\">\/\/ does not clutter error message for incomplete types:<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"nb\">false<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"o\">*<\/span><span class=\"p\">));<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Now, the conditions formally depend on <code class=\"language-plaintext highlighter-rouge\">T<\/code>, even if their actual value will always be false.\nStill, this is enough to <a href=\"https:\/\/godbolt.org\/z\/Eb3PTe\">silence the compiler<\/a>.\nOnly an actual instantiation will trigger the <a href=\"https:\/\/godbolt.org\/z\/jr3af8\">error<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">foo<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">bool<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n\n<span class=\"c1\">\/\/ leads to:<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span> <span class=\"n\">In<\/span> <span class=\"n\">instantiation<\/span> <span class=\"n\">of<\/span> <span class=\"err\">'<\/span><span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">bool<\/span><span class=\"o\">&gt;<\/span><span class=\"err\">'<\/span><span class=\"o\">:<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">13<\/span><span class=\"o\">:<\/span><span class=\"mi\">11<\/span><span class=\"o\">:<\/span>   <span class=\"n\">required<\/span> <span class=\"n\">from<\/span> <span class=\"n\">here<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">4<\/span><span class=\"o\">:<\/span><span class=\"mi\">25<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"k\">static<\/span> <span class=\"n\">assertion<\/span> <span class=\"n\">failed<\/span><span class=\"o\">:<\/span> <span class=\"n\">must<\/span> <span class=\"n\">use<\/span> <span class=\"n\">correct<\/span> <span class=\"n\">specialization<\/span>\n    <span class=\"mi\">4<\/span> <span class=\"o\">|<\/span>     <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"nb\">false<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"k\">sizeof<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span><span class=\"p\">),<\/span> <span class=\"s\">\"must use correct specialization\"<\/span><span class=\"p\">);<\/span>\n      <span class=\"o\">|<\/span>                   <span class=\"o\">~~~~~~^~~~~~~~~~~~<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>In my libraries, I usually formulate this as <a href=\"https:\/\/godbolt.org\/z\/5nYhc6\">a small helper<\/a> for consistency and clarity of intent:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span><span class=\"o\">...<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">bool<\/span> <span class=\"n\">always_false<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">false<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>which is then used as:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"n\">always_false<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"must use correct specialization\"<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The definition of <code class=\"language-plaintext highlighter-rouge\">always_false<\/code> is variadic so that multiple types can be provided.<\/p>\n\n<h2 id=\"use-with-if-constexpr\">Use with <code class=\"language-plaintext highlighter-rouge\">if constexpr<\/code><\/h2>\n\n<p>I have two main use cases for <code class=\"language-plaintext highlighter-rouge\">always_false&lt;T&gt;<\/code>.<\/p>\n\n<p>The first is the already mentioned class template specialization when the \u201cbase case\u201d is not supported.\nA solution I sometimes see is to only declare but not define the template:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ only define the specializations<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>However, the <a href=\"https:\/\/godbolt.org\/z\/dT9EhE\">resulting error message<\/a> will confuse users of your API:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">10<\/span><span class=\"o\">:<\/span><span class=\"mi\">11<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"n\">aggregate<\/span> <span class=\"err\">'<\/span><span class=\"n\">foo<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">bool<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">f<\/span><span class=\"err\">'<\/span> <span class=\"n\">has<\/span> <span class=\"n\">incomplete<\/span> <span class=\"n\">type<\/span> <span class=\"n\">and<\/span> <span class=\"n\">cannot<\/span> <span class=\"n\">be<\/span> <span class=\"n\">defined<\/span>\n   <span class=\"mi\">10<\/span> <span class=\"o\">|<\/span> <span class=\"n\">foo<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">bool<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n      <span class=\"o\">|<\/span>           <span class=\"o\">^<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The second use case is in combination with <code class=\"language-plaintext highlighter-rouge\">if constexpr<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">foo<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"k\">constexpr<\/span> <span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_same_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"c1\">\/\/ handle int case<\/span>\n    <span class=\"p\">}<\/span>\n    <span class=\"k\">else<\/span> <span class=\"k\">if<\/span> <span class=\"k\">constexpr<\/span> <span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_same_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"c1\">\/\/ handle float case<\/span>\n    <span class=\"p\">}<\/span>\n    <span class=\"c1\">\/\/ ... other cases<\/span>\n    <span class=\"k\">else<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"k\">static_assert<\/span><span class=\"p\">(<\/span><span class=\"nb\">false<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"T not supported\"<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This has the <a href=\"https:\/\/godbolt.org\/z\/51Gbde\">exact same problem<\/a>: the static assertion triggers even without any use or instantiation of <code class=\"language-plaintext highlighter-rouge\">foo<\/code>.\nLuckily it also has the same solution: <code class=\"language-plaintext highlighter-rouge\">static_assert(always_false&lt;T&gt;, \"T not supported\");<\/code><\/p>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p><code class=\"language-plaintext highlighter-rouge\">static_assert(false)<\/code> always immediately triggers (unless <code class=\"language-plaintext highlighter-rouge\">#ifdef<\/code>d) even if our intention is only to trigger on wrong instantiations.<\/p>\n\n<p>The solution is to use <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/dependent_name#Type-dependent_expressions\">type-dependent expressions<\/a> and make the condition depend on our template parameters.\nThe main use cases are templated classes, where the \u201cbase case\u201d should be forbidden and only \u201capproved\u201d specializations are allowed, and (chained) <code class=\"language-plaintext highlighter-rouge\">if constexpr<\/code>, where we want to communicate unsupported cases via static assertions.<\/p>\n\n<p>Popular ad-hoc solutions include <code class=\"language-plaintext highlighter-rouge\">sizeof(T) + 1 == 0<\/code> or <code class=\"language-plaintext highlighter-rouge\">false &amp;&amp; sizeof(T)<\/code>.\nI usually prefer a simple helper that clearly communicates intent:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span><span class=\"o\">...<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">constexpr<\/span> <span class=\"kt\">bool<\/span> <span class=\"n\">always_false<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">false<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Additional discussion and comments on <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/j4gsj4\/always_falset\/\">reddit<\/a>.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/sign-street-road-road-signs-2454791\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"dont_deduce<T>","description":"Reducing API friction by a seemingly useless typedef","pubDate":"Sat, 26 Sep 2020 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/09\/26\/dont-deduce","guid":"https:\/\/artificial-mind.net\/blog\/2020\/09\/26\/dont-deduce","content":"<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">foo_t<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">type<\/span> <span class=\"o\">=<\/span> <span class=\"n\">T<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">foo<\/span> <span class=\"o\">=<\/span> <span class=\"k\">typename<\/span> <span class=\"n\">foo_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;::<\/span><span class=\"n\">type<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Now that\u2019s a pretty useless snippet.<\/p>\n\n<p>\u2026 or is it??<\/p>\n\n<h2 id=\"controlling-type-deduction\">Controlling Type Deduction<\/h2>\n\n<p>Spoiler alert: it\u2019s not useless.<\/p>\n\n<p>In particular, it allows us to control <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/template_argument_deduction\">template argument deduction<\/a> to a certain extent.<\/p>\n\n<p>In my libraries, I usually define the typedef as follows:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">dont_deduce_t<\/span> \n<span class=\"p\">{<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">type<\/span> <span class=\"o\">=<\/span> <span class=\"n\">T<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">dont_deduce<\/span> <span class=\"o\">=<\/span> <span class=\"k\">typename<\/span> <span class=\"n\">dont_deduce_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;::<\/span><span class=\"n\">type<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This clearly communicates our intent: we want to disable type deduction for a certain parameter.<\/p>\n\n<blockquote>\n  <p>In C++20, the same functionality is provided in <code class=\"language-plaintext highlighter-rouge\">&lt;type_traits&gt;<\/code> under <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/types\/type_identity\">std::type_identity<\/a> (though I find this name significantly less clear in a function declaration). \nAlso note that I prefer a different convention than the C++ standard: the implementation type ends with <code class=\"language-plaintext highlighter-rouge\">_t<\/code> while the typedef is \u201cclean\u201d.<\/p>\n<\/blockquote>\n\n<p>Okay okay, not so fast.\nWhat problem are we trying to solve here?<\/p>\n\n<h2 id=\"motivating-example-vector-math\">Motivating Example: Vector Math<\/h2>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">vec3<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">T<\/span> <span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">y<\/span><span class=\"p\">,<\/span> <span class=\"n\">z<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"p\">{<\/span><span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">x<\/span> <span class=\"o\">*<\/span> <span class=\"n\">b<\/span><span class=\"p\">,<\/span> <span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">y<\/span> <span class=\"o\">*<\/span> <span class=\"n\">b<\/span><span class=\"p\">,<\/span> <span class=\"n\">a<\/span><span class=\"p\">.<\/span><span class=\"n\">z<\/span> <span class=\"o\">*<\/span> <span class=\"n\">b<\/span><span class=\"p\">};<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>That looks like a reasonable definition of <code class=\"language-plaintext highlighter-rouge\">operator*<\/code>, doesn\u2019t it?<\/p>\n\n<p>Turns out, it doesn\u2019t provide the smooth API that we\u2019d like to have.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"n\">v<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">3<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ that'd be a cool API, right?<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><a href=\"https:\/\/godbolt.org\/z\/78z8M1\">GCC 10.2 politely refuses this code<\/a> but not without a proper explanation:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">16<\/span><span class=\"o\">:<\/span><span class=\"mi\">11<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"n\">no<\/span> <span class=\"n\">match<\/span> <span class=\"k\">for<\/span> <span class=\"err\">'<\/span><span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"err\">'<\/span> <span class=\"p\">(<\/span><span class=\"n\">operand<\/span> <span class=\"n\">types<\/span> <span class=\"n\">are<\/span> <span class=\"err\">'<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span><span class=\"err\">'<\/span> <span class=\"n\">and<\/span> <span class=\"err\">'<\/span><span class=\"kt\">int<\/span><span class=\"err\">'<\/span><span class=\"p\">)<\/span>\n   <span class=\"mi\">16<\/span> <span class=\"o\">|<\/span>     <span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"n\">v<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">3<\/span><span class=\"p\">;<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"o\">~<\/span> <span class=\"o\">^<\/span> <span class=\"o\">~<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"o\">|<\/span>   <span class=\"o\">|<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"o\">|<\/span>   <span class=\"kt\">int<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">8<\/span><span class=\"o\">:<\/span><span class=\"mi\">9<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span> <span class=\"n\">candidate<\/span><span class=\"o\">:<\/span> <span class=\"err\">'<\/span><span class=\"k\">template<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span> <span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"k\">const<\/span> <span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;&amp;<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span><span class=\"p\">)<\/span><span class=\"err\">'<\/span>\n    <span class=\"mi\">8<\/span> <span class=\"o\">|<\/span> <span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"o\">^~~~~~~~<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">8<\/span><span class=\"o\">:<\/span><span class=\"mi\">9<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span>   <span class=\"k\">template<\/span> <span class=\"n\">argument<\/span> <span class=\"n\">deduction<\/span><span class=\"o\">\/<\/span><span class=\"n\">substitution<\/span> <span class=\"n\">failed<\/span><span class=\"o\">:<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"p\">&gt;<\/span><span class=\"o\">:<\/span><span class=\"mi\">16<\/span><span class=\"o\">:<\/span><span class=\"mi\">13<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span>   <span class=\"n\">deduced<\/span> <span class=\"n\">conflicting<\/span> <span class=\"n\">types<\/span> <span class=\"k\">for<\/span> <span class=\"n\">parameter<\/span> <span class=\"sc\">'T'<\/span> <span class=\"p\">(<\/span><span class=\"err\">'<\/span><span class=\"kt\">float<\/span><span class=\"err\">'<\/span> <span class=\"n\">and<\/span> <span class=\"err\">'<\/span><span class=\"kt\">int<\/span><span class=\"err\">'<\/span><span class=\"p\">)<\/span>\n   <span class=\"mi\">16<\/span> <span class=\"o\">|<\/span>     <span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"n\">v<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">3<\/span><span class=\"p\">;<\/span>\n      <span class=\"o\">|<\/span>             <span class=\"o\">^<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>What happens is that <code class=\"language-plaintext highlighter-rouge\">operator*<\/code> is called with a <code class=\"language-plaintext highlighter-rouge\">vec3&lt;float&gt;<\/code> and <code class=\"language-plaintext highlighter-rouge\">int<\/code>.\nThe compiler then tries to <em>deduce<\/em> a <code class=\"language-plaintext highlighter-rouge\">T<\/code> such that the signature <code class=\"language-plaintext highlighter-rouge\">(vec3&lt;T&gt; const&amp;, T)<\/code> is satisfied.\nFor the first argument it figures <code class=\"language-plaintext highlighter-rouge\">T = float<\/code> might be a good match while for the second, <code class=\"language-plaintext highlighter-rouge\">T = int<\/code> is the natural choice.\nThus it responds: <code class=\"language-plaintext highlighter-rouge\">deduced conflicting types for parameter 'T' ('float' and 'int')<\/code>.<\/p>\n\n<p>The compiler is only happy if the deductions for all arguments agree.\nHowever, our intention was more along the lines of:\n\u201cDeduce <code class=\"language-plaintext highlighter-rouge\">T<\/code> from <code class=\"language-plaintext highlighter-rouge\">vec3&lt;T&gt; const&amp;<\/code> and then try to convert <code class=\"language-plaintext highlighter-rouge\">b<\/code> to <code class=\"language-plaintext highlighter-rouge\">T<\/code>, preferably with an error if this doesn\u2019t work\u201d.<\/p>\n\n<p>We <em>could<\/em> make this work with additional template arguments and <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/sfinae\">SFINAE<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">B<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_convertible_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">B<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span><span class=\"p\">&gt;,<\/span> <span class=\"kt\">int<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">B<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>However, in my opinion, the superior solution is to \u201cdisable deduction\u201d for <code class=\"language-plaintext highlighter-rouge\">b<\/code> by turning its type into a so called <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/template_argument_deduction#Non-deduced_contexts\">non-deduced context<\/a>.\n<code class=\"language-plaintext highlighter-rouge\">dont_deduce&lt;T&gt;<\/code> is not a simple typedef of <code class=\"language-plaintext highlighter-rouge\">T<\/code>.\nRather, it is \u201cpiped through\u201d the templated class <code class=\"language-plaintext highlighter-rouge\">dont_deduce_t&lt;T&gt;<\/code> (via a typedef <code class=\"language-plaintext highlighter-rouge\">::type<\/code> that just maps <code class=\"language-plaintext highlighter-rouge\">T<\/code> to itself).\nBecause template specialization can arbitrarily mess with templated classes, deduction does <em>not<\/em> work \u201cthrough\u201d <code class=\"language-plaintext highlighter-rouge\">dont_deduce_t&lt;T&gt;::type<\/code>.\nIn particular, just because the compiler sees that <code class=\"language-plaintext highlighter-rouge\">dont_deduce_t&lt;T&gt;::type<\/code> should be <code class=\"language-plaintext highlighter-rouge\">float<\/code>, it cannot deduce that <code class=\"language-plaintext highlighter-rouge\">T<\/code> must be <code class=\"language-plaintext highlighter-rouge\">float<\/code> as well.\nJust imagine if someone writes a template specialization where <code class=\"language-plaintext highlighter-rouge\">dont_deduce_t&lt;some_user_type&gt;::type<\/code> is <code class=\"language-plaintext highlighter-rouge\">float<\/code>.<\/p>\n\n<blockquote>\n  <p>Shower thought: How about user-defined per-function deduction guides in C++3x?<\/p>\n<\/blockquote>\n\n<p>Anyways, by using <code class=\"language-plaintext highlighter-rouge\">dont_deduce&lt;T&gt;<\/code> we take away the compiler\u2019s ability to reason about <code class=\"language-plaintext highlighter-rouge\">T<\/code>, allowing us to write a rather clean API:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And voil\u00e0, now <code class=\"language-plaintext highlighter-rouge\">v * 3<\/code> <a href=\"https:\/\/godbolt.org\/z\/a5E6Yv\">just works<\/a>.<\/p>\n\n<p>As a non-deduced context, the second argument of <code class=\"language-plaintext highlighter-rouge\">operator*<\/code> is not used for template type deduction.\nThus, only <code class=\"language-plaintext highlighter-rouge\">vec3&lt;T&gt; const&amp;<\/code> is matched against the <code class=\"language-plaintext highlighter-rouge\">vec3&lt;float&gt;<\/code>, resulting in an unambiguous <code class=\"language-plaintext highlighter-rouge\">T = float<\/code>.\nAfter the typedef is resolved, we have <code class=\"language-plaintext highlighter-rouge\">operator*(vec3&lt;float&gt; const&amp;, float)<\/code> which is called with <code class=\"language-plaintext highlighter-rouge\">vec3&lt;float&gt;<\/code> and <code class=\"language-plaintext highlighter-rouge\">int<\/code>, which is perfectly fine as there obviously is a conversion from <code class=\"language-plaintext highlighter-rouge\">int<\/code> to <code class=\"language-plaintext highlighter-rouge\">float<\/code>.<\/p>\n\n<p>If the second argument is not convertible (e.g. <code class=\"language-plaintext highlighter-rouge\">v * \"foo\"<\/code>), we get <a href=\"https:\/\/godbolt.org\/z\/zzord3\">a nice error message<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">25<\/span><span class=\"o\">:<\/span><span class=\"mi\">11<\/span><span class=\"o\">:<\/span> <span class=\"n\">error<\/span><span class=\"o\">:<\/span> <span class=\"n\">no<\/span> <span class=\"n\">match<\/span> <span class=\"k\">for<\/span> <span class=\"err\">'<\/span><span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"err\">'<\/span> <span class=\"p\">(<\/span><span class=\"n\">operand<\/span> <span class=\"n\">types<\/span> <span class=\"n\">are<\/span> <span class=\"err\">'<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span><span class=\"err\">'<\/span> <span class=\"n\">and<\/span> <span class=\"err\">'<\/span><span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"p\">[<\/span><span class=\"mi\">4<\/span><span class=\"p\">]<\/span><span class=\"err\">'<\/span><span class=\"p\">)<\/span>\n   <span class=\"mi\">25<\/span> <span class=\"o\">|<\/span>     <span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"n\">v<\/span> <span class=\"o\">*<\/span> <span class=\"s\">\"foo\"<\/span><span class=\"p\">;<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"o\">~<\/span> <span class=\"o\">^<\/span> <span class=\"o\">~~~~~<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"o\">|<\/span>   <span class=\"o\">|<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"o\">|<\/span>   <span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"p\">[<\/span><span class=\"mi\">4<\/span><span class=\"p\">]<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">17<\/span><span class=\"o\">:<\/span><span class=\"mi\">9<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span> <span class=\"n\">candidate<\/span><span class=\"o\">:<\/span> <span class=\"err\">'<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"k\">const<\/span> <span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;&amp;<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">)<\/span> <span class=\"p\">[<\/span><span class=\"n\">with<\/span> <span class=\"n\">T<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">float<\/span><span class=\"p\">;<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">float<\/span><span class=\"p\">]<\/span><span class=\"err\">'<\/span>\n   <span class=\"mi\">17<\/span> <span class=\"o\">|<\/span> <span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n      <span class=\"o\">|<\/span>         <span class=\"o\">^~~~~~~~<\/span>\n<span class=\"o\">&lt;<\/span><span class=\"n\">source<\/span><span class=\"o\">&gt;:<\/span><span class=\"mi\">17<\/span><span class=\"o\">:<\/span><span class=\"mi\">52<\/span><span class=\"o\">:<\/span> <span class=\"n\">note<\/span><span class=\"o\">:<\/span>   <span class=\"n\">no<\/span> <span class=\"n\">known<\/span> <span class=\"n\">conversion<\/span> <span class=\"k\">for<\/span> <span class=\"n\">argument<\/span> <span class=\"mi\">2<\/span> <span class=\"n\">from<\/span> <span class=\"err\">'<\/span><span class=\"k\">const<\/span> <span class=\"kt\">char<\/span> <span class=\"p\">[<\/span><span class=\"mi\">4<\/span><span class=\"p\">]<\/span><span class=\"err\">'<\/span> <span class=\"n\">to<\/span> <span class=\"err\">'<\/span><span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span><span class=\"err\">'<\/span> <span class=\"p\">{<\/span><span class=\"n\">aka<\/span> <span class=\"err\">'<\/span><span class=\"kt\">float<\/span><span class=\"err\">'<\/span><span class=\"p\">}<\/span>\n   <span class=\"mi\">17<\/span> <span class=\"o\">|<\/span> <span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n      <span class=\"o\">|<\/span>                                     <span class=\"o\">~~~~~~~~~~~~~~~^<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This also highlights a subtle difference between the SFINAE and the <code class=\"language-plaintext highlighter-rouge\">dont_deduce&lt;T&gt;<\/code> solution:\nWith SFINAE, the function overload does not really exist (though modern compilers still give <a href=\"https:\/\/godbolt.org\/z\/PhnMqT\">reasonable, though often confusing error messages<\/a>).\nWith <code class=\"language-plaintext highlighter-rouge\">dont_deduce&lt;T&gt;<\/code>, the function exists and it\u2019s like calling a function with the wrong type of parameters.<\/p>\n\n<p>Also, SFINAE tends to blow up compile times while there should be no measurable negative impact of using <code class=\"language-plaintext highlighter-rouge\">dont_deduce&lt;T&gt;<\/code>.<\/p>\n\n<h2 id=\"other-useful-examples\">Other Useful Examples<\/h2>\n\n<p>That\u2019s all that I wanted to explain about <code class=\"language-plaintext highlighter-rouge\">dont_deduce&lt;T&gt;<\/code>.\nWhat follows are a few additional examples where it makes for a better API (in my opinion) if some arguments are not deduced.<\/p>\n\n<h3 id=\"clamping\">Clamping<\/h3>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">T<\/span> <span class=\"nf\">clamp<\/span><span class=\"p\">(<\/span><span class=\"n\">T<\/span> <span class=\"n\">value<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">min<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">max<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">value<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">min<\/span> <span class=\"o\">?<\/span> <span class=\"n\">min<\/span> <span class=\"o\">:<\/span> <span class=\"n\">value<\/span> <span class=\"o\">&gt;<\/span> <span class=\"n\">max<\/span> <span class=\"o\">?<\/span> <span class=\"n\">max<\/span> <span class=\"o\">:<\/span> <span class=\"n\">value<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ otherwise this wouldn't work:<\/span>\n<span class=\"kt\">float<\/span> <span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">v<\/span> <span class=\"o\">=<\/span> <span class=\"n\">clamp<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">,<\/span> <span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"mi\">1<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h3 id=\"contains-with-epsilon\">Contains with Epsilon<\/h3>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"nf\">contains<\/span><span class=\"p\">(<\/span><span class=\"n\">sphere3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">sphere<\/span><span class=\"p\">,<\/span> <span class=\"n\">pos3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">p<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">eps<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">distance<\/span><span class=\"p\">(<\/span><span class=\"n\">sphere<\/span><span class=\"p\">.<\/span><span class=\"n\">center<\/span><span class=\"p\">,<\/span> <span class=\"n\">p<\/span><span class=\"p\">)<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"n\">sphere<\/span><span class=\"p\">.<\/span><span class=\"n\">radius<\/span> <span class=\"o\">+<\/span> <span class=\"n\">eps<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ otherwise this wouldn't work:<\/span>\n<span class=\"n\">sphere3<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">s<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">pos3<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">p<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">contains<\/span><span class=\"p\">(<\/span><span class=\"n\">s<\/span><span class=\"p\">,<\/span> <span class=\"n\">p<\/span><span class=\"p\">,<\/span> <span class=\"mf\">1e-5<\/span><span class=\"p\">))<\/span> <span class=\"p\">...;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h3 id=\"queries-with-defaults\">Queries with Defaults<\/h3>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">T<\/span> <span class=\"nf\">get_property_or<\/span><span class=\"p\">(<\/span><span class=\"n\">property_handle<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">prop<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">default_val<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ otherwise this wouldn't work:<\/span>\n<span class=\"n\">property_handle<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">p<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">s<\/span> <span class=\"o\">=<\/span> <span class=\"n\">get_property_or<\/span><span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"p\">,<\/span> <span class=\"s\">\"&lt;no value&gt;\"<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Note that in this example the handle should dictate the type, not the default value.<\/p>\n\n<h3 id=\"containers-and-spans\">Containers and Spans<\/h3>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">add_range<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;&amp;<\/span> <span class=\"n\">vec<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">span<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span> <span class=\"k\">const<\/span><span class=\"o\">&gt;&gt;<\/span> <span class=\"n\">values<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">vec<\/span><span class=\"p\">.<\/span><span class=\"n\">resize<\/span><span class=\"p\">(<\/span><span class=\"n\">vec<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">()<\/span> <span class=\"o\">+<\/span> <span class=\"n\">values<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">());<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">v<\/span> <span class=\"o\">:<\/span> <span class=\"n\">values<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">vec<\/span><span class=\"p\">.<\/span><span class=\"n\">push_back<\/span><span class=\"p\">(<\/span><span class=\"n\">v<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ otherwise this wouldn't work:<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">vecA<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">vecB<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">add_range<\/span><span class=\"p\">(<\/span><span class=\"n\">vecA<\/span><span class=\"p\">,<\/span> <span class=\"n\">vecB<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>The (seemingly useless) <code class=\"language-plaintext highlighter-rouge\">dont_deduce&lt;T&gt;<\/code> (or <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/types\/type_identity\">std::type_identity<\/a>) can be used to selectively disable <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/template_argument_deduction\">template argument deduction<\/a>.<\/p>\n\n<p>This is a valuable tool for reducing API friction:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ (A)<\/span>\n\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">*<\/span><span class=\"p\">(<\/span><span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">dont_deduce<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ (B)<\/span>\n\n<span class=\"n\">vec3<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">v<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">v<\/span> <span class=\"o\">*<\/span> <span class=\"mi\">3<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ works with (B) but not with (A)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Additional discussion and comments on <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/j0pgxh\/controlling_template_argument_deduction_via_dont\/\">reddit<\/a>.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/murder-the-scene-investigation-5294706\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"Destructuring Assertions","description":"How can we make ASSERT(a == b) print the values of a and b?","pubDate":"Sat, 19 Sep 2020 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/09\/19\/destructuring-assertions","guid":"https:\/\/artificial-mind.net\/blog\/2020\/09\/19\/destructuring-assertions","content":"<p>Assertions are a major tool in defensive programming and I consider it a symbol of a mature programmer when their code is liberally accompanied by assertions.\nThey embody a fail-fast mentality and serve as additional documentation, making many assumptions explicit that the programmer made during the implementation.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;cassert&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"kt\">float<\/span> <span class=\"nf\">dot_product<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">span<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">lhs<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">span<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">rhs<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">assert<\/span><span class=\"p\">(<\/span><span class=\"n\">lhs<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">()<\/span> <span class=\"o\">==<\/span> <span class=\"n\">rhs<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">());<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">sum<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.<\/span><span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"kt\">size_t<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">lhs<\/span><span class=\"p\">.<\/span><span class=\"n\">size<\/span><span class=\"p\">();<\/span> <span class=\"o\">++<\/span><span class=\"n\">i<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">sum<\/span> <span class=\"o\">+=<\/span> <span class=\"n\">lhs<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">]<\/span> <span class=\"o\">*<\/span> <span class=\"n\">rhs<\/span><span class=\"p\">[<\/span><span class=\"n\">i<\/span><span class=\"p\">];<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">sum<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>In this post we will remedy a shortcoming of the traditional C assertion:<\/p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Assertion `lhs.size() == rhs.size()' failed.\n<\/code><\/pre><\/div><\/div>\n\n<p>Okay, our assertion failed, our code (or assumption) is buggy.<\/p>\n\n<p>But what are the sizes of <code class=\"language-plaintext highlighter-rouge\">lhs<\/code> and <code class=\"language-plaintext highlighter-rouge\">rhs<\/code>?<\/p>\n\n<p>Test frameworks like <a href=\"https:\/\/github.com\/catchorg\/Catch2\">Catch2<\/a> or <a href=\"https:\/\/github.com\/onqtam\/doctest\">doctest<\/a> are (seemingly magically) able to display the values of <code class=\"language-plaintext highlighter-rouge\">lhs<\/code> and <code class=\"language-plaintext highlighter-rouge\">rhs<\/code> when their assertions \/ checking macros fail:<\/p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Example.cpp:7: FAILED:\n  REQUIRE( lhs.size() == rhs.size() )\nwith expansion:\n  100 == 300\n<\/code><\/pre><\/div><\/div>\n\n<p>Let\u2019s assume we have an assertion of the form <code class=\"language-plaintext highlighter-rouge\">ASSERT(a == b)<\/code>.<\/p>\n\n<p>The rest of this post explains how to display the values of <code class=\"language-plaintext highlighter-rouge\">a<\/code> and <code class=\"language-plaintext highlighter-rouge\">b<\/code>.<\/p>\n\n<blockquote>\n  <p>SPOILER: we\u2019re going to exploit operator precedence and break some macro hygiene. <code class=\"language-plaintext highlighter-rouge\">a == b<\/code> will be expanded to <code class=\"language-plaintext highlighter-rouge\">assert_t{} &lt; a == b<\/code>, which is then parsed as <code class=\"language-plaintext highlighter-rouge\">(assert_t{} &lt; a) == b<\/code>, allowing access to <code class=\"language-plaintext highlighter-rouge\">a<\/code> and <code class=\"language-plaintext highlighter-rouge\">b<\/code>.<\/p>\n<\/blockquote>\n\n<h2 id=\"typical-assertion-anatomy\">Typical Assertion Anatomy<\/h2>\n\n<p>Before we start destructuring the assertion expression, let\u2019s take a look at how assertions are typically implemented.\nA super naive version would be:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#define ASSERT(expr) if (!expr) \\\n                         on_assert_failed(#expr, __FILE__, __LINE__, __FUNCTION__);\n<\/span><\/code><\/pre><\/div><\/div>\n\n<p>However, looking at a <a href=\"https:\/\/github.com\/lattera\/glibc\/blob\/master\/assert\/assert.h#L89\">standard library implementation<\/a> we find something similar to<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">void<\/span> <span class=\"nf\">on_assert_failed<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">expr<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">file<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">line<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">fun<\/span><span class=\"p\">);<\/span>\n\n<span class=\"cp\">#define ASSERT(expr) (static_cast&lt;bool&gt;(expr) ?                                  \\\n                      void(0) :                                                  \\\n                      on_assert_failed(#expr, __FILE__, __LINE__, __FUNCTION__))\n<\/span><\/code><\/pre><\/div><\/div>\n\n<p>There is already something noteworthy going on here.\nMost of this is basic macro hygiene but it cannot hurt to repeat it.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">!expr<\/code> is dangerous as <code class=\"language-plaintext highlighter-rouge\">ASSERT(a == b)<\/code> would expand to <code class=\"language-plaintext highlighter-rouge\">if (!a == b)<\/code>.\nThe common fix of <code class=\"language-plaintext highlighter-rouge\">!(expr)<\/code> is slightly better but still dangerous as the additional parentheses silence warnings, e.g. for the typo in <code class=\"language-plaintext highlighter-rouge\">ASSERT(a = b)<\/code>.\nA better solution is <code class=\"language-plaintext highlighter-rouge\">!static_cast&lt;bool&gt;(expr)<\/code> which preserves most warnings.<\/p>\n\n<p>Secondly, in function-like macros, it is common courtesy to make them behave as if they were normal functions.\nOn the one hand, this means requiring a semicolon at the end.\nOn the other hand, it means that one should implement <code class=\"language-plaintext highlighter-rouge\">ASSERT<\/code> as an <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/expressions\">expression<\/a>, not a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/statements\">statement<\/a>.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ should error due to missing ;<\/span>\n<span class=\"n\">ASSERT<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">==<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> \n\n<span class=\"c1\">\/\/ should work as expected<\/span>\n<span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">some_condition<\/span><span class=\"p\">)<\/span>\n    <span class=\"n\">ASSERT<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">==<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n<span class=\"k\">else<\/span>\n    <span class=\"nf\">ASSERT<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">!=<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Note that the <code class=\"language-plaintext highlighter-rouge\">else<\/code> would attach to the <code class=\"language-plaintext highlighter-rouge\">if (!expr)<\/code> of the naive version, NOT the expected <code class=\"language-plaintext highlighter-rouge\">if (some_condition)<\/code>.\nThis is the reason why expressions are preferred.\nFor the assertion the ternary operator <code class=\"language-plaintext highlighter-rouge\">cond ? true_expr : false_expr<\/code> is sufficient.\nIf you want to execute multiple statements, the <code class=\"language-plaintext highlighter-rouge\">do { ... statements ... } while(0)<\/code> construct is popular.\nIt is not an expression but at least it interacts properly with other control flow structures.<\/p>\n\n<h2 id=\"destructuring-simple-expressions\">Destructuring Simple Expressions<\/h2>\n\n<p>So, now that we know how a basic assertion works, how do we \u201canalyze\u201d the asserted expression to get <code class=\"language-plaintext highlighter-rouge\">a<\/code> and <code class=\"language-plaintext highlighter-rouge\">b<\/code> in <code class=\"language-plaintext highlighter-rouge\">ASSERT(a == b)<\/code>?\nThe metaprogramming capabilities of C++ do not allow us to inspect arbitrary expression as for example <a href=\"https:\/\/hookrace.net\/blog\/introduction-to-metaprogramming-in-nim\/#macros\">Nim Macros<\/a> are able to. \nWhat can we do instead?<\/p>\n\n<p>If <code class=\"language-plaintext highlighter-rouge\">ASSERT<\/code> were a normal function, <code class=\"language-plaintext highlighter-rouge\">a == b<\/code> would be evaluated before calling the function and there would be no chance to get the values of <code class=\"language-plaintext highlighter-rouge\">a<\/code> and <code class=\"language-plaintext highlighter-rouge\">b<\/code>.\nHowever, we are in a macro setting where the expression is \u201cembedded\u201d into the macro body via token substitution.\nWhile we cannot change <code class=\"language-plaintext highlighter-rouge\">a == b<\/code>, we can control its surroundings.<\/p>\n\n<p>How does this help us?<\/p>\n\n<p>Our goal is to \u201csnatch\u201d <code class=\"language-plaintext highlighter-rouge\">a<\/code> from <code class=\"language-plaintext highlighter-rouge\">a == b<\/code>, store its value AND string representation, then compare against <code class=\"language-plaintext highlighter-rouge\">b<\/code>, while also storing <code class=\"language-plaintext highlighter-rouge\">b<\/code>s string representation.\nIf the comparison fails, we call the \u201cassertion failed\u201d handler while passing the representation of <code class=\"language-plaintext highlighter-rouge\">a<\/code> and <code class=\"language-plaintext highlighter-rouge\">b<\/code>.<\/p>\n\n<p>As already spoilered, we will exploit <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/operator_precedence\">operator precedence<\/a>.<\/p>\n\n<p>We are going to surround <code class=\"language-plaintext highlighter-rouge\">a == b<\/code> with <code class=\"language-plaintext highlighter-rouge\">assert_t{} OP a == b<\/code> where <code class=\"language-plaintext highlighter-rouge\">assert_t<\/code> is a helper type and <code class=\"language-plaintext highlighter-rouge\">OP<\/code> is our \u201csnatching\u201d operator.\nComparisons are associated left-to-right, so <code class=\"language-plaintext highlighter-rouge\">OP<\/code> must have the same or higher precedence than the comparisons <code class=\"language-plaintext highlighter-rouge\">==, !=, &lt;, &lt;=, &gt;, &gt;=<\/code>.\nHowever, when its precedence is too high, it will interface with more complex assertions such as <code class=\"language-plaintext highlighter-rouge\">ASSERT(a + b == c)<\/code> where we want to \u201csnatch\u201d <code class=\"language-plaintext highlighter-rouge\">a + b<\/code> and not only <code class=\"language-plaintext highlighter-rouge\">a<\/code>.\nLooking at the <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/operator_precedence\">precedence table<\/a>, this leaves us with the shift operators or <code class=\"language-plaintext highlighter-rouge\">&lt;, &lt;=, &gt;, &gt;=<\/code> (ignoring the C++20 <code class=\"language-plaintext highlighter-rouge\">&lt;=&gt;<\/code> spaceship).<\/p>\n\n<p>For no particular reason I\u2019ll continue with <code class=\"language-plaintext highlighter-rouge\">&lt;<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">void<\/span> <span class=\"nf\">set_assert_vars<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">b<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">string_view<\/span> <span class=\"n\">comp<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">on_assert_failed<\/span><span class=\"p\">(<\/span><span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">expr<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">file<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">line<\/span><span class=\"p\">,<\/span> <span class=\"kt\">char<\/span> <span class=\"k\">const<\/span><span class=\"o\">*<\/span> <span class=\"n\">fun<\/span><span class=\"p\">);<\/span>\n\n<span class=\"cp\">#define ASSERT(expr) ((assert_t{} &lt; expr) ?                                       \\\n                       void(0) :                                                  \\\n                       on_assert_failed(#expr, __FILE__, __LINE__, __FUNCTION__))\n<\/span>\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">A<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">check_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">A<\/span> <span class=\"n\">a<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">B<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">==<\/span><span class=\"p\">(<\/span><span class=\"n\">B<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">==<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span>\n            <span class=\"k\">return<\/span> <span class=\"nb\">true<\/span><span class=\"p\">;<\/span>\n\n        <span class=\"n\">set_assert_vars<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">to_string<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span><span class=\"p\">),<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">to_string<\/span><span class=\"p\">(<\/span><span class=\"n\">b<\/span><span class=\"p\">),<\/span> <span class=\"s\">\"==\"<\/span><span class=\"p\">);<\/span>\n        <span class=\"k\">return<\/span> <span class=\"nb\">false<\/span><span class=\"p\">;<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">assert_t<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">A<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"n\">check_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">A<\/span><span class=\"o\">&gt;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">A<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">a<\/span><span class=\"p\">)<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"c1\">\/\/ this code prevents copies<\/span>\n        <span class=\"c1\">\/\/ if a is an lvalue ref, A is also an lvalue ref, e.g. int&amp;<\/span>\n        <span class=\"c1\">\/\/ if a is an rvalue ref, A is not a ref and a is moved into check_t<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">check_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">A<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">{<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">forward<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">A<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span><span class=\"p\">)};<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Consider for example <code class=\"language-plaintext highlighter-rouge\">ASSERT(1 + 1 == 3);<\/code>.\nInside the macro, this expands to the condition <code class=\"language-plaintext highlighter-rouge\">assert_t{} &lt; 1 + 1 == 3<\/code>, which is parsed as <code class=\"language-plaintext highlighter-rouge\">(assert_t{} &lt; 1 + 1) == 3<\/code>.\nThis calls <code class=\"language-plaintext highlighter-rouge\">operator&lt;<\/code> of <code class=\"language-plaintext highlighter-rouge\">assert_t<\/code>, returning a <code class=\"language-plaintext highlighter-rouge\">check_t&lt;int&gt;<\/code> with member <code class=\"language-plaintext highlighter-rouge\">a<\/code> set to <code class=\"language-plaintext highlighter-rouge\">2<\/code>.\n<code class=\"language-plaintext highlighter-rouge\">check_t<\/code> in turn has an <code class=\"language-plaintext highlighter-rouge\">operator==<\/code> that is called with <code class=\"language-plaintext highlighter-rouge\">3<\/code> as its right-hand side.\nThe comparison <code class=\"language-plaintext highlighter-rouge\">if (a == b)<\/code> fails, at which point <code class=\"language-plaintext highlighter-rouge\">set_assert_vars<\/code> is called.\nOnly now are <code class=\"language-plaintext highlighter-rouge\">a<\/code> and <code class=\"language-plaintext highlighter-rouge\">b<\/code> converted to strings.\nThis is important because <code class=\"language-plaintext highlighter-rouge\">to_string<\/code> is kinda expensive and we don\u2019t want to slow down runtime performance when the assertion is not failing.\nWe know that the \u201cassertion failed\u201d handler will be called immediately afterwards, so <code class=\"language-plaintext highlighter-rouge\">set_assert_vars<\/code> can simply store its arguments in <code class=\"language-plaintext highlighter-rouge\">thread_local<\/code> global variables that the handler will then display.<\/p>\n\n<p>See <a href=\"https:\/\/godbolt.org\/z\/MsGvM3\">here<\/a> for a fully working example.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">ASSERT<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span> <span class=\"o\">+<\/span> <span class=\"mi\">1<\/span> <span class=\"o\">==<\/span> <span class=\"mi\">3<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>assertion '1 + 1 == 3' failed\n  in .\/example.cpp:55 (main)\n  expansion: 2 == 3\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"next-steps\">Next Steps<\/h2>\n\n<p>To make this production-ready I would recommend the following:<\/p>\n\n<ul>\n  <li>add all desired comparison operators to <code class=\"language-plaintext highlighter-rouge\">check_t<\/code><\/li>\n  <li>add a <code class=\"language-plaintext highlighter-rouge\">operator bool()<\/code> to <code class=\"language-plaintext highlighter-rouge\">check_t<\/code> to support <code class=\"language-plaintext highlighter-rouge\">ASSERT(some_bool)<\/code><\/li>\n  <li>add <code class=\"language-plaintext highlighter-rouge\">static_assert<\/code>s to <code class=\"language-plaintext highlighter-rouge\">check_t<\/code> that check if the comparison between <code class=\"language-plaintext highlighter-rouge\">A<\/code> and <code class=\"language-plaintext highlighter-rouge\">B<\/code> actually works (nicer compile errors)<\/li>\n  <li>move <code class=\"language-plaintext highlighter-rouge\">assert_t<\/code> and <code class=\"language-plaintext highlighter-rouge\">check_t<\/code> in \u201c<code class=\"language-plaintext highlighter-rouge\">detail::<\/code>\u201d or \u201c<code class=\"language-plaintext highlighter-rouge\">impl::<\/code>\u201d scopes<\/li>\n  <li>write a user-extensible version of <code class=\"language-plaintext highlighter-rouge\">std::to_string<\/code> so that user types can register their own formatter<\/li>\n  <li>allow types without <code class=\"language-plaintext highlighter-rouge\">to_string<\/code> (e.g. print <code class=\"language-plaintext highlighter-rouge\">???<\/code>)<\/li>\n  <li>try to remove the dependence on <code class=\"language-plaintext highlighter-rouge\">&lt;string&gt;<\/code> as this is a rather expensive header (for example, the custom <code class=\"language-plaintext highlighter-rouge\">to_string<\/code> might return a <code class=\"language-plaintext highlighter-rouge\">char const*<\/code> that was allocated via <code class=\"language-plaintext highlighter-rouge\">new<\/code> and is <code class=\"language-plaintext highlighter-rouge\">delete[]<\/code>d by <code class=\"language-plaintext highlighter-rouge\">on_assert_failed<\/code>; this is not performance critical)<\/li>\n  <li>add <code class=\"language-plaintext highlighter-rouge\">operator&amp;&amp;<\/code> and <code class=\"language-plaintext highlighter-rouge\">operator||<\/code> to <code class=\"language-plaintext highlighter-rouge\">check_t<\/code> and <code class=\"language-plaintext highlighter-rouge\">assert_t<\/code> that cause <code class=\"language-plaintext highlighter-rouge\">static_assert<\/code> failures (we cannot destructure chained expressions so this should be forbidden. there is an escape hatch via <code class=\"language-plaintext highlighter-rouge\">ASSERT((a || b))<\/code> without destructuring)<\/li>\n  <li>only store a reference in <code class=\"language-plaintext highlighter-rouge\">check_t<\/code> so that types must not even be movable (lifetime is fine as the reference doesn\u2019t outlive the assert expression)<\/li>\n  <li>add an optional general message to the assertion, supporting a format-like syntax (e.g. <code class=\"language-plaintext highlighter-rouge\">ASSERTF(a == f(b), \"xyz is not fulfilled and b is {}\", b);<\/code>)<\/li>\n  <li>proper integration with logging, stack traces, custom assert handlers<\/li>\n  <li>optimize the performance so that assertions can also be enabled in <code class=\"language-plaintext highlighter-rouge\">Release with Debug Info<\/code> mode (or even <code class=\"language-plaintext highlighter-rouge\">Release<\/code>) with minimal runtime impact<\/li>\n<\/ul>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>While the metaprogramming capabilities of C++ are not expressive enough to analyze expression ASTs, we nevertheless can achieve simple \u201cdestructuring\u201d of comparisons to implement assertions that report the compared values:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">a<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">2<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">c<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">2<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">d<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">3<\/span><span class=\"p\">;<\/span>\n\n<span class=\"n\">ASSERT<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span> <span class=\"o\">==<\/span> <span class=\"n\">c<\/span> <span class=\"o\">+<\/span> <span class=\"n\">d<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ assertion 'a + b == c + d' failed<\/span>\n<span class=\"c1\">\/\/   in .\/example.cpp:55 (main)<\/span>\n<span class=\"c1\">\/\/   expansion: 3 == 5<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This works by breaking macro hygiene and use operator precedence to \u201csnatch\u201d the compared value before it is actually compared.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">ASSERT<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span> <span class=\"o\">==<\/span> <span class=\"n\">c<\/span> <span class=\"o\">+<\/span> <span class=\"n\">d<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ is expanded to<\/span>\n<span class=\"p\">((<\/span><span class=\"n\">assert_t<\/span><span class=\"p\">{}<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span> <span class=\"o\">==<\/span> <span class=\"n\">c<\/span> <span class=\"o\">+<\/span> <span class=\"n\">d<\/span><span class=\"p\">)<\/span> <span class=\"o\">?<\/span> <span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">)<\/span> <span class=\"o\">:<\/span>\n                                 <span class=\"n\">on_assert_failed<\/span><span class=\"p\">(...));<\/span>\n\n<span class=\"c1\">\/\/ is parsed as<\/span>\n<span class=\"p\">(((<\/span><span class=\"n\">assert_t<\/span><span class=\"p\">{}<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"o\">==<\/span> <span class=\"n\">c<\/span> <span class=\"o\">+<\/span> <span class=\"n\">d<\/span><span class=\"p\">)<\/span> <span class=\"o\">?<\/span> <span class=\"kt\">void<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">)<\/span> <span class=\"o\">:<\/span>\n                                   <span class=\"n\">on_assert_failed<\/span><span class=\"p\">(...));<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Additional discussion and comments on <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/ivql73\/destructuring_assertions\/\">reddit<\/a>.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/photos\/blueprint-ruler-architecture-964630\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"Recursive Lambdas in C++","description":"Ever wondered how to make our beloved [](){}s call themselves?","pubDate":"Sat, 12 Sep 2020 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/09\/12\/recursive-lambdas","guid":"https:\/\/artificial-mind.net\/blog\/2020\/09\/12\/recursive-lambdas","content":"<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">fib<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"mi\">7<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>If only it were that simple.<\/p>\n\n<p>Obviously, any performance-conscious programmer will compute Fibonacci numbers iteratively (or even <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fibonacci_number#Closed-form_expression\">explicitly<\/a>), but this solution will serve as an example for an underappreciated tool: <em>recursive lambdas<\/em>.<\/p>\n\n<p>Lambdas are one of my favorite features in any programming language and while I long for a <a href=\"http:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2017\/p0573r1.html\">shorter syntax in C++<\/a>, I still use them quite ubiquitously, especially for local functions.\nThey allow us to abstract behavior into a function while still accessing local variables (through captures) and without leaking new names into the surrounding namespace.\nWhile already plenty powerful, sometimes we might want to call a lambda recursively.<\/p>\n\n<p>The Fibonacci sequence is an artificial example but I encountered plenty scenarios where you just want to traverse some recursive data structure real quick and a recursive lambda would have been the best solution.\nBut alas, the above example does not compile because the name <code class=\"language-plaintext highlighter-rouge\">fib<\/code> is not accessible within the lambda.<\/p>\n\n<blockquote>\n  <p>It\u2019s funny how the <code class=\"language-plaintext highlighter-rouge\">x<\/code> in <code class=\"language-plaintext highlighter-rouge\">int x = x + 1;<\/code> refers to the newly declared variable and is basically never what you want but the <code class=\"language-plaintext highlighter-rouge\">fib<\/code> in our example does not refer to the declared lambda even though it is exactly what we want.<\/p>\n<\/blockquote>\n\n<h2 id=\"a-suboptimal-solution\">A Suboptimal Solution<\/h2>\n\n<p>Before we get to the good stuff, let\u2019s examine a common, yet unsatisfactory solution first:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;functional&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">function<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">fib<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">fib<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Essentially, by declaring <code class=\"language-plaintext highlighter-rouge\">fib<\/code> beforehand, we are able to reference it inside the lambda.\nHowever, <code class=\"language-plaintext highlighter-rouge\">fib<\/code> now requires an explicit type and as each lambda expression has its own compiler-generated type, you\u2019ll have a hard time naming it (it\u2019s a kind of <a href=\"http:\/\/videocortex.io\/2017\/Bestiary\/#-voldemort-types\">Voldemort Type<\/a>).\nInstead, an <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> is often the go-to type to store lambdas.<\/p>\n\n<p>So, why do I consider this solution inferior?<\/p>\n\n<ul>\n  <li>first of all, look at <a href=\"https:\/\/godbolt.org\/z\/fTTj7r\">the assembly<\/a>! A monster, compared to <a href=\"https:\/\/godbolt.org\/z\/3E5anW\">a normal recursive function<\/a><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">std::function<\/code> is type erased and often allocates (though some standard libraries perform <em>small function optimization<\/em> and don\u2019t allocate if the size of the lambda is small, i.e. it doesn\u2019t capture too much)<\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;functional&gt;<\/code> is a big and costly header, basically <a href=\"https:\/\/artificial-mind.net\/projects\/compile-health\/\">costing 200ms+<\/a> just to include it<\/li>\n  <li>it cannot be made <code class=\"language-plaintext highlighter-rouge\">constexpr<\/code><\/li>\n  <li>it requires writing the function signature twice<\/li>\n<\/ul>\n\n<h2 id=\"generic-lambdas-to-the-rescue\">Generic Lambdas to the Rescue??<\/h2>\n\n<p>Let me present my preferred solution:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">fib<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">fib<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"mi\">7<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Oof. A generic lambda? Templates? Calling <code class=\"language-plaintext highlighter-rouge\">fib<\/code> with itself?<\/p>\n\n<p>Let me explain!<\/p>\n\n<p>So, the problem with our opening example was that <code class=\"language-plaintext highlighter-rouge\">fib<\/code> is not a visible name inside the lambda.\nWe simply remedied that by passing <code class=\"language-plaintext highlighter-rouge\">fib<\/code> as an additional parameter.\nOf course, we don\u2019t know the type of <code class=\"language-plaintext highlighter-rouge\">fib<\/code> yet, so we use <code class=\"language-plaintext highlighter-rouge\">auto&amp;&amp;<\/code> and turn it into a generic lambda.\nAlso, no, <code class=\"language-plaintext highlighter-rouge\">decltype(fib)&amp;&amp;<\/code> wouldn\u2019t work.\nIf we could access <code class=\"language-plaintext highlighter-rouge\">fib<\/code>, we wouldn\u2019t have this problem in the first case!\nFinally, because we now have an additional parameter, we have to pass <code class=\"language-plaintext highlighter-rouge\">fib<\/code> to itself every time we call it.<\/p>\n\n<p>This solution has none of the disadvantages of the previous solution.\nCompared to <a href=\"https:\/\/godbolt.org\/z\/3E5anW\">a normal recursive function<\/a>, we have <a href=\"https:\/\/godbolt.org\/z\/fG3TdW\">one additional jump in the assembly<\/a> and of course the slight syntactical inconvenience of having to pass an additional parameter.<\/p>\n\n<p>If you use the recursive lambda many times in the remainder of the function you can simply wrap it again to make the call more natural:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span> <span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"mi\">7<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Still produces <a href=\"https:\/\/godbolt.org\/z\/86WdEr\">good assembly<\/a>.<\/p>\n\n<h2 id=\"desugaring-the-lambda\">Desugaring the Lambda<\/h2>\n\n<p>Okay, okay, I get it.\nThis might still be too much magic to fully comprehend how the lambda works.\nIs it instantiated for every recursion depth?\nHow would this work with arbitrary deep recursions?\nSomething is not making sense here.<\/p>\n\n<p>A step back.<\/p>\n\n<p>Lambdas are not a magical feature.\nThey are simply syntactical sugar for a local <code class=\"language-plaintext highlighter-rouge\">struct<\/code> that has an <code class=\"language-plaintext highlighter-rouge\">operator()<\/code> and each capture as a member (capturing per reference creates reference members):<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">k<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">7<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">k<\/span><span class=\"p\">](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span> <span class=\"o\">+<\/span> <span class=\"n\">k<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span>\n<span class=\"k\">return<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"mi\">3<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>is basically equivalent to:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">k<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">7<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">lambda_obj<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">k<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ captured by value<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"k\">operator<\/span><span class=\"p\">()(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span> <span class=\"o\">+<\/span> <span class=\"n\">k<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">lambda_obj<\/span><span class=\"p\">{<\/span><span class=\"n\">k<\/span><span class=\"p\">};<\/span>\n<span class=\"k\">return<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"mi\">3<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Our <em>recursive lambda<\/em> is a bit more complex, but not much.\nGeneric lambdas simply have a templated <code class=\"language-plaintext highlighter-rouge\">operator()<\/code>, the rest is the same:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">fib<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">fib<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"mi\">7<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>is basically equivalent to:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">lambda_obj<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">&gt;<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"k\">operator<\/span><span class=\"p\">()(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">fib<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span>\n    <span class=\"p\">{<\/span>\n        <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n        <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">fib<\/span> <span class=\"o\">=<\/span> <span class=\"n\">lambda_obj<\/span><span class=\"p\">{};<\/span> <span class=\"c1\">\/\/ no capture<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"mi\">7<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The only reason you cannot do this in practice is that function-local templates (be it function templates or class templates) are forbidden.\nGeneric lambdas have a special exemption from that rule.<\/p>\n\n<p>This also solves the question of the infinite instantiation:\nThe only template that is instantiated is the templated function <code class=\"language-plaintext highlighter-rouge\">lambda_obj::operator()<\/code> and its only instantiation is <code class=\"language-plaintext highlighter-rouge\">int lambda_obj::operator()&lt;lambda_obj&gt;(int n, lambda_obj&amp; fib) const<\/code>.\nCalling <code class=\"language-plaintext highlighter-rouge\">fib<\/code> inside this function is actually the same instantiation! (<code class=\"language-plaintext highlighter-rouge\">fib<\/code> still has the type <code class=\"language-plaintext highlighter-rouge\">lambda_obj&amp;<\/code>)<\/p>\n\n<h2 id=\"another-example-tree-recursion\">Another Example: Tree Recursion<\/h2>\n\n<p>Okay, that\u2019s cool and all, but how does it help in the real life?<\/p>\n\n<p>Let\u2019s say we have a simple recursive data structure, for example a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Binary_space_partitioning\">BSP tree<\/a> stored embeddedly in an <code class=\"language-plaintext highlighter-rouge\">std::vector<\/code> (or some other contiguous container) for memory efficiency:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">node<\/span> <span class=\"c1\">\/\/ only represents inner nodes<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ dividing plane<\/span>\n    <span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">vec3<\/span> <span class=\"n\">plane_normal<\/span><span class=\"p\">;<\/span>\n    <span class=\"kt\">float<\/span> <span class=\"n\">plane_distance<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"c1\">\/\/ idx for child on positive \/ negative side<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">child_pos<\/span><span class=\"p\">;<\/span>\n    <span class=\"kt\">int<\/span> <span class=\"n\">child_neg<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"kt\">bool<\/span> <span class=\"n\">is_on_positive_side<\/span><span class=\"p\">(<\/span><span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">pos3<\/span> <span class=\"n\">p<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> \n    <span class=\"p\">{<\/span> \n        <span class=\"k\">return<\/span> <span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"p\">,<\/span> <span class=\"n\">plane_normal<\/span><span class=\"p\">)<\/span> <span class=\"o\">&gt;<\/span> <span class=\"n\">plane_distance<\/span><span class=\"p\">;<\/span> \n    <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The two members <code class=\"language-plaintext highlighter-rouge\">child_pos<\/code> and <code class=\"language-plaintext highlighter-rouge\">child_neg<\/code> store the topological information of the tree.\nIf they are positive, they point to another inner node.\nIf they are negative, they point into leaf data (stored as \u201cnegative leaf idx - 1\u201d).<\/p>\n\n<h3 id=\"point-queries\">Point Queries<\/h3>\n\n<p>The first example function is a <em>point query<\/em>, i.e. given a 3D position, return the data stored in the leaf cell:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">LeafT<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"n\">LeafT<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">get_data_at<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">span<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">node<\/span> <span class=\"k\">const<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">span<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">LeafT<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">leaf_data<\/span><span class=\"p\">,<\/span> <span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">pos3<\/span> <span class=\"n\">p<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">recurse<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">node_idx<\/span><span class=\"p\">,<\/span> <span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">)<\/span> <span class=\"o\">-&gt;<\/span> <span class=\"n\">LeafT<\/span><span class=\"o\">&amp;<\/span> <span class=\"p\">{<\/span>\n        <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">node_idx<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">)<\/span> <span class=\"c1\">\/\/ leaf node<\/span>\n            <span class=\"k\">return<\/span> <span class=\"n\">leaf_data<\/span><span class=\"p\">[<\/span><span class=\"mi\">1<\/span> <span class=\"o\">-<\/span> <span class=\"n\">node_idx<\/span><span class=\"p\">];<\/span>\n\n        <span class=\"c1\">\/\/ visit proper child<\/span>\n        <span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">n<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"n\">node_idx<\/span><span class=\"p\">];<\/span>\n        <span class=\"n\">recurse<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">.<\/span><span class=\"n\">is_on_positive_side<\/span><span class=\"p\">(<\/span><span class=\"n\">p<\/span><span class=\"p\">)<\/span> <span class=\"o\">?<\/span> <span class=\"n\">n<\/span><span class=\"p\">.<\/span><span class=\"n\">child_pos<\/span> <span class=\"o\">:<\/span> <span class=\"n\">n<\/span><span class=\"p\">.<\/span><span class=\"n\">child_neg<\/span><span class=\"p\">,<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">);<\/span>\n    <span class=\"p\">};<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ usage:<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">node<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">nodes<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">data<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">pos3<\/span> <span class=\"n\">query_pos<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n\n<span class=\"c1\">\/\/ NOTE: template arg cannot be deduced <\/span>\n<span class=\"c1\">\/\/      (because the compiler does not know vector&lt;float&gt; corresponds to span&lt;float&gt;)<\/span>\n<span class=\"k\">auto<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">d<\/span> <span class=\"o\">=<\/span> <span class=\"n\">get_data_at<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">,<\/span> <span class=\"n\">data<\/span><span class=\"p\">,<\/span> <span class=\"n\">query_pos<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h3 id=\"visitor--internal-iteration\">Visitor \/ Internal Iteration<\/h3>\n\n<p>The second example is a generic traversal operator that takes a direction and a callback.\nThe callback function is called for all leaf indices ordered ascendingly by the given direction.\nThis is for example useful to implement the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Painter%27s_algorithm\">painter\u2019s algorithm<\/a> with render jobs stored in the BSP.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">\/\/ callback signature: (int leaf_idx) -&gt; void<\/span>\n<span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">visit_in_direction<\/span><span class=\"p\">(<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">span<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">node<\/span> <span class=\"k\">const<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">,<\/span> <span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">vec3<\/span> <span class=\"n\">dir<\/span><span class=\"p\">,<\/span> <span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">callback<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"n\">recurse<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">node_idx<\/span><span class=\"p\">,<\/span> <span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">)<\/span> <span class=\"o\">-&gt;<\/span> <span class=\"kt\">void<\/span> <span class=\"p\">{<\/span>\n        <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">node_idx<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">)<\/span> <span class=\"c1\">\/\/ leaf node<\/span>\n        <span class=\"p\">{<\/span>\n            <span class=\"n\">callback<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span> <span class=\"o\">-<\/span> <span class=\"n\">node_idx<\/span><span class=\"p\">);<\/span>\n            <span class=\"k\">return<\/span><span class=\"p\">;<\/span>\n        <span class=\"p\">}<\/span>\n\n        <span class=\"k\">auto<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span> <span class=\"n\">n<\/span> <span class=\"o\">=<\/span> <span class=\"n\">nodes<\/span><span class=\"p\">[<\/span><span class=\"n\">node_idx<\/span><span class=\"p\">];<\/span>\n        <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">dot<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">.<\/span><span class=\"n\">plane_normal<\/span><span class=\"p\">,<\/span> <span class=\"n\">dir<\/span><span class=\"p\">)<\/span> <span class=\"o\">&gt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">)<\/span> <span class=\"c1\">\/\/ points in same direction<\/span>\n        <span class=\"p\">{<\/span>\n            <span class=\"n\">recurse<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">.<\/span><span class=\"n\">child_neg<\/span><span class=\"p\">,<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">);<\/span>\n            <span class=\"n\">recurse<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">.<\/span><span class=\"n\">child_pos<\/span><span class=\"p\">,<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">);<\/span>\n        <span class=\"p\">}<\/span>\n        <span class=\"k\">else<\/span> <span class=\"c1\">\/\/ points in different direction<\/span>\n        <span class=\"p\">{<\/span>\n            <span class=\"n\">recurse<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">.<\/span><span class=\"n\">child_pos<\/span><span class=\"p\">,<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">);<\/span>\n            <span class=\"n\">recurse<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span><span class=\"p\">.<\/span><span class=\"n\">child_neg<\/span><span class=\"p\">,<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">);<\/span>\n        <span class=\"p\">}<\/span>\n    <span class=\"p\">};<\/span>\n    <span class=\"n\">recurse<\/span><span class=\"p\">(<\/span><span class=\"mi\">0<\/span><span class=\"p\">,<\/span> <span class=\"n\">recurse<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ usage:<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">vector<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">node<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">nodes<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n<span class=\"n\">tg<\/span><span class=\"o\">::<\/span><span class=\"n\">vec3<\/span> <span class=\"n\">view_dir<\/span> <span class=\"o\">=<\/span> <span class=\"p\">...;<\/span>\n\n<span class=\"n\">visit_in_direction<\/span><span class=\"p\">(<\/span><span class=\"n\">nodes<\/span><span class=\"p\">,<\/span> <span class=\"n\">view_dir<\/span><span class=\"p\">,<\/span> <span class=\"p\">[<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">leaf_idx<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ render \/ process leaf_idx<\/span>\n<span class=\"p\">});<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Note: the trailing return type <code class=\"language-plaintext highlighter-rouge\">-&gt; void<\/code> seems to be mandatory here, otherwise my clang complains that it cannot deduce the return type.<\/p>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<p>\u2026 or rather a late TL;DR?<\/p>\n\n<p>Our goal was to make the following <em>recursive lambda<\/em> work:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">fib<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"mi\">7<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>While this is not directly possible, we can get really close by just adding a parameter!<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"n\">fib<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">,<\/span> <span class=\"k\">auto<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">fib<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"mi\">7<\/span><span class=\"p\">,<\/span> <span class=\"n\">fib<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The recipe is simple:\nIf you want to call a lambda recursively, just add an <code class=\"language-plaintext highlighter-rouge\">auto&amp;&amp;<\/code> parameter taking the function again and call that.\nThis produces basically optimal assembly and can be used in combination with capturing.<\/p>\n\n<p>Additional discussion and comments on <a href=\"https:\/\/www.reddit.com\/r\/cpp\/comments\/irupel\/recursive_lambdas_in_c\/\">reddit<\/a>.<\/p>\n\n<h3 id=\"update-2020-09-13\">Update 2020-09-13:<\/h3>\n\n<p>If the lambda does not capture anything, it can be declared <code class=\"language-plaintext highlighter-rouge\">static<\/code> and the following works:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">using<\/span> <span class=\"n\">fib_t<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">int<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)(<\/span><span class=\"kt\">int<\/span><span class=\"p\">);<\/span>\n<span class=\"k\">static<\/span> <span class=\"n\">fib_t<\/span> <span class=\"n\">fib<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">int<\/span> <span class=\"n\">n<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span>\n    <span class=\"k\">if<\/span> <span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">&lt;=<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"k\">return<\/span> <span class=\"n\">n<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">1<\/span><span class=\"p\">)<\/span> <span class=\"o\">+<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"n\">n<\/span> <span class=\"o\">-<\/span> <span class=\"mi\">2<\/span><span class=\"p\">);<\/span>\n<span class=\"p\">};<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"n\">fib<\/span><span class=\"p\">(<\/span><span class=\"mi\">7<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Note that <code class=\"language-plaintext highlighter-rouge\">auto<\/code> does not work here because the compiler needs to know the type of <code class=\"language-plaintext highlighter-rouge\">fib<\/code> before calling it.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/pixabay.com\/illustrations\/menger-fractal-design-cube-702863\/\">pixabay<\/a><\/em>)<\/p>"},{"title":"C++ Compile Health Watchdog","description":"Ever wondered what is slowing down your C++ builds?","pubDate":"Thu, 16 Apr 2020 00:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2020\/04\/16\/cpp-compile-health","guid":"https:\/\/artificial-mind.net\/blog\/2020\/04\/16\/cpp-compile-health","content":"<p>So anyways, why is my build so slow?<\/p>\n\n<p><strong>TL;DR:<\/strong> I made a <a href=\"\/projects\/compile-health\">website<\/a> to check the build impact of standard or external libraries. This post describes the motivation and methodology behind it.<\/p>\n\n<p><strong>NOTE:<\/strong> This is <em>NOT<\/em> a public shaming of C++ libraries. I wanted this website to make sensible decisions and keep myself accountable. \nThe <a href=\"https:\/\/isocpp.org\/blog\/2020\/04\/results-summary-2020-global-developer-survey-lite\">latest isocpp survey results<\/a> indicate that I\u2019m not the only one caring about compile times.<\/p>\n\n<h2 id=\"easter-project-2020\">Easter Project 2020<\/h2>\n\n<p>My C++ life changed dramatically ever since I discovered <a href=\"https:\/\/github.com\/yrnkrn\/zapcc\">zapcc<\/a> and got my incremental build times to below 1 second.\nI\u2019ve developed a kind of fetish for fast builds and sometimes rewrote or wrapped several external libraries purely to get the compile times under control.<\/p>\n\n<p>As the old saying goes, there are really only three ways to optimize anything: <em>measure<\/em>, <em>measure<\/em>, and <em>measure<\/em>.<\/p>\n\n<p>Thus, over the Great Easter Lockdown 2020 I set out and created the <a href=\"\/projects\/compile-health\">C++ Compile Health Watchdog<\/a> website to help us make informed decisions about which headers to include publicly, which to wrap, and maybe tip the scale in the choice of a library.\nThe code for creating the data that feeds this table can be found in <a href=\"https:\/\/github.com\/Philip-Trettner\/cpp-compile-overhead\">my github repo<\/a>, contributions are more than welcome!<\/p>\n\n<p>The rest of this post explains my methodology in detail:<\/p>\n\n<h2 id=\"the-impact-of-a-header\">The Impact of a Header<\/h2>\n\n<p>Each row in the table is a either a single header or source file, compiled with a certain configuration (compiler, build type, C++ version).\nA <a href=\"https:\/\/github.com\/Philip-Trettner\/cpp-compile-overhead\/blob\/master\/scripts\/analyze-file.py\">python script<\/a> is responsible for running various tests and extracting metrics we care about.<\/p>\n\n<p>I\u2019m using the compiler (clang and gcc for now) in two modes:<\/p>\n\n<ul>\n  <li>Compile: <code class=\"language-plaintext highlighter-rouge\">-c main.cc -o main.o<\/code> (+ extra args)<\/li>\n  <li>Preprocess: <code class=\"language-plaintext highlighter-rouge\">-E main.cc -o main.o<\/code> (+ extra args)<\/li>\n<\/ul>\n\n<p>The extra args depend on the configuration, for example Clang 8 (<code class=\"language-plaintext highlighter-rouge\">\/usr\/bin\/clang++-8<\/code>) for C++17 in RelWithDebInfo results in <code class=\"language-plaintext highlighter-rouge\">-std=c++17 -O2 -g -DNDEBUG -march=skylake<\/code>. I\u2019ve added <code class=\"language-plaintext highlighter-rouge\">-march<\/code> mainly for consistency.\nThe <code class=\"language-plaintext highlighter-rouge\">main.cc<\/code> we\u2019re compiling looks like this:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;file-to-test&gt;<\/span><span class=\"cp\">\n<\/span><span class=\"kt\">int<\/span> <span class=\"nf\">main<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>I\u2019m also testing a <code class=\"language-plaintext highlighter-rouge\">baseline.cc<\/code> without the include.<\/p>\n\n<p>First, we\u2019re counting the <em>Lines of Code<\/em> in the preprocess-only file to get two metrics: <code class=\"language-plaintext highlighter-rouge\">line_count_raw<\/code> (all lines) and <code class=\"language-plaintext highlighter-rouge\">line_count<\/code> (only those lines containing <code class=\"language-plaintext highlighter-rouge\">[a-zA-Z0-9_]<\/code>).<\/p>\n\n<p>The file size of the compiled <code class=\"language-plaintext highlighter-rouge\">main.o<\/code> gives us <code class=\"language-plaintext highlighter-rouge\">object_size<\/code> (with a <code class=\"language-plaintext highlighter-rouge\">_base<\/code> version for the baseline).\nWe continue by invoking <code class=\"language-plaintext highlighter-rouge\">nm -a -S main.o<\/code> to collect information about the contained symbols (number and size of undefined, data, code, weak, and debug symbols as well as the size of the <em>names<\/em> of the symbols).\nThen we invoke <code class=\"language-plaintext highlighter-rouge\">strings main.o<\/code> to get the number and size of all contained strings, follow by <code class=\"language-plaintext highlighter-rouge\">size -B main.o<\/code> to get the size of the <code class=\"language-plaintext highlighter-rouge\">.text<\/code>, <code class=\"language-plaintext highlighter-rouge\">.data<\/code>, and <code class=\"language-plaintext highlighter-rouge\">.bss<\/code> sections.\nBig binaries slow down the build process by increasing the load on the linker.\n\u201cSymbol explosions\u201d can happen relatively easy by accident.<\/p>\n\n<p>Finally, we measure how long the compile and the preprocessing commands take for <code class=\"language-plaintext highlighter-rouge\">main.cc<\/code> and <code class=\"language-plaintext highlighter-rouge\">baseline.cc<\/code>.\nI\u2019ve experimented with a few different approaches and my current local optimum for accuracy, reproducibility, and sane data generation time is:\nExecute each command 10 times and take the lowest elapsed wallclock time.\nCompiling is more or less deterministic so higher times are likely due to OS interruptions and other processes.\nThe whole benchmarking is done single-threadedly as multi-core compilation will likely skew the results.\nI\u2019m currently testing 36 configurations per file and some files can take pretty long.\nRelative accuracy is probably already good enough for slower files so I only execute the commands 3 times if the file takes more than 500ms to compile.\nFor reference: 36 configurations, compiled 10 times each for a file that takes 1s is at least 6 minutes benchmarking for a single file!<\/p>\n\n<h3 id=\"impact-score\">Impact Score<\/h3>\n\n<p>On the <a href=\"\/projects\/compile-health\">website<\/a>, all the measured metrics are available by hovering over the table cells.\nHowever, I also wanted a single visible number that summarizes the \u201cimpact\u201d of including a given header.\nI call this the <em>Impact Score<\/em> <code class=\"language-plaintext highlighter-rouge\">S<\/code> and it is, of course, highly opinionated.\nIt is based on the increase in compile time <code class=\"language-plaintext highlighter-rouge\">t<\/code> and the increase in binary size <code class=\"language-plaintext highlighter-rouge\">s<\/code>.<\/p>\n\n<p>\\[ S_{\\text{time}} = 100 \\cdot \\left( 1 - \\frac{200\\ \\text{ms}}{200\\ \\text{ms} + t} \\right) \\]\n\\[ S_{\\text{binary}} = 100 \\cdot \\left( 1 - \\frac{5\\ \\text{MB}}{5\\ \\text{MB} + s} \\right) \\]\n\\[ S = \\min(100, S_{\\text{time}} + S_{\\text{binary}}) \\]<\/p>\n\n<p>Each score is designed to be between 0 (zero compile time or binary increase) and 100 (compile time or binary increase goes to infinity).\n50% impact is reached at 200 ms or 5 MB.<\/p>\n\n<h2 id=\"roadmap\">Roadmap<\/h2>\n\n<p>This project is not finished and I might broaden its scope considerably.<\/p>\n\n<p>There are a few features that I want to implement for the website, like filtering configurations and uploading your own data.\nIt would be nice if the data generator can directly use a JSON Compilation Database (e.g. via <code class=\"language-plaintext highlighter-rouge\">CMAKE_EXPORT_COMPILE_COMMANDS<\/code>) to analyze your own projects without hassle.\nAnd of course I need to add Visual Studio to the mix.<\/p>\n\n<p>There are also things <a href=\"https:\/\/github.com\/Philip-Trettner\/cpp-compile-overhead#contributing\">you can help me with<\/a>.\nLet me know which libraries I should add to the table!\nIs there additional data you would like to see for each compilation?<\/p>"},{"title":"PyTorch Setup (C++17, zapcc, QtCreator, Debian, user-space)","description":"Harder than it should be.","pubDate":"Thu, 24 Oct 2019 01:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2019\/10\/24\/pytorch-cpp","guid":"https:\/\/artificial-mind.net\/blog\/2019\/10\/24\/pytorch-cpp","content":"<p>This post documents my <a href=\"https:\/\/pytorch.org\">PyTorch<\/a> C++ setup.\nIt is intended as a brief how-to.<\/p>\n\n<h2 id=\"goals\">Goals<\/h2>\n\n<ul>\n  <li>Works with C++17 code (no pre-C++11 ABI)<\/li>\n  <li>Works with the <a href=\"https:\/\/github.com\/yrnkrn\/zapcc\">zapcc<\/a> compiler (personal favorite)<\/li>\n  <li>Works with QtCreator (currently my favored IDE on linux)<\/li>\n  <li>Works with Debian without sudo rights (work constraint)<\/li>\n  <li>Works with CUDA (only realistic way to train larger networks)<\/li>\n<\/ul>\n\n<h2 id=\"steps\">Steps<\/h2>\n\n<p>(All versions were current at the time of writing)<\/p>\n\n<h3 id=\"installing-pytorch-c\">Installing PyTorch C++<\/h3>\n\n<ul>\n  <li>go to <a href=\"https:\/\/pytorch.org\/\">https:\/\/pytorch.org\/<\/a><\/li>\n  <li>scroll down to configure download<\/li>\n  <li>select:\n    <ul>\n      <li>Build: <code class=\"language-plaintext highlighter-rouge\">Stable (1.3)<\/code><\/li>\n      <li>OS: <code class=\"language-plaintext highlighter-rouge\">Linux<\/code><\/li>\n      <li>Package: <code class=\"language-plaintext highlighter-rouge\">LibTorch<\/code><\/li>\n      <li>Language: <code class=\"language-plaintext highlighter-rouge\">C++<\/code><\/li>\n      <li>CUDA: <code class=\"language-plaintext highlighter-rouge\">10.1<\/code><\/li>\n    <\/ul>\n  <\/li>\n  <li>download with <code class=\"language-plaintext highlighter-rouge\">cxx11 ABI<\/code> (<em>Important!<\/em>)<\/li>\n<\/ul>\n\n<p>Example link: <a href=\"https:\/\/download.pytorch.org\/libtorch\/cu101\/libtorch-cxx11-abi-shared-with-deps-1.3.0.zip\">https:\/\/download.pytorch.org\/libtorch\/cu101\/libtorch-cxx11-abi-shared-with-deps-1.3.0.zip<\/a><\/p>\n\n<h3 id=\"installing-cuda-toolkit-101\">Installing Cuda Toolkit 10.1<\/h3>\n\n<ul>\n  <li>go to <a href=\"https:\/\/developer.nvidia.com\/cuda-downloads\">https:\/\/developer.nvidia.com\/cuda-downloads<\/a><\/li>\n  <li>\n    <p>select <code class=\"language-plaintext highlighter-rouge\">Linux -&gt; x86_64 -&gt; Ubuntu -&gt; 18.04 -&gt; runfile (local)<\/code><\/p>\n\n    <p>(should give you <code class=\"language-plaintext highlighter-rouge\">wget &lt;link&gt;<\/code>)<\/p>\n  <\/li>\n  <li>instead of executing, extract it via <code class=\"language-plaintext highlighter-rouge\">xyz.run --tar mxvf<\/code><\/li>\n  <li>run the <code class=\"language-plaintext highlighter-rouge\">cuda-installer<\/code> in the extracted folder\n    <ul>\n      <li>do not install driver, samples, demo suite<\/li>\n      <li>go to <code class=\"language-plaintext highlighter-rouge\">Options -&gt; Toolkit Options<\/code>\n        <ul>\n          <li>change Toolkit Install Path to <code class=\"language-plaintext highlighter-rouge\">\/local\/something\/cuda-10.1<\/code><\/li>\n          <li>do not create links<\/li>\n          <li>do not install manpage<\/li>\n        <\/ul>\n      <\/li>\n      <li>go to <code class=\"language-plaintext highlighter-rouge\">Options -&gt; Library install path<\/code>\n        <ul>\n          <li>change to <code class=\"language-plaintext highlighter-rouge\">\/local\/something\/cuda-10.1<\/code><\/li>\n        <\/ul>\n      <\/li>\n    <\/ul>\n  <\/li>\n<\/ul>\n\n<h3 id=\"installing-cudnn\">Installing CuDNN<\/h3>\n\n<ul>\n  <li>go to <a href=\"https:\/\/developer.nvidia.com\/cudnn\">https:\/\/developer.nvidia.com\/cudnn<\/a><\/li>\n  <li>download CuDNN<\/li>\n  <li>extract file<\/li>\n  <li>copy <code class=\"language-plaintext highlighter-rouge\">lib64<\/code> and <code class=\"language-plaintext highlighter-rouge\">include<\/code> folders to <code class=\"language-plaintext highlighter-rouge\">\/local\/something\/cuda-10.1\/<\/code><\/li>\n<\/ul>\n\n<h3 id=\"cmake\">CMake<\/h3>\n\n<h4 id=\"fix-caffe2targetscmake\">Fix <code class=\"language-plaintext highlighter-rouge\">Caffe2Targets.cmake<\/code><\/h4>\n\n<p>Caffe2 contains hard-coded CUDA paths that are wrong in our installation.\nSearch <code class=\"language-plaintext highlighter-rouge\">Caffe2Targets.cmake<\/code> for <code class=\"language-plaintext highlighter-rouge\">lib64\/libcudart.so<\/code> and replace all absolute CUDA paths by your local installation.<\/p>\n\n<h4 id=\"pytorch\">PyTorch<\/h4>\n\n<p>I chose to explicitly provide a hint to the pytorch path.<\/p>\n\n<div class=\"language-cmake highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\"># pytorch<\/span>\n<span class=\"nb\">find_package<\/span><span class=\"p\">(<\/span>Torch REQUIRED\n    <span class=\"c1\"># path containing bin\/lib\/include\/share<\/span>\n    PATHS <span class=\"s2\">\"\/path\/to\/libtorch\/\"<\/span>\n<span class=\"p\">)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><em>NOTE<\/em>: the cmake variable <code class=\"language-plaintext highlighter-rouge\">TORCH_LIBRARY<\/code> was set to a wrong value anyways.\nIn QtCreator <code class=\"language-plaintext highlighter-rouge\">Projects -&gt; Build -&gt; CMake -&gt; TORCH_LIBRARY<\/code> change the value to <code class=\"language-plaintext highlighter-rouge\">\/path\/to\/libtorch\/lib\/libtorch.so<\/code>.<\/p>\n\n<p><em>NOTE<\/em>: the cmake variable <code class=\"language-plaintext highlighter-rouge\">CUDA_TOOLKIT_ROOT_DIR<\/code> was missing in my case.\nIn QtCreator <code class=\"language-plaintext highlighter-rouge\">Projects -&gt; Build -&gt; CMake -&gt; CUDA_TOOLKIT_ROOT_DIR<\/code> change the value to <code class=\"language-plaintext highlighter-rouge\">\/path\/to\/cuda-10.1\/<\/code>.<\/p>\n\n<h4 id=\"openmp\">OpenMP<\/h4>\n\n<p>When using zapcc, openmp is not found by default.\nWe fix this by providing a clang-7 openmp (which zapcc is based on).<\/p>\n\n<div class=\"language-cmake highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\"># OpenMP<\/span>\n<span class=\"nb\">find_package<\/span><span class=\"p\">(<\/span>OpenMP<span class=\"p\">)<\/span>\n<span class=\"nb\">if<\/span> <span class=\"p\">(<\/span>OPENMP_CXX_FOUND<span class=\"p\">)<\/span>\n    <span class=\"nb\">set<\/span><span class=\"p\">(<\/span>OPENMP_TARGET OpenMP::OpenMP_CXX<span class=\"p\">)<\/span>\n<span class=\"nb\">elseif<\/span><span class=\"p\">(<\/span>NOT MSVC<span class=\"p\">)<\/span>\n    <span class=\"nb\">add_library<\/span><span class=\"p\">(<\/span>openmp INTERFACE<span class=\"p\">)<\/span>\n    <span class=\"nb\">target_link_libraries<\/span><span class=\"p\">(<\/span>openmp INTERFACE \/usr\/lib\/llvm-7\/lib\/libomp.so<span class=\"p\">)<\/span>\n    <span class=\"nb\">target_include_directories<\/span><span class=\"p\">(<\/span>openmp INTERFACE \/usr\/lib\/llvm-7\/include\/openmp<span class=\"p\">)<\/span>\n    <span class=\"nb\">set<\/span><span class=\"p\">(<\/span>OPENMP_TARGET openmp<span class=\"p\">)<\/span>\n<span class=\"nb\">else<\/span><span class=\"p\">()<\/span>\n    <span class=\"nb\">set<\/span><span class=\"p\">(<\/span>OPENMP_TARGET <span class=\"s2\">\"\"<\/span><span class=\"p\">)<\/span>\n<span class=\"nb\">endif<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h4 id=\"our-project\">Our Project<\/h4>\n\n<div class=\"language-cmake highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">target_link_libraries<\/span><span class=\"p\">(<\/span>OurProjectName PUBLIC\n    <span class=\"c1\"># ... other dependencies<\/span>\n    <span class=\"si\">${<\/span><span class=\"nv\">TORCH_LIBRARIES<\/span><span class=\"si\">}<\/span>\n    <span class=\"si\">${<\/span><span class=\"nv\">OPENMP_TARGET<\/span><span class=\"si\">}<\/span>\n<span class=\"p\">)<\/span>\n\n<span class=\"nb\">if<\/span> <span class=\"p\">(<\/span>NOT MSVC<span class=\"p\">)<\/span>\n    <span class=\"nb\">target_compile_options<\/span><span class=\"p\">(<\/span>OurProjectName PUBLIC\n        <span class=\"c1\"># ... other options<\/span>\n        -fopenmp\n    <span class=\"p\">)<\/span>\n<span class=\"nb\">endif<\/span><span class=\"p\">()<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"running-the-program\">Running The Program<\/h2>\n\n<p>If the <code class=\"language-plaintext highlighter-rouge\">LD_LIBRARY_PATH<\/code> is not set to <code class=\"language-plaintext highlighter-rouge\">\/path\/to\/cuda-10.1\/lib64<\/code>, some libraries will not be found when running the program.<\/p>\n\n<p>In the QtCreator, this can be added in <code class=\"language-plaintext highlighter-rouge\">Projects -&gt; Run -&gt; Run Environment -&gt; Add<\/code>.<\/p>\n\n<p>The example code at <a href=\"https:\/\/pytorch.org\/cppdocs\/frontend.html\">https:\/\/pytorch.org\/cppdocs\/frontend.html<\/a> should work now.<\/p>"},{"title":"Consider deleting your rvalue ref-qualified assignment operators","description":"Why is foo{} = foo{} working anyways?","pubDate":"Sun, 22 Sep 2019 08:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2019\/09\/22\/delete-your-rvalue-ref-assignments","guid":"https:\/\/artificial-mind.net\/blog\/2019\/09\/22\/delete-your-rvalue-ref-assignments","content":"<p>The title might sound like an incantation to summon some mid-tier C++ god but it addresses a very real everyday pitfall:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span> <span class=\"p\">{<\/span> <span class=\"p\">...<\/span> <span class=\"p\">};<\/span>\n<span class=\"n\">foo<\/span> <span class=\"n\">get_my_foo<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"p\">...<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ .. some code later:<\/span>\n<span class=\"n\">foo<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">get_my_foo<\/span><span class=\"p\">()<\/span> <span class=\"o\">=<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This <a href=\"https:\/\/godbolt.org\/z\/oh3vFA\">compiles<\/a> \u2026 and does nothing useful.<\/p>\n\n<p>We\u2019ve assigned <code class=\"language-plaintext highlighter-rouge\">f<\/code> to a temporary <code class=\"language-plaintext highlighter-rouge\">foo<\/code>.\nNo error, no warning.<\/p>\n\n<h2 id=\"a-real-life-example\">A Real-Life Example<\/h2>\n\n<p>In the math library I\u2019m writing we have a <code class=\"language-plaintext highlighter-rouge\">mat<\/code> struct for matrices and <code class=\"language-plaintext highlighter-rouge\">vec<\/code> for vectors.\nMatrices are stored column-major, i.e. as an array of column vectors.\nNow, sometimes you want to get the row of such matrix and thus <code class=\"language-plaintext highlighter-rouge\">mat<\/code> has a function <code class=\"language-plaintext highlighter-rouge\">vec mat::row(int)<\/code> that returns the specified row.\nIt has to return the <code class=\"language-plaintext highlighter-rouge\">vec<\/code> per value because only columns are stored contiguously in <code class=\"language-plaintext highlighter-rouge\">mat<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span> <span class=\"o\">&lt;<\/span><span class=\"kt\">int<\/span> <span class=\"n\">C<\/span><span class=\"p\">,<\/span> <span class=\"kt\">int<\/span> <span class=\"n\">R<\/span><span class=\"p\">,<\/span> <span class=\"k\">class<\/span> <span class=\"nc\">ScalarT<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">mat<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">col_t<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vec<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">R<\/span><span class=\"p\">,<\/span> <span class=\"n\">ScalarT<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">using<\/span> <span class=\"n\">row_t<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vec<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">C<\/span><span class=\"p\">,<\/span> <span class=\"n\">ScalarT<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"n\">col_t<\/span> <span class=\"n\">columns<\/span><span class=\"p\">[<\/span><span class=\"n\">C<\/span><span class=\"p\">];<\/span> <span class=\"c1\">\/\/ column-major matrix<\/span>\n\n    <span class=\"n\">row_t<\/span> <span class=\"n\">row<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span> <span class=\"n\">i<\/span><span class=\"p\">)<\/span> <span class=\"k\">const<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"p\">...;<\/span> <span class=\"p\">}<\/span>\n<span class=\"p\">};<\/span>\n\n<span class=\"k\">using<\/span> <span class=\"n\">mat3<\/span> <span class=\"o\">=<\/span> <span class=\"n\">mat<\/span><span class=\"o\">&lt;<\/span><span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">;<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">vec3<\/span> <span class=\"o\">=<\/span> <span class=\"n\">vec<\/span><span class=\"o\">&lt;<\/span><span class=\"mi\">3<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>And then someone writes:<\/p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>mat3 m;\nm.row(1) = {1, 2, 3};\n<\/code><\/pre><\/div><\/div>\n\n<p>Looks perfectly reasonable, compiles without warnings, \u2026 and <a href=\"https:\/\/godbolt.org\/z\/azm1BP\">does nothing<\/a>.<\/p>\n\n<h2 id=\"solution-a\">Solution A<\/h2>\n\n<p>The problem in both cases is that it is totally fine to call <code class=\"language-plaintext highlighter-rouge\">operator=<\/code> on an rvalue reference (<code class=\"language-plaintext highlighter-rouge\">foo&amp;&amp;<\/code> or <code class=\"language-plaintext highlighter-rouge\">vec3&amp;&amp;<\/code>).\nThe compiler-generated definition something looks like:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Note that these member functions are not <code class=\"language-plaintext highlighter-rouge\">const<\/code>-qualified as assigning to a <code class=\"language-plaintext highlighter-rouge\">const<\/code> object doesn\u2019t really make any sense.\nAn easy solution to our \u201ceasy to use accidentally wrong\u201d API problem is thus:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span> <span class=\"p\">{<\/span> <span class=\"p\">...<\/span> <span class=\"p\">};<\/span>\n<span class=\"k\">const<\/span> <span class=\"n\">foo<\/span> <span class=\"n\">get_my_foo<\/span><span class=\"p\">()<\/span> <span class=\"p\">{<\/span> <span class=\"p\">...<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ .. some code later:<\/span>\n<span class=\"n\">foo<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">get_my_foo<\/span><span class=\"p\">()<\/span> <span class=\"o\">=<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ ERROR: cannot assign to const foo<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<h2 id=\"solution-b\">Solution B<\/h2>\n\n<p>Most of the time a type is used way more often than it is declared.\nOur first solution requires each use of our type as a return value to be <code class=\"language-plaintext highlighter-rouge\">const<\/code>-qualified.\nIsn\u2019t there a solution that is write-once and then works any time <code class=\"language-plaintext highlighter-rouge\">foo<\/code> is used as a return value?<\/p>\n\n<p>Turns out there is.<\/p>\n\n<p>And by the way, the following problem cannot be solved by <code class=\"language-plaintext highlighter-rouge\">const<\/code>-qualification:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">foo<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">foo<\/span><span class=\"p\">{}<\/span> <span class=\"o\">=<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ no error?<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">const<\/code>-solution works for return values but doesn\u2019t really prevent the core of the problem: \nassigning to a temporary.<\/p>\n\n<p>Since C++11 it is possible to add <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/member_functions#const-.2C_volatile-.2C_and_ref-qualified_member_functions\">reference qualifiers to member functions<\/a>.\nThese so-called ref-qualified member functions allow us to overload member functions not only on <code class=\"language-plaintext highlighter-rouge\">const<\/code> and non-<code class=\"language-plaintext highlighter-rouge\">const<\/code> but also on which type of reference <code class=\"language-plaintext highlighter-rouge\">this<\/code> is (<code class=\"language-plaintext highlighter-rouge\">&amp;<\/code> for lvalue, <code class=\"language-plaintext highlighter-rouge\">&amp;&amp;<\/code> for rvalue).<\/p>\n\n<p>Simplifying a bit (a lot?), lvalues are \u201cthings with names\u201d, like local variables.\nMost of the time, rvalues are temporaries or at least things we want to consider temporaries.<\/p>\n\n<p>Thus, our second solution is to delete the assignment operators for rvalue ref-qualified <code class=\"language-plaintext highlighter-rouge\">foo<\/code>s:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"o\">=<\/span> <span class=\"k\">delete<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"o\">=<\/span> <span class=\"k\">delete<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Unfortunately, declaring these <a href=\"https:\/\/foonathan.net\/blog\/2019\/02\/26\/special-member-functions.html\">special member functions<\/a> causes the compiler-generated copy and move constructors to disappear.\nThat means for a full solution (see below) we also need to explicitly default copy and move ctors.\nDeclaring any constructor makes the implicitly declared default constructor disappear, so we need to declare that as well.<\/p>\n\n<p>A minor consequence of deleting this assignment operator is that <code class=\"language-plaintext highlighter-rouge\">std::is_assignable&lt;foo, foo&gt;<\/code> becomes <code class=\"language-plaintext highlighter-rouge\">false<\/code>.\nThe reason is that <code class=\"language-plaintext highlighter-rouge\">std::is_assignable&lt;foo, foo&gt;<\/code> is actually <code class=\"language-plaintext highlighter-rouge\">std::is_assignable&lt;foo&amp;&amp;, foo&gt;<\/code>.\nYou typically only want <code class=\"language-plaintext highlighter-rouge\">std::is_assignable&lt;foo&amp;, foo&gt;<\/code> anyways, which is still <code class=\"language-plaintext highlighter-rouge\">true<\/code>.<\/p>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<p>If you suspect that a type of yours is susceptible to accidental assignment-to-temporary (like our <code class=\"language-plaintext highlighter-rouge\">vec<\/code> is), consider deleting the rvalue ref-qualified assignments:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">foo<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"c1\">\/\/ default ctor<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">()<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"c1\">\/\/ copy and move ctor<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"c1\">\/\/ assignment ops<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span> <span class=\"k\">const<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"o\">=<\/span> <span class=\"k\">delete<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;<\/span> <span class=\"o\">=<\/span> <span class=\"k\">default<\/span><span class=\"p\">;<\/span>\n    <span class=\"n\">foo<\/span><span class=\"o\">&amp;<\/span> <span class=\"k\">operator<\/span><span class=\"o\">=<\/span><span class=\"p\">(<\/span><span class=\"n\">foo<\/span><span class=\"o\">&amp;&amp;<\/span><span class=\"p\">)<\/span> <span class=\"o\">&amp;&amp;<\/span> <span class=\"o\">=<\/span> <span class=\"k\">delete<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">};<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>So maybe the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Rule_of_three_(C%2B%2B_programming)\"><del>rule-of-three<\/del> rule-of-five<\/a> should now be rule-of-seven?<\/p>"},{"title":"Performance of std::function","description":"How bad is it really?","pubDate":"Sat, 07 Sep 2019 12:20:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2019\/09\/07\/std-function-performance","guid":"https:\/\/artificial-mind.net\/blog\/2019\/09\/07\/std-function-performance","content":"<p>Popular folklore demands that you avoid <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> if you care about performance.<\/p>\n\n<p>But is it really true?\nHow bad is it?<\/p>\n\n<h2 id=\"nanobenchmarking-stdfunction\">Nanobenchmarking <code class=\"language-plaintext highlighter-rouge\">std::function<\/code><\/h2>\n\n<p>Benchmarking is hard.\nMicrobenchmarking is a dark art.\nMany people insist that nanobenchmarking is out of the reach for us mortals.<\/p>\n\n<p>But that won\u2019t stop us:\nlet\u2019s benchmark the overhead of creating and calling a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code>.<\/p>\n\n<p>We have to tread extra carefully here.\nModern desktop CPUs are insanely complex, often with deep pipelines, out-of-order execution, sophisticated branch prediction, prefetching, multiple level of caches, hyperthreading, and many more arcane performance-enhancing features.<\/p>\n\n<p>The other enemy is the compiler.<\/p>\n\n<blockquote>\n  <p>Any sufficiently advanced optimizing compiler is indistinguishable from magic.<\/p>\n<\/blockquote>\n\n<p>We\u2019ll have to make sure that our code-to-be-benchmarked is not being optimized away.\nLuckily, <code class=\"language-plaintext highlighter-rouge\">volatile<\/code> is still not fully deprecated and can be (ab)used to prevent many optimizations.\nIn this post we will only measure throughput (how long does it take to call the same function 1000000 times?).\nWe\u2019re going to use the following scaffold:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">template<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">F<\/span><span class=\"p\">&gt;<\/span>\n<span class=\"kt\">void<\/span> <span class=\"nf\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">F<\/span><span class=\"o\">&amp;&amp;<\/span> <span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">a_in<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.0<\/span><span class=\"n\">f<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b_in<\/span> <span class=\"o\">=<\/span> <span class=\"mf\">0.0<\/span><span class=\"n\">f<\/span><span class=\"p\">)<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"k\">constexpr<\/span> <span class=\"n\">count<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">1'000'000<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">volatile<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">a<\/span> <span class=\"o\">=<\/span> <span class=\"n\">a_in<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">volatile<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span> <span class=\"o\">=<\/span> <span class=\"n\">b_in<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">volatile<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">r<\/span><span class=\"p\">;<\/span>\n\n    <span class=\"k\">auto<\/span> <span class=\"k\">const<\/span> <span class=\"n\">t_start<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">chrono<\/span><span class=\"o\">::<\/span><span class=\"n\">high_resolution_clock<\/span><span class=\"o\">::<\/span><span class=\"n\">now<\/span><span class=\"p\">();<\/span>\n    <span class=\"k\">for<\/span> <span class=\"p\">(<\/span><span class=\"k\">auto<\/span> <span class=\"n\">i<\/span> <span class=\"o\">=<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"n\">i<\/span> <span class=\"o\">&lt;<\/span> <span class=\"n\">count<\/span><span class=\"p\">;<\/span> <span class=\"o\">++<\/span><span class=\"n\">i<\/span><span class=\"p\">)<\/span>\n        <span class=\"n\">r<\/span> <span class=\"o\">=<\/span> <span class=\"n\">f<\/span><span class=\"p\">(<\/span><span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"n\">b<\/span><span class=\"p\">);<\/span>\n    <span class=\"k\">auto<\/span> <span class=\"k\">const<\/span> <span class=\"n\">t_end<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">chrono<\/span><span class=\"o\">::<\/span><span class=\"n\">high_resolution_clock<\/span><span class=\"o\">::<\/span><span class=\"n\">now<\/span><span class=\"p\">();<\/span>\n\n    <span class=\"k\">auto<\/span> <span class=\"k\">const<\/span> <span class=\"n\">dt<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">chrono<\/span><span class=\"o\">::<\/span><span class=\"n\">duration<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">double<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">t_end<\/span> <span class=\"o\">-<\/span> <span class=\"n\">t_start<\/span><span class=\"p\">).<\/span><span class=\"n\">count<\/span><span class=\"p\">();<\/span>\n    <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">dt<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">count<\/span> <span class=\"o\">*<\/span> <span class=\"mf\">1e9<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"s\">\" ns \/ op\"<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Double checking with <a href=\"https:\/\/godbolt.org\/z\/fjBN1a\">godbolt<\/a> we can verify that the compiler is not optimizing the function body even though we only compute <code class=\"language-plaintext highlighter-rouge\">0.0f + 0.0f<\/code> in a loop.\nThe loop itself has some overhead and sometimes the compiler will unroll parts of the loop.<\/p>\n\n<h2 id=\"baseline\">Baseline<\/h2>\n\n<p>Our test system in the following benchmarks is an Intel Core i9-9900K running at 4.8 GHz (a modern high-end consumer CPU at the time of writing).\nThe code is compiled with <code class=\"language-plaintext highlighter-rouge\">clang-7<\/code> and the <code class=\"language-plaintext highlighter-rouge\">libcstd++<\/code> standard library using <code class=\"language-plaintext highlighter-rouge\">-O2<\/code> and <code class=\"language-plaintext highlighter-rouge\">-march=native<\/code>.<\/p>\n\n<p>We start with a few basic tests:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">benchmark<\/span><span class=\"p\">([](<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"mf\">0.0<\/span><span class=\"n\">f<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span>      <span class=\"c1\">\/\/ 0.21 ns \/ op (1 cycle \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([](<\/span><span class=\"kt\">float<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 0.22 ns \/ op (1 cycle \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([](<\/span><span class=\"kt\">float<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">a<\/span> <span class=\"o\">\/<\/span> <span class=\"n\">b<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 0.62 ns \/ op (3 cycles \/ op)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The baseline is about 1 cycle per operation and the <code class=\"language-plaintext highlighter-rouge\">a \/ b<\/code> test verifies that we can reproduce the throughput of basic operations (a good reference is <a href=\"https:\/\/asmjit.com\/asmgrid\/\">AsmGrid<\/a>, X86 Perf on the upper right).\n(I\u2019ve repeated all benchmarks multiple times and chose the mode of the distribution.)<\/p>\n\n<h2 id=\"calling-functions\">Calling Functions<\/h2>\n\n<p>The first thing we want to know: How expensive is a function call?<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">using<\/span> <span class=\"n\">fun_t<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">float<\/span><span class=\"p\">(<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ inlineable direct call<\/span>\n<span class=\"kt\">float<\/span> <span class=\"nf\">funA<\/span><span class=\"p\">(<\/span><span class=\"kt\">float<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ non-inlined direct call<\/span>\n<span class=\"n\">__attribute__<\/span><span class=\"p\">((<\/span><span class=\"n\">noinline<\/span><span class=\"p\">))<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">funB<\/span><span class=\"p\">(<\/span><span class=\"kt\">float<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span><span class=\"p\">;<\/span> <span class=\"p\">}<\/span>\n\n<span class=\"c1\">\/\/ non-inlined indirect call<\/span>\n<span class=\"n\">fun_t<\/span><span class=\"o\">*<\/span> <span class=\"n\">funC<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ set externally to funA<\/span>\n\n<span class=\"c1\">\/\/ visible lambda<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">funD<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">float<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span>\n\n<span class=\"c1\">\/\/ std::function with visible function<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">funE<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">function<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">funA<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ std::function with non-inlined function<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">funF<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">function<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">funB<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ std::function with function pointer<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">funG<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">function<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">funC<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ std::function with visible lambda<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">funH<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">function<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">(<\/span><span class=\"n\">funD<\/span><span class=\"p\">);<\/span>\n\n<span class=\"c1\">\/\/ std::function with direct lambda<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">funI<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">function<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">fun_t<\/span><span class=\"o\">&gt;<\/span><span class=\"p\">([](<\/span><span class=\"kt\">float<\/span> <span class=\"n\">a<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span> <span class=\"n\">b<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"n\">a<\/span> <span class=\"o\">+<\/span> <span class=\"n\">b<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The results:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funA<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 0.22 ns \/ op (1 cycle  \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funB<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 1.04 ns \/ op (5 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funC<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 1.04 ns \/ op (5 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funD<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 0.22 ns \/ op (1 cycle  \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funE<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 1.67 ns \/ op (8 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funF<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 1.67 ns \/ op (8 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funG<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 1.67 ns \/ op (8 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funH<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 1.25 ns \/ op (6 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">(<\/span><span class=\"n\">funI<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ 1.25 ns \/ op (6 cycles \/ op)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>This suggests that only A and D are inlined and that there is some additional optimization possible when using <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> with a lambda.<\/p>\n\n<h2 id=\"constructing-a-stdfunction\">Constructing a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code><\/h2>\n\n<p>We can also measure how long it takes to construct or copy a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">function<\/span><span class=\"o\">&lt;<\/span><span class=\"kt\">float<\/span><span class=\"p\">(<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"p\">)<\/span><span class=\"o\">&gt;<\/span> <span class=\"n\">f<\/span><span class=\"p\">;<\/span>\n\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{};<\/span> <span class=\"p\">});<\/span>   <span class=\"c1\">\/\/ 0.42 ns \/ op ( 2 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funA<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 4.37 ns \/ op (21 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funB<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 4.37 ns \/ op (21 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funC<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 4.37 ns \/ op (21 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funD<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 1.46 ns \/ op ( 7 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funE<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 5.00 ns \/ op (24 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funF<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 5.00 ns \/ op (24 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funG<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 5.00 ns \/ op (24 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funH<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 4.37 ns \/ op (21 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"n\">funI<\/span><span class=\"p\">;<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 4.37 ns \/ op (21 cycles \/ op)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>The result of <code class=\"language-plaintext highlighter-rouge\">f = funD<\/code> suggests that constructing a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> directly from a lambda is pretty fast.\nLet\u2019s check that when using different capture sizes:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">b4<\/span> <span class=\"p\">{<\/span> <span class=\"kt\">int32_t<\/span> <span class=\"n\">x<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">b8<\/span> <span class=\"p\">{<\/span> <span class=\"kt\">int64_t<\/span> <span class=\"n\">x<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span>\n<span class=\"k\">struct<\/span> <span class=\"nc\">b16<\/span> <span class=\"p\">{<\/span> <span class=\"kt\">int64_t<\/span> <span class=\"n\">x<\/span><span class=\"p\">,<\/span> <span class=\"n\">y<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span>\n\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[](<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span> <span class=\"p\">});<\/span>          <span class=\"c1\">\/\/ 1.46 ns \/ op ( 7 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">b4<\/span><span class=\"p\">{}](<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span> <span class=\"p\">});<\/span>  <span class=\"c1\">\/\/ 4.37 ns \/ op (21 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">b8<\/span><span class=\"p\">{}](<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span> <span class=\"p\">});<\/span>  <span class=\"c1\">\/\/ 4.37 ns \/ op (21 cycles \/ op)<\/span>\n<span class=\"n\">benchmark<\/span><span class=\"p\">([<\/span><span class=\"o\">&amp;<\/span><span class=\"p\">]{<\/span> <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"n\">x<\/span> <span class=\"o\">=<\/span> <span class=\"n\">b16<\/span><span class=\"p\">{}](<\/span><span class=\"kt\">float<\/span><span class=\"p\">,<\/span> <span class=\"kt\">float<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> <span class=\"k\">return<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"p\">};<\/span> <span class=\"p\">});<\/span> <span class=\"c1\">\/\/ 1.66 ns \/ op ( 8 cycles \/ op)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>I didn\u2019t have the patience to untangle the assembly or the <code class=\"language-plaintext highlighter-rouge\">libcstd++<\/code> implementation to check where this behavior originates.\nYou obviously have to pay for the capture and I think what we see here is a strange interaction between some kind of small function optimization and the compiler hoisting the construction of <code class=\"language-plaintext highlighter-rouge\">b16{}<\/code> out of our measurement loop.<\/p>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>I think there is a lot of fearmongering regarding <code class=\"language-plaintext highlighter-rouge\">std::function<\/code>, not all of it is justified.<\/p>\n\n<p>My benchmarks suggest that on a modern microarchitecture the following overhead can be expected on hot data and instruction caches:<\/p>\n\n<table>\n  <tbody>\n    <tr>\n      <td>calling a non-inlined function<\/td>\n      <td>4 cycles<\/td>\n    <\/tr>\n    <tr>\n      <td>calling a function pointer<\/td>\n      <td>4 cycles<\/td>\n    <\/tr>\n    <tr>\n      <td>calling a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> of a lambda<\/td>\n      <td>5 cycles<\/td>\n    <\/tr>\n    <tr>\n      <td>calling a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> of a function or function pointer<\/td>\n      <td>7 cycles<\/td>\n    <\/tr>\n    <tr>\n      <td>constructing an empty <code class=\"language-plaintext highlighter-rouge\">std::function<\/code><\/td>\n      <td>7 cycles<\/td>\n    <\/tr>\n    <tr>\n      <td>constructing a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> from a function or function pointer<\/td>\n      <td>21 cycles<\/td>\n    <\/tr>\n    <tr>\n      <td>copying a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code><\/td>\n      <td>21..24 cycles<\/td>\n    <\/tr>\n    <tr>\n      <td>constructing a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> from a non-capturing lambda<\/td>\n      <td>7 cycles<\/td>\n    <\/tr>\n    <tr>\n      <td>constructing a <code class=\"language-plaintext highlighter-rouge\">std::function<\/code> from a capturing lambda<\/td>\n      <td>21+ cycles<\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n\n<p>A word of caution: the benchmarks really only represent the overhead relative to <code class=\"language-plaintext highlighter-rouge\">a + b<\/code>.\nDifferent functions show slightly different overhead behavior as they might use different scheduler ports and execution units that might overlap differently with what the loop requires.\nAlso, a lot of this depends on how willing the compiler is to inline.<\/p>\n\n<p>We\u2019ve only measured the throughput.\nThe results are only valid for \u201ccalling the same function many times with different arguments\u201d, not for \u201ccalling many different functions\u201d.\nBut that is a topic for another post.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/unsplash.com\/photos\/sb7RUrRMaC4\">unsplash<\/a><\/em>)<\/p>"},{"title":"Basic Floating Point Optimizations","description":"Why is \"f + 0.0\" slower than \"f - 0.0\"?","pubDate":"Fri, 09 Aug 2019 01:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2019\/08\/09\/floating-point-optimizations","guid":"https:\/\/artificial-mind.net\/blog\/2019\/08\/09\/floating-point-optimizations","content":"<p>Ever seen some people write <code class=\"language-plaintext highlighter-rouge\">f * 0.5<\/code> when they mean <code class=\"language-plaintext highlighter-rouge\">f \/ 2<\/code>?<\/p>\n\n<p>Or if the compiler is able to optimize the <code class=\"language-plaintext highlighter-rouge\">f * 1.0<\/code> that you added for clarity?<\/p>\n\n<p>Maybe you wrote <code class=\"language-plaintext highlighter-rouge\">f + f<\/code> instead of <code class=\"language-plaintext highlighter-rouge\">f * 2<\/code> as a clever optimization?<\/p>\n\n<p>Modern compilers are basically magic, <em>but do they actually perform these optimizations?<\/em>\nAnd, more importantly, <em>why is <code class=\"language-plaintext highlighter-rouge\">f + 0.0<\/code> slower than <code class=\"language-plaintext highlighter-rouge\">f - 0.0<\/code>?<\/em><\/p>\n\n<h2 id=\"preliminaries\">Preliminaries<\/h2>\n\n<p>Many posts have been written about elaborate magic involving IEEE 754 floating point numbers.\nSome of my favorites include the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fast_inverse_square_root#History_and_investigation\">fast inverse square root<\/a> (commonly attributed to John Carmack though the method is much older), <a href=\"https:\/\/randomascii.wordpress.com\/category\/floating-point\/\">basically everything from the Random ASCII blog<\/a>, and the <a href=\"https:\/\/developer.nvidia.com\/content\/depth-precision-visualized\">reverse-Z depth test<\/a> for rendering.<\/p>\n\n<p>In this post I want to take a look at less flashy, more foundational things.\nWe\u2019re going over a couple small optimizations that we probably expected from our compilers.\nSome of them are performed but others are not actually legal or only work due to slightly arcane rules.<\/p>\n\n<p>While this might apply to other languages as well, I\u2019m most familiar with C++ which I\u2019ll be using for this post.\nWe will assume that our C++ implementation uses:<\/p>\n\n<ul>\n  <li>Two\u2019s complement for integers<\/li>\n  <li>IEEE 754 for floating points<\/li>\n<\/ul>\n\n<p>(This is currently implementation-defined behavior, but at least <a href=\"http:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2018\/p0907r1.html\">the first might change<\/a>)<\/p>\n\n<p>As always, it is highly instructive to look at the generated assembly for which we will use the excellent page by <a href=\"https:\/\/godbolt.org\/\">Matt Godbolt<\/a>.<\/p>\n\n<h2 id=\"the-curios-case-of-f--00\">The Curios Case of <code class=\"language-plaintext highlighter-rouge\">f + 0.0<\/code><\/h2>\n\n<p>Most programmers are taught that floating point math is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Floating-point_arithmetic#Accuracy_problems\">not associative<\/a> (<a href=\"https:\/\/godbolt.org\/z\/4MPMs-\">for example<\/a> <code class=\"language-plaintext highlighter-rouge\">-1e9f + 1e9f + 1 == 1<\/code> vs. <code class=\"language-plaintext highlighter-rouge\">-1e9f + (1e9f + 1) == 0<\/code>).\nThe same goes for integers if overflows might be involved.<\/p>\n\n<p>Sometimes one might come across trivial computations like <code class=\"language-plaintext highlighter-rouge\">i + 0<\/code>, <code class=\"language-plaintext highlighter-rouge\">f * 1.0<\/code>, <code class=\"language-plaintext highlighter-rouge\">i * 1<\/code>, etc..\nWe might want to spell them out for clarity or consistency or because it\u2019s part of a generic, templated algorithm that happens to be instantiated for these values.\nIt is very tempting to think \u201cmy compiler is smart and will do the-right-thing (tm)\u201d.<\/p>\n\n<p><em>Does it though?<\/em><\/p>\n\n<p>As any good lawyer will tell you: it depends.<\/p>\n\n<p>Integers are easier to reason about (for humans and computers alike) and have less surprises than floats, so <code class=\"language-plaintext highlighter-rouge\">i + 0<\/code>, <code class=\"language-plaintext highlighter-rouge\">i - 0<\/code>, <code class=\"language-plaintext highlighter-rouge\">i * 1<\/code>, and <code class=\"language-plaintext highlighter-rouge\">i \/ 1<\/code> will all be optimized to just <code class=\"language-plaintext highlighter-rouge\">i<\/code>.\nSometimes, the compiler can do a strength reduction and for example replace <code class=\"language-plaintext highlighter-rouge\">i * 2<\/code> with <code class=\"language-plaintext highlighter-rouge\">i + i<\/code>, which is semantically identical but faster.\n<code class=\"language-plaintext highlighter-rouge\">i * -1<\/code> can be replaced with <code class=\"language-plaintext highlighter-rouge\">-i<\/code> but only because signed integer overflows are undefined behavior. <code class=\"language-plaintext highlighter-rouge\">std::numeric_limits&lt;int&gt;::min() * -1<\/code> is not representable but because it\u2019s undefined behavior the compiler is free to assume that this case will not happen.<\/p>\n\n<p>In contrast, nothing is ever easy in floating points.\nThe typical gang that will ruin your special-case-free reasoning is <code class=\"language-plaintext highlighter-rouge\">\u00b1Inf<\/code>, <code class=\"language-plaintext highlighter-rouge\">NaN<\/code>, and <code class=\"language-plaintext highlighter-rouge\">\u00b10<\/code>.\nWhile <code class=\"language-plaintext highlighter-rouge\">f - 0.0<\/code>, <code class=\"language-plaintext highlighter-rouge\">f * 1.0<\/code>, and <code class=\"language-plaintext highlighter-rouge\">f \/ 1.0<\/code> are optimized to <code class=\"language-plaintext highlighter-rouge\">f<\/code>, you might be surprised to see that <code class=\"language-plaintext highlighter-rouge\">f + 0.0<\/code> <a href=\"https:\/\/godbolt.org\/z\/dMvu7D\">is not<\/a>.\nIt turns out that IEEE 754 has a special rule for sums of values with equal magnitude but different signs:<\/p>\n\n<blockquote>\n  <p><strong>6.3 The sign bit<\/strong><\/p>\n\n  <p>When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be <code class=\"language-plaintext highlighter-rouge\">+0<\/code> in all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be <code class=\"language-plaintext highlighter-rouge\">\u22120<\/code>.<\/p>\n\n  <p>However, <code class=\"language-plaintext highlighter-rouge\">x + x == x \u2212 (\u2212x)<\/code> retains the same sign as x even when x is zero.<\/p>\n<\/blockquote>\n\n<p>Let\u2019s make a small table what this means for different operations:<\/p>\n\n<table>\n  <thead>\n    <tr>\n      <th>f<\/th>\n      <th>f + 0.0<\/th>\n      <th>f - 0.0<\/th>\n      <th>f + -0.0<\/th>\n      <th>f - -0.0<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <td>+0.0<\/td>\n      <td>+0.0<\/td>\n      <td>+0.0<\/td>\n      <td>+0.0<\/td>\n      <td>+0.0<\/td>\n    <\/tr>\n    <tr>\n      <td>-0.0<\/td>\n      <td><strong>+0.0<\/strong><\/td>\n      <td>-0.0<\/td>\n      <td>-0.0<\/td>\n      <td><strong>+0.0<\/strong><\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n\n<p>(Cases where the expression cannot be legally replaced by <code class=\"language-plaintext highlighter-rouge\">f<\/code> are <strong>bold<\/strong>.)<\/p>\n\n<blockquote>\n  <p>Yes, <code class=\"language-plaintext highlighter-rouge\">f + 0.0<\/code> is slower than <code class=\"language-plaintext highlighter-rouge\">f - 0.0<\/code>.<\/p>\n<\/blockquote>\n\n<p>Even better, <code class=\"language-plaintext highlighter-rouge\">f - -0.0<\/code> is as \u201cslow\u201d as <code class=\"language-plaintext highlighter-rouge\">f + 0.0<\/code> while <code class=\"language-plaintext highlighter-rouge\">f - -0<\/code> is optimized to <code class=\"language-plaintext highlighter-rouge\">f<\/code>.\n(Because <code class=\"language-plaintext highlighter-rouge\">-0<\/code> is an integer which is converted to <code class=\"language-plaintext highlighter-rouge\">+0.0<\/code>, not <code class=\"language-plaintext highlighter-rouge\">-0.0<\/code>.)<\/p>\n\n<p>One might wonder if this whole <code class=\"language-plaintext highlighter-rouge\">+0.0<\/code> vs <code class=\"language-plaintext highlighter-rouge\">-0.0<\/code> business makes any difference.\nRemember that <code class=\"language-plaintext highlighter-rouge\">1.0 \/ +0.0 == Inf<\/code> and <code class=\"language-plaintext highlighter-rouge\">1.0 \/ -0.0 == -Inf<\/code> and you can easily construct pathological cases:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">double<\/span> <span class=\"nf\">clamp<\/span><span class=\"p\">(<\/span><span class=\"kt\">double<\/span> <span class=\"n\">vmin<\/span><span class=\"p\">,<\/span> <span class=\"kt\">double<\/span> <span class=\"n\">vmax<\/span><span class=\"p\">,<\/span> <span class=\"kt\">double<\/span> <span class=\"n\">v<\/span><span class=\"p\">)<\/span> <span class=\"p\">{<\/span> \n    <span class=\"k\">return<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">max<\/span><span class=\"p\">(<\/span><span class=\"n\">vmin<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">min<\/span><span class=\"p\">(<\/span><span class=\"n\">vmax<\/span><span class=\"p\">,<\/span> <span class=\"n\">v<\/span><span class=\"p\">));<\/span> \n<span class=\"p\">}<\/span>\n\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">clamp<\/span><span class=\"p\">(<\/span><span class=\"mf\">2.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">5.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">1.0<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"mf\">1e-200<\/span> <span class=\"o\">\/<\/span> <span class=\"mf\">1e200<\/span> <span class=\"o\">+<\/span> <span class=\"mf\">0.0<\/span><span class=\"p\">))<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ 5<\/span>\n<span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">clamp<\/span><span class=\"p\">(<\/span><span class=\"mf\">2.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">5.0<\/span><span class=\"p\">,<\/span> <span class=\"mf\">1.0<\/span> <span class=\"o\">\/<\/span> <span class=\"p\">(<\/span><span class=\"o\">-<\/span><span class=\"mf\">1e-200<\/span> <span class=\"o\">\/<\/span> <span class=\"mf\">1e200<\/span> <span class=\"o\">-<\/span> <span class=\"mf\">0.0<\/span><span class=\"p\">))<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ 2<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Though overall, the usefulness of these special values can be doubted and if you are not actively expecting <code class=\"language-plaintext highlighter-rouge\">Inf<\/code>s or <code class=\"language-plaintext highlighter-rouge\">NaN<\/code>s you might be well advised to sprinkle a few <code class=\"language-plaintext highlighter-rouge\">assert(isfinite(f))<\/code> across your code.<\/p>\n\n<h2 id=\"wall-of-fame-and-shame\">Wall of Fame and Shame<\/h2>\n\n<p>Here are the results of a few common \u201ctrivial\u201d operations: (<a href=\"https:\/\/godbolt.org\/z\/1v0-RC\">godbolt<\/a>)<\/p>\n\n<table>\n  <thead>\n    <tr>\n      <th>expression<\/th>\n      <th>compiled<\/th>\n      <th>notes<\/th>\n    <\/tr>\n  <\/thead>\n  <tbody>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f + 0.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f + 0.0<\/code><\/td>\n      <td>optimized if <code class=\"language-plaintext highlighter-rouge\">-fno-signed-zeros<\/code><\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f - 0.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f<\/code><\/td>\n      <td>\u00a0<\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f + -0.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f<\/code><\/td>\n      <td>\u00a0<\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f - -0.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f + 0.0<\/code><\/td>\n      <td>optimized if <code class=\"language-plaintext highlighter-rouge\">-fno-signed-zeros<\/code><\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f * 1.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f<\/code><\/td>\n      <td>\u00a0<\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f \/ 1.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f<\/code><\/td>\n      <td>\u00a0<\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f \/ 2.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f * 0.5<\/code><\/td>\n      <td>\u00a0<\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f \/ 3.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f \/ 3.0<\/code><\/td>\n      <td>cannot guarantee same rounding if <code class=\"language-plaintext highlighter-rouge\">f * 0.333..<\/code> were used<\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f * 0.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f * 0.0<\/code><\/td>\n      <td>optimized if <code class=\"language-plaintext highlighter-rouge\">-fno-signed-zeros<\/code> and <code class=\"language-plaintext highlighter-rouge\">-ffinite-math-only<\/code><\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f * -1.0<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">-f<\/code><\/td>\n      <td>actually implemented via flipping the sign bit with <code class=\"language-plaintext highlighter-rouge\">xor<\/code><\/td>\n    <\/tr>\n    <tr>\n      <td><code class=\"language-plaintext highlighter-rouge\">f - f<\/code><\/td>\n      <td><code class=\"language-plaintext highlighter-rouge\">f - f<\/code><\/td>\n      <td>optimized if <code class=\"language-plaintext highlighter-rouge\">-ffinite-math-only<\/code><\/td>\n    <\/tr>\n  <\/tbody>\n<\/table>\n\n<h2 id=\"conclusion\">Conclusion<\/h2>\n\n<p>At some point people tend to give up trying to understand floats and accept that they are \u201cstrange\u201d or even \u201cunreliable\u201d.\n<a href=\"https:\/\/randomascii.wordpress.com\/2013\/07\/16\/floating-point-determinism\/\">And it\u2019s not exactly false<\/a>.\nBut they might be throwing the baby out with the bathwater.\nIt is true that floats involve a good amount of complexity but most rules are actually designed to <em>help<\/em> making numerical algorithms reliable.<\/p>\n\n<p>Don\u2019t forget that <a href=\"https:\/\/randomascii.wordpress.com\/2017\/06\/19\/sometimes-floating-point-math-is-perfect\/\">IEEE 754 has a few useful guarantees<\/a>.\nFor example, the five basic operations <code class=\"language-plaintext highlighter-rouge\">+<\/code>. <code class=\"language-plaintext highlighter-rouge\">-<\/code>, <code class=\"language-plaintext highlighter-rouge\">*<\/code>, <code class=\"language-plaintext highlighter-rouge\">\/<\/code>, and <code class=\"language-plaintext highlighter-rouge\">sqrt<\/code> are guaranteed to give \u201cexact\u201d results, which means the closest representable number corresponding to the current rounding mode.\nThis especially implies that everything that is representable will be computed exactly:<\/p>\n\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">1.0 + 2.0 == 3.0<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">3.0 \/ 4.0 == 0.75<\/code><\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">sqrt(5.0625) == 2.25<\/code><\/li>\n<\/ul>\n\n<p><code class=\"language-plaintext highlighter-rouge\">float<\/code>s have 23 bit mantissa, meaning integer computations with <code class=\"language-plaintext highlighter-rouge\">float<\/code> will be exact if input and output are at most a few millions.\n<code class=\"language-plaintext highlighter-rouge\">double<\/code> has enough precision that any <code class=\"language-plaintext highlighter-rouge\">int32_t<\/code> computation will be exact.<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/unsplash.com\/photos\/Nl-GCtizDHg\">unsplash<\/a><\/em>)<\/p>"},{"title":"Special Treatment for Literal Zero","description":"Or: how to make \"foo(0)\" and \"foo(1)\" call different functions","pubDate":"Thu, 01 Aug 2019 02:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2019\/08\/01\/special-treatment-for-literal-zero","guid":"https:\/\/artificial-mind.net\/blog\/2019\/08\/01\/special-treatment-for-literal-zero","content":"<h2 id=\"tldr\">TL;DR:<\/h2>\n\n<p>It is possible in C++ to make <code class=\"language-plaintext highlighter-rouge\">foo(0)<\/code> and <code class=\"language-plaintext highlighter-rouge\">foo(1)<\/code> call different functions.\nWe used that to design an API where <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> calls an optimized check without branching.<\/p>\n\n<p>Trigger warning: this post contains mild forms of C++ gore.<\/p>\n\n<h2 id=\"a-tale-of-0-and-1\">A Tale of <code class=\"language-plaintext highlighter-rouge\">0<\/code> and <code class=\"language-plaintext highlighter-rouge\">1<\/code><\/h2>\n\n<p>I recently stumbled upon the little fact that just because <code class=\"language-plaintext highlighter-rouge\">x &lt; 1<\/code> compiles you cannot assume that <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> would as well (<a href=\"https:\/\/godbolt.org\/z\/BJIDFx\">godbolt<\/a>).<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">X<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"kt\">long<\/span><span class=\"p\">);<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"kt\">void<\/span><span class=\"o\">*<\/span><span class=\"p\">);<\/span>\n\n<span class=\"n\">X<\/span> <span class=\"n\">x<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ fine<\/span>\n<span class=\"n\">x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ ambiguous overload<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Let\u2019s start simple.\n<code class=\"language-plaintext highlighter-rouge\">0<\/code> and <code class=\"language-plaintext highlighter-rouge\">1<\/code> are integer literals and without any suffix their type is good ol\u2019 <code class=\"language-plaintext highlighter-rouge\">int<\/code>.\n(<a href=\"https:\/\/godbolt.org\/z\/z7YlBy\">Lack of a suffix alone is not a guarantee though.<\/a>)\n<code class=\"language-plaintext highlighter-rouge\">int<\/code>s can be promoted to <code class=\"language-plaintext highlighter-rouge\">long<\/code>s which is why <code class=\"language-plaintext highlighter-rouge\">x &lt; 1<\/code> happily calls the first overload.\nSo far so good.<\/p>\n\n<p>Back in the days (technical term for the period of time before C++11), there was no <code class=\"language-plaintext highlighter-rouge\">nullptr<\/code>.\nPeople either used <code class=\"language-plaintext highlighter-rouge\">0<\/code> or <code class=\"language-plaintext highlighter-rouge\">NULL<\/code> (<a href=\"https:\/\/en.cppreference.com\/w\/cpp\/types\/NULL\">a fancy macro<\/a>).\nThe reason this worked was because there is a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/implicit_conversion#Pointer_conversions\">special rule<\/a> that <code class=\"language-plaintext highlighter-rouge\">0<\/code> can be converted to any pointer and yields the null pointer.<\/p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> is thus ambiguous because both overloads are valid and they are equally good (both are one conversion away).<\/p>\n\n<h2 id=\"an-idea-forms\">An Idea Forms<\/h2>\n\n<p>This little curious gem led me to rethink a recent API design problem we faced.<\/p>\n\n<p>We were writing a custom 256 bit integer type, basically a <code class=\"language-plaintext highlighter-rouge\">BigInteger<\/code> but with a fixed representation using 4 <code class=\"language-plaintext highlighter-rouge\">uint64_t<\/code>s internally.\nThis was performance sensitive code and we wanted everything to be maximally efficient.<\/p>\n\n<p>Our <code class=\"language-plaintext highlighter-rouge\">i256<\/code> type uses two\u2019s complement and the check <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> can be done by simply testing if the highest bit is set.\nHowever, <code class=\"language-plaintext highlighter-rouge\">x &lt; 1<\/code> or <code class=\"language-plaintext highlighter-rouge\">x &lt; y<\/code> was a bit more expensive because you basically have to test <code class=\"language-plaintext highlighter-rouge\">x - y &lt; 0<\/code>.<\/p>\n\n<p>In our initial design we opted for a <code class=\"language-plaintext highlighter-rouge\">is_below_zero(i256)<\/code> function that performs the optimized test and <code class=\"language-plaintext highlighter-rouge\">operator&lt;(i256, i256)<\/code> that does the generic test.\nUnfortunately, people tend to write <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> and I can\u2019t blame them because it\u2019s more readable and more concise than <code class=\"language-plaintext highlighter-rouge\">is_below_zero(x)<\/code>.<\/p>\n\n<p>Looping back to our initial observation:\nIf it is possible to generate a compile error for <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> but not for <code class=\"language-plaintext highlighter-rouge\">x &lt; 1<\/code> then maybe it is also possible to redirect <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> to a different function.<\/p>\n\n<blockquote>\n  <p>What if we could automatically (and statically) use <code class=\"language-plaintext highlighter-rouge\">is_below_zero(x)<\/code> whenever <code class=\"language-plaintext highlighter-rouge\">operator&lt;<\/code> is called with a literal <code class=\"language-plaintext highlighter-rouge\">0<\/code> and otherwise fall back to the generic code without using a runtime branch?<\/p>\n<\/blockquote>\n\n<h2 id=\"the-initial-design\">The Initial Design<\/h2>\n\n<p>Usually, whenever we want to check if something would compile without actually introducing a compile error, we reach for <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/sfinae\">SFINAE<\/a>.\nThe idea would be to disable <code class=\"language-plaintext highlighter-rouge\">operator&lt;(long)<\/code> for <code class=\"language-plaintext highlighter-rouge\">0<\/code>.\nIn our case, this doesn\u2019t work because SFINAE usually deals in types where <code class=\"language-plaintext highlighter-rouge\">0<\/code> and <code class=\"language-plaintext highlighter-rouge\">1<\/code> are not distinguishable anymore.<\/p>\n\n<p>However, we don\u2019t actually need to disable <code class=\"language-plaintext highlighter-rouge\">operator&lt;(long)<\/code>, we just need to break the ambiguity.\nIf there were a way to make <code class=\"language-plaintext highlighter-rouge\">operator&lt;(long)<\/code> a worse candidate than <code class=\"language-plaintext highlighter-rouge\">operator&lt;(void*)<\/code> (for <code class=\"language-plaintext highlighter-rouge\">0<\/code>) then this meets our requirements.<\/p>\n\n<p>Lo and behold, there is a way!<\/p>\n\n<p>User conversions are <a href=\"https:\/\/stackoverflow.com\/questions\/44086269\/why-does-my-variant-convert-a-stdstring-to-a-bool\">famously<\/a> <a href=\"https:\/\/stackoverflow.com\/questions\/44021989\/implicit-cast-from-const-string-to-bool\">known<\/a> for being \u201cweaker\u201d than standard conversions.<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">struct<\/span> <span class=\"nc\">not_literal_zero<\/span> <span class=\"p\">{<\/span> <span class=\"n\">not_literal_zero<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">);<\/span> <span class=\"p\">};<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">literal_zero<\/span> <span class=\"o\">=<\/span> <span class=\"kt\">void<\/span><span class=\"o\">*<\/span><span class=\"p\">;<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">X<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">not_literal_zero<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ (1)<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">literal_zero<\/span><span class=\"p\">);<\/span>     <span class=\"c1\">\/\/ (2)<\/span>\n\n<span class=\"n\">X<\/span> <span class=\"n\">x<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ calls (1)<\/span>\n<span class=\"n\">x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ calls (2)<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><code class=\"language-plaintext highlighter-rouge\">1<\/code> cannot be converted to <code class=\"language-plaintext highlighter-rouge\">void*<\/code> but has an implicit conversion to <code class=\"language-plaintext highlighter-rouge\">not_literal_zero<\/code>.\n<code class=\"language-plaintext highlighter-rouge\">0<\/code> can be converted to both but <code class=\"language-plaintext highlighter-rouge\">void*<\/code> is a standard conversion while <code class=\"language-plaintext highlighter-rouge\">not_literal_zero<\/code> a user-provided one, making (2) the unambiguous call target (<a href=\"https:\/\/godbolt.org\/z\/Ke8rWa\">godbolt<\/a>).<\/p>\n\n<h2 id=\"banishing-footguns\">Banishing Footguns<\/h2>\n\n<p>Are we done yet?<\/p>\n\n<p>Well\u2026 yes, and no.<\/p>\n\n<p>We achieved our initial goal of making <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> call our optimized version while <code class=\"language-plaintext highlighter-rouge\">x &lt; 1<\/code> and <code class=\"language-plaintext highlighter-rouge\">x &lt; y<\/code> call the generic version.\nAPIs should not be needlessly surprising but we\u2019re good law-abiding citizen:\nBoth versions are semantically equivalent for <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code>.\nThe call is statically resolved, thus no performance overhead in any case and a performance win for the common <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> test.<\/p>\n\n<p>However, we introduced a few unnecessary <a href=\"https:\/\/en.wiktionary.org\/wiki\/footgun\">footguns<\/a>:<\/p>\n\n<ol>\n  <li><code class=\"language-plaintext highlighter-rouge\">x &lt; nullptr<\/code> compiles<\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">x &lt; &amp;x<\/code> compiles<\/li>\n  <li><code class=\"language-plaintext highlighter-rouge\">x &lt; \"y tho?\"<\/code> almost compiles<\/li>\n<\/ol>\n\n<p>The third case would compile if we use <code class=\"language-plaintext highlighter-rouge\">void const*<\/code> instead of <code class=\"language-plaintext highlighter-rouge\">void*<\/code>, something we might be tempted to do due to const-correctness.<\/p>\n\n<p>Let\u2019s fix the first case.\nThis is actually pretty simple because <code class=\"language-plaintext highlighter-rouge\">nullptr<\/code> has a special type <code class=\"language-plaintext highlighter-rouge\">std::nullptr_t<\/code>.\nWe cannot add an overload with <code class=\"language-plaintext highlighter-rouge\">std::nullptr_t<\/code> as that would make <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> ambiguous again.\nHowever, we can disable it with SFINAE:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;type_traits&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"k\">template<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">T<\/span><span class=\"p\">,<\/span> \n         <span class=\"k\">class<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_same_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">T<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">nullptr_t<\/span><span class=\"p\">&gt;<\/span><span class=\"o\">&gt;&gt;<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">T<\/span><span class=\"p\">)<\/span> <span class=\"o\">=<\/span> <span class=\"k\">delete<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>For <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code>, <code class=\"language-plaintext highlighter-rouge\">T<\/code> will be <code class=\"language-plaintext highlighter-rouge\">int<\/code> and this overload is disabled via <code class=\"language-plaintext highlighter-rouge\">std::enable_if<\/code>.\nFor <code class=\"language-plaintext highlighter-rouge\">x &lt; nullptr<\/code>, <code class=\"language-plaintext highlighter-rouge\">T<\/code> is deduced to be <code class=\"language-plaintext highlighter-rouge\">std::nullptr_t<\/code> and the overload is valid.\nIt requires no additional conversion and thus matches better than our <code class=\"language-plaintext highlighter-rouge\">void*<\/code> overload.\nThe compile error is not ideal (<code class=\"language-plaintext highlighter-rouge\">use of deleted function 'operator&lt;'<\/code>) and could be improved with a <code class=\"language-plaintext highlighter-rouge\">static_assert<\/code> if desired.<\/p>\n\n<p>The final issue is that <code class=\"language-plaintext highlighter-rouge\">x &lt; &amp;x<\/code> compiles.\nThe reason is that many pointer types can be converted to <code class=\"language-plaintext highlighter-rouge\">void*<\/code>, which is a pretty general type.\nWhile we cannot fix all corner cases, we can make it incredibly hard to accidentally trigger it.<\/p>\n\n<p>First, we change <code class=\"language-plaintext highlighter-rouge\">void*<\/code> into a <a href=\"https:\/\/en.cppreference.com\/w\/cpp\/language\/pointer#Pointers_to_data_members\">pointer-to-data-member<\/a>, let\u2019s say <code class=\"language-plaintext highlighter-rouge\">X* X::*<\/code> (a pointer to a member of <code class=\"language-plaintext highlighter-rouge\">X<\/code> that is itself a pointer to <code class=\"language-plaintext highlighter-rouge\">X<\/code>).\n<code class=\"language-plaintext highlighter-rouge\">0<\/code> can still initialize this pointer, but the only other way would be <code class=\"language-plaintext highlighter-rouge\">&amp;X::m<\/code> (where <code class=\"language-plaintext highlighter-rouge\">m<\/code> is an actual member of <code class=\"language-plaintext highlighter-rouge\">X<\/code> of type <code class=\"language-plaintext highlighter-rouge\">X*<\/code>) or via explicit casts.<\/p>\n\n<p>If you really don\u2019t want to take any chances, you can also reach for a <a href=\"http:\/\/videocortex.io\/2017\/Bestiary\/#-voldemort-types\">Voldemort Type<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">auto<\/span> <span class=\"nf\">hidden_ptr_to_member<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">struct<\/span> <span class=\"nc\">H<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n    <span class=\"n\">H<\/span><span class=\"o\">*<\/span> <span class=\"n\">H<\/span><span class=\"o\">::*<\/span> <span class=\"n\">p<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">nullptr<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">p<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"k\">using<\/span> <span class=\"n\">literal_zero<\/span> <span class=\"o\">=<\/span> <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">hidden_ptr_to_member<\/span><span class=\"p\">());<\/span>\n\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">literal_zero<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Outside of <code class=\"language-plaintext highlighter-rouge\">hidden_ptr_to_member<\/code> there is no syntax to name the inner <code class=\"language-plaintext highlighter-rouge\">struct H<\/code>, thus making it a <em>type-that-cannot-be-named<\/em>.\nThe only way to actually refer to it is by using <code class=\"language-plaintext highlighter-rouge\">decltype(...)<\/code>.\nApart from the inability to name it, it is just a pointer-to-data-member and thus suitable for our <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> overload.<\/p>\n\n<p>For a final touch and maybe as an appeal to our inner humanity, we move <code class=\"language-plaintext highlighter-rouge\">hidden_ptr_to_member<\/code> to <code class=\"language-plaintext highlighter-rouge\">namespace impl<\/code>.\n<code class=\"language-plaintext highlighter-rouge\">impl::<\/code> and <code class=\"language-plaintext highlighter-rouge\">detail::<\/code> namespaces are basically C++ bro code for \u201cI have to expose this because of C++ rules but if you actively use anything inside it, you\u2019re on your own\u201d.<\/p>\n\n<h2 id=\"summary\">Summary<\/h2>\n\n<p>And with that, we have our <a href=\"https:\/\/godbolt.org\/z\/r1sYsE\">final version<\/a>:<\/p>\n\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">#include<\/span> <span class=\"cpf\">&lt;type_traits&gt;<\/span><span class=\"cp\">\n<\/span>\n<span class=\"k\">namespace<\/span> <span class=\"n\">impl<\/span>\n<span class=\"p\">{<\/span>\n<span class=\"k\">auto<\/span> <span class=\"n\">hidden_ptr_to_member<\/span><span class=\"p\">()<\/span>\n<span class=\"p\">{<\/span>\n    <span class=\"k\">struct<\/span> <span class=\"nc\">H<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n    <span class=\"n\">H<\/span><span class=\"o\">*<\/span> <span class=\"n\">H<\/span><span class=\"o\">::*<\/span> <span class=\"n\">p<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">nullptr<\/span><span class=\"p\">;<\/span>\n    <span class=\"k\">return<\/span> <span class=\"n\">p<\/span><span class=\"p\">;<\/span>\n<span class=\"p\">}<\/span>\n<span class=\"p\">}<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">not_literal_zero<\/span> <span class=\"p\">{<\/span> <span class=\"n\">not_literal_zero<\/span><span class=\"p\">(<\/span><span class=\"kt\">int<\/span><span class=\"p\">);<\/span> <span class=\"p\">};<\/span>\n<span class=\"k\">using<\/span> <span class=\"n\">literal_zero<\/span> <span class=\"o\">=<\/span> <span class=\"k\">decltype<\/span><span class=\"p\">(<\/span><span class=\"n\">impl<\/span><span class=\"o\">::<\/span><span class=\"n\">hidden_ptr_to_member<\/span><span class=\"p\">());<\/span>\n\n<span class=\"k\">struct<\/span> <span class=\"nc\">X<\/span> <span class=\"p\">{<\/span> <span class=\"p\">};<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">not_literal_zero<\/span><span class=\"p\">);<\/span> <span class=\"c1\">\/\/ (1)<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">literal_zero<\/span><span class=\"p\">);<\/span>     <span class=\"c1\">\/\/ (2)<\/span>\n\n<span class=\"k\">template<\/span><span class=\"o\">&lt;<\/span><span class=\"k\">class<\/span> <span class=\"nc\">forbid_nullptr<\/span><span class=\"p\">,<\/span> \n         <span class=\"k\">class<\/span> <span class=\"o\">=<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">enable_if_t<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">is_same_v<\/span><span class=\"o\">&lt;<\/span><span class=\"n\">forbid_nullptr<\/span><span class=\"p\">,<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">nullptr_t<\/span><span class=\"p\">&gt;<\/span><span class=\"o\">&gt;&gt;<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">forbid_nullptr<\/span><span class=\"p\">)<\/span> <span class=\"o\">=<\/span> <span class=\"k\">delete<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ (3)<\/span>\n\n<span class=\"n\">X<\/span> <span class=\"n\">x<\/span><span class=\"p\">;<\/span>\n<span class=\"n\">x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span>       <span class=\"c1\">\/\/ calls (1)<\/span>\n<span class=\"n\">x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"mi\">0<\/span><span class=\"p\">;<\/span>       <span class=\"c1\">\/\/ calls (2)<\/span>\n<span class=\"n\">x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"nb\">nullptr<\/span><span class=\"p\">;<\/span> <span class=\"c1\">\/\/ calls (3) which is deleted<\/span>\n<span class=\"n\">x<\/span> <span class=\"o\">&lt;<\/span> <span class=\"o\">&amp;<\/span><span class=\"n\">x<\/span><span class=\"p\">;<\/span>      <span class=\"c1\">\/\/ no matching call<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p><code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> and <code class=\"language-plaintext highlighter-rouge\">x &lt; 1<\/code> call different functions and we can safely optimize the <code class=\"language-plaintext highlighter-rouge\">x &lt; 0<\/code> case.\nWith creative use of some slightly more arcane C++ rules we were able to remove a few pitfalls (like being able to call <code class=\"language-plaintext highlighter-rouge\">x &lt; y<\/code> with <code class=\"language-plaintext highlighter-rouge\">nullptr<\/code> or various other pointer types).<\/p>\n\n<p>I probably have to say that I consider this thing more in the category of \u201cfun with C++\u201d than \u201cproduction-ready code\u201d, so please use your own judgement before adopting it.\nA possible middle ground could be<\/p>\n<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">[[<\/span><span class=\"n\">deprecated<\/span><span class=\"p\">(<\/span><span class=\"s\">\"use is_below_zero(X) instead\"<\/span><span class=\"p\">)]]<\/span>\n<span class=\"kt\">bool<\/span> <span class=\"k\">operator<\/span><span class=\"o\">&lt;<\/span><span class=\"p\">(<\/span><span class=\"n\">X<\/span><span class=\"p\">,<\/span> <span class=\"n\">literal_zero<\/span><span class=\"p\">);<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>(<em>Title image from <a href=\"https:\/\/unsplash.com\/photos\/rfpSOlH1JlQ\">unsplash<\/a><\/em>)<\/p>"},{"title":"Hello Blog","description":"Welcome to my blog!","pubDate":"Wed, 31 Jul 2019 01:00:00 +0000","link":"https:\/\/artificial-mind.net\/blog\/2019\/07\/31\/hello-blog","guid":"https:\/\/artificial-mind.net\/blog\/2019\/07\/31\/hello-blog","content":"<div class=\"language-cpp highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">cout<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"s\">\"Hello Blog\"<\/span> <span class=\"o\">&lt;&lt;<\/span> <span class=\"n\">std<\/span><span class=\"o\">::<\/span><span class=\"n\">endl<\/span><span class=\"p\">;<\/span>\n<\/code><\/pre><\/div><\/div>\n\n<p>Welcome to my blog!<\/p>\n\n<p>Over the years I had to solve many technical challenges and developed quite strong opinions about various design choice in library development.\nMy colleagues are my go-to target when discussing and designing APIs or solving intricate problems though I plan to change that:<\/p>\n\n<p>Whiteboard discussions are ephemeral while blog posts are eternal\u2014or so they say.<\/p>\n\n<p>I want to make the step and write down the results of the more interesting discussions and document the rationales behind various design decisions.\nSometimes I want to share interesting results or just show a cool rendering or two.<\/p>\n\n<p>I\u2019m currently doing my PhD in computer graphics with a heavy focus on shape analysis, geometry processing, and optimization.\nI\u2019ve written various C++ libraries for rendering with OpenGL, mesh processing, vector math, benchmarking, and more.\nMy interests are broad and this will probably be reflected in the blog.<\/p>\n\n<p>All in all I hope this makes for some interesting topics for some of you.\nIn that spirit:<\/p>\n\n<p>Let\u2019s start this journey!<\/p>\n\n<p>(<em>Title image from <a href=\"https:\/\/unsplash.com\/photos\/68ZlATaVYIo\">unsplash<\/a><\/em>)<\/p>"}]}}