September | 2018 | Possibly Wrong

This past week’s Riddler puzzle on FiveThirtyEight asks for the number of different paths of minimum length from a starting intersection of city streets to a destination $m$ blocks east and $n$ blocks north. Put another way, moving on the 2D integer lattice graph, how many paths are there from the origin $(0,0)$ to vertex $(m,n)$ that are of minimum length?

Constraining the paths to minimum length greatly simplifies the problem. So let’s generalize, and instead ask for the number of paths from $(0,0)$ to $(m,n)$ of length $k$ — so that the original problem asks for the particular case $k=|m|+|n|$ , but what if we allow longer paths where we sometimes move in the “wrong” direction away from the destination?

I think this is a nice problem, with an elegant solution only slightly more complex than the original posed in the Riddler column. As a hint, the animation below visualizes the result, where the path length $k$ increases with each frame, showing the probability distribution of the endpoint of a 2D random walk.

Probability distribution of endpoint of 2D lattice random walk, vs. number of steps.

Perhaps as another hint, note the checkerboard pattern to the distribution; only “half” of the vertices are reachable for a particular path length $k$ , and which half is reachable alternates as $k$ increases.

Introduction

This is a follow-up to a post from several years ago describing a C++ implementation of arbitrary-precision unsigned integer arithmetic. This weekend I extended this to also support arbitrary-precision signed integers and rational numbers. Although this started as an educational tool, it now feels a bit more complete, and actually usable for the combinatorics and probability applications of the sort that are frequently discussed here.

I tried to stick to the original objectives of relatively simple and hopefully readable code, with stand-alone, header-only implementation, as freely available in the public domain as legally possible.

The code is available here, as well as on GitHub, in three header files:

#include "math_Unsigned.h" defines a math::Unsigned type representing the natural numbers with all of the sensible arithmetic, bitwise, and relational operators, essentially everything except bitwise one’s complement… although more on this shortly.
#include "math_Integer.h" defines an Integer type with a sign-and-magnitude implementation in terms of Unsigned, with all corresponding operators, including bitwise operators having two’s complement semantics assuming “infinite sign extension.”
#include "math_Rational.h" defines a Rational type implemented in terms of Integer numerator and denominator.

This was a fun exercise; there were interesting challenges in developing each of the three classes. As discussed previously, the unsigned type handles the actual arbitrary-precision representation (implemented as a vector<uint32_t> of digits in base $2^{32}$ ), where division is by far the most complex operation to implement efficiently.

The implementation of the signed integer type is relatively straightforward… except for the bitwise operators. Assuming a sign-and-magnitude representation (using an Unsigned under the hood), it is an interesting exercise to work out how to implement bitwise and, or, xor, and not, so that they have two’s complement semantics even for negative operands. In the process, I had to add an “AND NOT” operator to the original underlying unsigned type (there is actually a built-in operator &^ for this in Go).

With this machinery in place, the rational type is the simplest to implement. The only wrinkle here is that a few additional constructors are needed, since user-defined conversions from the more primitive integral types (e.g., Rational from Integer, Integer from int32_t, etc.) are only implicitly applied “one level deep.”

Example application: Are seven riffle shuffles enough?

To test and demonstrate use of these classes, consider riffle shuffling a standard poker deck of 52 playing cards. How many shuffles are sufficient to “fully randomize” the deck? A popular rule of thumb, attributed to Bayer and Diaconis, is that seven riffle shuffles are recommended. (See a longer list of references here, along with some simpler counting arguments that at least six shuffles are certainly necessary.)

This recommendation is based on analysis of the Gilbert-Shannon-Reeds model of a single riffle shuffle, and of the total variation distance between probability distributions $Q^m$ and $U$ , where $Q^m$ is the distribution of arrangements of the deck after $m$ GSR riffle shuffles, and $U$ is the desired uniform distribution where every arrangement is equally likely. We can compute this total variation distance exactly as a function of the number $m$ of shuffles, as demonstrated in the following example code:

#include "math_Rational.h"
#include <iostream>
using namespace math;

Integer factorial(int n)
{
    Integer f = 1;
    for (int k = 1; k <= n; ++k)
    {
        f *= k;
    }
    return f;
}

Integer binomial(int n, int k)
{
    if (0 <= k && k <= n)
    {
        return factorial(n) / factorial(k) / factorial(n - k);
    }
    else
    {
        return 0;
    }
}

Integer power(int base, int exp)
{
    Integer n = 1;
    for (int k = 0; k < exp; ++k)
    {
        n *= base;
    }
    return n;
}

Integer eulerian(int n, int k)
{
    Integer r = 0;
    for (int j = 0; j < k + 2; ++j)
    {
        r += (power(-1, j) * binomial(n + 1, j) * power(k + 1 - j, n));
    }
    return r;
}

Rational total_variation_distance(int cards, int shuffles)
{
    Rational q = 0;
    for (int r = 1; r <= cards; ++r)
    {
        Rational a = Rational(
            binomial((1 << shuffles) + cards - r, cards),
            Integer(1) << (cards * shuffles)) - Rational(1, factorial(cards));
        q += (eulerian(cards, r - 1) * (a < 0 ? -a : a));
    }
    return q / 2;
}

int main()
{
    int cards = 52;
    for (int shuffles = 0; shuffles <= 15; ++shuffles)
    {
        std::cout << shuffles << " " <<
            total_variation_distance(cards, shuffles).to_double() << std::endl;
    }
}

The following figure shows the results. Total variation distance ranges from a maximum of one (between discrete distributions with disjoint support) to a minimum of zero, in this case corresponding to an exactly uniform distribution of arrangements of the deck.

Total variation distance vs. number of GSR riffle shuffles of a standard 52-card deck.

We can see the sharp threshold behavior, where total variation distance transitions from near one to near zero over just a few shuffles, first dropping below 1/2 at seven shuffles.

Possibly Wrong

On science, mathematics, and computing

Monthly Archives: September 2018

A lattice path puzzle

Arbitrary-precision rational arithmetic in C++