November | 2013 | Possibly Wrong

While recently helping to fix some buggy simulation software, I encountered a nasty problem involving comparison of floating-point values. I was vaguely aware of the potential for this sort of problem to occur, but this was the first time that I had actually seen it in the wild.

The program was an event-driven simulation, with a periodic “update” event that was supposed to occur, well, periodically (5 times per simulated second). But it would consistently hang after a couple of minutes, getting stuck in an infinite loop because the periodic update event stopped triggering. Here is what the code looked like, with the actual time values that caused the problem:

double curr_time = 128.1253990141936;
double prev_time = 127.9253990141936;
if (curr_time >= prev_time + 0.200) {
    update();
} else {
    // We shouldn't be here.
}

The if-condition was expected to evaluate to true, indicating that enough time had passed since the previous update. But the condition failed, thus preventing the update and associated time advance. See if you can identify the cause of the problem.

At this point, the usual from-the-hip diagnosis is, “Floating-point arithmetic is only an approximation; you should never test values for equality, only for differences within some tolerance. You should probably read What Every Computer Scientist Should Know About Floating-Point Arithmetic, by David Goldberg.”

This is mostly good advice… but when reading online discussions about this issue, I often get the impression that part of Goldberg’s story is missed. That is, it is certainly important to recognize that floating-point values have limited precision, so that arithmetic and comparisons that we might expect to be valid for real numbers on a cocktail napkin are not valid when evaluated using limited floating-point precision. For example, evaluating the expression $1.1+2.2==3.3$ typically yields false, since none of the three values can be represented exactly as a rational number with a denominator that is a power of 2. So instead of the equality you want, the computer is instead thinking:

$\frac{2476979795053773}{2251799813685248} + \frac{2476979795053773}{1125899906842624} \neq \frac{3715469692580659}{1125899906842624}$

But in the case of the simulation bug described above, this wasn’t the problem. To see this, let’s add some debugging output to the program:

double curr_time = 128.1253990141936;
double prev_time = 127.9253990141936;
if (curr_time >= prev_time + 0.200) {
    update();
} else {
    double test_time = prev_time + 0.200;
    std::printf("curr_time = %a\n", curr_time);
    std::printf("test_time = %a\n", test_time);
}

Output:
curr_time = 0x1.004034p+7
test_time = 0x1.004034p+7

The two values being compared are exactly equal! That is, both the current time (128.1253990141936) and the test time (the sum 127.9253990141936+0.200) are represented by exactly the same 64 bits in the underlying IEEE-754 double-precision representation.

So what’s going on? If two floating-point values are indeed equal, how can they fail a weak inequality comparison?

It turns out that the problem is not that precision is limited, but that precision is varying, and varying in ways that the programmer cannot predict or control. In the if-condition, the compiler has decided to evaluate the sum on the right-hand side in extended precision (something more than 64 bits, probably 80 or 96), and then perform the comparison, still in that extended precision register, without first rounding back down to 64-bit double precision. As a result, those additional extended-precision bits make the sum slightly greater than the “truncated” double-precision value of the current time on the left-hand side.

This can be a particularly nasty problem to troubleshoot, because the behavior is very dependent on platform, compiler, and even “code context.” (In this case, the simulation was running on Linux using gcc 3.4.4. I have been unable to reproduce the problem anywhere else.) By “code context” I mean that the compiler has a lot of leeway in determining if/when to jump back and forth between extended and double precision, and those decisions can change depending on surrounding code! For example, we “masked” the behavior somewhat by our unsuccessful attempt at debugging output above, since the compiler is forced to eventually round the sum and store it in a double-precision variable so that we can print it.

Goldberg does describe this issue, although the reader must make it 80% of the way through a 70-page paper to find it. But the C++ FAQ also contains a section (Cline calls it the “newbie section,” but it’s really more of an attic for anecdotes without a better home) that I think does a great job of describing the problem as well.

Edit: The Random ASCII blog has a great series of articles on floating-point issues. The one most directly relevant to the problem discussed here is titled “Floating-Point Determinism,” in particular the section on composing larger expressions.

ValidSudokuQ[s_, n_, {i_, j_}] := And @@ Map[ Count[#, n] <= 1 &, { s[[i, All]], s[[All, j]], Partition[s, {3, 3}][[Ceiling[i/3], Ceiling[j/3]]] // Flatten } ] $RecursionLimit = Infinity; SolveSudoku[s_] := Module[ {p, r, n}, p = Position[s, 0]; If[p === {}, Print[s], p = First[p]; Do[ r = ReplacePart[s, n, p]; If[ValidSudokuQ[r, n, p], SolveSudoku[r]], {n, 9} ] ] ]

Possibly Wrong

On science, mathematics, and computing

Monthly Archives: November 2013

Floating-point equality: It’s worse than you think

Sudoku Solver