Signed overflow optimization hazards in the kernel

August 15, 2012

This article was contributed by Paul McKenney

A recent LWN article described a couple of the hazards that compiler optimizations can pose for multithreaded code. This article takes a different approach, looking at a compiler-optimization hazard that can also strike sequential code. This hazard stems from an annoying aspect of the C11 standard, namely that signed-integer overflow is undefined (Section 3.4.3).

Overflow is a consequence of the fact that a computer's native arithmetic capability is quite limited. For example, if a C program running on a typical Linux system tries adding one int variable with value 2,147,483,647 to another int with value 1, the result will be -2,147,483,648—which might surprise people who naively expect the mathematically correct value of +2,147,483,648. This deviation from mathematical correctness occurs because the machine cannot represent the correct value of 2,147,483,648 in a 32-bit twos-complement integer. Therefore, any attempt to compute this number without help from software will result in overflow.

Quick Quiz 1: Yecch!!! Why can't CPU designers come up with something better?
Answer

The number -2,147,483,648 is “unusual” in that adding it to itself (again, using twos-complement 32-bit integers) results in zero. Furthermore, it is its own negative: negating -2,147,483,648 results in -2,147,483,648. Therefore, this weird number is (1) equal to half of zero and (2) both positive and negative but (3) not equal to zero. This weirdness earned this number a special mention in the C standard, which says that its handling is implementation-defined (Section 6.2.6.2, Paragraph 2).

Unfortunately, relying on signed integer overflow for both normal and weird values is extremely convenient when working with free-running counters. For example, suppose that our program is dealing with a succession of work items, each designated by an integer. Suppose further that this code might be called upon to report on the past and future 25 work items. This situation will likely require a fast way to distinguish between past, current, and future work. Signed twos-complement integers make this easy. If current is the integer corresponding to the current work item,

    (current - other > 0)

will evaluate to true if the other work item is from the past. This works, even if the counter hits the maximum value and wraps around, due to the circular nature of twos-complement integers: adding one to the largest positive number that can be represented results in the smallest negative number. As a result, there is no need to write special-case code to handle counter overflows.

The simplicity of this approach has caused coders to willfully ignore the undefined nature of C signed-integer overflow since well before the dawn of Linux. For example, I was happily relying on twos-complement semantics from C signed-integer overflow in the early 1980s, and the only reason I wasn't doing so earlier was that I wasn't using C any earlier. Nor am I alone. Here are a couple of representative code fragments from version 3.5 of the Linux kernel:

    if (auth_vnode->acl_order - acl_order > 0) {

    return (int)(tcmp - __raw_readl(timer_base + MX1_2_TCN)) < 0 ?  -ETIME : 0;

The first is from afs_cache_permit() in the AFS filesystem, which is using this pattern to sort out the order of events in a distributed filesystem. The second example is from mx1_2_set_next_event() in the ARM architecture, which is using a variation on this theme to determine whether the requested event time really is in the future. Here the actual subtraction is unsigned, but the result is cast to a signed integer. Because unsigned longs are always positive, the only way that the result can be negative (when interpreted as a signed value) is overflow, which the compiler is permitted to assume never happens. The compiler is therefore within its rights to unconditionally evaluate the test as false and return zero, which might fatally disappoint the caller.

In addition, there used to be several instances of this pattern in the Linux kernel's RCU implementation, where it was used to figure out whether a given request had been implicitly satisfied due to the efforts undertaken to fulfill concurrent requests of the same type. These have since been converted to use unsigned arithmetic using the technique described below.

One might well ask: is there really a problem here? All systems running Linux are twos complement, so we really should not worry about clauses in the C standard designed to handle the wider variety of arithmetic that was available a few decades ago, right?

Unfortunately, wrong. The C compiler can and does make use of undefined behavior when optimizing. To see this, consider the following code:

     1 long long_cmp_opt(const int a, const int b)
     2 {
     3   if (a > 0) {
     4     do_something();
     5     if (b < 0) {
     6       do_something_else();
     7       if ((a - b) > 0)
     8         do_another_thing();
     9     }
    10   }
    11 }

At line 7 the compiler knows that the variable a is positive and the variable b is negative. Therefore, ignoring the possibility of integer overflow, the compiler knows that this “if” condition will always evaluate to true, meaning that the compiler is within its rights to invoke do_another_thing() unconditionally, without actually doing the subtraction and comparison. In contrast, if a is (say) 2,147,483,647 and b is -2,147,483,648, the unoptimized code would avoid invoking do_another_thing(). Therefore, this optimization has significantly changed the program's behavior.

Quick Quiz 2: But just how often is the compiler going to know the sign of both the values???
Answer

Of course, in real life, overflow really can occur. But because the C standard says that signed overflow is undefined, the compiler is permitted to do whatever it wishes in the overflow case. And GCC 4.6.1 really does omit the subtraction and comparison when compiling this example for x86 at optimization levels of -O2 or higher.

Fortunately for the Linux kernel, GCC will generate the subtraction and comparison for -O1 or less. But optimizations can migrate to lower optimization levels over time, and there may come a time when either performance or energy-efficiency considerations motivate the Linux kernel to move to higher optimization levels. If that happens, what can be done?

Quick Quiz 3: First you were talking about overflowing, now about wrapping. Consistent terminology, please?
Answer

One approach is to move to unsigned integers for free-running counters. The C standard defines unsigned integers to use modular arithmetic, so that wrapping the counter is fully defined (Section 6.2.5 Paragraph 9).

Of course, checking for counter wrap must be done differently. For purposes of comparison, here is the (undefined) signed version:

    if ((a - b) < 0)

And here is the corresponding version for unsigned long types:

    if (ULONG_MAX / 2 < a - b)

This version relies on the fact that, bit for bit, twos-complement addition and subtraction are identical to their unsigned counterparts. Now, the bitwise representation of the constant ULONG_MAX / 2 is a zero bit followed by all one-bits, which is the largest value that does not have the most-significant bit set. Therefore, if the result of computing a - b is greater than this constant, we know that this result has its uppermost bit set. Because the uppermost bit is the sign bit for twos-complement numbers, we are guaranteed that the signed and unsigned versions compute identical results.

Of course, the unsigned version is more characters to type, but that is what inline functions and C-preprocessor macros are for. But what about the code that the compiler generates? After all, the Linux kernel absolutely does not need extra instructions loading large constants for each comparison!

The good news is that GCC actually generates exactly the same code for both of the above versions when compiled with -O1 on both x86 and PowerPC:

   /* x86 code. */
      e:	8b 44 24 04          	mov    0x4(%esp),%eax
     12:	2b 44 24 08          	sub    0x8(%esp),%eax
     16:	c1 e8 1f             	shr    $0x1f,%eax

   /* PowerPC code. */
     1c:   7c 64 18 50     subf    r3,r4,r3
     20:   78 63 0f e0     rldicl  r3,r3,1,63

Of course, there will be times when the Linux kernel absolutely must rely on undefined behavior. However, this is not one of those times: As shown above, there are straightforward ways to avoid relying on signed integer overflow. Removing the kernel's reliance on signed integer overflow could avoid our getting burned by increasingly aggressive optimization, and might further allow use of higher optimization levels to improve performance and battery lifetime. So it is not too early to start future-proofing the Linux kernel by removing its reliance on signed integer overflow!

Answers to Quick Quizzes

Quick Quiz 1: Yecch!!! Why can't CPU designers come up with something better?

Answer: CPU designers have come up with a variety of schemes over the decades. However, in my experience, each scheme has its own peculiarities. I have used ones complement and twos complement, and dealing with the peculiarities of twos complement proved easier for me than those of ones complement.

That said, I suspect that the dominance of twos complement was not due to ease of use, but rather due to the fact that it allows a single hardware adder to perform both signed and unsigned computations.

Back to Quick Quiz 1.

Quick Quiz 2: But just how often is the compiler going to know the sign of both the values???

Answer: The more inline functions we add, the higher the probability that the compiler will be able to infer all sorts of things about the values in question, including their sign. And it only takes one unwanted optimization for the Linux kernel to fail.

Back to Quick Quiz 2.

Quick Quiz 3: First you were talking about overflowing, now about wrapping. Consistent terminology, please?

Answer: Interestingly enough, the C standard does not define overflow for unsigned integers. Instead, it defines the unsigned integral types to use modular arithmetic so as to eliminate the possibility of overflow. Aside from things like division by zero, that is. The term “wrap” works regardless.

Back to Quick Quiz 3.

Index entries for this article
Kernel	GCC
GuestArticles	McKenney, Paul E.

to post comments

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 3:40 UTC (Thu) by gmaxwell (guest, #30048) [Link] (14 responses)

It's worth pointing out that the optimizations resulting from "signed counters can't overflow" can be pretty useful, since it can allows the compiler to figure out loop sizes e.g. "This loop runs either 8 times exactly, or it overflows; so it runs 8 times— unrolling and vectorizing".

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 21:53 UTC (Thu) by daglwn (guest, #65432) [Link] (3 responses)

Bingo. This is also why it's particularly bad to use unsigned loop counters.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:19 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

So the kernel's current use of -fno-strict-overflow is presumably disabling a number of optimizations. How much performance would you two guess that the kernel is giving up?

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:45 UTC (Thu) by gmaxwell (guest, #30048) [Link] (1 responses)

It's hard to say—

First, for code in the kernel, I assume that vectorizing is off the table because IIRC the kernel doesn't normally use the XMM registers to avoid saving them... so thats a chunk of the benefit that wouldn't exist in the kernel.

Branch prediction is often effective enough to make the looping cost small. But not always. I've seen easily measurable gains in numerical code performance just from reorganizing things so the compiler could find limits and unroll, but it doesn't always matter.

The most obvious thing to do would be to compile the kernel without the switches and hope that it runs long enough to run a few benchmarks. :)

Another issue is that overflow is _usually_ a bug where it exists (Yes, there are plenty of times where a programmer is using it intentionally but they are far less common than cases where an overflow is a bug). By preferring to use signed values and preventing overflow where you won't handle it, and using unsigned only where you must for range or where you intend overflow you make tools that catch bugs with dynamic instrumentation of overflow far more powerful. ( E.g. http://embed.cs.utah.edu/ioc/ ). Though since no one is applying these sorts of tools to the kernel I guess it doesn't matter currently. :)

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 17:20 UTC (Sat) by ppisa (subscriber, #67307) [Link]

It would worth to compile and run kernel for MIPS with GCC option -ftrapv, if its actual GCC implementation is not broken in the current GCC version. MIPS has wrapping (addu, addiu) and signed overflow generation (add, addi) variants of the instructions. But wrapping variants are used even for signed types to keep compatibility with usual C code writeup manners.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:47 UTC (Thu) by cmccabe (guest, #60281) [Link] (9 responses)

I really doubt that using -fnowrapv or the equivalent provides much of a performance boost in practice.

Most of the examples I've seen have been of the form "aha! I can (incorrectly) assume that this if statement that the programmer put here is a no-op!" In all of these cases I've seen, the "optimization" is something that a good programmer could have and probably would have done manually anyway.

I'd really like to see some real-world performance numbers about the effects of this optimization. Based on everything I've seen so far, this is a case where compiler writers got so carried away considering what they _could_ do, that they didn't stop to think if they _should_.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 23:10 UTC (Thu) by cmccabe (guest, #60281) [Link] (2 responses)

I still feel a little confused by this. If you have a loop like this:

> int i;
> for (i = 0; i < 5; i++) {
> [expression not involving i]
> }

I don't see how -fnowrapv will prevent you from unrolling the loop. You know that starting at 0 and adding 1 will get you to 5 before it will get you to overflow.

I guess you could come up with a scenario where you don't know the initial value of the counter, but you do have a constant positive increment and a set upper limit. So something like this:

> for (i = function_from_another_translation_unit(); i < 5; i++) {
> [expression not involving i]
> }

Even in this case, though, you still know that either you'll run the loop no times, or there will be no overflow. You have to get to something like this to start seeing a problem:

> for (i = function_from_another_translation_unit(); i < 0x7ffffffe; i += 2) {
> [expression not involving i]
> }

This is pretty exotic scenario, though. If i starts off as 0x7fffffff and -fwrapv is enabled, the loop will never terminate, whereas with -fnowrapv it's undefined. To me, this feels like buggy code in the first place, so I don't really care if it's not optimized.

Am I missing something here? Is there a good example of a real-world scenario where -fnowrapv helps well-written code?

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 22:56 UTC (Sat) by jzbiciak (guest, #5246) [Link] (1 responses)

Where I've heard it coming up is when you have code that effectively looks like this:

    for (i = x; i < x + 16; i++)
    {
        /* code that does not modify either x or i */
    }

The compiler wants to know it's safe to assume this loop goes around 16 times. That isn't true if "x + 16" could overflow.

Signed overflow optimization hazards in the kernel

Posted Aug 20, 2012 7:22 UTC (Mon) by jezuch (subscriber, #52988) [Link]

> The compiler wants to know it's safe to assume this loop goes around 16 times. That isn't true if "x + 16" could overflow.

If you want to make it explicit, there's a flag for this in GCC: -funsafe-loop-optimizations (along with -Wunsafe-loop-optimizations if you want to know when it happens; it's a warning, though, so beware of build environments which insist on -Werror). AFACT there are two cases handled by this flag: assuming that the loop counter does not overflow, and assuming that the loop is not infinite. Don't know the exact implementation details, though.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 23:10 UTC (Thu) by klossner (subscriber, #30046) [Link]

Sure, a good programmer should have done that. But the C compiler is often presented with automatically-generated code that has never been seen by human eyes, for example after several levels of macro expansion have done their work. Folding out the resulting dead code is often a substantial win.

Signed overflow optimization hazards in the kernel

Posted Aug 17, 2012 19:09 UTC (Fri) by iabervon (subscriber, #722) [Link]

int find(char *s, char c, int lim)
{
  for (int i = 0; i != lim && s[i]; i++)
    if (s[i] == c)
      return i;
  return -1;
}

int main()
{
  find("foo", 'o', 1);
  find("foo", 'o', -1);
}

If the compiler inlines the second call, it can drop the "i != lim" test by assuming that overflow is impossible.

Signed overflow optimization hazards in the kernel

Posted Aug 20, 2012 16:45 UTC (Mon) by daglwn (guest, #65432) [Link] (3 responses)

You don't believe vectorization can significantly speed up code?

Signed overflow optimization hazards in the kernel

Posted Aug 21, 2012 7:58 UTC (Tue) by jezuch (subscriber, #52988) [Link] (2 responses)

As someone who insists on rebuilding some of Debian's packages for my own machine (for fun and not for profit), I can tell that vectorization very rarely has any significant impact. Yes, it can provide a great boost in some synthetic benchmarks or maybe in HPC code (but then, HPC code relies more on optimization by hand and leaves little to the compiler, as I understand it), but very few loops in real-world code are well-formed enough to be suitable for auto-vectorization. I won't make your browser visibly faster :)

Signed overflow optimization hazards in the kernel

Posted Aug 21, 2012 14:30 UTC (Tue) by daglwn (guest, #65432) [Link] (1 responses)

> but then, HPC code relies more on optimization by hand and leaves little
> to the compiler, as I understand it)

Not true. The Intel compiler, for example, will vectorize its brains out automatically.

A code using lots of non-restrict pointers will certainly be difficult to vectorize. Such code can sometimes be auto-parallelized but I don't believe gcc has that capability.

gcc's vectorization is also pretty weak, though it is getting better. That may be part of what you're seeing.

Signed overflow optimization hazards in the kernel

Posted Aug 22, 2012 10:39 UTC (Wed) by jezuch (subscriber, #52988) [Link]

> A code using lots of non-restrict pointers will certainly be difficult to vectorize. Such code can sometimes be auto-parallelized but I don't believe gcc has that capability.

It sorta-kinda does (-ftree-parallelize-loops) but I haven't tested it much. But I expect it to be even weaker than vectorization.

> gcc's vectorization is also pretty weak, though it is getting better. That may be part of what you're seeing.

It may be that. Or it may be that most of the code in non-HPC world does not lend itself to vectorization easily. I actually don't know, I'm just an amateur and hobbyist :)

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 5:32 UTC (Thu) by jmspeex (subscriber, #51639) [Link] (5 responses)

> This hazard stems from an annoying aspect of the C11 standard, namely that signed-integer overflow is undefined (Section 3.4.3).

Actually, signed-integer overflow was undefined long before C11. It was definitely undefined in C99 and I'm pretty sure it also was in C89. After all, there *are* machines that handle it in different ways. There's (nearly extinct) one's complement and two's complement, but there's also saturating arithmetic that many DSPs use, where (for 16-bit type), 32767+1 = 32767.

Also, when it comes to undefined arithmetic, there's a lot more to worry about. For example, shifting *by* a negative value is undefined. So is shifting a negative value left (even if the shift is by a positive number of bits). Fortunately, clang now has a feature that can add run-time checks to your source code and detect these undefined arithmetic operations.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 21:10 UTC (Thu) by daney (guest, #24551) [Link] (3 responses)

> Actually, signed-integer overflow was undefined long before C11.
> It was definitely undefined in C99 and I'm pretty sure it also was in C89.

The first edition of K & R explicitly states the signed-integer overflow is undefined. This has carried through to the present.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:48 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

Understood. And when it was suggested within the C11 Standards committee that signed-integer overflow be given twos-complement semantics, the discussion was both emphatic and short. ;-)

Signed overflow optimization hazards in the kernel

Posted Aug 17, 2012 16:07 UTC (Fri) by josh (subscriber, #17465) [Link] (1 responses)

What did the arguments against it say, other than "that would remove compiler optimization possibilities"?

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 18:19 UTC (Sat) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

One objection was that there really are still non-twos-complement machines in common use. As was noted by the comment to this article discussing saturating adders, where 32767+1==32767. But this would be addressed by "implementation defined" rather than "undefined".

Another objection was that there are systems still in common use that trap on signed integer overflow. If the C standard required wrapping, compilers for such systems would require special edge-case checks on pretty much any signed integer operation.

And there was of course also the objection that signed integer overflow always has been undefined. ;-)

Signed overflow optimization hazards in the kernel

Posted Aug 17, 2012 19:16 UTC (Fri) by gmaxwell (guest, #30048) [Link]

JM, the clang integer overflow checker we've used on our projects isn't part of clang proper, it's a (very useful) patch: http://embed.cs.utah.edu/ioc/

Beyond the optimization possibilities, the existence of tools like this is also a reason for keeping the undefined behavior, e.g. continue using signed values for counters that don't need the extra unsigned range: Most of the time overflow that you didn't expect (and thus couldn't wrap in a casting macro) is a sign of a logic error. By keeping it invalid you gain the possibility of dynamic instrumentation to catch those errors.

(Though I don't know if anyone has managed to get tools like this working with the kernel yet!)

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 7:13 UTC (Thu) by stevenb (guest, #11536) [Link] (1 responses)

Why not just use -fwrapv in the kernel?

`-fwrapv'
This option instructs the compiler to assume that signed arithmetic
overflow of addition, subtraction and multiplication wraps around
using twos-complement representation. This flag enables some
optimizations and disables others. This option is enabled by
default for the Java front-end, as required by the Java language
specification.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 13:28 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Very cool, thank you for the info! Very welcome information, especially in light of the fact that the kernel compiles at -O2, not -O1 as I incorrectly stated. Apparently -fwrapv has also made its way into other compilers' command lines as well, including that of clang.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 7:25 UTC (Thu) by hobocken (guest, #85839) [Link] (4 responses)

»Fortunately for the Linux kernel, GCC will generate the subtraction and comparison for -O1 or less. But optimizations can migrate to lower optimization levels over time, and there may come a time when either performance or energy-efficiency considerations motivate the Linux kernel to move to higher optimization levels. If that happens, what can be done?«

I don't understand the paragraph above. The kernel is already compiled
with -O2 or -Os.
From the top level Makefile:

ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os
else
KBUILD_CFLAGS += -O2
endif

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 9:02 UTC (Thu) by ken (subscriber, #625) [Link] (3 responses)

Exactly I noticed that to. It's been -O2 as far as I can remember.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 13:19 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

And you are absolutely right: -O2 and no sign of -fwrapv. Perhaps this is more urgent than I was thinking...

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 13:58 UTC (Thu) by abatters (✭ supporter ✭, #6932) [Link] (1 responses)

http://marc.info/?t=124806309500001&r=1&w=2

The kernel currently uses -fno-strict-overflow because -fwrapv was buggy in some versions of gcc. Unfortunately -fno-strict-overflow is also buggy in some other versions of gcc. Depending on the version of gcc I compile with, I have to patch the makefile to use one or the other.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:49 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Whew! It goes back to being non-urgent. ;-)

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 14:04 UTC (Thu) by baldrick (subscriber, #4123) [Link] (12 responses)

> The second example is from mx1_2_set_next_event() in the ARM architecture,
> which is using a variation on this theme to determine whether the
> requested event time really is in the future. Here the actual subtraction
> is unsigned, but the result is cast to an signed integer. Because unsigned
> longs are always positive, the only way that the result can be negative
> (when interpreted as a signed value) is overflow, which the compiler is
> permitted to assume never happens. The compiler is therefore within its
> rights to unconditionally evaluate the test as false and return zero,
> which might fatally disappoint the caller.

As far as I know this is wrong: there is no problem in the second example. The misunderstanding is in "Because unsigned longs are always positive, the only way that the result can be negative (when interpreted as a signed value) is overflow, which the compiler is permitted to assume never happens". Since the numbers have unsigned type, subtracting them always has a well defined result. The fact that when you think of these numbers as being signed you see in your head that a signed overflow must have occurred is of no relevance. The cast to a signed type is also always well defined. In fact I would say the right way to avoid this issue is to copy this example and do something like
(signed) ((unsigned)x - (unsigned)y) < 0
rather than do the baroque comparison recommended by the article.

For more on this kind of thing I recommend http://blog.regehr.org/archives/213

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:46 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (7 responses)

Hmmm... If I compile the following with gcc 4.6.1 with -O2:

unsigned long long signed_cast(unsigned long long a, unsigned long long b)
{
        return (signed)((unsigned)a - (unsigned)b);
}

The following x86 object code is generated:

 120:   8b 54 24 04             mov    0x4(%esp),%edx
 124:   2b 54 24 0c             sub    0xc(%esp),%edx
 128:   89 d0                   mov    %edx,%eax
 12a:   c1 fa 1f                sar    $0x1f,%edx
 12d:   c3                      ret

This is a 32-bit subtraction on 64-bit quantities. And when I plugged 2^32=4294967296 for a and 2^31-1=2147483647 for b, a-b evaluated to 2147483649 (as expected), but signed_cast(a, b) evaluated to -2147483647.

So unless I misunderstood what you are suggesting, I will be sticking with the baroque comparisons for RCU.

Signed overflow optimization hazards in the kernel

Posted Aug 17, 2012 2:20 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Though I could make the macros type-generic by (ab)using sizeof(). Not sure it is worth it, though.

Signed overflow optimization hazards in the kernel

Posted Aug 17, 2012 5:05 UTC (Fri) by jimparis (guest, #38647) [Link] (3 responses)

Hmmm... If I compile the following with gcc 4.6.1 with -O2:
unsigned long long signed_cast(unsigned long long a, unsigned long long b)
{
        return (signed)((unsigned)a - (unsigned)b);
}
...
This is a 32-bit subtraction on 64-bit quantities.

Well, yeah, that's because "(signed)" is casting 32-bit when compiled with -m32. I might be misunderstanding your issue here but it seems you wanted:

unsigned long long signed_cast(unsigned long long a, unsigned long long b)
{
        return (signed long long)((unsigned long long)a - (unsigned long long)b);
}

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 18:07 UTC (Sat) by ppisa (subscriber, #67307) [Link] (2 responses)

I hope that behavior of unsigned to signed conversion stays defined (at least for GCC and GCC replacement compilers - LLVM, Intel atc.). GCC defines behavior in current manual version

http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation...

I use ((signed type)((unsigned type)a - (unsigned type)b) > 0) often in our embedded code and in the fact I am probably author/coauthor of that MX1 example - if that line was not not rewritten by somebody.

Paul's example behavior is correct according to my knowledge (unsigned) is equivalent to (unsigned int) ie on 32 bit target 32 bit subtraction is evaluated and then sign is extended to 64 bits when conversion to (signed it) and then 64 bit signed type is required.

What is common trap is that plain

(unsigned type) a - (unsigned type) b

is not considered (type) nor (signed type). It is signed but interpreted as so big signed to hold additional bit if it is compared with zero. So additional cast to signed type same or smaller than both inputs (a and b) has to be used.

I use next mechanism to allow cyclic comparison between different
hardware, position, time, state generation counters etc in our code.

http://ulan.git.sourceforge.net/git/gitweb.cgi?p=ulan/ulut;...

Library is licensed GPL, LGPL, MPL, but code fragment can be taken as public domain, if it helps somebody.

#ifndef ul_cyclic_gt
#define ul_cyclic_gt(x,y) \
((sizeof(x)>=sizeof(long long))&&(sizeof(y)>=sizeof(long long))? \
(long long)((unsigned long long)(x)-(unsigned long long)(y))>0: \
(sizeof(x)>=sizeof(long))&&(sizeof(y)>=sizeof(long))? \
(long)((unsigned long)(x)-(unsigned long)(y))>0: \
(sizeof(x)>=sizeof(int))&&(sizeof(y)>=sizeof(int))? \
(int)((unsigned int)(x)-(unsigned int)(y))>0: \
(sizeof(x)>=sizeof(short))&&(sizeof(y)>=sizeof(short))? \
(short)((unsigned short)(x)-(unsigned short)(y))>0: \
(signed char)((unsigned char)(x)-(unsigned char)(y))>0 \
)
#endif /*ul_cyclic_gt*/

#ifndef ul_cyclic_ge
#define ul_cyclic_ge(x,y) \
((sizeof(x)>=sizeof(long long))&&(sizeof(y)>=sizeof(long long))? \
(long long)((unsigned long long)(x)-(unsigned long long)(y))>=0: \
(sizeof(x)>=sizeof(long))&&(sizeof(y)>=sizeof(long))? \
(long)((unsigned long)(x)-(unsigned long)(y))>=0: \
(sizeof(x)>=sizeof(int))&&(sizeof(y)>=sizeof(int))? \
(int)((unsigned int)(x)-(unsigned int)(y))>=0: \
(sizeof(x)>=sizeof(short))&&(sizeof(y)>=sizeof(short))? \
(short)((unsigned short)(x)-(unsigned short)(y))>=0: \
(signed char)((unsigned char)(x)-(unsigned char)(y))>=0 \
)
#endif /*ul_cyclic_ge*/

Please, if you know about some target, compiler or intention to break assumption (at least hopefully current GCC version guarantee) that unsigned to signed conversion reinterprets MSB as a sign. As for correctness of the code, there could be problem if target specifies some basic arithmetic type with some bits unused. It short 16 bit but stored in 32 bit entity. But none of our targets has that problem.

Code is used in many targets, some of safety grade class applications so notice of possible (even future) breakage is critical for me and users.

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 19:02 UTC (Sat) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (1 responses)

Cool, that was the sort of thing I was thinking of with my "sizeof()" earlier, though I still do feel more comfortable with using the constants than relying on casting.

But why the casts to unsigned integral types? I would instead have expected a requirement that the caller's cyclic arithmetic be carried out in unsigned integers, so that the casts were unnecessary.

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 22:34 UTC (Sat) by ppisa (subscriber, #67307) [Link]

Hmm, cast to unsigned used to work even for signed types and in practice works still. a+=20 is translated into single add instruction on all targets I know. The other reason for casting is, that sometimes you can strore in object only shorter part of time stamp or generation counter, if you know, that live period is small enough or if you only check for change. Casting both to smaller of the two makes subtraction possibly cheaper, the result has to be casted to smaller one anyway.

But main reason for casting to ensure thing works on existing code
with signed types.

Best wishes,

Pavel

Signed overflow optimization hazards in the kernel

Posted Aug 19, 2012 17:00 UTC (Sun) by baldrick (subscriber, #4123) [Link] (1 responses)

By "signed" I meant "signed integer type of the same size" and by "unsigned" I meant "unsigned integer type of the same size". The "same size" means: the same number of bits as the original integer type, so in the case of example 2 this means "signed long long" and "unsigned long long". Sorry for not being clear.

Signed overflow optimization hazards in the kernel

Posted Aug 20, 2012 0:44 UTC (Mon) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

No problem!

I must admit that it would be nice if "(signed typeof(a))" and (unsigned typeof(b))" flipped the signedness of "a" and "b", but my version of gcc really doesn't like either variant. ;-)

Signed overflow optimization hazards in the kernel

Posted Aug 17, 2012 6:45 UTC (Fri) by wahern (subscriber, #37304) [Link] (3 responses)

You have that in reverse. Conversion to unsigned is always well defined. Conversion to signed where the value cannot be represented is implemented-defined:

C99 6.3.1.3 Signed and unsigned integers

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Signed overflow optimization hazards in the kernel

Posted Aug 17, 2012 21:57 UTC (Fri) by pdewacht (subscriber, #47633) [Link] (2 responses)

But given that Linux is only intended to be compiled by gcc, we can rely on its implementation-defined behavior:

The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.

(and I don't see how any compiler for a two's complement computer could define different behavior.)

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 19:48 UTC (Sat) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

True enough!

However, the Linux kernel's code can be legitimately used in any GPLv2 project, including those that might run on systems with non-twos-complement signed integer arithmetic. This sharing of code among compatibly licensed projects is a very good thing, in my view.

Which means in this case, where there is a solution that meets the C standard, and which loses nothing by doing so (at least on x86 and Power), it only makes sense to use that C-standard solution.

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 23:58 UTC (Sat) by giraffedata (guest, #1954) [Link]

The result of ... converting an integer to a signed integer type when the value cannot be represented in an object of that type ...
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type;

I must be reading that wrongly, because that's not at all what GCC does. With -m32, int is a signed integer type of width 32. UINT_MAX reduced modulo 2^32 is UINT_MAX, which is not within the range of int. So this does not describe what (int)UINT_MAX does.

Rather, what GCC does appears to be the opposite of what the standard requires for conversion from negative number to unsigned integer (add UINT_MAX+1 until it fits): it subtracts UINT_MAX+1 until the value is within the range of int (in this case -1).

About undefined behaviour...

Posted Aug 16, 2012 16:45 UTC (Thu) by juhl (guest, #33245) [Link] (3 responses)

For people interrested in (or concerned about) undefined behaviour, the problems it can cause and the optimizations it enables etc, I can greatly recommend reading the following articles:

What Every C Programmer Should Know About Undefined Behavior #1/3
What Every C Programmer Should Know About Undefined Behavior #2/3
What Every C Programmer Should Know About Undefined Behavior #3/3
A Guide to Undefined Behavior in C and C++, Part 1
A Guide to Undefined Behavior in C and C++, Part 2
A Guide to Undefined Behavior in C and C++, Part 3
Sequence Points and Expression Evaluation in C++

About undefined behaviour...

Posted Aug 16, 2012 22:50 UTC (Thu) by hummassa (guest, #307) [Link]

Great links! Thank you very much!!!

About undefined behaviour...

Posted Aug 16, 2012 22:52 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

Thank you, highly recommended!

About undefined behaviour...

Posted Aug 23, 2012 16:03 UTC (Thu) by pwood (guest, #86373) [Link]

John Regehr's blog posts on undefined behaviour are always worth reading (his research group developed the IOC patch to clang that a couple of people have linked to above.) In the context of integer behaviour in C he has a quiz here which is worth a look. I'm not sure that you ever want to rely on undefined behaviour as suggested in the article as it means that the execution of your program is undefined and can change at the compiler's will. That is very different from relying on implementation defined behaviour.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 23:19 UTC (Thu) by klossner (subscriber, #30046) [Link]

» I suspect that the dominance of twos complement was not due to ease of use, but rather due to the fact that it allows a single hardware adder to perform both signed and unsigned computations. «

My recollection of those days is that the greater motivation was to get away from systems with two different representations of zero. There were ancient Fortran codes run on CDC mainframes that had to test results for both postive and negative zero.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 23:34 UTC (Thu) by mmorrow (guest, #83845) [Link]

Here's my favorite undefined-signed-overflow example:

  int f(int x){while(x < x + 1) x += 1; return x;}

gives:

  f:
  .L2:
    jmp .L2
    .ident  "GCC: (GNU) 4.8.0 20120408 (experimental)"

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 22:51 UTC (Sat) by jzbiciak (guest, #5246) [Link] (1 responses)

This seems like it argues for a macro to hide the ugliness. Name it something obvious such as "before()". It makes the intent obvious and ensures the ugly math is done correctly. Something like:

/* return true if timestamp 'a' is earlier than timestamp 'b' */
#define before(a,b) (ULONG_MAX/2 > ((unsigned long)(a) - (unsigned long)(b)))

Or is this just an invitation to more problems?

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 23:04 UTC (Sat) by jzbiciak (guest, #5246) [Link]

I see I missed ppisa's impressive macros upthread. Those certainly seem to take into account a range of integer types as well!

Signed overflow optimization hazards in the kernel

Posted Aug 20, 2012 8:59 UTC (Mon) by etienne (guest, #25256) [Link] (2 responses)

> So it is not too early to start future-proofing the Linux kernel by removing its reliance on signed integer overflow!

Would be nice to have a GCC option for ia32 so that any "add" used for signed arithmetic is followed with "into" (exception if overflow).
It will not slow too much the execution ("into" will not be predicted as a taken jump), and could help locate potential problems...

Signed overflow optimization hazards in the kernel

Posted Aug 22, 2012 8:51 UTC (Wed) by mpr22 (subscriber, #60784) [Link] (1 responses)

-ftrapv is an architecture-independent compiler option to GCC. I don't know whether it works.

Signed overflow optimization hazards in the kernel

Posted Aug 22, 2012 13:21 UTC (Wed) by etienne (guest, #25256) [Link]

Seems like -ftrapv can work, but it is far from just adding an "into" instruction after each signed "add" on ia32 - note that "into" has disappeared in amd64 instruction set.

$ cat test.c
int a, b, c;
void main (void) {
c = a + b;
}
$ gcc -m32 -O2 -ftrapv test.c
$ objdump -d a.out

a.out: file format elf32-i386
....
080483f0 <main>:
push %ebp
mov %esp,%ebp
and $0xfffffff0,%esp
sub $0x10,%esp
mov 0x804a01c,%eax
mov %eax,0x4(%esp)
mov 0x804a020,%eax
mov %eax,(%esp)
call 8048420 <__addvsi3>
mov %eax,0x804a024
leave
ret
....
08048420 <__addvsi3>:
push %ebp
mov %esp,%ebp
push %ebx
sub $0x4,%esp
mov 0xc(%ebp),%ecx
mov 0x8(%ebp),%edx
call 804845c <__i686.get_pc_thunk.bx>
add $0x1bc2,%ebx
test %ecx,%ecx
lea (%ecx,%edx,1),%eax
js 8048450 <__addvsi3+0x30>
cmp %edx,%eax
setl %dl
test %dl,%dl
jne 8048457 <__addvsi3+0x37>
add $0x4,%esp
pop %ebx
pop %ebp
ret
xchg %ax,%ax
cmp %edx,%eax
setg %dl
jmp 8048444 <__addvsi3+0x24>
call 80482fc <abort@plt>

Without ftrapv:
080483c0 <main>:
mov 0x804a01c,%eax
add 0x804a018,%eax
push %ebp
mov %esp,%ebp
mov %eax,0x804a020
pop %ebp
ret

Signed overflow optimization hazards in the kernel

Posted Aug 23, 2012 9:47 UTC (Thu) by reddit (guest, #86331) [Link]

The sensible way to do this is (int)(a - b) < 0 where a and b are unsigned.

Signed overflow optimization hazards in the kernel

Posted Aug 23, 2012 9:54 UTC (Thu) by georgm (subscriber, #19574) [Link] (1 responses)

A question:

What is the reason to write "if((a-b) < 0) ..." instead of just "if(a<b)"?

Signed overflow optimization hazards in the kernel

Posted Aug 23, 2012 10:28 UTC (Thu) by georgm (subscriber, #19574) [Link]

The question concerns signed values, where wraps are not common. Unsigned benefit is clear.