-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APFloat] Wrong result when truncating double to float #55838
Comments
@rotateright @efriedma-quic pinging you as people that seem familiar with that code. Not sure where to even begin looking to understand why it currently behaves as it does and what is the appropriate fix. |
Perhaps the biggest indication that this is a bug in conversion function is that if I hide conversion behind non-inlinable helper (preventing constant folding) the result is expected (0): https://godbolt.org/z/cehTrKrcj |
llvm-project/llvm/lib/Support/APFloat.cpp Line 2187 in 5fee179
|
Just a note: in C the
or
|
Here's what Alive2 has to say (https://alive2.llvm.org/ce/z/TFYgXt): %a = bitcast i32 503489155 to float
%b = fpext float %a to double
%c = fptrunc double %b to float
; Evaluate to:
float %a = #x1e02a283 (0.000000000000?)
double %b = #x3bc0545060000000 (0.000000000000?)
float %c = #x1e02a283 (0.000000000000?) |
@nunoplopes The original testcase is equivalent to something more like this:
See this alive link |
Thanks! Uhm, the output truncates precision in the optimized IR so it almost looks correct. |
@pmor, the original test case this was reduced from was in IR so had no aliasing concerns. I didn't want to bother with memcpy/std::bit_cast for a simple C++ reproducer. The problematic place in IR is more or less just a bitcast + fptrunc, like in @efriedma-quic example. |
Or in C++20, we have In this case, that seems not to be the source of the problem, since GCC and clang make minor efforts to support some basic idioms like this. But |
Looks like the wrong result only appears when |
Does anyone know/understand why this whole code is guarded by llvm-project/llvm/lib/Support/APFloat.cpp Lines 1383 to 1422 in 5fee179
I don't know about the if (exponentChange > 0) {
lostFraction lf;
/* Shift right and capture any new lost fraction. */
lf = shiftSignificandRight(exponentChange);
lost_fraction = combineLostFractions(lf, lost_fraction);
/* Keep OMSB up-to-date. */
if (omsb > (unsigned) exponentChange)
omsb -= exponentChange;
else
omsb = 0;
} so |
I haven't made sense of that code either yet, but all double denorms should be 0.0 or -0.0 when rounded to float, so maybe we can bail out sooner. And that's really any sufficiently small magnitude double - as long as the top 2 exponent bits are not set, it's going to be rounded to 0.0 if this is correct: |
"normalize" is the last step of most floating-point operations in APFloat. It ensures that the most significant bit is in the correct position, and performs rounding. "omsb" is zero if the significand is exactly zero. I assume the idea here is that if the result is zero, we can't do exponent adjustment: there is no "most significant bit" because all the bits are zero. So there's nothing to adjust, and the meaning of the lost_fraction is ambiguous; it's not clear what it supposed to be half of, since half of zero is just zero. Probably the right fix here is to change IEEEFloat::convert so when the operand is denormal, it performs a smaller shift, so at least one bit of the significand is still set. Then normalize() should do the right thing. And maybe normalize() should assert that if the signficand is exactly zero, the lost_fraction is also exactly zero. |
Hm, so normalize can't be given non-zero lost fraction with zero significand in other codepaths? |
I was thinking the same as your last option, but it's not clear to me if shifting less as Eli suggested is necessary:
|
@rotateright I'll submit a patch then, but I think lostFraction should be changed to /* The fcZero case is a denormal that underflowed to zero. */
return (opStatus) (opUnderflow | opInexact); Also, I don't know whether we can rely on |
Sounds good - I added some IR regression tests, but we might want to supplement that with unit tests for APFloat directly: |
I'd probably say it's a problem with how it's called? I mean, in practice, I don't think any of the other callers can call normalize() with both a zero significand and a lost fraction. And the expected semantics aren't really clear. I don't think it's true in general that converting a denormal to a float with narrower significand storage necessarily produces zero. Consider conversions between half and bfloat. (I don't think there's any way to write that in LLVM IR at the moment, but the APFloat API allows it.) |
A little off-topic. Why in LLVM rounding is a part of normalization?
In other compilers I see that normalization and rounding are separate operations. |
I'm not the original author... but I guess it just made sense to put them together because they both need to happen after every floating-point computation. |
For the following C(++) code Clang produces output that differs from every other major compiler (and from internal tests it was reduced from):
output:
expected:
I've pinpointed the issue to Val.convert in FPTrunc case of
llvm::ConstantFoldCastInstruction
which for0x0.000001e02a283p-1022
double returns0x1p-149
float, whereas one would expect (assuming the other compilers are correct here) returned value to be0
.Furthermore, it is represented as
float 0x36A0000000000000
in IR which may or may not be expected (if it was hex value for simple float, when there are eight extra zeroes, but if it uses double width even for regular floats, then it's ok.godbolt: https://godbolt.org/z/YM3rvY3hM
The text was updated successfully, but these errors were encountered: