BUG: Fix from_float_positional errors for huge pads#28149
Conversation
|
Hi, I'm a new contributor to NumPy and I would appreciate any guidance you may have. |
seberg
left a comment
There was a problem hiding this comment.
Introducing the error return looks great. A few comments, mainly:
- It seems a lot easier to just inline the error. It would be fine to do a single error for everything, although I would probably inline the error anyway with a
goto buffer_too_small:or so. - This fixes the error, but a lot of care needs to be taken about overflows unfortunately:
- Can be fixed with re-arranging some of the length calculations, another way might be to use
int64for some of these checks. - There are probably a lot more errors here: Every time
maxPrintLenis checked and a special path taken, the result returned is currently wrong in some way and we need to raise an/the error!
Would you be up to auditing that code as well?
- Can be fixed with re-arranging some of the length calculations, another way might be to use
For simplicity, I really suggest to just raise one single error as noted in one of the comments. In practice, no-one will ever see this error (if they did we should just support it...), so we can be lazy and worry about raising a particularly helpful error.
(I suppose the "no-one" should ever see this error makes me lean towards a RuntimeError.)
numpy/_core/src/multiarray/dragon4.c
Outdated
| /* handle overflow condition safely*/ | ||
| if (NPY_TRUE == right_overflow) { | ||
| buffer[pos] = '\0'; | ||
| return RIGHT_PADDING_OVERFLOW; |
There was a problem hiding this comment.
I don't think there is any reason to have multiple error returns here. You can just set the Python error here and return -1 as the only error value.
numpy/_core/arrayprint.py
Outdated
| many characters are to the left of the decimal point. | ||
| many characters are to the left of the decimal point. Total length of | ||
| string representation of the floating point value cannot exceed 16384, | ||
| including pad_right, if any. |
There was a problem hiding this comment.
I feel this is a lot of detail, nobody should ever see. If we want it, maybe we should put it into the Exceptions section.
There was a problem hiding this comment.
I've simplified this part and moved to a separate section below. Is this what you've meant?
numpy/_core/src/multiarray/dragon4.c
Outdated
|
|
||
|
|
||
| /* provided digits_rights is too large, can't create the string required*/ | ||
| if (pos + count > maxPrintLen) { |
There was a problem hiding this comment.
I suspect this (and likely others) can still overflow for ridiculous inputs like 2**31-1.
| if (pos + count > maxPrintLen) { | |
| if (count > maxPrintLen - pos) { |
re-arranging it should keep us safe?
There was a problem hiding this comment.
I agree. I've fixed this where it's relevant.
numpy/_core/src/multiarray/dragon4.c
Outdated
|
|
||
| /* provided digits_rights is too large, can't create the string required*/ | ||
| if (pos + count > maxPrintLen) { | ||
| right_overflow = NPY_TRUE; |
There was a problem hiding this comment.
Why continue and not set the error and return here right away, we shouldn't use the result, so it doesn't matter if the buffer is null terminated.
There was a problem hiding this comment.
Fixed according to your suggestion
numpy/_core/src/multiarray/dragon4.c
Outdated
| error_msg = "An unexpected error occurred";\ | ||
| break;\ | ||
| }\ | ||
| PyErr_SetString(error_type, error_msg);\ |
There was a problem hiding this comment.
If we set the error above, this code can stay unchanged. Also Dragon4_PrintFloat_##format is used below a second time and both must stay aligned.
We should make sure there is a test for both calls.
There was a problem hiding this comment.
I've reverted the code for this function and and switched to raising errors in FormatPositional
| assert_equal(fpos(tp('1.001'), precision=1, trim='-'), "1") | ||
|
|
||
| assert_raises_regex(ValueError, "Left padding exceeds buffer size of 16384", fpos, tp('1.047'), precision=2, pad_left=int(1e5)) | ||
| assert_raises_regex(ValueError, "Right padding exceeds buffer size of 16384", fpos, tp('1.047'), precision=2, pad_right=int(1e5)) |
There was a problem hiding this comment.
@charris what is our preferred line length? It seems a bit that the new linter setup ignores line-length?
For here: Would be nice to break it. That is easiest if you use the nicer pattern of:
with pytest.raises(..., match="..."):
(there is a lot of code not using it, but that is mostly because it predates pytest.)
There was a problem hiding this comment.
I've split it into several lines to match the expected line-length
There was a problem hiding this comment.
@seberg Preferred line length is now <= 88 for Python code. Clang-format is still at 80, but we should probably increase that. Linux is, IIRC, now <= 90 suggested, but not inforced.
There was a problem hiding this comment.
Yeah, the thing is that the linter job should have complained: I guess it just means adding a specific line length somewhere.
|
Thanks @seberg for your review. I've implemented your suggestions. Is there anything else I should add? |
seberg
left a comment
There was a problem hiding this comment.
Thanks! You missed one > maxPrintLen re-order (I think there are no more, but you could check briefly). We need to fix that and also add tests for huge input values (please do try if it crashes without the last fix!).
Otherwise, a few smaller things, it would be nice if you can refactor the test a bit more at least.
Personally, I might not document the error at all, but don't care. Either way, no need to change (I might yet decide to remove it via the "suggests" before merging).
Anyway, almost good to go in, thanks a lot!
numpy/_core/src/multiarray/dragon4.c
Outdated
| count = maxPrintLen - pos; | ||
| /* integer part too long for buffer, avoid overflow */ | ||
| if (count > maxPrintLen - pos) { | ||
| PyErr_SetString(PyExc_RuntimeError, "String buffer overflow"); |
There was a problem hiding this comment.
Small suggestion to write something that is easier to understand for Python programmers. Maybe along: "float formating result too large." (in all places of course)
numpy/_core/src/multiarray/dragon4.c
Outdated
|
|
||
|
|
||
| /* provided digits_left is too large, can't create the string required*/ | ||
| if (count + shift > maxPrintLen) { |
There was a problem hiding this comment.
This is similar to above shift is unbounded in size, so count + shift might overflow. So rephrase as count > maPrintLen - shift.
| assert_raises_regex(RuntimeError, "String buffer overflow", fpos, | ||
| tp('1.047'), precision=2, pad_left=int(1e5)) | ||
| assert_raises_regex(RuntimeError, "String buffer overflow", fpos, | ||
| tp('1.047'), precision=2, pad_right=int(1e5)) |
There was a problem hiding this comment.
Thanks for reflowing. There is one important thing to do, I think. And that is we should add a test for all of these with the padding set to np.iinfo("uint32").max because this hits those count + size > maxPrintlen overflows.
(Ideally you add the tests first and confirm that the test crashes NumPy!)
Then I have a few smaller asks/suggestions that would be great:
- I slightly prefer
10**5rather thanint(1e5). - It would be much clearer to use
with pytest.raises(...): - Very long tests aren't nice, it would be much better to split out the test into its own function. Ideally, you use
@pytest.mark.parametrizeinstead of the loop for[np.float16, np.float32, np.float64].
(You should be able to find good examples for these if you search the NumPy code.)
There was a problem hiding this comment.
I've implemented all of your suggestions:
- I've refactored the test into several smaller tests
- I've parametrized on type
- I've switched to 10**5
- I've added
np.iinfo("int32").maxas a test value,instead of np.iinfo("uint32").maxbecause the unsigned value is too large to be passed as `npy_int32' and tests detect this and fail
numpy/_core/src/multiarray/dragon4.c
Outdated
| npy_int32 count = pos; | ||
|
|
||
|
|
||
| /* provided digits_left is too large, can't create the string required*/ |
There was a problem hiding this comment.
I don't think these comments are needed here because with the Python error text, the code reason is just as clear already, IMO. So I would slightly prefer if you remove them again.
There was a problem hiding this comment.
Agreed, removed comment
56e83bd to
fa0273d
Compare
|
After implementing all suggestions, I see that my PR fails one of the tests - |
seberg
left a comment
There was a problem hiding this comment.
Thanks @yakovdan. The Cygwin failure is indeed universal, I think so let's ignore it for now 🤞.
Thanks for cleaning the tests up a bit more than necessary even! Definitely a nice improvement.
Mostly think some of the comments are unnecessary, so will just remove them (and that doc note, just because I lean to it, but others may well disagree).
I think this is all done though, will just apply and merge soon or when the tests pass.
from_float_positional errors for huge pads
This PR adds graceful error handling for np.format_float_positional when provided pad_left or pad_right arguments are too large. * TST: Added tests for correct handling of overflow * BUG: fixed pad_left and pad_right causing overflow if too large * TST: added overflow test and fixed formatting * BUG: fixed overflow checks and simplified error handling * BUG: rewritten excpetion message and fixed overflow check * TST: split test into smaller tests, added large input value * Apply suggestions from code review --------- Co-authored-by: Sebastian Berg <[email protected]>
This PR adds graceful error handling for np.format_float_positional when provided pad_left or pad_right arguments are too large. * TST: Added tests for correct handling of overflow * BUG: fixed pad_left and pad_right causing overflow if too large * TST: added overflow test and fixed formatting * BUG: fixed overflow checks and simplified error handling * BUG: rewritten excpetion message and fixed overflow check * TST: split test into smaller tests, added large input value * Apply suggestions from code review --------- Co-authored-by: Sebastian Berg <[email protected]> Signed-off-by: Filipe Laíns <[email protected]>
This PR adds graceful error handling for
np.format_float_positionalwhen providedpad_leftorpad_rightarguments are too large, leading to an overflow of the string buffer.In such cases, a
ValueErrorexception is raised.Added tests and updated the doc to describe limits on
pad_leftandpad_rightfixes #28068