BUG: Fix ``from_float_positional`` errors for huge pads by yakovdan · Pull Request #28149 · numpy/numpy

yakovdan · 2025-01-11T18:36:03Z

This PR adds graceful error handling for np.format_float_positional when provided pad_left or pad_right arguments are too large, leading to an overflow of the string buffer.

In such cases, a ValueError exception is raised.
Added tests and updated the doc to describe limits on pad_left and pad_right

fixes #28068

yakovdan · 2025-01-12T13:04:58Z

Hi, I'm a new contributor to NumPy and I would appreciate any guidance you may have.

seberg

Introducing the error return looks great. A few comments, mainly:

It seems a lot easier to just inline the error. It would be fine to do a single error for everything, although I would probably inline the error anyway with a goto buffer_too_small: or so.
This fixes the error, but a lot of care needs to be taken about overflows unfortunately:
- Can be fixed with re-arranging some of the length calculations, another way might be to use int64 for some of these checks.
- There are probably a lot more errors here: Every time maxPrintLen is checked and a special path taken, the result returned is currently wrong in some way and we need to raise an/the error!
  Would you be up to auditing that code as well?

For simplicity, I really suggest to just raise one single error as noted in one of the comments. In practice, no-one will ever see this error (if they did we should just support it...), so we can be lazy and worry about raising a particularly helpful error.
(I suppose the "no-one" should ever see this error makes me lean towards a RuntimeError.)

numpy/_core/src/multiarray/dragon4.c

seberg · 2025-01-13T08:26:25Z

numpy/_core/src/multiarray/dragon4.c

+        /* handle overflow condition safely*/
+        if (NPY_TRUE == right_overflow) {
+            buffer[pos] = '\0';
+            return RIGHT_PADDING_OVERFLOW;


I don't think there is any reason to have multiple error returns here. You can just set the Python error here and return -1 as the only error value.

Agreed, fixed

seberg · 2025-01-13T08:44:10Z

numpy/_core/arrayprint.py

-        many characters are to the left of the decimal point.
+        many characters are to the left of the decimal point. Total length of 
+        string representation of the floating point value cannot exceed 16384, 
+        including pad_right, if any.


I feel this is a lot of detail, nobody should ever see. If we want it, maybe we should put it into the Exceptions section.

I've simplified this part and moved to a separate section below. Is this what you've meant?

seberg · 2025-01-13T08:47:33Z

numpy/_core/src/multiarray/dragon4.c

-
+
+        /* provided digits_rights is too large, can't create the string required*/
        if (pos + count > maxPrintLen) {


I suspect this (and likely others) can still overflow for ridiculous inputs like 2**31-1.

Suggested change

if (pos + count > maxPrintLen) {

if (count > maxPrintLen - pos) {

re-arranging it should keep us safe?

I agree. I've fixed this where it's relevant.

seberg · 2025-01-13T08:49:24Z

numpy/_core/src/multiarray/dragon4.c

+
+        /* provided digits_rights is too large, can't create the string required*/
        if (pos + count > maxPrintLen) {
+            right_overflow = NPY_TRUE;


Why continue and not set the error and return here right away, we shouldn't use the result, so it doesn't matter if the buffer is null terminated.

Fixed according to your suggestion

seberg · 2025-01-13T08:53:39Z

numpy/_core/src/multiarray/dragon4.c

+                error_msg = "An unexpected error occurred";\
+                break;\
+        }\
+        PyErr_SetString(error_type, error_msg);\


If we set the error above, this code can stay unchanged. Also Dragon4_PrintFloat_##format is used below a second time and both must stay aligned.

We should make sure there is a test for both calls.

I've reverted the code for this function and and switched to raising errors in FormatPositional

seberg · 2025-01-13T08:55:34Z

numpy/_core/tests/test_scalarprint.py

            assert_equal(fpos(tp('1.001'), precision=1, trim='-'), "1")
+
+            assert_raises_regex(ValueError, "Left padding exceeds buffer size of 16384", fpos, tp('1.047'), precision=2, pad_left=int(1e5))            
+            assert_raises_regex(ValueError, "Right padding exceeds buffer size of 16384", fpos, tp('1.047'), precision=2, pad_right=int(1e5))


@charris what is our preferred line length? It seems a bit that the new linter setup ignores line-length?

For here: Would be nice to break it. That is easiest if you use the nicer pattern of:

with pytest.raises(..., match="..."):

(there is a lot of code not using it, but that is mostly because it predates pytest.)

I've split it into several lines to match the expected line-length

@seberg Preferred line length is now <= 88 for Python code. Clang-format is still at 80, but we should probably increase that. Linux is, IIRC, now <= 90 suggested, but not inforced.

Yeah, the thing is that the linter job should have complained: I guess it just means adding a specific line length somewhere.

yakovdan · 2025-01-15T07:41:41Z

Thanks @seberg for your review. I've implemented your suggestions. Is there anything else I should add?

seberg

Thanks! You missed one > maxPrintLen re-order (I think there are no more, but you could check briefly). We need to fix that and also add tests for huge input values (please do try if it crashes without the last fix!).

Otherwise, a few smaller things, it would be nice if you can refactor the test a bit more at least.

Personally, I might not document the error at all, but don't care. Either way, no need to change (I might yet decide to remove it via the "suggests" before merging).

Anyway, almost good to go in, thanks a lot!

seberg · 2025-01-16T09:00:29Z

numpy/_core/src/multiarray/dragon4.c

-                count = maxPrintLen - pos;
+            /* integer part too long for buffer, avoid overflow */
+            if (count > maxPrintLen - pos) {
+                PyErr_SetString(PyExc_RuntimeError, "String buffer overflow");


Small suggestion to write something that is easier to understand for Python programmers. Maybe along: "float formating result too large." (in all places of course)

seberg · 2025-01-16T09:02:39Z

numpy/_core/src/multiarray/dragon4.c

-
+
+        /* provided digits_left is too large, can't create the string required*/
        if (count + shift > maxPrintLen) {


This is similar to above shift is unbounded in size, so count + shift might overflow. So rephrase as count > maPrintLen - shift.

seberg · 2025-01-16T09:12:56Z

numpy/_core/tests/test_scalarprint.py

+            assert_raises_regex(RuntimeError, "String buffer overflow", fpos, 
+                                tp('1.047'), precision=2, pad_left=int(1e5))
+            assert_raises_regex(RuntimeError, "String buffer overflow", fpos, 
+                                tp('1.047'), precision=2, pad_right=int(1e5))


Thanks for reflowing. There is one important thing to do, I think. And that is we should add a test for all of these with the padding set to np.iinfo("uint32").max because this hits those count + size > maxPrintlen overflows.
(Ideally you add the tests first and confirm that the test crashes NumPy!)

Then I have a few smaller asks/suggestions that would be great:

I slightly prefer 10**5 rather than int(1e5).

It would be much clearer to use with pytest.raises(...):

Very long tests aren't nice, it would be much better to split out the test into its own function. Ideally, you use @pytest.mark.parametrize instead of the loop for [np.float16, np.float32, np.float64].

(You should be able to find good examples for these if you search the NumPy code.)

I've implemented all of your suggestions:

I've refactored the test into several smaller tests

I've parametrized on type

I've switched to 10**5

I've added np.iinfo("int32").max as a test value, instead of np.iinfo("uint32").max because the unsigned value is too large to be passed as `npy_int32' and tests detect this and fail

seberg · 2025-01-16T09:15:58Z

numpy/_core/src/multiarray/dragon4.c

        npy_int32 count = pos;
-
+
+        /* provided digits_left is too large, can't create the string required*/


I don't think these comments are needed here because with the Python error text, the code reason is just as clear already, IMO. So I would slightly prefer if you remove them again.

Agreed, removed comment

yakovdan · 2025-01-18T15:06:33Z

After implementing all suggestions, I see that my PR fails one of the tests - cygwin_build_test, probably related to the recent changes made to this test. Is there anything required on my part to pass this test?

seberg

Thanks @yakovdan. The Cygwin failure is indeed universal, I think so let's ignore it for now 🤞.

Thanks for cleaning the tests up a bit more than necessary even! Definitely a nice improvement.

Mostly think some of the comments are unnecessary, so will just remove them (and that doc note, just because I lean to it, but others may well disagree).

I think this is all done though, will just apply and merge soon or when the tests pass.

numpy/_core/arrayprint.py

numpy/_core/src/multiarray/dragon4.c

numpy/_core/tests/test_scalarprint.py

This PR adds graceful error handling for np.format_float_positional when provided pad_left or pad_right arguments are too large. * TST: Added tests for correct handling of overflow * BUG: fixed pad_left and pad_right causing overflow if too large * TST: added overflow test and fixed formatting * BUG: fixed overflow checks and simplified error handling * BUG: rewritten excpetion message and fixed overflow check * TST: split test into smaller tests, added large input value * Apply suggestions from code review --------- Co-authored-by: Sebastian Berg <[email protected]>

This PR adds graceful error handling for np.format_float_positional when provided pad_left or pad_right arguments are too large. * TST: Added tests for correct handling of overflow * BUG: fixed pad_left and pad_right causing overflow if too large * TST: added overflow test and fixed formatting * BUG: fixed overflow checks and simplified error handling * BUG: rewritten excpetion message and fixed overflow check * TST: split test into smaller tests, added large input value * Apply suggestions from code review --------- Co-authored-by: Sebastian Berg <[email protected]> Signed-off-by: Filipe Laíns <[email protected]>

yakovdan added 3 commits January 11, 2025 20:17

DOC: added limits to pad_left, pad_right

32b5ccb

TST: Added tests for correct handling of overflow

1e4fe5c

BUG: fixed pad_left and pad_right causing overflow if too large

b530a08

github-actions bot added the 00 - Bug label Jan 11, 2025

seberg reviewed Jan 13, 2025

View reviewed changes

yakovdan added 3 commits January 15, 2025 09:12

DOC: modified doc for simpler exception

e5e13e3

TST: added overflow test and fixed formatting

82d1481

BUG: fixed overflow checks and simplified error handling

3d85e35

seberg reviewed Jan 16, 2025

View reviewed changes

seberg added the 09 - Backport-Candidate PRs tagged should be backported label Jan 16, 2025

yakovdan added 2 commits January 18, 2025 16:03

BUG: rewritten excpetion message and fixed overflow check

b02ef67

TST: split test into smaller tests, added large input value

fa0273d

yakovdan force-pushed the bug_segfault_format_float branch from 56e83bd to fa0273d Compare January 18, 2025 14:25

yakovdan mentioned this pull request Jan 18, 2025

MAINT: testing: specify python executable to use in extbuild #28183

Merged

seberg approved these changes Jan 20, 2025

View reviewed changes

Apply suggestions from code review

6e6882b

seberg changed the title ~~BUG: numpy.format_float_positional causes Segmentaion Fault when accepting a large pad_left~~ BUG: Fix from_float_positional errors for huge pads Jan 21, 2025

seberg merged commit 1256527 into numpy:main Jan 21, 2025
67 checks passed

charris mentioned this pull request Jan 21, 2025

BUG: Fix from_float_positional errors for huge pads #28208

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jan 21, 2025



		/* provided digits_rights is too large, can't create the string required*/
		if (pos + count > maxPrintLen) {

	if (pos + count > maxPrintLen) {
	if (count > maxPrintLen - pos) {



		/* provided digits_left is too large, can't create the string required*/
		if (count + shift > maxPrintLen) {

		npy_int32 count = pos;


		/* provided digits_left is too large, can't create the string required*/

Uh oh!

Conversation

yakovdan commented Jan 11, 2025

Uh oh!

yakovdan commented Jan 12, 2025

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yakovdan Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yakovdan commented Jan 15, 2025

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yakovdan commented Jan 18, 2025

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yakovdan Jan 15, 2025 •

edited

Loading