BUG: Order percentile monotonically #16273

CloseChoice · 2020-05-17T13:49:03Z

Order output of np.quantile/np.percentile monotonically if quantiles/percentiles are ordered monotonically.

Change the way of performing the linear interpolation from (2) to (1) as given in
https://math.stackexchange.com/questions/907327/accurate-floating-point-linear-interpolation

eric-wieser · 2020-05-17T13:55:37Z

This is a duplicate of #15098. For reviewers:

master uses lerp(a, b, t) = a*(1-t) + b*t
This Pr uses lerp(a, b, t) = a + (b-a)*t
BUG ensure monotonic property of lerp in numpy.percentile #15098 uses lerp(a, b, t) = a + (b-a)*t if t < 0.5 else b - (b-a)*(1-t)

CloseChoice · 2020-05-17T13:59:39Z

This is a duplicate of #15098. For reviewers:

* `master` uses `lerp(a, b, t) = a*(1-t) + b*t`

* This Pr uses `lerp(a, b, t) = a + (b-a)*t`

* #15098 uses `lerp(a, b, t)  = a + (b-a)*t if t < 0.5 else b - (b-a)*(1-t)`

Hi,
thanks for the answer and you're totally correct concerning my and master's formula. I just realized that there is an open PR when I was done writing my code, so I thought I'd try to give it a chance. Can close this PR though, if you advice to.

CloseChoice · 2020-05-17T14:03:20Z

numpy/lib/tests/test_function_base.py

I actually thought of writing this as a hypothesis test but hypothesis module was never imported and I don't know your conventions. But can change it hypothesis for sure if that is desired.

We recently decided we were willing to accept hypothesis tests - if you think the test is easier to read with hypothesis, please use it, so that maintainers like me who haven't used it can get the hang of it.

I added a hypothesis test. Am curious for your feedback. The purpose of this test is not to increase readability but to cover more test cases without writing multiple tests and to find failing egde cases we might not have thought of.

CloseChoice · 2020-05-17T14:04:17Z

numpy/lib/function_base.py

I did not understand what out means in this context. If you can give me a hint I might be able to simplify the code even more. I guess there is the possibility to get rid of x1 = x1.squeeze(0)

In working out what hint to give you, I ended up doing some cleanup of this code myself. Thanks for drawing it to my attention. It might be best for you to wait until my cleanup goes through, and then to rebase your tests.

Thanks for your patience - my cleanup is in. It's unlikely you'll be able to usefully merge the implementation, but you should be able to keep the tests.

Your cleanup made it way easier to implement the montonic lerp. Thanks for that.

eric-wieser · 2020-05-20T05:22:58Z

Thanks for your patience - my cleanup is in. It's unlikely you'll be able to usefully merge the implementation, but you should be able to keep the tests.

eric-wieser · 2020-05-21T11:04:38Z

numpy/lib/function_base.py

Suggested change

#import pdb; pdb.set_trace()

eric-wieser · 2020-05-21T11:05:13Z

numpy/lib/tests/test_function_base.py

Suggested change

equals_sorted = np.sort(quantile) == quantile

assert equals_sorted.all()

assert_equal(np.sort(quantile), quantile)

eric-wieser · 2020-05-21T11:09:18Z

My fear with this patch is that we also care about the other properties in #14685 (comment) (vs #15098 which claims to satisfy all of them).

Do you think you could put together a hypothesis tests that can prove that those properties are not satisfied with your solution?

eric-wieser · 2020-05-26T19:14:24Z

numpy/lib/tests/test_function_base.py

Suggested change

assert np.lib.function_base._lerp(a, b, t) == np.lib.function_base._lerp(b, a, t)

assert np.lib.function_base._lerp(a, b, t) == np.lib.function_base._lerp(b, a, 1-t)

Or more likely,

Suggested change

assert np.lib.function_base._lerp(a, b, t) == np.lib.function_base._lerp(b, a, t)

assert np.lib.function_base._lerp(a, b, 1 - (1- t)) == np.lib.function_base._lerp(b, a, 1 - t)

To ensure that the precision of t is the same in both directions

CloseChoice · 2020-05-26T19:23:00Z

My fear with this patch is that we also care about the other properties in #14685 (comment) (vs #15098 which claims to satisfy all of them).

Do you think you could put together a hypothesis tests that can prove that those properties are not satisfied with your solution?

I added the tests but both solutions (mine and #15098) don't satisfy the exact symmetry condition but of course both satisfy the condition with np.isclose.

EDIT: This is outdated. When we apply assert np.lib.function_base._lerp(a, b, 1 - (1- t)) == np.lib.function_base._lerp(b, a, 1 - t) #15098 passes the test while my initial formula fails it

eric-wieser · 2020-05-26T19:27:49Z

Even with #16273 (comment)?

numpy/lib/tests/test_function_base.py

CloseChoice · 2020-05-26T19:33:50Z

Even with #16273 (comment)?

Only with #16273 (comment). Using assert np.lib.function_base._lerp(a, b, t) == np.lib.function_base._lerp(b, a, (1 - t)) fails it.

…nction

…ry of lerp

Co-authored-by: Eric Wieser <[email protected]>

eric-wieser · 2020-05-27T19:21:49Z

numpy/lib/tests/test_function_base.py

+                                  width=32),
+                      b= st.floats(allow_nan=False, allow_infinity=False,
+                                   width=32))


These need to be 64 for this test to be interesting.

changing to 64bit leads ot overflows in the basic add and subtract methods:

b=1.7976931348623157e+308 a=-1.7976931348623157e+308 np.subtract(b, a) *** RuntimeWarning: overflow encountered in subtract

That's potentially a problem, the previous implementation did not fail like that for those cases.

For now, can you do something like clip a and b to reasonable magnitudes? We still want to test the precision of 64-bit, even if we decide we don't care about the range.

I changed the range from -1e300 up to 1e300, seems to work fine.

Are you sure that it wasn't a problem before? The overflow happens in the basic add and subtract methods and should be reproducible for specific combinations for the former implementation aswell since we are still using those methods.

numpy/lib/function_base.py

Co-authored-by: Eric Wieser <[email protected]>

numpy/lib/tests/test_function_base.py

eric-wieser

Looks good to me, assuming @seberg is happy.

Co-authored-by: Eric Wieser <[email protected]>

seberg

LGTM, may just merge soon. I cannot say, I love the way the indentation works out for the hypothesis tests. I think I would prefer a hanging indent, maybe of 8 spaces. There may be long lines.

I also wonder if those tests (except the symmetric one maybe?), are actually likely to stress the issue. On first sight, it feels like we may need an absurd amount of fuzzy trials until the test could proof that the old version was bad?

EDIT: To be clear, I am not willing to stress on code style in the tests here...

seberg · 2020-06-27T16:34:41Z

Thanks @CloseChoice, lets put put this in :), I think the tests should cover the relevant paths well, even if I am not sure that all the tests actually stress the monotonicity very strongly.

EDIT: To be clear, if someone feels like doing style/test touch-ups that is of course welcome, but I don't feel its important enough to delay/stress out over.

CloseChoice · 2020-06-28T16:00:59Z

@seberg : Thanks for merging ;)

glemaitre · 2020-06-29T07:49:19Z

@CloseChoice @seberg @eric-wieser Thanks for taking care of this issue and sorry to have disappear from the map.
This will be really useful.

HyukjinKwon · 2021-05-28T04:45:49Z

numpy/lib/function_base.py

 def _lerp(a, b, t, out=None):
    """ Linearly interpolate from a to b by a factor of t """
-    return add(a*(1 - t), b*t, out=out)
+    diff_b_a = subtract(b, a)


Just in case this was overlooked mistakenly, seems like this now disallows booleans in percentile:

>>> np.percentile([True, False, False], q=0.5)

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<__array_function__ internals>", line 5, in percentile File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3818, in percentile return _quantile_unchecked( File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out, File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce r = func(a, **kwargs) File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func r = _lerp(x_below, x_above, weights_above, out=out) File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3961, in _lerp diff_b_a = subtract(b, a) TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

which worked before (NumPy 1.19):

0.0

FWIW, this behaviour change can also be triggered by pandas's:

>>> pd.DataFrame({"i": [0, 1, 2], "b": [False, False, True], "s": ["x", "y", "z"]}).quantile(q=0.5, numeric_only=True)

Before:

i 1.0 b 0.0 Name: 0.5, dtype: float64

After:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../python3.8/site-packages/pandas/core/frame.py", line 9266, in quantile result = data._mgr.quantile( File "/.../python3.8/site-packages/pandas/core/internals/managers.py", line 491, in quantile block = b.quantile(axis=axis, qs=qs, interpolation=interpolation) File "/.../python3.8/site-packages/pandas/core/internals/blocks.py", line 1592, in quantile result = nanpercentile( File "/.../python3.8/site-packages/pandas/core/nanops.py", line 1675, in nanpercentile return np.percentile(values, q, axis=axis, interpolation=interpolation) File "<__array_function__ internals>", line 5, in percentile File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3818, in percentile return _quantile_unchecked( File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3937, in _quantile_unchecked r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out, File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3515, in _ureduce r = func(a, **kwargs) File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 4064, in _quantile_ureduce_func r = _lerp(x_below, x_above, weights_above, out=out) File "/.../python3.8/site-packages/numpy/lib/function_base.py", line 3961, in _lerp diff_b_a = subtract(b, a) TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

It might be better to open an issue instead of commenting on the merge PR. I am not sure that it will draw attention.

The problem is that I am not sure if this is an intended behaviour or a bug. It could make sense either way 1. bool != numeric but 2. bool inherits int in Python.

I believe I provided enough details with a self-contained reproducer so I would defer to other maintainers here.

Actually opening an issue with the regression tag would make sense here. What do you thi
nk @eric-wieser @seberg ?

Please open the issue, otherwise the discussion will get lost even quicker. If we figure that it is OK if this breaks now, we can just close the issue again. Thanks!

done: #19154

…mns_should_be_discarded_if_numeric_only_is_true ### What changes were proposed in this pull request? This PR proposes to fix and reenable `test_stats_on_non_numeric_columns_should_be_discarded_if_numeric_only_is_true` that was disabled when we upgrade Python 3.9 in CI at #32657. Seems like this is because of the latest NumPy's behaviour change, see also `https://github.com/numpy/numpy/pull/16273#discussion_r641264085`. pandas inherits this behaviour but it doesn't make sense when `numeric_only` is set to `True` in pandas. I will track and follow the status of the issue between pandas and NumPy. For the time being, I propose to exclude boolean case alone in percentile/quartile test case ### Why are the changes needed? To keep the test coverage. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? I roughly locally tested. But it should pass in CI. Closes #32690 from HyukjinKwon/SPARK-35510. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

CloseChoice force-pushed the BUG-order_percentile-monotonically branch from 717997a to 229f14c Compare May 17, 2020 13:51

CloseChoice commented May 17, 2020

View reviewed changes

eric-wieser mentioned this pull request May 17, 2020

MAINT: cleanups to quantile #16274

Merged

charris changed the title ~~Bug order percentile monotonically~~ BUG: Order percentile monotonically May 17, 2020

charris added 00 - Bug component: numpy.lib labels May 17, 2020

CloseChoice force-pushed the BUG-order_percentile-monotonically branch from af5dd48 to be592a4 Compare May 21, 2020 10:30

CloseChoice requested a review from eric-wieser May 21, 2020 10:34

eric-wieser reviewed May 21, 2020

View reviewed changes

numpy/lib/function_base.py Outdated

Copy link

Member

eric-wieser May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

#import pdb; pdb.set_trace()

eric-wieser reviewed May 21, 2020

View reviewed changes

eric-wieser reviewed May 26, 2020

View reviewed changes

numpy/lib/tests/test_function_base.py Outdated Show resolved Hide resolved

CloseChoice force-pushed the BUG-order_percentile-monotonically branch from 90eba5f to d895113 Compare May 26, 2020 19:46

CloseChoice and others added 7 commits May 27, 2020 19:35

BUG: np.quantile ordering not monotonic

3b0516a

add hypothesis test, fix bug of non-monotonic ordering of quantile fu…

54de868

…nction

remove pdb; add hypothesis tests for monotony, boundedness and symmet…

708798b

…ry of lerp

fix symmetry test

2ffcb11

use symmetric lerp function

bf65d6b

Update numpy/lib/tests/test_function_base.py

1397597

Co-authored-by: Eric Wieser <[email protected]>

fix lerp function and corresponding tests

214e830

CloseChoice force-pushed the BUG-order_percentile-monotonically branch from 18815e4 to 214e830 Compare May 27, 2020 17:51

remove debug statements

eddef43

eric-wieser reviewed May 27, 2020

View reviewed changes

CloseChoice added 4 commits June 10, 2020 01:11

make lerp be able to handle 0d cases

e760360

check for greater-equal in lerp monotony test

172b7d3

check for greater-equal in lerp monotony test

961d1b6

fix _scalar_or_0d in _lerp

2c05353

eric-wieser reviewed Jun 12, 2020

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

Update numpy/lib/function_base.py

2b7f671

Co-authored-by: Eric Wieser <[email protected]>

eric-wieser reviewed Jun 12, 2020

View reviewed changes

numpy/lib/tests/test_function_base.py Outdated Show resolved Hide resolved

eric-wieser reviewed Jun 12, 2020

View reviewed changes

CloseChoice and others added 2 commits June 12, 2020 22:17

Update numpy/lib/tests/test_function_base.py

45605b2

Co-authored-by: Eric Wieser <[email protected]>

limit test_quantile_monotonic

88acbc8

eric-wieser mentioned this pull request Jun 17, 2020

ENH: Allow no conversion to scalar guarantee in ufunc and ufunc.outer #14489

Closed

eric-wieser requested a review from seberg June 17, 2020 07:38

eric-wieser approved these changes Jun 17, 2020

View reviewed changes

seberg approved these changes Jun 18, 2020

View reviewed changes

seberg self-requested a review June 18, 2020 18:05

seberg merged commit 4d5b255 into numpy:master Jun 27, 2020

seberg mentioned this pull request Jun 27, 2020

BUG ensure monotonic property of lerp in numpy.percentile #15098

Closed

charris added 09 - Backport-Candidate PRs tagged should be backported and removed 09 - Backport-Candidate PRs tagged should be backported labels Jun 27, 2020

eric-wieser mentioned this pull request Jul 14, 2020

Possible wrong on TestQuantile #16856

Closed

This was referenced Feb 2, 2021

[BUG] Sporadic KBinsDiscretizer pytests fail with quantile strategy rapidsai/cuml#2933

Open

Percentile output non-monotonic for monotonically increasing percentiles cupy/cupy#4607

Closed

Improve floating point accuracy in percentile cupy/cupy#4617

Merged

HyukjinKwon reviewed May 28, 2021

View reviewed changes

This was referenced Jun 3, 2021

Broken bool supports in NumPy's percentile #19154

Closed

BUG: Broken bool supports in pandas' quantile by NumPy's percentile behaviour change pandas-dev/pandas#41792

Closed

	equals_sorted = np.sort(quantile) == quantile
	assert equals_sorted.all()
	assert_equal(np.sort(quantile), quantile)

	assert np.lib.function_base._lerp(a, b, t) == np.lib.function_base._lerp(b, a, t)
	assert np.lib.function_base._lerp(a, b, t) == np.lib.function_base._lerp(b, a, 1-t)

Uh oh!

BUG: Order percentile monotonically #16273

BUG: Order percentile monotonically #16273

Uh oh!

Conversation

CloseChoice commented May 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser commented May 17, 2020

Uh oh!

CloseChoice commented May 17, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented May 20, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented May 21, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CloseChoice commented May 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eric-wieser commented May 26, 2020

Uh oh!

Uh oh!

CloseChoice commented May 26, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CloseChoice May 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eric-wieser left a comment

Choose a reason for hiding this comment

Uh oh!

seberg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Jun 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CloseChoice commented Jun 28, 2020

Uh oh!

glemaitre commented Jun 29, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CloseChoice commented May 17, 2020 •

edited

Loading

CloseChoice commented May 26, 2020 •

edited

Loading

CloseChoice May 27, 2020 •

edited

Loading

seberg left a comment •

edited

Loading

seberg commented Jun 27, 2020 •

edited

Loading