ENH: stats.quantile: add array API compatible quantile function by mdhaber · Pull Request #22352 · scipy/scipy

mdhaber · 2025-01-17T22:53:28Z

Reference issue

What does this implement/fix?

Some SciPy functions will need an array API compatible quantile or median function if they are to be translated to the array API.
Looking toward RFC: stats: sunsetting scipy.stats.mstats #22194, we'll need a function to replace mquantiles, which is one of the most-used functions in stats.mstats. scoreatpercentile is not be a great substitute for a few reasons, including the unfamiliar name.
Based on RFC: array-agnostic quantile data-apis/array-api#795, its not looking like the array API standard will converge on a quantile function that would do everything we need it to do.

This PR adds an array API compatible quantile function with native support for axis, nan_policy (for the data array), and keepdims. Rather than perform a cartesian product operation like numpy.quantile (compute all quantiles for all slices), it follows more familiar and flexible broadcasting rules. NaNs in the probability array produce NaNs in the output rather than causing the entire operation to fail.

Additional information

I imagine that support for weights will be requested. If so, we can discuss that in a follow-up issue.
In the first commit, this does not support masked arrays, but it would be trivial to support marrays natively, and we could support NumPy masked arrays by either converting them to marrays or setting masked elements to NaN and using nan_policy='omit'. (The latter, of course, would not work if the desired nan_policy='propagate'.)
Initially, I've only implemented methods 4-9. It will be easy to add 1-3; I just didn't. We could also allow the method to be specified as a tuple of floats, which could be interpreted as the alphap and betap of mquantiles.

mdhaber · 2025-01-22T07:53:48Z

Harrell-Davis quantiles can easily be incorporated as method='harrell-davis' as shown in mdhaber/scipy@quantile...hdquantile.

ev-br

Looks all reasonable.

One thing might be worth doing upfront is to explicitly decide (and lightly test) the policy on float32 inputs: for f32 inputs, is the output f32 for f64. I personally don't have a strong preference either way, as long as there's a decision.

Historically we were "f64 is enough for everybody", now that there are array libraries that default to f32, new functions better make a decision. As long as this one is private, I guess whichever works better for our internal usage is fine.

ev-br · 2025-01-30T17:00:25Z

+    return res
+
+
+@skip_xp_backends('array_api_strict', reason="No take_along_axis yet.")


Could test-drive the main branch of array_api_strict which has it. Or wait until the release --- but the exact timeline is blocked on the array api 2024 release. Which is "soon" (tm)

I am happy waiting until it's released. An update when it's released is no problem.

ev-br · 2025-01-30T17:08:04Z

+        x = xp_ravel(x)
+        p = xp_ravel(p)
+        axis = 0
+    elif np.iterable(axis) or int(axis) != axis:


np.iterable, ouch. Is there ever a reason to allow axis to be anything other than a tuple or (maybe, just maybe), tuple or a list.

Well, currently this will raise an informative error for non-tuple iterables.

The reason I'm not bothering to a handle tuples right now is that the plan is to handle them with a decorator for all stats functions. Currently, the axis_nan_policy decorator does this, but we haven't (and won't, IMO) update it to be array API compatible. I'll write a much lighter, array API compatible one for axis tuple support at some point.

ev-br · 2025-01-31T09:21:18Z

+    if not keepdims:
+        res = xp.squeeze(res, axis=axis)
+
+    return res[()]


This won't work on array-api-strict, correct? In general, this is not compliant unless res is strictly 1D

Correct. When we can test with array-api-strict, we will do the usual res[()] if res.ndim == 0 else res. Surely there will be a handful of other unfortunate verbosities I forgot. The test suite will let us know, and I can make the code uglier at that time. Update: went ahead and made it uglier now.

mdhaber · 2025-01-31T12:13:33Z

One thing might be worth doing upfront

Explicitly, this function is to preserve float32 throughout if that's the result type of x and p. ~~If I didn't add a dtype test already, I will shortly, and at that time I'll fix any related mistakes.~~ I'm already testing this in test_against_reference. (This is the policy for all array API compatible stats functions.)

As long as this one is private,

This is not to be private. If this is looking close, I'll go ahead and post to the mailing list after a commit. (Done.)

mdhaber · 2025-01-31T22:11:23Z

I think only one test failed for an understandable reason, and then the GPU job reported failure for all subsequent tests. Is that supposed to happen?

Anyway, fixed that. Forum post created.

rgommers · 2025-02-01T11:34:36Z

I think only one test failed for an understandable reason, and then the GPU job reported failure for all subsequent tests. Is that supposed to happen?

That can happen when a GPU gets into a bad state. If the log gets eaten then it's a problem, because you cannot see where the problem is - this happened during setting up the GPU job and turned out to be a Cirrus virtualization bug. If the tests complete and failures are reported normally by pytest, then it is usually normal.

mdhaber · 2025-02-01T14:09:08Z

That can happen when a GPU gets into a bad state

Sure, but I didn't see how the failed assertion should cause the GPU to get into a bad state. I think it was just an incorrect assumption about dtypes in the test, so the result dtype was different than expected; I don't think anything else went wrong. Would a failed assertion cause a GPU to get into a bad state?

rgommers · 2025-02-03T17:18:13Z

Sure, but I didn't see how the failed assertion should cause the GPU to get into a bad state

Well, the first failure contains a pretty good clue:

../../../../.github/workflows/.pixi/envs/torch-cuda/lib/python3.12/site-packages/torch/testing/_comparison.py:711: in compare
    self._compare_values(actual, expected)
       actual     = <[RuntimeError('CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some ...pile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f37083759a0>

So it's not a assert xxx executed by pytest that failed, but an assert executed on the GPU. That tends to result in this kind of behavior.

mdhaber · 2025-02-03T17:44:50Z

I didn't think it was a pytest assertion. The stack trace suggests that it's happening in torch.testing, and I know that would be working on the GPU. I was still surprised by the idea that failure of a pytorch.testing assert would cause this sort of error state, even when executed on the GPU, so I dug a bit deeper. It turns out that the "device-side assert" actually occured earlier than the test assertion shown in the stack trace due to pytorch/pytorch#146211. The stack trace was either incorrect (as it said it might be) or somehow the code continued to execute until it got to the assertion (at which point it finally decided to raise).

ev-br

This now LGTM, and the discourse thread is silent for about a week. Let's give it at least one more weekend, I'd say, and then interpret it as a lazy consensus.

rgommers · 2025-02-09T19:09:58Z

That makes sense, thanks for digging in deeper @mdhaber.

mdhaber · 2025-02-10T16:15:29Z

https://github.com/mdhaber/scipy/tree/hdquantile now adds methods 1-3, a Harwell-Davis method, and marray support. I'll open that up as soon as this lands.

ev-br · 2025-02-10T18:11:27Z

Okay, the discourse thread is still silent, the code LGTM, let's merge and keep the ball rolling. Thanks Matt, Ralf.

ENH: stats.quantile: add array API compatible quantile function

e07ab69

mdhaber added scipy.stats enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) labels Jan 17, 2025

mdhaber requested a review from rgommers as a code owner January 17, 2025 22:53

github-actions bot added the Meson Items related to the introduction of Meson as the new build system for SciPy label Jan 17, 2025

mdhaber mentioned this pull request Jan 17, 2025

RFC: array-agnostic quantile data-apis/array-api#795

Closed

mdhaber commented Jan 18, 2025

View reviewed changes

Comment thread scipy/stats/_quantile.py

mdhaber commented Jan 18, 2025

View reviewed changes

Comment thread scipy/stats/_quantile.py

This was referenced Jan 22, 2025

MAINT: fix domain edge error for betainc a or b ==0 case is 1. #11788

Closed

ENH: add take_along_axis mdhaber/marray#77

Merged

Update version to 0.0.7 mdhaber/marray#79

Merged

ev-br reviewed Jan 31, 2025

View reviewed changes

mdhaber commented Jan 31, 2025

View reviewed changes

Comment thread scipy/stats/_quantile.py Outdated

mdhaber added 2 commits January 31, 2025 10:14

Merge remote-tracking branch 'upstream/main' into quantile

17d68e3

MAINT: stats.quantile: updates per review

657c63b

mdhaber commented Jan 31, 2025

View reviewed changes

Comment thread scipy/stats/tests/test_quantile.py Outdated

Update scipy/stats/tests/test_quantile.py

f3648f9

mdhaber mentioned this pull request Jan 31, 2025

Negative index support for take_along_dim pytorch/pytorch#146211

Closed

MAINT: stats.quantile: avoid pytorch/pytorch#146211

f5ed1ba

mdhaber added 2 commits January 31, 2025 14:51

TST: stats.quantile: skip Dask

4f4577c

TST: stats.quantile: adjust for varying default dtype

25f943e

mdhaber requested a review from ev-br February 7, 2025 18:31

ev-br approved these changes Feb 7, 2025

View reviewed changes

mdhaber requested a review from ev-br February 10, 2025 16:18

ev-br merged commit 4326e9e into scipy:main Feb 10, 2025

ev-br added this to the 1.16.0 milestone Feb 10, 2025

lorentzenchr mentioned this pull request Feb 15, 2025

MNT _weighted_percentile supports np.nan values scikit-learn/scikit-learn#29034

Merged

mdhaber mentioned this pull request Feb 19, 2025

RFC: stats: sunsetting scipy.stats.mstats #22194

Open

mdhaber mentioned this pull request Mar 3, 2025

ENH: stats: add array API-support #20544

Open

zitongzhoueric mentioned this pull request Apr 2, 2025

TST: stats.quantile: add edge test case for axis=None && keepdims=True #22788

Merged

lucyleeow mentioned this pull request May 21, 2025

Add array API support to median_absolute_error scikit-learn/scikit-learn#31406

Merged

This was referenced Jun 30, 2025

ENH add Array API support for d2_pinball_score and d2_absolute_error_score scikit-learn/scikit-learn#31671

Merged

ENH: add quantile data-apis/array-api-extra#340

Open

lucyleeow mentioned this pull request Oct 31, 2025

DOC Fix comment in _median scikit-learn/scikit-learn#32616

Merged

		return res


		@skip_xp_backends('array_api_strict', reason="No take_along_axis yet.")

Uh oh!

Conversation

mdhaber commented Jan 17, 2025

Reference issue

What does this implement/fix?

Additional information

Uh oh!

Uh oh!

Uh oh!

mdhaber commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ev-br left a comment

Choose a reason for hiding this comment

Uh oh!

ev-br Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

mdhaber Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ev-br Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

mdhaber Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ev-br Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

mdhaber Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdhaber commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Feb 1, 2025

Uh oh!

mdhaber commented Feb 1, 2025

Uh oh!

rgommers commented Feb 3, 2025

Uh oh!

mdhaber commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ev-br left a comment

Choose a reason for hiding this comment

Uh oh!

rgommers commented Feb 9, 2025

Uh oh!

mdhaber commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ev-br commented Feb 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mdhaber commented Jan 22, 2025 •

edited

Loading

mdhaber Jan 31, 2025 •

edited

Loading

mdhaber Jan 31, 2025 •

edited

Loading

mdhaber commented Jan 31, 2025 •

edited

Loading

mdhaber commented Jan 31, 2025 •

edited

Loading

mdhaber commented Feb 3, 2025 •

edited

Loading

mdhaber commented Feb 10, 2025 •

edited

Loading