ENH: stats.quantile: add array API compatible quantile function#22352
ENH: stats.quantile: add array API compatible quantile function#22352ev-br merged 7 commits intoscipy:mainfrom
Conversation
|
Harrell-Davis quantiles can easily be incorporated as |
ev-br
left a comment
There was a problem hiding this comment.
Looks all reasonable.
One thing might be worth doing upfront is to explicitly decide (and lightly test) the policy on float32 inputs: for f32 inputs, is the output f32 for f64. I personally don't have a strong preference either way, as long as there's a decision.
Historically we were "f64 is enough for everybody", now that there are array libraries that default to f32, new functions better make a decision. As long as this one is private, I guess whichever works better for our internal usage is fine.
| return res | ||
|
|
||
|
|
||
| @skip_xp_backends('array_api_strict', reason="No take_along_axis yet.") |
There was a problem hiding this comment.
Could test-drive the main branch of array_api_strict which has it. Or wait until the release --- but the exact timeline is blocked on the array api 2024 release. Which is "soon" (tm)
There was a problem hiding this comment.
I am happy waiting until it's released. An update when it's released is no problem.
| x = xp_ravel(x) | ||
| p = xp_ravel(p) | ||
| axis = 0 | ||
| elif np.iterable(axis) or int(axis) != axis: |
There was a problem hiding this comment.
np.iterable, ouch. Is there ever a reason to allow axis to be anything other than a tuple or (maybe, just maybe), tuple or a list.
There was a problem hiding this comment.
Well, currently this will raise an informative error for non-tuple iterables.
The reason I'm not bothering to a handle tuples right now is that the plan is to handle them with a decorator for all stats functions. Currently, the axis_nan_policy decorator does this, but we haven't (and won't, IMO) update it to be array API compatible. I'll write a much lighter, array API compatible one for axis tuple support at some point.
| if not keepdims: | ||
| res = xp.squeeze(res, axis=axis) | ||
|
|
||
| return res[()] |
There was a problem hiding this comment.
This won't work on array-api-strict, correct? In general, this is not compliant unless res is strictly 1D
There was a problem hiding this comment.
Correct. When we can test with array-api-strict, we will do the usual res[()] if res.ndim == 0 else res. Surely there will be a handful of other unfortunate verbosities I forgot. The test suite will let us know, and I can make the code uglier at that time. Update: went ahead and made it uglier now.
Explicitly, this function is to preserve
This is not to be private. If this is looking close, I'll go ahead and post to the mailing list after a commit. (Done.) |
|
I think only one test failed for an understandable reason, and then the GPU job reported failure for all subsequent tests. Is that supposed to happen? Anyway, fixed that. Forum post created. |
That can happen when a GPU gets into a bad state. If the log gets eaten then it's a problem, because you cannot see where the problem is - this happened during setting up the GPU job and turned out to be a Cirrus virtualization bug. If the tests complete and failures are reported normally by |
Sure, but I didn't see how the failed assertion should cause the GPU to get into a bad state. I think it was just an incorrect assumption about dtypes in the test, so the result dtype was different than expected; I don't think anything else went wrong. Would a failed assertion cause a GPU to get into a bad state? |
Well, the first failure contains a pretty good clue: So it's not a |
|
I didn't think it was a |
ev-br
left a comment
There was a problem hiding this comment.
This now LGTM, and the discourse thread is silent for about a week. Let's give it at least one more weekend, I'd say, and then interpret it as a lazy consensus.
|
That makes sense, thanks for digging in deeper @mdhaber. |
|
https://github.com/mdhaber/scipy/tree/hdquantile now adds methods 1-3, a Harwell-Davis method, and |
|
Okay, the discourse thread is still silent, the code LGTM, let's merge and keep the ball rolling. Thanks Matt, Ralf. |
Reference issue
data-apis/array-api#795
gh-22194
What does this implement/fix?
scipy.stats.mstats#22194, we'll need a function to replacemquantiles, which is one of the most-used functions instats.mstats.scoreatpercentileis not be a great substitute for a few reasons, including the unfamiliar name.quantiledata-apis/array-api#795, its not looking like the array API standard will converge on aquantilefunction that would do everything we need it to do.This PR adds an array API compatible
quantilefunction with native support foraxis,nan_policy(for the data array), andkeepdims. Rather than perform a cartesian product operation likenumpy.quantile(compute all quantiles for all slices), it follows more familiar and flexible broadcasting rules. NaNs in the probability array produce NaNs in the output rather than causing the entire operation to fail.Additional information
weightswill be requested. If so, we can discuss that in a follow-up issue.marrays or setting masked elements to NaN and usingnan_policy='omit'. (The latter, of course, would not work if the desirednan_policy='propagate'.)alphapandbetapofmquantiles.