ENH: Add Array API 2023.12 version support#26724
Conversation
862c8ec to
005e41e
Compare
|
Hi @seberg, This PR will add support for 2023.12 Array API standard that NumPy aims to implement.
With these items merged we will be 2023.12 compliant for 2.1.0 release. |
mhvk
left a comment
There was a problem hiding this comment.
Some in-line comments. Main ones are that any new or changed API should be properly documented, including a changelog entry.
For cumulative_sum, arguably a better way would be to implement include_initial - though that is obviously a lot more work, changing the implementation of accumulate for the ufuncs. But it would imply one can actually include any initial.
numpy/_core/_methods.py
Outdated
| if min is None and max is None: | ||
| raise ValueError("One of max or min must be given") | ||
|
|
||
| # TODO: is there a better way to return identity? |
There was a problem hiding this comment.
This seems fine.
But it implies a (small) API change, which perhaps is worth a changelog entry?
There was a problem hiding this comment.
But then out argument is ignored. But what if a user passed it? Throw an error?
I wonder about something like:
def identity(x, out=None):
if out is None:
return x
else:
# TODO: move x contents to out
...
return outThere was a problem hiding this comment.
Ah, yes, you are right. And perhaps in general one would expect a copy, like for np.minimum and np.maximum. In that case, how about return np.positive(x, out=out)?
There was a problem hiding this comment.
np.positive will work here - thank you!
numpy/_core/fromnumeric.py
Outdated
|
|
||
| @array_function_dispatch(_clip_dispatcher) | ||
| def clip(a, a_min, a_max, out=None, **kwargs): | ||
| def clip(a, a_min=_NotPassed, a_max=_NotPassed, out=None, *, min=_NotPassed, |
There was a problem hiding this comment.
Maybe just use np._NoValue - that is meant for cases like these (and looks better in the documentation!)
numpy/_core/fromnumeric.py
Outdated
| Array API compatible cumulative_sum | ||
| """ | ||
| if out is not None and include_initial: | ||
| raise ValueError("not supported") |
There was a problem hiding this comment.
The error message should be a bit clearer about what is not supported.
There was a problem hiding this comment.
OK, yes, should have seen that this is a draft!
numpy/_core/fromnumeric.py
Outdated
| raise ValueError("not supported") | ||
|
|
||
| if x.ndim >= 2 and axis is None: | ||
| raise ValueError("requires axis arg") |
There was a problem hiding this comment.
Again, expand the error message.
numpy/_core/fromnumeric.py
Outdated
| initial_shape = list(x.shape) | ||
| initial_shape[axis] = 1 | ||
| res = np.concat( | ||
| [np.zeros(initial_shape, dtype=res.dtype), res], axis=axis |
There was a problem hiding this comment.
Hmm, seems a bit of a missed opportunity to actually allow initial to be passed on too. But easy to adjust if that were to happen.
There was a problem hiding this comment.
Sure! Let's keep additive and multiplicative identity as initial for cumulative_sum and cumulative_prod for now.
There was a problem hiding this comment.
If the plan is to update accumulate, or add a separate exclusive scan method, I would include this behavior there. I see cumulative_sum/cumsum as a shorthand for the common sum.accumulate, where more advanced cases should be accessed by using the ufunc methods.
And anyways, you can always do cumulative_sum(x, include_initial=True) + initial.
There was a problem hiding this comment.
I think that in this PR we can ship include_initial in cumulative_sum only to make it to 2.1.0 release with 2023.12 compatibility. Updating accumulate ufunc to also accept include_initial could be addressed separately as ufunc enhancement.
numpy/_core/fromnumeric.py
Outdated
| def cumulative_sum(x, /, *, axis=None, dtype=None, out=None, | ||
| include_initial=False): | ||
| """ | ||
| Array API compatible cumulative_sum |
There was a problem hiding this comment.
Since this is presumably going to be exposed in the main namespace, this needs a proper docstring, including the differences with regular cumsum.
|
@mhvk could you take another look? |
mhvk
left a comment
There was a problem hiding this comment.
Thanks, some comments inline, but a larger one on passing in out and include_initial: this is not too hard to do, with the following replacement,
def _cumulative_func(x, func, axis, dtype, out, include_initial):
x_ndim = ndim(x)
if axis is None:
if x_ndim >= 2:
raise ValueError("For arrays which have more than one dimension "
"``axis`` argument is required.")
axis = 0
if out is not None and include_initial:
item = [slice(None)] * x_ndim
item[axis] = slice(1, None)
func.accumulate(x, axis=axis, dtype=dtype, out=out[tuple(item)])
item[axis] = 0
out[tuple(item)] = func.identity
return out
res = func.accumulate(x, axis=axis, dtype=dtype, out=out)
if include_initial:
initial_shape = list(x.shape)
initial_shape[axis] = 1
res = np.concat(
[np.full_like(res, func.identity, shape=initial_shape), res],
axis=axis
)
return res
.github/workflows/linux.yml
Outdated
| python -m pip install -r requirements/build_requirements.txt | ||
| python -m pip install -r requirements/test_requirements.txt | ||
| python -m pip install -r array-api-tests/requirements.txt | ||
| python -m pip install --upgrade hypothesis |
There was a problem hiding this comment.
Is this necessary? Why not set a minimum version instead? (just coriosity, but the changes to the workflow seem unrelated to the PR)
There was a problem hiding this comment.
In test_requirements.txt there's hypothesis==6.81.1. array-api-tests suite requires newer version so I updated it to the latest hypothesis==6.104.1.
@ngoldbaum I see that you updated the version a year ago. Is there a reason not to use >= here?
There was a problem hiding this comment.
At one point NumPy used a bot to keep the test dependencies regularly updated, but it caused a lot of noise on forks of NumPy due to a bug and we decided to turn that off. That means the test requirements are only updated as people notice they are out of date. Please feel free to update the hypothesis version in the test requirements and if it doesn't affect anything else in CI it should be fine.
We do want pinned versions though to make it less likely that an upstream version change will suddenly break our CI.
| compatible alternatives for `numpy.cumsum` and `numpy.cumprod`. | ||
| * `numpy.clip` now supports ``max`` and ``min`` keyword arguments which are meant | ||
| to replace ``a_min`` and ``a_max``. Also, for ``np.clip(a)`` or ``np.clip(a, None, None)`` | ||
| an identity will be returned instead of raising an error. |
There was a problem hiding this comment.
Maybe "a copy of the input array will be ..."
numpy/_core/fromnumeric.py
Outdated
| return (a, a_min, a_max) | ||
| def _clip_dispatcher(a, a_min=None, a_max=None, out=None, *, min=None, | ||
| max=None, **kwargs): | ||
| return (a,) |
There was a problem hiding this comment.
This seems wrong: there is no reason one couldn't dispatch on the minimum and maximum too (they can be arrays), so I think this should become return (a, a_min, a_max, min, max)
There was a problem hiding this comment.
Right, I made it (a, a_min, a_max, out, min, max) as other functions here also put out here. Done!
numpy/_core/fromnumeric.py
Outdated
|
|
||
| """ | ||
| if a_min is np._NoValue and a_max is np._NoValue: | ||
| a_min = None if min == np._NoValue else min |
There was a problem hiding this comment.
Replace == with is here and on the next line.
numpy/_core/fromnumeric.py
Outdated
| raise ValueError("Passing ``out`` and ``include_initial=True`` " | ||
| "at the same time is not supported.") | ||
|
|
||
| x = asarray(x) |
There was a problem hiding this comment.
Might as well use np.asanyarray(x) here, so that for astropy I don't have to override these new functions!
Alternatively, omit this whole line and use if np.ndim(x) >= 2 below.
There was a problem hiding this comment.
I used np.atleast1d(x) as for scalars func.accumulate throws an error. Internally atleast1d uses asanyarray. Used np.ndim(x) here also anyway.
numpy/_core/fromnumeric.py
Outdated
| for more details. | ||
| include_initial : bool, optional | ||
| Boolean indicating whether to include the initial value (zeros) as | ||
| the first value in the output. ``include_initial=True`` changes |
There was a problem hiding this comment.
Why is this constraint necessary?
numpy/_core/fromnumeric.py
Outdated
|
|
||
| Examples | ||
| -------- | ||
| >>> a = np.array([1,2,3,4,5,6]) |
There was a problem hiding this comment.
Do insert spaces after the , in the examples.
numpy/_core/fromnumeric.py
Outdated
| 1000000.0050000029 | ||
|
|
||
| """ | ||
| initial_func = np.zeros if include_initial else None |
There was a problem hiding this comment.
With my suggestion, the full implementation becomes,
return _cumulative_func(x, um.add, axis, dtype, out, include_initial)
numpy/_core/fromnumeric.py
Outdated
|
|
||
|
|
||
| def _cumulative_func(x, func, axis, dtype, out, initial_func): | ||
| if out is not None and initial_func is not None: |
There was a problem hiding this comment.
This is not really necessary, since we know where the output is placed (see below).
numpy/_core/fromnumeric.py
Outdated
|
|
||
| res = func(x, axis=axis, dtype=dtype, out=out) | ||
|
|
||
| if initial_func: |
There was a problem hiding this comment.
This would become if include_initial with my suggestion.
numpy/_core/numeric.py
Outdated
| raise TypeError( | ||
| f"Input should be a NumPy array. It is a {type(x)} instead." | ||
| ) | ||
| if device not in ["cpu", None]: |
There was a problem hiding this comment.
Nitpick: make it device is not None and device != "cpu" to speed up the 99.999999% of the cases where the default is used.
|
@mhvk Thank you for a thorough review! I applied all your comments. |
mhvk
left a comment
There was a problem hiding this comment.
Looks good but for a few very minor nits. Maybe squash commits while you are at it?
numpy/_core/fromnumeric.py
Outdated
|
|
||
| def _cumulative_func(x, func, axis, dtype, out, include_initial): | ||
| x = np.atleast_1d(x) | ||
| x_ndim = ndim(x) |
There was a problem hiding this comment.
Good catch on atleast_1d. But now using ndim is no longer required, so I would just replace with x.ndim
| @@ -0,0 +1,6 @@ | |||
| * `numpy.cumulative_sum` and `numpy.cumulative_prod` were added as Array API | |||
| compatible alternatives for `numpy.cumsum` and `numpy.cumprod`. | |||
There was a problem hiding this comment.
Is it useful to mention the differences? I.e., that one cannot pass in an initial, but can include_initial in the result?
There was a problem hiding this comment.
I added one line about include_initial functionality. Done!
numpy/_core/fromnumeric.py
Outdated
|
|
||
| Examples | ||
| -------- | ||
| >>> a = np.array([1,2,3]) |
There was a problem hiding this comment.
While you are at it, one more space after comma...
numpy/_core/fromnumeric.py
Outdated
|
|
||
| The cumulative product for each column (i.e., over the rows) of ``b``: | ||
|
|
||
| >>> b = np.array([[1,2,3], [4,5,6]]) |
Numpy recently merged support for the 2023.12 revision of the Array API: numpy/numpy#26724 This breaks two of our tests and I've chosen to skip those tests for now: 1. The first breakage was caused by differences in how numpy and JAX cast negative floats to `uint8`. Specifically `np.float32(-1).astype(np.uint8)` returns `np.uint8(255)` whereas `jnp.float32(-1).astype(jnp.uint8)` produces `Array(0, dtype=uint8)`. We don't make any promises about consistency with casting floats to ints, noting that this can even be backend dependent. I don't believe this failure is identifying any unexpected behavior, and we test many other dtypes properly so I'm not concerned about skipping this test. 2. The second failure was caused by the fact that the approach we took in jax-ml#20550 to support backwards compatibility and the Array API for `clip` differs from the one used in numpy/numpy#26724. Again, the behavior is consistent, but it produces a different signature. I've skipped checking `clip`'s signature, but we should revisit it once the `a_min` and `a_max` parameters have been removed from JAX. Fixes jax-ml#22251
Numpy recently merged support for the 2023.12 revision of the Array API: numpy/numpy#26724 This breaks two of our tests: 1. The first breakage was caused by differences in how numpy and JAX cast negative floats to `uint8`. Specifically `np.float32(-1).astype(np.uint8)` returns `np.uint8(255)` whereas `jnp.float32(-1).astype(jnp.uint8)` produces `Array(0, dtype=uint8)`. We don't make any promises about consistency with casting floats to ints, noting that this can even be backend dependent. To fix our test, we now only generate positive inputs when the output dtype is unsigned. 2. The second failure was caused by the fact that the approach we took in jax-ml#20550 to support backwards compatibility and the Array API for `clip` differs from the one used in numpy/numpy#26724. Again, the behavior is consistent, but it produces a different signature. I've skipped checking `clip`'s signature, but we should revisit it once the `a_min` and `a_max` parameters have been removed from JAX. Fixes jax-ml#22251
This PR upgrades Array API and
array-api-testssuite to2023.12version:np.cumulative_sumandnp.cumulative_prod.minandmaxkeyword arguments tonp.clip. Allows passingNones for botha_minanda_max.devicekeyword argument tonp.astype.