API: Deprecate `np.int_` and `np.uint` #24794

mtsokol · 2023-09-25T11:47:28Z

Addresses #24743.

This PR deprecates np.int_ with np.uint, and restores np.long with np.ulong.

seberg · 2023-09-25T13:25:23Z

Hmmm, I am a bit worried how this will turn out with gh-24224 (although I missed the typing changes there). intp is a reasonable replacement after that for many of the long changes here, but may also be awkward sometimes?

mtsokol · 2023-09-25T13:42:30Z

Hmmm, I am a bit worried how this will turn out with gh-24224 (although I missed the typing changes there). intp is a reasonable replacement after that for many of the long changes here, but may also be awkward sometimes?

@seberg In this PR I just renamed int_ to long and uint to ulong, so a potential change long -> intp can be treated as int_ -> intp change, in my opinion.

seberg · 2023-09-25T13:48:01Z

True, just going to have to touch a lot of this twice, made me wonder if it really is so bad to have np.int_ TBH, but I can't form a strong enough opinion to really push fo rthat.

mtsokol · 2023-10-06T10:22:28Z

All downstream libs PRs have been merged.

ngoldbaum

LGTM! In the interest of avoiding breaking downstream CI on a Friday I'd like to merge this early next week.

eric-wieser · 2023-10-06T17:11:54Z

This seems pretty disruptive to me; when deprecating something in numpy, usually there is a replacement that already exists; but in this case, np.long has only just been added (as far as I can tell). This leaves downstream projects with no way to work on old and new numpy versions simultaneously.

Maybe that's fine here due to this targeting 2.0, and so the normal approach of PendingDeprecationWarning / an intermediate release with the replacement but no warning isn't appropriate?

eric-wieser · 2023-10-06T17:15:14Z

doc/source/release/1.21.0-notes.rst

 A mypy_ plugin is now available for automatically assigning the (platform-dependent)
 precisions of certain `~numpy.number` subclasses, including the likes of
-`~numpy.int_`, `~numpy.intp` and `~numpy.longlong`. See the documentation on
+`~numpy.long`, `~numpy.intp` and `~numpy.longlong`. See the documentation on


I think this is misleading, as in 1.21 np.long meant something different.

Ah right, I added this change by accident. Reverted!

eric-wieser · 2023-10-06T17:16:18Z

doc/release/upcoming_changes/24794.deprecation.rst

@@ -0,0 +1,3 @@
+* ``np.int_`` has been deprecated. Use ``np.long`` instead.


Suggested change

* ``np.int_`` has been deprecated. Use ``np.long`` instead.

* ``np.int_`` as the spelling of C's `long` has been deprecated. Use the new ``np.long`` instead.

Sure, changed!

ngoldbaum · 2023-10-06T17:46:33Z

This leaves downstream projects with no way to work on old and new numpy versions simultaneously.

I had envisioned something like:

try:
    np_long = np.long
except AttributeError
    np_long = np.int_

Or equivalently something that switches on numpy version.

I'd be fine with leaving behind an np.int_ and np.uint_ that emits a PendingDeprecationWarning in 2.0 and DeprecationWarning in 2.1 if you feel like that would help but with this NEP 52 stuff we've been trying to rip off bandaids when possible so we don't delay pain unnecessarily.

rgommers · 2023-10-06T18:16:01Z

The other relevant argument is that 8 or 9 out of 10 times, the use of np.int_ in downstream code is a mistake. It should typically be int64. Also, dtype=np.int_ can be replaced with dtype=int is one really wants to avoid a try-except. Overall, there isn't really significant disruption here I'd say.

ngoldbaum · 2023-10-06T18:27:34Z

I missed earlier - this needs an update to the migration guide I think.

In the vein of Ralf's comment, the release note, migration guide, and maybe the error message should probably indicate that another possible replacement is to use the python int builtin or equivalently in NumPy 2.0 (not the main branch yet), int64.

eric-wieser · 2023-10-06T23:27:10Z

Also, dtype=np.int_ can be replaced with dtype=int is one really wants to avoid a try-except

I don't think this is a good idea if we're planning to change the meaning of int to mean int64 in future instead of C long. I think this might require a fairly detailed explanation in the migration guide regarding whether the intent is just "be an integer" (9/10) or "be a C long" (1/10).

rgommers · 2023-10-07T06:56:34Z

I don't think this is a good idea if we're planning to change the meaning of int to mean int64 in future instead of C long.

Good point, I agree. I didn't consider that we'd want to do this in the future, but it does seem like a logical change to make.

seberg · 2023-10-07T06:59:47Z

The only way I can think of it is that np.int_ refers to the default integer which happens to be a C-long, but as far as I am aware there is agreement that we should try to make it an intp/ssize_t at the very least to fix windows 64 bit (unless someone wants to make it a 64bit always).

So that doesn't actually fix the problem because even if you keep np.int_ as the default integer, it isn't the same as long on windows 64bit anymore.

Now dtype=int then isn't the right workaround when you need a long. It is a decent spelling for "default integer", though. Fortunately, in most cases dtype="long" always worked as does np.dtype("long") so the try/except is hopefully less pressing.

I don't care too much about what happens with np.int_ because users don't normally pass things to C unchecked so the problem is really only the Cython int_t type which has no reasonable meaning. If a user starts using 64bit rather than long, it might break something, but at least it should be loud about it (and frankly, there is a lot of software just assuming default int is 64bit and while that software may be fine with ignoring 32bit platforms, they simply shrug off the fact that they don't work on windows).

mtsokol · 2023-10-09T07:58:44Z

Just FYI, this comment (scipy/scipy#19310 (comment)) shows the source where np.dtype(int) is interpreted as C-Long right now.

seberg · 2023-10-09T08:48:56Z

My appetite for removing np.int_ is really not increasing. What Eric said above (and I mentioned a few times), is that in the majority of cases replacing np.int_ with long is counterproductive. Because what you mean is "give me an integer" which should be the default integer which is only coincidentally long and we are very actively talking about changing that.
(Which is frankly far more important than this change, unless we decide against it.)

Yes, there are a few places where np.int_ does represent long. But that is mainly in C "heavy" modules like SciPy and even there it is true only so much of the time.

So the point remains: If this nudges even a fraction of users to using long rather than np.dtype(int).type then I am not sure there is much gain? The advantage of removing np.int_ seems mainly to force reviewing every use of it, but the likely error path here is that windows users of some cython/C extensions will get errors because the C/Cython side needs to be fixed for our change of what the default integer is. (Which is annoying, but there will also be many such will be less broken by the change.)

ngoldbaum · 2023-10-09T14:21:42Z

In that case would you be ok with restoring np.float_ and np.complex_ to keep the API consistent?

eric-wieser · 2023-10-09T14:41:14Z

Yes, there are a few places where np.int_ does represent long. But that is mainly in C "heavy" modules like SciPy and even there it is true only so much of the time.

Whether we remove np.int_ or not, I think we should certainly add np.long and np.ulong to the public API so that we are free to eventually point int_ at something else.

seberg · 2023-10-09T14:46:09Z

In that case would you be ok with restoring np.float_ and np.complex_ to keep the API consistent?

I don't really care either way (so yes, I am OK with it). Those don't have the confusion about what is the default.

I mainly dislike that it seems to become quite hard to spell "default integer" and that seems to lead to confusion where we replace np.int_ with np.long when the code should be using np.dtype(int).type() or np.intp once we change that. But either of those seems rather confusing to me!
There could be an argument that int_ is a terrible name default integer and that we should just introduce a new one.

rgommers · 2023-10-09T15:24:59Z

Because what you mean is "give me an integer" which should be the default integer

It depends on what that means, but as a concept this is not well-defined in NumPy, while in the Array API standard it is (and it should be int64 rather than anything platform-dependent): https://data-apis.org/array-api/latest/API_specification/data_types.html#default-data-types. So let's not try to use that exact term for anything but that. Same for float_ -> should be float64.

Even on 32-bit systems, the "default integer" should be int64 , that works just fine. It may be slower for some things, but prevent overflows for numerical code (and since 32-bit is often untested in CI, writing code that may overflow on 32-bit systems happens quite a bit). In general, platform-dependent "defaults" are bad. If you know you want a platform-specific thing, you should make that choice explicitly by using long (a C-like name) or some other name that makes it clear that that is what you intended.

The way I see it, we should recommend users to use the following:

I'm writing generic code for typical numpy operations -> default to using np.int64 (also named the "default integer dtype")
I know these are small numbers and really want to save some memory -> use np.int32
I'm using something for indexing -> use np.intp
I on purpose want to use a platform-dependent type here matching C's long (or long long) -> use a "C like type" like np.long / np.longlong

In that case would you be ok with restoring np.float_ and np.complex_ to keep the API consistent?

These are not useful, so let's please not restore them. They're only counterproductive.

rgommers · 2023-10-09T15:28:02Z

We can perhaps postpone removal of int_ for a few releases if it's really clear that removing it now is too annoying (that's a toss-up I'd say), but we should aim to remove it and guide users better - because they're using it now only because we had np.int and then deprecated them and pointed them at np.int_. Neither name/dtype was a good idea.

rkern · 2023-10-09T15:38:39Z

It depends on what that means, but as a concept this is not well-defined in NumPy,

Sure it is! We've talked about Numeric/numarray/numpy having default integer and floating point types since the beginning. Essentially, the default integer type is whatever dtype np.array([1]) turns out to be. Up until now that has been defined to be whatever dtype maps to a signed C long on the platform (following Python 1 and 2's definition of a default integer type). We're looking to change that definition now (wisely), but "default integer dtype" is a concept that exists and has an existing definition to change.

rgommers · 2023-10-09T15:59:21Z

but "default integer dtype" is a concept that exists and has an existing definition to change.

You're right, thanks for the correction - it exists indeed, it just isn't very clearly documented (but that's true for other things as well).

seberg · 2023-10-09T16:03:17Z

Let's split out the "what should the default be" to gh-24890. I thought since nobody has really disagreed said something else the many times I have said "intp seems the pragmatic choice to do something here", I thought that was consensus.
And I even thought we had consensus that this was a 2.0 candidate (yes, I also dislike that np.long doesn't exist on post versions for easier backcompat).

ngoldbaum · 2023-10-13T03:53:35Z

We ended up landing at the last community meeting that the most sensible thing might be to close this PR, leave int_ and uint alone, and live with the inconsistency of not having float_ and cfloat.

In principle we could replace int_ with intp but that's not actually any clearer of a name (p? pointer? what does that have to do with anything - all I care about is ints...). Granted the underscore isn't great either, but int_ has the virtue of existing already.

I think reintroducing np.long is uncontroversial, @mtsokol are you ok with closing this and making a followup just doing that?

seberg · 2023-10-13T07:59:44Z

Hmmm, yeah, it would probably be good to split out adding long (and maybe tests) from my PR to help move it along.
I will probably try to do that today unless you have time to take a shot @mtsokol (then I we can push that forward quickly)? That makes the rest easier there, changing the default is pretty simple there but the random changes are trickier so reducing it a bit is nice.

mtsokol · 2023-10-13T08:02:08Z

@seberg Sure! I will make a separate np.long PR out of it.

mtsokol self-assigned this Sep 25, 2023

github-actions bot added the 30 - API label Sep 25, 2023

mtsokol added the Numpy 2.0 API Changes label Sep 25, 2023

pllim mentioned this pull request Oct 3, 2023

TST: In the future np.long will be defined as the corresponding NumPy scalar astropy/astropy#15419

Closed

mtsokol force-pushed the deperace-int_-and-uint branch 2 times, most recently from 8ff7807 to eb525b5 Compare October 6, 2023 09:39

mtsokol requested a review from rgommers October 6, 2023 10:06

mtsokol added 4 commits October 6, 2023 16:57

API: Deprecate np.int_ and np.uint

b439470

Include missing renames

98d9c3d

Add changelog file

e1461ad

Fix after rebase

ed36420

mtsokol force-pushed the deperace-int_-and-uint branch from eb525b5 to ed36420 Compare October 6, 2023 15:08

ngoldbaum approved these changes Oct 6, 2023

View reviewed changes

eric-wieser reviewed Oct 6, 2023

View reviewed changes

Apply review changes

0c156c9

seberg mentioned this pull request Oct 9, 2023

DISCUSS: What should the default integer type/dtype be #24890

Closed

mtsokol closed this Oct 13, 2023

mtsokol deleted the deperace-int_-and-uint branch October 13, 2023 08:01

This was referenced Oct 13, 2023

Revert np.int_ and np.uint changes pandas-dev/pandas#55505

Closed

Revert np.int_ and np.uint changes scipy/scipy#19382

Closed

		@@ -0,0 +1,3 @@
		* ``np.int_`` has been deprecated. Use ``np.long`` instead.

	* ``np.int_`` has been deprecated. Use ``np.long`` instead.
	* ``np.int_`` as the spelling of C's `long` has been deprecated. Use the new ``np.long`` instead.

Uh oh!

API: Deprecate np.int_ and np.uint #24794

API: Deprecate np.int_ and np.uint #24794

Uh oh!

Conversation

mtsokol commented Sep 25, 2023

Uh oh!

seberg commented Sep 25, 2023

Uh oh!

mtsokol commented Sep 25, 2023

Uh oh!

seberg commented Sep 25, 2023

Uh oh!

mtsokol commented Oct 6, 2023

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Oct 6, 2023

Uh oh!

eric-wieser Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

mtsokol Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

eric-wieser Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

mtsokol Oct 9, 2023

Choose a reason for hiding this comment

Uh oh!

ngoldbaum commented Oct 6, 2023

Uh oh!

rgommers commented Oct 6, 2023

Uh oh!

ngoldbaum commented Oct 6, 2023

Uh oh!

eric-wieser commented Oct 6, 2023

Uh oh!

rgommers commented Oct 7, 2023

Uh oh!

seberg commented Oct 7, 2023

Uh oh!

mtsokol commented Oct 9, 2023

Uh oh!

seberg commented Oct 9, 2023

Uh oh!

ngoldbaum commented Oct 9, 2023

Uh oh!

eric-wieser commented Oct 9, 2023

Uh oh!

seberg commented Oct 9, 2023

Uh oh!

rgommers commented Oct 9, 2023

Uh oh!

rgommers commented Oct 9, 2023

Uh oh!

rkern commented Oct 9, 2023

Uh oh!

rgommers commented Oct 9, 2023

Uh oh!

seberg commented Oct 9, 2023

Uh oh!

ngoldbaum commented Oct 13, 2023

Uh oh!

seberg commented Oct 13, 2023

Uh oh!

mtsokol commented Oct 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

API: Deprecate `np.int_` and `np.uint` #24794

API: Deprecate `np.int_` and `np.uint` #24794