Skip to content

Conversation

@mtsokol
Copy link
Member

@mtsokol mtsokol commented Sep 25, 2023

Addresses #24743.

This PR deprecates np.int_ with np.uint, and restores np.long with np.ulong.

@seberg
Copy link
Member

seberg commented Sep 25, 2023

Hmmm, I am a bit worried how this will turn out with gh-24224 (although I missed the typing changes there). intp is a reasonable replacement after that for many of the long changes here, but may also be awkward sometimes?

@mtsokol
Copy link
Member Author

mtsokol commented Sep 25, 2023

Hmmm, I am a bit worried how this will turn out with gh-24224 (although I missed the typing changes there). intp is a reasonable replacement after that for many of the long changes here, but may also be awkward sometimes?

@seberg In this PR I just renamed int_ to long and uint to ulong, so a potential change long -> intp can be treated as int_ -> intp change, in my opinion.

@seberg
Copy link
Member

seberg commented Sep 25, 2023

True, just going to have to touch a lot of this twice, made me wonder if it really is so bad to have np.int_ TBH, but I can't form a strong enough opinion to really push fo rthat.

@mtsokol
Copy link
Member Author

mtsokol commented Oct 6, 2023

All downstream libs PRs have been merged.

@mtsokol mtsokol force-pushed the deperace-int_-and-uint branch from eb525b5 to ed36420 Compare October 6, 2023 15:08
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! In the interest of avoiding breaking downstream CI on a Friday I'd like to merge this early next week.

@eric-wieser
Copy link
Member

This seems pretty disruptive to me; when deprecating something in numpy, usually there is a replacement that already exists; but in this case, np.long has only just been added (as far as I can tell). This leaves downstream projects with no way to work on old and new numpy versions simultaneously.

Maybe that's fine here due to this targeting 2.0, and so the normal approach of PendingDeprecationWarning / an intermediate release with the replacement but no warning isn't appropriate?

A mypy_ plugin is now available for automatically assigning the (platform-dependent)
precisions of certain `~numpy.number` subclasses, including the likes of
`~numpy.int_`, `~numpy.intp` and `~numpy.longlong`. See the documentation on
`~numpy.long`, `~numpy.intp` and `~numpy.longlong`. See the documentation on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is misleading, as in 1.21 np.long meant something different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I added this change by accident. Reverted!

@@ -0,0 +1,3 @@
* ``np.int_`` has been deprecated. Use ``np.long`` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``np.int_`` has been deprecated. Use ``np.long`` instead.
* ``np.int_`` as the spelling of C's `long` has been deprecated. Use the new ``np.long`` instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, changed!

@ngoldbaum
Copy link
Member

This leaves downstream projects with no way to work on old and new numpy versions simultaneously.

I had envisioned something like:

try:
    np_long = np.long
except AttributeError
    np_long = np.int_

Or equivalently something that switches on numpy version.

I'd be fine with leaving behind an np.int_ and np.uint_ that emits a PendingDeprecationWarning in 2.0 and DeprecationWarning in 2.1 if you feel like that would help but with this NEP 52 stuff we've been trying to rip off bandaids when possible so we don't delay pain unnecessarily.

@rgommers
Copy link
Member

rgommers commented Oct 6, 2023

The other relevant argument is that 8 or 9 out of 10 times, the use of np.int_ in downstream code is a mistake. It should typically be int64. Also, dtype=np.int_ can be replaced with dtype=int is one really wants to avoid a try-except. Overall, there isn't really significant disruption here I'd say.

@ngoldbaum
Copy link
Member

I missed earlier - this needs an update to the migration guide I think.

In the vein of Ralf's comment, the release note, migration guide, and maybe the error message should probably indicate that another possible replacement is to use the python int builtin or equivalently in NumPy 2.0 (not the main branch yet), int64.

@eric-wieser
Copy link
Member

Also, dtype=np.int_ can be replaced with dtype=int is one really wants to avoid a try-except

I don't think this is a good idea if we're planning to change the meaning of int to mean int64 in future instead of C long. I think this might require a fairly detailed explanation in the migration guide regarding whether the intent is just "be an integer" (9/10) or "be a C long" (1/10).

@rgommers
Copy link
Member

rgommers commented Oct 7, 2023

I don't think this is a good idea if we're planning to change the meaning of int to mean int64 in future instead of C long.

Good point, I agree. I didn't consider that we'd want to do this in the future, but it does seem like a logical change to make.

@seberg
Copy link
Member

seberg commented Oct 7, 2023

The only way I can think of it is that np.int_ refers to the default integer which happens to be a C-long, but as far as I am aware there is agreement that we should try to make it an intp/ssize_t at the very least to fix windows 64 bit (unless someone wants to make it a 64bit always).

So that doesn't actually fix the problem because even if you keep np.int_ as the default integer, it isn't the same as long on windows 64bit anymore.

Now dtype=int then isn't the right workaround when you need a long. It is a decent spelling for "default integer", though. Fortunately, in most cases dtype="long" always worked as does np.dtype("long") so the try/except is hopefully less pressing.

I don't care too much about what happens with np.int_ because users don't normally pass things to C unchecked so the problem is really only the Cython int_t type which has no reasonable meaning. If a user starts using 64bit rather than long, it might break something, but at least it should be loud about it (and frankly, there is a lot of software just assuming default int is 64bit and while that software may be fine with ignoring 32bit platforms, they simply shrug off the fact that they don't work on windows).

@mtsokol
Copy link
Member Author

mtsokol commented Oct 9, 2023

Just FYI, this comment (scipy/scipy#19310 (comment)) shows the source where np.dtype(int) is interpreted as C-Long right now.

@seberg
Copy link
Member

seberg commented Oct 9, 2023

My appetite for removing np.int_ is really not increasing. What Eric said above (and I mentioned a few times), is that in the majority of cases replacing np.int_ with long is counterproductive. Because what you mean is "give me an integer" which should be the default integer which is only coincidentally long and we are very actively talking about changing that.
(Which is frankly far more important than this change, unless we decide against it.)

Yes, there are a few places where np.int_ does represent long. But that is mainly in C "heavy" modules like SciPy and even there it is true only so much of the time.

So the point remains: If this nudges even a fraction of users to using long rather than np.dtype(int).type then I am not sure there is much gain? The advantage of removing np.int_ seems mainly to force reviewing every use of it, but the likely error path here is that windows users of some cython/C extensions will get errors because the C/Cython side needs to be fixed for our change of what the default integer is. (Which is annoying, but there will also be many such will be less broken by the change.)

@ngoldbaum
Copy link
Member

In that case would you be ok with restoring np.float_ and np.complex_ to keep the API consistent?

@eric-wieser
Copy link
Member

Yes, there are a few places where np.int_ does represent long. But that is mainly in C "heavy" modules like SciPy and even there it is true only so much of the time.

Whether we remove np.int_ or not, I think we should certainly add np.long and np.ulong to the public API so that we are free to eventually point int_ at something else.

@seberg
Copy link
Member

seberg commented Oct 9, 2023

In that case would you be ok with restoring np.float_ and np.complex_ to keep the API consistent?

I don't really care either way (so yes, I am OK with it). Those don't have the confusion about what is the default.

I mainly dislike that it seems to become quite hard to spell "default integer" and that seems to lead to confusion where we replace np.int_ with np.long when the code should be using np.dtype(int).type() or np.intp once we change that. But either of those seems rather confusing to me!
There could be an argument that int_ is a terrible name default integer and that we should just introduce a new one.

@rgommers
Copy link
Member

rgommers commented Oct 9, 2023

Because what you mean is "give me an integer" which should be the default integer

It depends on what that means, but as a concept this is not well-defined in NumPy, while in the Array API standard it is (and it should be int64 rather than anything platform-dependent): https://data-apis.org/array-api/latest/API_specification/data_types.html#default-data-types. So let's not try to use that exact term for anything but that. Same for float_ -> should be float64.

Even on 32-bit systems, the "default integer" should be int64 , that works just fine. It may be slower for some things, but prevent overflows for numerical code (and since 32-bit is often untested in CI, writing code that may overflow on 32-bit systems happens quite a bit). In general, platform-dependent "defaults" are bad. If you know you want a platform-specific thing, you should make that choice explicitly by using long (a C-like name) or some other name that makes it clear that that is what you intended.

The way I see it, we should recommend users to use the following:

  • I'm writing generic code for typical numpy operations -> default to using np.int64 (also named the "default integer dtype")
  • I know these are small numbers and really want to save some memory -> use np.int32
  • I'm using something for indexing -> use np.intp
  • I on purpose want to use a platform-dependent type here matching C's long (or long long) -> use a "C like type" like np.long / np.longlong

In that case would you be ok with restoring np.float_ and np.complex_ to keep the API consistent?

These are not useful, so let's please not restore them. They're only counterproductive.

@rgommers
Copy link
Member

rgommers commented Oct 9, 2023

We can perhaps postpone removal of int_ for a few releases if it's really clear that removing it now is too annoying (that's a toss-up I'd say), but we should aim to remove it and guide users better - because they're using it now only because we had np.int and then deprecated them and pointed them at np.int_. Neither name/dtype was a good idea.

@rkern
Copy link
Member

rkern commented Oct 9, 2023

It depends on what that means, but as a concept this is not well-defined in NumPy,

Sure it is! We've talked about Numeric/numarray/numpy having default integer and floating point types since the beginning. Essentially, the default integer type is whatever dtype np.array([1]) turns out to be. Up until now that has been defined to be whatever dtype maps to a signed C long on the platform (following Python 1 and 2's definition of a default integer type). We're looking to change that definition now (wisely), but "default integer dtype" is a concept that exists and has an existing definition to change.

@rgommers
Copy link
Member

rgommers commented Oct 9, 2023

but "default integer dtype" is a concept that exists and has an existing definition to change.

You're right, thanks for the correction - it exists indeed, it just isn't very clearly documented (but that's true for other things as well).

@seberg
Copy link
Member

seberg commented Oct 9, 2023

Let's split out the "what should the default be" to gh-24890. I thought since nobody has really disagreed said something else the many times I have said "intp seems the pragmatic choice to do something here", I thought that was consensus.
And I even thought we had consensus that this was a 2.0 candidate (yes, I also dislike that np.long doesn't exist on post versions for easier backcompat).

@ngoldbaum
Copy link
Member

We ended up landing at the last community meeting that the most sensible thing might be to close this PR, leave int_ and uint alone, and live with the inconsistency of not having float_ and cfloat.

In principle we could replace int_ with intp but that's not actually any clearer of a name (p? pointer? what does that have to do with anything - all I care about is ints...). Granted the underscore isn't great either, but int_ has the virtue of existing already.

I think reintroducing np.long is uncontroversial, @mtsokol are you ok with closing this and making a followup just doing that?

@mtsokol mtsokol closed this Oct 13, 2023
@seberg
Copy link
Member

seberg commented Oct 13, 2023

Hmmm, yeah, it would probably be good to split out adding long (and maybe tests) from my PR to help move it along.
I will probably try to do that today unless you have time to take a shot @mtsokol (then I we can push that forward quickly)? That makes the rest easier there, changing the default is pretty simple there but the random changes are trickier so reducing it a bit is nice.

@mtsokol mtsokol deleted the deperace-int_-and-uint branch October 13, 2023 08:01
@mtsokol
Copy link
Member Author

mtsokol commented Oct 13, 2023

@seberg Sure! I will make a separate np.long PR out of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants