Skip to content

Conversation

@SwayamInSync
Copy link
Member

@SwayamInSync SwayamInSync commented Sep 30, 2025

closes #27231

Summary

This pull request removes the legacy MachAr class and related files from NumPy, streamlining the codebase and updating the internal dtype API to support querying numerical constants directly from dtypes. The changes modernize how machine limits and constants are handled, moving away from Python-side introspection to a more robust and extensible C API.

Removal of legacy Python code:

  • Removed the entire numpy/_core/_machar.py file, which contained the deprecated MachAr class for floating-point machine parameter introspection.
  • Removed the associated type stub file numpy/_core/_machar.pyi and its references from the build configuration, fully eliminating Python-side support for MachAr.
  • Cleaned up imports in numpy/_core/__init__.py to remove _machar, reflecting the removal of the module.
  • Removed initialization of legacy limits in numpy/__init__.py as they are no longer needed with the new approach.

Enhancements to dtype API for numerical constants:

  • Added new constant IDs (e.g., NPY_CONSTANT_zero, NPY_CONSTANT_one, NPY_CONSTANT_maximum_finite, etc.) and a new PyArrayDTypeMeta_GetConstant function to the dtype API, enabling direct querying of numerical constants from dtypes in C.
  • Updated internal offset definitions and macros in numpy/_core/include/numpy/dtype_api.h to accommodate the new constants and API changes.

Miscellaneous codebase improvements:

  • Added #include <float.h> in numpy/_core/src/multiarray/arraytypes.c.src to ensure access to floating-point limits in C.
  • Included the updated numpy/dtype_api.h header in arraytypes.c.src to support the new constant querying API.

seberg and others added 6 commits September 22, 2025 09:18
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a few small comments. It seems like at least one run fails around querying the subnormal information from the macros.
Not sure what to do about that, I suppose either hard-code or maybe use nextafter for them.
(if we do that, it might be nice to see if we can find out whether subnormals work in that step, no idea what nextafter does. OTOH, the typical thing is FTZ mode, which I suspect still allows manual creation with nextafter... so it's really a user problem and we would need an ftz attribute on finfo rather than changing what smallest_subnormal gives.)

return NULL;
}
PyObject *finfo = PyTuple_GetItem(args, 0);
if (finfo == NULL || finfo == Py_None) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't be NULL, but doesn't matter. I might not bother, it'll just say "can't set attributes of NoneType" without this, but happy either way.

#define NPY_CONSTANT_finfo_nmant 13
#define NPY_CONSTANT_finfo_min_exp 14
#define NPY_CONSTANT_finfo_max_exp 15
#define NPY_CONSTANT_finfo_decimal_digits 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to see opinions about the choice of constant here, and also the constants derived in Python.

@SwayamInSync
Copy link
Member Author

SwayamInSync commented Sep 30, 2025

Thanks a few small comments. It seems like at least one run fails around querying the subnormal information from the macros. Not sure what to do about that, I suppose either hard-code or maybe use nextafter for them. (if we do that, it might be nice to see if we can find out whether subnormals work in that step, no idea what nextafter does. OTOH, the typical thing is FTZ mode, which I suspect still allows manual creation with nextafter... so it's really a user problem and we would need an ftz attribute on finfo rather than changing what smallest_subnormal gives.)

I tried modifying the template to use nextafter when macro is not defined, only hardocded the16 bit (it was going 0)

edit: wait, maybe I can do better, let me fix all the other error cases first

@SwayamInSync
Copy link
Member Author

SwayamInSync commented Sep 30, 2025

Any reason why for double-double format tiny was set to NAN instead of DBL_MIN?

float_dd_ma = MachArLike(ld,
                         machep=-105,
                         negep=-106,
                         minexp=-1022,
                         maxexp=1024,
                         it=105,
                         ...
                         eps=exp2(ld(-105)),  # This is 2^-105!
                         epsneg=exp2(ld(-106)),
                         huge=nextafter(ld(inf), ld(0)),
                         tiny=nan,  # Set to NaN for double-double!
                         smallest_subnormal=nextafter(0., 1.))

@seberg
Copy link
Member

seberg commented Sep 30, 2025

Any reason why for double-double format tiny was set to NAN

IIRC it was some weird/wrong value, then things were changed and nobody wanted to figure out the right value, so it was set to NaN because that seemed still better than being nonsense.
I think there is a difficulty that the tiny definition is a bit trickier, because the precision of double-double changes over the full range.
Anyway, have a brief look at gh-19511 and then set it to something reasonable if you find a value (I would love to use the one from the headers, but dunno if even those are unreliable...).

@SwayamInSync
Copy link
Member Author

SwayamInSync commented Sep 30, 2025

I would love to use the one from the headers, but dunno if even those are unreliable

I think tiny might be taken from headers (as the underflow checks are passed) but the huge value will need to be what is set in machar above nextafter(ld(inf), ld(0))

Let me confirm this and post here whatever suits the best

@SwayamInSync
Copy link
Member Author

Only override the EPSILON-2^-105 to and MAX_FINITE=npy_nextafterl(NPY_INFINITY, 0.0L) seems to worked for native ppc64le

@SwayamInSync
Copy link
Member Author

SwayamInSync commented Sep 30, 2025

For this particular issue of testing array_repr

def test_array_repr():
o = 1 + LD_INFO.eps
a = np.array([o])
b = np.array([1], dtype=np.longdouble)
if not np.all(a != b):
raise ValueError("precision loss creating arrays")
assert_(repr(a) != repr(b))

platforms where long double == double (float64) earlier it is only getting passed because of the difference in data type namings and default precision value of 8 for printing

"precision": 8, # precision of floating point representations

Windows Old behaviour (before refactoring)

Resulting in 2 arrays as following

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '2.3.3'

In [3]: LD_INFO = np.finfo(np.longdouble)

In [4]: o = 1 + LD_INFO.eps

In [5]: a = np.array([o])

In [6]: b = np.array([1], dtype=np.longdouble)

In [7]: a, b
Out[7]: (array([1.]), array([1.], dtype=float64))

As you can see the only difference is in dtype leading to that test's assert to fail leading to passing the test

In this new behaviour

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '2.4.0.dev0+git20250930.d8508a2'

In [3]: LD_INFO = np.finfo(np.longdouble)

In [4]: o = 1 + LD_INFO.eps

In [5]: a = np.array([o])

In [6]: b = np.array([1], dtype=np.longdouble)

In [7]: a, b
Out[7]: (array([1.], dtype=float64), array([1.], dtype=float64))

So since both get dispatched from the same dtype, results in failing the test, I am not sure whether testing this was the intention behind this test. Anyways the hack to retain the original behaviour is as follows

inside getlimits.py

# On platforms where longdouble is the same size as double (e.g., Windows),
# use double descriptor to populate constants for backward compatibility.
# The old MachArLike code would match the float64 signature on such platforms
# and return float64 scalars.
if (self.dtype.type == ntypes.longdouble and
    self.dtype.itemsize == numeric.dtype(ntypes.double).itemsize):
     self.dtype = populate_dtype
     populate_dtype = numeric.dtype(ntypes.double)
else:
     populate_dtype = self.dtype
# Fills in all constants defined directly on the dtype (in C)
_populate_finfo_constants(self, populate_dtype)

This override to use float64 which is as per here the implied dtype and might not need to explicitly put in array representations

_typelessdata = [int_, float64, complex128, _nt.bool]
def dtype_is_implied(dtype):
"""
Determine if the given dtype is implied by the representation
of its values.
Parameters
----------
dtype : dtype
Data type
Returns
-------
implied : bool
True if the dtype is implied by the representation of its values.
Examples
--------
>>> import numpy as np
>>> np._core.arrayprint.dtype_is_implied(int)
True
>>> np.array([1, 2, 3], int)
array([1, 2, 3])
>>> np._core.arrayprint.dtype_is_implied(np.int8)
False
>>> np.array([1, 2, 3], np.int8)
array([1, 2, 3], dtype=int8)
"""
dtype = np.dtype(dtype)
if format_options.get()['legacy'] <= 113 and dtype.type == np.bool:
return False
# not just void types can be structured, and names are not part of the repr
if dtype.names is not None:
return False
# should care about endianness *unless size is 1* (e.g., int8, bool)
if not dtype.isnative:
return False
return dtype.type in _typelessdata

@seberg what you suggest? keeping the current way (which retain the past behaviour) or need to change the array print behaviour by increasing the precision to respectively used dtype and ignorning the name of dtype (Not sure how useful, maybe can do in a different PR)?

@SwayamInSync
Copy link
Member Author

The Qemu/loongarch64 docker image is not even building so considering that out for now. This PR is ready for the review

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving as comments. I like this, how happy are you with the approach?

Did you have a look at the derived values I had already created? The other question is whether we should rescue some of the old parts (much reduced) and slam them into tests?

Need to have another pass, but wanted to give these comments. But also, I think it's basically done.
(With this change, I assume there may be weird platform fallouts eventually, but that is something to deal with when it happens.)

#define NPY_CONSTANT_maximum_finite 4
#define NPY_CONSTANT_minimum_finite 5
#define NPY_CONSTANT_inf 6
#define NPY_CONSTANT_ninf 7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use a clearer name, e.g. neg_inf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I know I added it, but I guess we could even remove it and define it in Python as -inf. I think we removed np.ninf after all. (But I don't mind either way).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well still change this I guess.

@SwayamInSync
Copy link
Member Author

@seberg I need an opinion here

From macros FLT_MANT_DIG=24 includes the implicit leading bit for precision (IEEE specifies 23 explicit fraction bits but effective 24-bit significand);
FLT_MIN_EXP=-125 reflects C's radix-agnostic definition where the exponent is offset by +1 compared to IEEE's unbiased min of -126 for normalized numbers; similarly, FLT_MAX_EXP=128 offsets IEEE's max unbiased exponent of +127, marking the overflow threshold.
With MAChar these values were hardcoded so somewhere the values are as per IEEE (nmant, minexp) and somewhere as per C macro (maxexp)

So do you prefer the complete original behaviour or we should take the whatever value comes from the header as finfo and adjust the derived values correct with proper documentations around this?

@SwayamInSync
Copy link
Member Author

Also orignally machar used values make more sense atleast for nmant and sticking to that won't break any user's code, in case if they were doing some nmant or minexp dependent calculations and suddenly after refactor upgrade they are seeing float32 showing 24 instead of 23 mantissa bits

@seberg
Copy link
Member

seberg commented Oct 3, 2025

So do you prefer the complete original behaviour or we should take the whatever value comes from the header as finfo and adjust the derived values correct with proper documentations around this?

Ufff, that is annoying that these definitions are so subtly different! For the Python side, we clearly can't change the values.

For nmant, I could see calling it mant_dig instead and using the C definition on the C-side. However for min_exp/max_exp that sleigh of hands doesn't work that well, unless we name it _c_max_exp, but...
So think I lean towards using the identical finfo_... naming and documenting that these do not match the C definition unfortunately (at least for the new C docs).

(I would like to still tag the integer constants, but if you like I can push that also.)

SwayamInSync

This comment was marked as resolved.

/*
Definition: Minimum negative integer such that FLT_RADIX raised by power one less than that integer is a normalized float, double and long double respectively
refernce: https://en.cppreference.com/w/c/types/limits.html
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left these comments for developers so that they can reference why we are subtracting 1

from numpy import exp2, log10


class MachArLike:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file mimics the old machar behaviour to test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept in separate file since machar is not going to be around so not mixing this with other tests

Copy link
Member Author

@SwayamInSync SwayamInSync left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes ensure the values remains as original and we can document them in new finfo docs on what C macros returns and where we tweak it for backwards compatibility.

This way if user will be aware of the behaviour and if needs the true C macro value then he can just add +1 to the respective fields to get them.

@SwayamInSync
Copy link
Member Author

@seberg if this look good enough then I can proceed with the documentation

@SwayamInSync
Copy link
Member Author

Hmm... test failures aren't looking to be coming from finfo refactor

@seberg seberg merged commit f001a70 into numpy:main Oct 5, 2025
79 checks passed
@seberg
Copy link
Member

seberg commented Oct 5, 2025

@SwayamInSync decided to just put it in. I feel this should be reasonable both from a general API perspective and from the changes to the finfo.

(Although, I will expect that there may be some follow-ups with finfo, it's always trickier than you expect.)

I think it's just as well to include release notes that you think are helpful in a seperate PR to add docs (and at that point -- or earlier -- others will also get another chance to review the API choices here).

@SwayamInSync
Copy link
Member Author

Thanks @seberg , will do the doc PR asap

bwhitt7 pushed a commit to bwhitt7/numpy that referenced this pull request Oct 7, 2025
Introduce `NPY_DT_get_constant` as a slot and a corresponding function that fills in a single element pointer for an arbitrary (corresponding) dtype/descriptor.
Since we do want some integer values for `finfo`, some values are set to always fill in an `npy_intp` value instead.

The slot assumes that it will be used with the GIL held and on uninitialized data, since I suspect that this is the typical use-case (i.e. it is unlikely that we need to fetch a constant deep during a calculation, especially in a context where the correct type/value isn't known anyway).

This completely re-organizes and simplifies `finfo` by moving all definitions to C.

Co-authored-by: Sebastian Berg <[email protected]>
seberg pushed a commit that referenced this pull request Oct 13, 2025
…29836 (#29889)

This is a follow-up PR of work happened in gh-29836

* The deprecated MachAr runtime discovery mechanism has been removed.
* np.finfo fetches the constants provided by the compiler macros
* new slot to fetch the dtype related constants
@jorenham jorenham added this to the 2.4.0 release milestone Oct 14, 2025
mattip added a commit that referenced this pull request Oct 15, 2025
IndifferentArea pushed a commit to IndifferentArea/numpy that referenced this pull request Dec 7, 2025
Introduce `NPY_DT_get_constant` as a slot and a corresponding function that fills in a single element pointer for an arbitrary (corresponding) dtype/descriptor.
Since we do want some integer values for `finfo`, some values are set to always fill in an `npy_intp` value instead.

The slot assumes that it will be used with the GIL held and on uninitialized data, since I suspect that this is the typical use-case (i.e. it is unlikely that we need to fetch a constant deep during a calculation, especially in a context where the correct type/value isn't known anyway).

This completely re-organizes and simplifies `finfo` by moving all definitions to C.

Co-authored-by: Sebastian Berg <[email protected]>
IndifferentArea pushed a commit to IndifferentArea/numpy that referenced this pull request Dec 7, 2025
…umpy#29836 (numpy#29889)

This is a follow-up PR of work happened in numpygh-29836

* The deprecated MachAr runtime discovery mechanism has been removed.
* np.finfo fetches the constants provided by the compiler macros
* new slot to fetch the dtype related constants
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: np.finfo support for Numpy User DTypes

5 participants