-
-
Notifications
You must be signed in to change notification settings - Fork 12k
MAINT: Refactor of core/_type_aliases.py
#24411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
91d92ec to
1a495ab
Compare
|
That's a very useful analysis and clear table, thanks @mtsokol. I think what may be useful for added insight here is explicitly listing all the names for the first two rows. I'd name those "canonical names for C types" and "canonical names in Python". There will be some adjustments here already, e.g. the actual C name
|
|
Sure! I included such table and I filled it with values coming from
|
|
Sorry, maybe I should have suggested 3 columns. What I meant was the Python "C-like" and "Pythonic" names. The actual type in C could be useful too. So for this one for example: in Python code we'd prefer if people wrote The
|
|
I updated the 2nd table with three columns ("Canonical Python", "C-like python", "compatible C type") - I wanted to clarify some rows.
As
Sure! Moved to separate rows. |
I think I'd write it like this:
So for C types that vary in size are exposed in the Python API, we don't have a canonical sized alias, and there should be one preferred name only. So C int becomes |
I updated the table so there are no multiple options depending on architecture (thus |
|
Getting there! Still some more to do for integers I think. What I'd like to get to is for each row to be unique. C
|
My previous update accidentally didn't get saved - I updated the table one more time. It should finally have unique values in each cell. |
|
This looks quite good to me now, thanks Mateusz. Last thoughts for now:
|
1a495ab to
109b1aa
Compare
6eae269 to
bdcb9ac
Compare
|
Closing as #24651 has been opened to hide outdated comments. |
Hi @rgommers @seberg @ngoldbaum,
Here I'm sharing a proposition of
numpy/core/_type_aliases.pyfile refactor.While working on removing other scalar aliases I had some difficulties understanding how
allTypes/sctypeDictdictionaries are being created. For example, I noticed that thefloatalias is originally present insctypeDict, then it's removed and then added again.I decided to try to refactor this file to decompose the creation of these dictionaries into well-defined stages.
I think the purpose of this file is straightforward: Read types from the compiled
multiarraymodule, and expose them as Python dictionaries, while mapping C naming conventions to Python conventions (e.g. interpretfloatasdoubleinstead ofsingle).I identified which "group" of entries is present in each dictionary (could it be modified for NumPy 2.0?):
words(e.g. "float", "cdouble", "int_")words+bits(e.g. "int16", "complex64")symbols(e.g. "p", "L")symbols+bytes(e.g. "c8", "b1")abstracts(e.g. "inexact", "integer")numbers(e.g. 1, 2, 3)aliases(e.g. "complex_", "longfloat")extra/other aliases(e.g. "bool")Additionally, here's a decomposition of
wordsandwords+bitsinto three columns (/symbol means that it may vary depending on 32 or 64-bit arch):In the refactored file I build each of these groups to then join them into final dicts. I do it in a loop in
def _build_dicts(), where sizedint&uintaliases must take under consideration a "priority" of types - in case at least two of them are same bitsize.After that I rename and add aliases to the created groups. In the last stage I clean entries that aren't present in the original implementation (these are custom cases).
The last, and separate, stage creates
sctypeswhich I copied, as the implementation was clear to me.Important to mention, this modifies a crucial piece of code that could be tricky to debug (e.g. canonical int names are aliased incorrectly on some architecture (I tested it locally only on macbook with Intel)).
I renamed the old implementation to
_expired_type_aliases.pyand added a testtest_new_vs_expired_type_aliasesthat checks if dictionaries created by the new implementation are exactly the same as done by the previous code. This should flag in the CI if the new implementation fails to mimic the previous one on a specific architecture. (After some time, the original impl could be completely removed)Please share your feedback!