Skip to content

Conversation

@iritkatriel
Copy link
Owner

@iritkatriel iritkatriel commented Aug 24, 2021

TODO:

  1. Add the tuples
  2. PREDICT stuff in ceval.c
  3. fix test_modulefinder
  4. replace LOAD_ASSERTION_ERROR
  5. Review the tests to see if any are now missing

Copy link

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that looks cool! Maybe the prediction machinery should be directed here too?

@iritkatriel
Copy link
Owner Author

the prediction machinery

The who?

@gvanrossum
Copy link

gvanrossum commented Aug 24, 2021 via email

@iritkatriel
Copy link
Owner Author

ah yes, things like this:
```
ADDOP(c, GET_YIELD_FROM_ITER);
ADDOP_LOAD_CONST(c, Py_None);


and then `GET_YIELD_FROM_ITER` predicts LOAD_CONST.. 

@gvanrossum
Copy link

gvanrossum commented Aug 24, 2021 via email

@iritkatriel
Copy link
Owner Author

Yes, I'll check each case. In the case I quoted the constant is None so it's a common const for sure.

@iritkatriel
Copy link
Owner Author

I have some issues with when/where I allocated the common consts data - refleaks tests show I allocated it more than once (and if I try not to then subinterpreters don't have it). I'll have to rethink where this actually sits (I put it next to the small_ints but allocate in another place because I need the AttributeError exception etc to be ready).

I need to go soon so will continue tomorrow.

@gvanrossum
Copy link

Did you make any progress on the leak? Where you put the initialization looks fine to me (but I'm no expert on that part of the interpreter).

@iritkatriel
Copy link
Owner Author

I'm just looking at the leak. I'm moving the dict to the interpreter state alongside the array, so that they have the same lifetime. And I'm putting the alloc.dealloc alongside PyLong_Init/Fini. (I'm aiming to make this alloc/dealloc like small_ints).

That said, I'm not sure why the small_ints array is allocated on the interpreter state. Wouldn't one systemwide copy be enough?

@gvanrossum
Copy link

That said, I'm not sure why the small_ints array is allocated on the interpreter state. Wouldn't one systemwide copy be enough?

Presumably in preparation for the future where each interpreter has its own set of standard objects (Py_None, exceptions, types, etc.) so they can each have their own GIL. This is presumably @ericsnowcurrently's work (with Victor).

@iritkatriel
Copy link
Owner Author

I made a todo list at the top of this page.

I wanted to check that AssertionError is really a constant. The c one is (but the python one is not):

>>> AssertionError = TypeError
>>> AssertionError
<class 'TypeError'>
>>> AssertionError(14)
TypeError(14)
>>> assert(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError
>>> 

@iritkatriel
Copy link
Owner Author

This is what I broke in modulefinder:

if (op == IMPORT_NAME and i >= 2

@gvanrossum
Copy link

This is what I broke in modulefinder:

if (op == IMPORT_NAME and i >= 2

I wonder what that code is doing that requires scanning the bytecode. It seems a weird tactic.

@iritkatriel
Copy link
Owner Author

It's looking for import statements in the code, this thing is loading "all modules used by a script".

I'm just thinking all this stuff should be encapsulated in dis, nobody should be munching on opcodes like this.

@gvanrossum
Copy link

Right, file a bpo about that.

@iritkatriel
Copy link
Owner Author

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdickinson This function implements obj --> key where typically key is (type(obj), obj) but for floats I also add a boolean sign.

The purpose of this PR is to reduce the size of code objects by moving common consts (see faster-cpython/ideas#65 (comment)) to a shared list from which they are looked up with LOAD_COMMON_CONSTS, instead of with LOAD_CONST from co_consts.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiran - this is where I applied the copysign function you suggested.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this works well with NaNs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iritkatriel This looks good to me for floats.

Do we need to handle complex constants, too? 0.0 + 0.0j should be treated as a different constant from 0.0 - 0.0j or -0.0 - 0.0j or -0.0 + 0.0j.

@tiran True, NaNs would be messy for this scheme. My initial thought was that there should be no way that you can get a NaN as a code constant, but of course that's not true:

>>> def f():
...     a = 1e300 * 1e300
...     b = (1e300 * 1e300) / (1e300 * 1e300)
...     c = -1e300 * 1e300
... 
>>> f.__code__.co_consts
(None, inf, nan, -inf)

But presumably this would "work" the same way as it currently does for code-object-specific constant tables: NaNs get repeated, but infinities don't. @iritkatriel I don't have wider context here - what are the criteria for deciding that something is a "common" constant? I assume that it would have to occur at least twice, and in that case NaNs would just never make it into the common constant table.

So I think this should "just work" for NaNs, but it's worth some tests to check that NaNs are never shared.

Copy link

@mdickinson mdickinson Aug 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle complex constants, too?

For reference, here's the related part of codeobject.c: https://github.com/python/cpython/blob/d3eaf0cc5b311ad023fd13e367f817d528403306/Objects/codeobject.c#L1816-L1840

It seems as though there ought to be an opportunity for code re-use here, so that we don't have essentially the same logic in multiple places. But I imagine that's out of scope for this PR.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I can just use _PyCode_ConstantKey instead of adding a new obj_to_key. Right?

Copy link

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments specifically on the obj_to_key part; I haven't looked at the rest of the PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick-level comment: I'd suggest using Py_True and Py_False the other way around: Py_False for positive and Py_True for negative, just for consistency - it's how other things that deal with the sign bit tend to work (e.g. Decimal.is_signed, or C's signbit macro, or even the raw bit representation of the float).

Talking of signbit, that might also work in place of the copysign test: signbit(v) ? Py_True : Py_False. I suspect we didn't use it in the original code because it wasn't introduced until C99, and we weren't able to depend on C99 features at the time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iritkatriel This looks good to me for floats.

Do we need to handle complex constants, too? 0.0 + 0.0j should be treated as a different constant from 0.0 - 0.0j or -0.0 - 0.0j or -0.0 + 0.0j.

@tiran True, NaNs would be messy for this scheme. My initial thought was that there should be no way that you can get a NaN as a code constant, but of course that's not true:

>>> def f():
...     a = 1e300 * 1e300
...     b = (1e300 * 1e300) / (1e300 * 1e300)
...     c = -1e300 * 1e300
... 
>>> f.__code__.co_consts
(None, inf, nan, -inf)

But presumably this would "work" the same way as it currently does for code-object-specific constant tables: NaNs get repeated, but infinities don't. @iritkatriel I don't have wider context here - what are the criteria for deciding that something is a "common" constant? I assume that it would have to occur at least twice, and in that case NaNs would just never make it into the common constant table.

So I think this should "just work" for NaNs, but it's worth some tests to check that NaNs are never shared.

@iritkatriel
Copy link
Owner Author

I don't have wider context here - what are the criteria for deciding that something is a "common" constant?

It's a hard-coded list based on what constants appear the most in code objects in PyPl packages. Scroll down to _Py_InitCommonConsts to see where it is populated.

Thank you both for the comments - at the moment we don't have NaNs and complex numbers in the hard coded list, so I think it's ok if the lookup just always fails for them (as it would now, right?).

Perhaps I should add a check in add_common_float which raises an exception if you try to add NaN, Inf or complex number as a common constant, because this is currently not supported (and we don't think it is needed).

@mdickinson
Copy link

Thanks for the context!

so I think it's ok if the lookup just always fails for them (as it would now, right?)

Yep, sounds good to me.

@iritkatriel
Copy link
Owner Author

I don't know why it keeps saying this:

2021-08-27T13:10:22.9433879Z Generated files not up to date. Perhaps you forgot to run make regen-all or build.bat --regen ;)
2021-08-27T13:10:22.9435119Z  M Include/opcode.h
2021-08-27T13:10:22.9436000Z  M Python/opcode_targets.h
2021-08-27T13:10:22.9446511Z ##[error]Process completed with exit code 1.

Neither make regen-all nor build.bat --regen make any difference.

@iritkatriel iritkatriel force-pushed the experimentB-commonconst branch from ee4d361 to 743f2f9 Compare August 27, 2021 16:02
@iritkatriel
Copy link
Owner Author

I don't know why it keeps saying this:

2021-08-27T13:10:22.9433879Z Generated files not up to date. Perhaps you forgot to run make regen-all or build.bat --regen ;)
2021-08-27T13:10:22.9435119Z  M Include/opcode.h
2021-08-27T13:10:22.9436000Z  M Python/opcode_targets.h
2021-08-27T13:10:22.9446511Z ##[error]Process completed with exit code 1.

Neither make regen-all nor build.bat --regen make any difference.

I think it's to do with the fact that this PR is against my main branch (rather than cpython's). The build log shows that it compares the source and target branches of the PR, and those were out of sync.

@markshannon
Copy link

What's the justification for adding the seemingly arbitrary selection of tuples and strings rather than just a few simple constants?
Could we just start with the simple constants, and add the others later?

Why the procedural approach to building the table of constants, instead of declaring PyObject *common_consts[] = { ...?

@iritkatriel
Copy link
Owner Author

What's the justification for adding the seemingly arbitrary selection of tuples and strings rather than just a few simple constants?

This is the list Guido extracted of the common constants in PyPl:
faster-cpython/ideas#65 (comment)

Why the procedural approach to building the table of constants, instead of declaring PyObject *common_consts[] = { ...?

Error checking.

@markshannon
Copy link

Why the procedural approach to building the table of constants, instead of declaring PyObject *common_consts[] = { ...?

Error checking.

I don't understand. There can be no runtime errors in a declaration, only in executable code.

@iritkatriel
Copy link
Owner Author

Why the procedural approach to building the table of constants, instead of declaring PyObject *common_consts[] = { ...?

Error checking.

I don't understand. There can be no runtime errors in a declaration, only in executable code.

Are PyLong_FromLong() and PyFloat_FromFloat() not executable code?

@markshannon
Copy link

Are PyLong_FromLong() and PyFloat_FromFloat() not executable code?

Yes, they are. So don't use them 🙂

PyObject *common_consts[] = { Py_None, Py_Ellipsis, PyExc_AssertionError, ... };

You could add a few float constants, PyFloat_Zero, PyFloat_One.
Integers are probably best handled with their own instruction, e.g. LOAD_SMALL_INT.

@gvanrossum
Copy link

PyObject *common_consts[] = { Py_None, Py_Ellipsis, PyExc_AssertionError, ... };

Note that, as a static declaration, the Windows C compiler doesn’t accept that. (C++ does.)

@markshannon
Copy link

PyObject *common_consts[] = { Py_None, Py_Ellipsis, PyExc_AssertionError, ... };

Note that, as a static declaration, the Windows C compiler doesn’t accept that. (C++ does.)

Are you sure about that?
How does this compile?

@gvanrossum
Copy link

How does this compile?

Hm, it does compile. I am sure I saw a problem with references to type objects in C files (that I made go away by compiling as C++), but I must have misunderstood its root cause.

Or maybe we compile with some flag that made the error go away. I have a simple example where I initialize a struct member with &_Py_NoneStruct (i.e. the expansion of Py_None), and VS Code highlights the & character and tells me "expression must have a constant value C/C++(28)", but it compiles. So now I'm officially confused.

Possibly the root of my confusion is the common idiom of initializing type objects in extension modules using PyVarObject_HEAD_INIT(NULL, 0) or PyObject_HEAD_INIT(NULL), and then setting the actual type using PyType_Ready(). I somehow thought that was due to MSVC, but it may be due to the DLL linker or possibly needed for shared libraries in general.

Anyway, just ignore me, that's always been the best option.

@iritkatriel iritkatriel deleted the experimentB-commonconst branch July 25, 2023 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants