-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Implement bytecode callbacks with static, never-modified bytecode #13553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement bytecode callbacks with static, never-modified bytecode #13553
Conversation
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also makes the failures disappear and looks good to me.
|
This looks good! This approach does seem cleaner than either modifying bytecode or slicing up callbacks into 3-argument chunks, so I think this is a better fix than #13549. However, I think there's a bug in the current version: the old code, going through I think there's a simple fix: the interpreter should do a |
|
Stephen Dolan (2024/10/16 02:36 -0700):
I think there's a simple fix: the interpreter should do a `goto
check_stacks` right before entering the main loop, so that a stack
check occurs before any instructions are executed.
Won't that be too often then? I suppose many instructions do actually
not change the stack and it then feels a pity to me to do a check before
they are executed, leading to a performance loss. Or am I missing
something maybe?
|
|
I'm not suggesting a check inside the loop, before every instruction - that would indeed cause performance loss. I'm suggesting a single check, before the loop is entered, which would be once per invocation of |
|
Stephen Dolan (2024/10/16 05:55 -0700):
I'm not suggesting a check inside the loop, before every instruction -
that would indeed cause performance loss. I'm suggesting a single
check, _before_ the loop is entered, which would be once per
invocation of `caml_bytecode_interpreter`.
Ah okay, sorry I misunderstood. And you think such a check would be
enough, then?
|
On this subject, shouldn't |
Well spotted! See the latest commit on this PR.
Right now we rely on the "red zone" at the bottom of the stack to absorb the arguments. There's an assertion but it's not up to date (it assumes 256 words of red zone while it's only 32 words nowadays). Maybe it would be better to check and resize the stack in caml_callbackN_exn. Will look into it in a couple of days. |
|
Ah, I see, |
|
Yes, I pushed an alternative to @stedolan's suggestion, based on @NickBarnes' remark, where |
|
I added a test that triggers the stack resizing, and a Changes entry. This PR is ready for a final review. |
| domain_state->current_stack->sp -= narg + 4; | ||
| /* Ensure there's enough stack space */ | ||
| intnat req = narg + 3 + Stack_threshold_words; | ||
| if (domain_state->current_stack->sp - req < |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This almost calls for an internal version of caml_ensure_stack_capacity, but that can be done later if worth doing.
Support passing the initial values of env and extra_args as parameters to the bytecode interpretation function.
- Use the new `caml_bytecode_interpreter` API to jump straight to the function's code. - Avoid modifying bytecode in place. - Avoid per-thread bytecode. - Register bytecode early and only once. Fixes: ocaml#13402 Fixes: ocaml#13512 Closes: ocaml#13549
... before copying the arguments to the stack and calling the bytecode interpreter.
For bytecode, this tests stack resizing in caml_callbackN.
4df9b57 to
2a750b1
Compare
|
Cleaned up the history. Rebased to fix conflict. Knelt in front of check-typo. Time to merge! |
Implement bytecode callbacks with static, never-modified bytecode (cherry picked from commit c3092e7)
Implement bytecode callbacks with static, never-modified bytecode (cherry picked from commit c3092e7)
|
Cherry-picked to the 5.2 branch as 71d85e0 in prevision of the upcoming 5.2.1 release. |
This is an alternative to #13549. See this PR for the detailed description of the problem that needs to be fixed.
Like #13549, the present PR avoids modifying bytecode dynamically, and making it thread-local.
Unlike #13549, it preserves the ability to perform a callback on N arguments in one go, instead of having to cut it in slices of 1 to 3 arguments.