Skip to content

Conversation

@xavierleroy
Copy link
Contributor

This is an alternative to #13549. See this PR for the detailed description of the problem that needs to be fixed.

Like #13549, the present PR avoids modifying bytecode dynamically, and making it thread-local.

Unlike #13549, it preserves the ability to perform a callback on N arguments in one go, instead of having to cut it in slices of 1 to 3 arguments.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also makes the failures disappear and looks good to me.

@stedolan
Copy link
Contributor

This looks good! This approach does seem cleaner than either modifying bytecode or slicing up callbacks into 3-argument chunks, so I think this is a better fix than #13549.

However, I think there's a bug in the current version: the old code, going through Instruct(APPLY), used to do a goto check_stacks before executing the code from the callback. I think this stack check is important, as I think it's possible to reach caml_callback without much available stack (there's no stack check before Instruct(C_CALLn), so you can enter C code having already consumed part of your stack).

I think there's a simple fix: the interpreter should do a goto check_stacks right before entering the main loop, so that a stack check occurs before any instructions are executed.

@shindere
Copy link
Contributor

shindere commented Oct 16, 2024 via email

@stedolan
Copy link
Contributor

I'm not suggesting a check inside the loop, before every instruction - that would indeed cause performance loss. I'm suggesting a single check, before the loop is entered, which would be once per invocation of caml_bytecode_interpreter.

@dra27 dra27 linked an issue Oct 16, 2024 that may be closed by this pull request
@shindere
Copy link
Contributor

shindere commented Oct 16, 2024 via email

@NickBarnes
Copy link
Contributor

I think this stack check is important, as I think it's possible to reach caml_callback without much available stack (there's no stack check before Instruct(C_CALLn), so you can enter C code having already consumed part of your stack).

On this subject, shouldn't caml_callbackN_exn do a stack limit check before subtracting narg + 4 from sp? Maybe I'm missing something obvious about the bytecode stacks.

@xavierleroy
Copy link
Contributor Author

xavierleroy commented Oct 16, 2024

the interpreter should do a goto check_stacks right before entering the main loop

Well spotted! See the latest commit on this PR.

shouldn't caml_callbackN_exn do a stack limit check before subtracting narg + 4 from sp

Right now we rely on the "red zone" at the bottom of the stack to absorb the arguments. There's an assertion but it's not up to date (it assumes 256 words of red zone while it's only 32 words nowadays). Maybe it would be better to check and resize the stack in caml_callbackN_exn. Will look into it in a couple of days.

@NickBarnes
Copy link
Contributor

Ah, I see, Stack_threshold_words ?

@xavierleroy
Copy link
Contributor Author

Yes, Stack_threshold_words.

I pushed an alternative to @stedolan's suggestion, based on @NickBarnes' remark, where caml_callbackN checks for stack space and reallocates the stack if needed. I doubt the test suite exercises this code, though. I'll try to come up with a test later this week.

xavierleroy added a commit to xavierleroy/ocaml that referenced this pull request Oct 18, 2024
@xavierleroy xavierleroy added this to the 5.3 milestone Oct 18, 2024
@xavierleroy
Copy link
Contributor Author

I added a test that triggers the stack resizing, and a Changes entry. This PR is ready for a final review.

domain_state->current_stack->sp -= narg + 4;
/* Ensure there's enough stack space */
intnat req = narg + 3 + Stack_threshold_words;
if (domain_state->current_stack->sp - req <
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This almost calls for an internal version of caml_ensure_stack_capacity, but that can be done later if worth doing.

@jmid
Copy link
Member

jmid commented Oct 22, 2024

After a good round of testing this afternoon I can confirm that this fixes both #13512 and #13402.
Thank you both!

Support passing the initial values of env and extra_args as parameters
to the bytecode interpretation function.
- Use the new `caml_bytecode_interpreter` API to jump straight to the
  function's code.
- Avoid modifying bytecode in place.
- Avoid per-thread bytecode.
- Register bytecode early and only once.

Fixes:  ocaml#13402
Fixes:  ocaml#13512
Closes: ocaml#13549
... before copying the arguments to the stack and calling the bytecode
interpreter.
For bytecode, this tests stack resizing in caml_callbackN.
@xavierleroy xavierleroy force-pushed the tweak-bytecode-callbacks branch from 4df9b57 to 2a750b1 Compare October 23, 2024 09:16
@xavierleroy
Copy link
Contributor Author

Cleaned up the history. Rebased to fix conflict. Knelt in front of check-typo. Time to merge!

@xavierleroy xavierleroy merged commit c3092e7 into ocaml:trunk Oct 23, 2024
xavierleroy added a commit that referenced this pull request Oct 23, 2024
Implement bytecode callbacks with static, never-modified bytecode

(cherry picked from commit c3092e7)
Octachron pushed a commit that referenced this pull request Oct 23, 2024
Implement bytecode callbacks with static, never-modified bytecode

(cherry picked from commit c3092e7)
@Octachron
Copy link
Member

Cherry-picked to the 5.2 branch as 71d85e0 in prevision of the upcoming 5.2.1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GC crashes under bytecode

6 participants