-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix atomic.wait, get wasi_ctx exit code and thread mgr issues #2024
Conversation
Noticed that every run of test |
Okay I missed that
What's the advantage? |
Ran each internal test 100 times compiled with classic_interpreter with this change under TSAN and filtered warnings about “LOAD/STORE” and all other data races seems to be gone. Great job! |
@@ -60,6 +57,11 @@ main(int argc, char **argv) | |||
assert(count != NULL && "Failed to call calloc"); | |||
assert(pthread_mutex_init(&mutex, NULL) == 0 && "Failed to init mutex"); | |||
|
|||
for (int i = 0; i < NUM_THREADS; i++) { | |||
vals[i] = malloc(sizeof(int *)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vals[i] = malloc(sizeof(int *)); | |
vals[i] = malloc(sizeof(int)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this part of the test was to try heap allocation from the spawned thread, that's why I was putting it in __wasi_thread_start_C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that malloc
function is not only called here, but also called in __wasi_thread_spawn when allocating aux stack:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/libraries/thread-mgr/thread_manager.c#L147
Two threads may call malloc simultaneously.
If the case isn't modified, out of bounds memory access
exception is often thrown.
Would a race condition be possible between
I was updating the tests to avoid using pthread sync primitives (since they seem to cause all the load/store warnings in the sanitizer when in classic interpreter mode) and I noticed those. But I didn't finish validating the changes, I'll push a PR once I'm done and if necessary we can address possible issues separately. |
Letting the thread positively checking the suspend flags should be better than passively waiting, for the latter, there may be several potential issues: |
I don't think there will be, for the former, the exec_env_tls is the parent thread's exec_env, but here the |
That was a thing that me and @eloparco were concerned about, but it was sorted out in #2016. Let's say there is such a situation:
There are only 3 possibilities, because both critical sections for main and 3rd thread are under the same lock:
So a newly created thread won't wait in any case and the situation is not possible actually. In testing 100+ times it never hanged at all after the fix |
Yeah it would be bad. I suppose it's the only place where we now allocate memory after setting flag, but before notifying. Another way to avoid it would be allocating it in advance. Looks reasonable for me to merge then, but I would consider return to using |
OK, sounds reasonable, I was just concerned about that. It is good if this is not an issue. Thanks for the detailed explanation. |
Yes, but we cannot know the count of atomic waiting threads in advance, and don't know how much memory should be allocated previously. And even if we can, the memory allocation failure may occur also. My personal opinion is to use the new idea, one is to keep the same strategy with what we handled in poll_oneoff in libc-wasi: |
Okay, wait a bit before merge. @hritikgupta suppose he found some problems, I will try to reproduce |
At least next tests get stuck on this PR if run enough iterations
I checked with Fortunately these tests stuck the same way on the Also Below I attached the backtrace of the main thread from
|
@hritikgupta thanks for spotting the problem! If anyone would like to investigate the problem I suggest you do it without this change because it would reproduce way faster |
OK, thanks, let's merge this PR first and then investigate the reported issue later. |
Roughly checked the issue, all threads got stuck in |
Yes, underneath Actually, I was seeing the same problem yesterday in #2028 even when implementing |
@eloparco @g0djan @hritikgupta I think I found the root cause of hang, it is due to that the opcodes generated by wasi-libc's a_store are not atomic operations. I submitted a PR to wasi-libc to fix it: Now it seems the hang didn't occur again. |
…dealliance#2024) - Remove notify_stale_threads_on_exception and change atomic.wait to be interruptible by keep waiting and checking every one second, like the implementation of poll_oneoff in libc-wasi - Wait all other threads exit and then get wasi exit_code to avoid getting invalid value - Inherit suspend_flags of parent thread while creating new thread to avoid terminated flag isn't set for new thread - Fix wasi-threads test case update_shared_data_and_alloc_heap - Add "Lib wasi-threads enabled" prompt for cmake - Fix aot get exception, use aot_copy_exception instead
to be interruptible by keep waiting and checking every one second,
like the implementation of poll_oneoff in libc-wasi
getting invalid value
avoid terminated flag isn't set for new thread