crypto: thread: serialize concurrent joins#19433
crypto: thread: serialize concurrent joins#19433ckalina wants to merge 2 commits intoopenssl:masterfrom
Conversation
Multiple concurrent joins with a running thread suffer from a race condition that allows concurrent join calls to perform concurrent arch specific join calls, which is UB on POSIX, or to concurrently execute join and terminate calls. As soon as a thread T1 exists, one of the threads that joins with T1 is selected to perform the join, the remaining ones await completion. Once completed, the remaining calls immediately return. If the join failed, another thread is selected to attempt the join operation. Forcefully terminating a thread that is in the process of joining another thread is not supported. Common code from thread_posix and thread_win was refactored to use common wrapper that handles synchronization. Signed-off-by: Čestmír Kalina <[email protected]>
|
Unfortunately this did not seem to fix the MacOSX hang. |
While POSIX threads are cancellable and may be asynchronously cancelled, their cancellation is not guaranteed by the POSIX standard. test_thread_noreturn, which simulates a long-running possibly unresponsive thread: THREAD openssl#1 THREAD openssl#2 LOCK L1 SPAWN openssl#2 LOCK L1 On MacOS, cancelling such thread only queues cancellation request, but the following pthread_join hangs. Replace this implementation by an unbounded sequence of sleeps instead. Signed-off-by: Čestmír Kalina <[email protected]>
|
The MacOS hang was indeed independent of this and should now hopefully be fixed. While POSIX threads are explicitly set cancellable and may be asynchronously cancelled, their cancellation is not guaranteed by the POSIX standard. test_thread_noreturn, which simulates a long-running possibly unresponsive thread: On MacOS, cancelling such thread only queues cancellation request, but the following pthread_join hangs. Replace this implementation by an unbounded sequence of sleeps instead. One second sleeps are chosen, but anything smaller goes too. I'd prefer not to use larger values, though, to ensure that it's woken up often enough. |
|
This is urgent. The buildbot macosx failure is unrelated. |
|
Agree urgent |
|
I'll wait for the windows buildbot builds to finish and merge afterwards. |
|
For some reason on Windows the test still hangs when running on buildbot. I'll merge this anyway to unblock at least the macosx builds and we'll continue in another PR. |
|
Merged to master branch. Thank you. Assuming a PR fixing the Windows buildbot hang will follow. |
Multiple concurrent joins with a running thread suffer from a race condition that allows concurrent join calls to perform concurrent arch specific join calls, which is UB on POSIX, or to concurrently execute join and terminate calls. As soon as a thread T1 exists, one of the threads that joins with T1 is selected to perform the join, the remaining ones await completion. Once completed, the remaining calls immediately return. If the join failed, another thread is selected to attempt the join operation. Forcefully terminating a thread that is in the process of joining another thread is not supported. Common code from thread_posix and thread_win was refactored to use common wrapper that handles synchronization. Signed-off-by: Čestmír Kalina <[email protected]> Reviewed-by: Hugo Landau <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> (Merged from #19433)
While POSIX threads are cancellable and may be asynchronously cancelled, their cancellation is not guaranteed by the POSIX standard. test_thread_noreturn, which simulates a long-running possibly unresponsive thread: THREAD #1 THREAD #2 LOCK L1 SPAWN #2 LOCK L1 On MacOS, cancelling such thread only queues cancellation request, but the following pthread_join hangs. Replace this implementation by an unbounded sequence of sleeps instead. Signed-off-by: Čestmír Kalina <[email protected]> Reviewed-by: Hugo Landau <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> (Merged from #19433)
|
I confirm that this PR fixed the problem on MacOS. |
Multiple concurrent joins with a running thread suffer from a race condition that allows concurrent join calls to perform concurrent arch specific join calls, which is UB on POSIX, or to concurrently execute join and terminate calls. As soon as a thread T1 exists, one of the threads that joins with T1 is selected to perform the join, the remaining ones await completion. Once completed, the remaining calls immediately return. If the join failed, another thread is selected to attempt the join operation. Forcefully terminating a thread that is in the process of joining another thread is not supported. Common code from thread_posix and thread_win was refactored to use common wrapper that handles synchronization. Signed-off-by: Čestmír Kalina <[email protected]> Reviewed-by: Hugo Landau <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> (Merged from openssl#19433)
While POSIX threads are cancellable and may be asynchronously cancelled, their cancellation is not guaranteed by the POSIX standard. test_thread_noreturn, which simulates a long-running possibly unresponsive thread: THREAD #1 THREAD #2 LOCK L1 SPAWN #2 LOCK L1 On MacOS, cancelling such thread only queues cancellation request, but the following pthread_join hangs. Replace this implementation by an unbounded sequence of sleeps instead. Signed-off-by: Čestmír Kalina <[email protected]> Reviewed-by: Hugo Landau <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> (Merged from openssl#19433)
Multiple concurrent joins with a running thread suffer from a race condition that allows concurrent join calls to perform concurrent arch specific join calls, which is UB on POSIX, or to concurrently execute join and terminate calls. As soon as a thread T1 exists, one of the threads that joins with T1 is selected to perform the join, the remaining ones await completion. Once completed, the remaining calls immediately return. If the join failed, another thread is selected to attempt the join operation. Forcefully terminating a thread that is in the process of joining another thread is not supported. Common code from thread_posix and thread_win was refactored to use common wrapper that handles synchronization. Signed-off-by: Čestmír Kalina <[email protected]> Reviewed-by: Hugo Landau <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> (Merged from openssl#19433)
While POSIX threads are cancellable and may be asynchronously cancelled, their cancellation is not guaranteed by the POSIX standard. test_thread_noreturn, which simulates a long-running possibly unresponsive thread: THREAD openssl#1 THREAD openssl#2 LOCK L1 SPAWN openssl#2 LOCK L1 On MacOS, cancelling such thread only queues cancellation request, but the following pthread_join hangs. Replace this implementation by an unbounded sequence of sleeps instead. Signed-off-by: Čestmír Kalina <[email protected]> Reviewed-by: Hugo Landau <[email protected]> Reviewed-by: Tomas Mraz <[email protected]> (Merged from openssl#19433)
Multiple concurrent joins with a running thread suffer from a race condition that allows concurrent join calls to perform concurrent arch specific join calls, which is UB on POSIX, or to concurrently execute join and terminate calls.
As soon as a thread T1 exists, one of the threads that joins with T1 is selected to perform the join, the remaining ones await completion. Once completed, the remaining calls immediately return. If the join failed, another thread is selected to attempt the join operation.
Forcefully terminating a thread that is in the process of joining another thread is not supported.
Common code from thread_posix and thread_win was refactored to use common wrapper that handles synchronization.
This should fix the timeout encountered while executing threads test (likely due to test_thread_native_multiple_joins).
Checklist