-
Notifications
You must be signed in to change notification settings - Fork 253
Description
Describe the bug
Running the OSAL select test, I ran into a deadlock situation where the "multi" test got stuck and never finished.
To Reproduce
Hit or miss... Run test repeatedly on a system with other loads (e.g. parallel builds)
Expected behavior
Test should complete
Code snips
Checking the test status/backtrace it looks like two tasks (main + "Server_Fn") are waiting on the binary sem. In particular the Server_Fn is stuck here:
osal/src/tests/select-test/select-test.c
Line 162 in d698a4d
| status = OS_BinSemTake(bin_sem_id); |
While the main task is waiting in the teardown code (the TestSelectMultipleRead has completed, and it has invoked Teardown_Multi which in turn invokes Teardown_Single here):
osal/src/tests/select-test/select-test.c
Line 273 in d698a4d
| status = OS_BinSemTake(bin_sem_id2); |
System observed on:
Ubuntu 20.04
Additional context
This is likely related to the use of OS_BinSemFlush. We should probably deprecate this function, as I cannot see how this can ever be used safely without it being a race condition. VxWorks offers it which (I think) is why OSAL also offers it, but its a fundamentally broken concept.
I can confirm that looking at the traceback in gdb, the flush_count is indeed already 1 - meaning the flush had already happened by the time the Server_Fn entered the bin sem take routine.
Reporter Info
Joseph Hickey, Vantage Systems, Inc.