Skip to content

Race condition in "select" test #707

@jphickey

Description

@jphickey

Describe the bug
Running the OSAL select test, I ran into a deadlock situation where the "multi" test got stuck and never finished.

To Reproduce
Hit or miss... Run test repeatedly on a system with other loads (e.g. parallel builds)

Expected behavior
Test should complete

Code snips
Checking the test status/backtrace it looks like two tasks (main + "Server_Fn") are waiting on the binary sem. In particular the Server_Fn is stuck here:

status = OS_BinSemTake(bin_sem_id);

While the main task is waiting in the teardown code (the TestSelectMultipleRead has completed, and it has invoked Teardown_Multi which in turn invokes Teardown_Single here):

status = OS_BinSemTake(bin_sem_id2);

System observed on:
Ubuntu 20.04

Additional context
This is likely related to the use of OS_BinSemFlush. We should probably deprecate this function, as I cannot see how this can ever be used safely without it being a race condition. VxWorks offers it which (I think) is why OSAL also offers it, but its a fundamentally broken concept.

I can confirm that looking at the traceback in gdb, the flush_count is indeed already 1 - meaning the flush had already happened by the time the Server_Fn entered the bin sem take routine.

Reporter Info
Joseph Hickey, Vantage Systems, Inc.

Metadata

Metadata

Assignees

Labels

bugunit-testTickets related to the OSAL unit testing (functional and/or coverage)

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions