xtimer: Fix race condition in xtimer_msg_receive_timeout#16374
xtimer: Fix race condition in xtimer_msg_receive_timeout#16374kaspar030 merged 1 commit intoRIOT-OS:masterfrom
Conversation
|
This issue seems to be related to #13504 . |
|
I can confirm that with this #13345 (comment) no longer crashes. |
|
Description also makes total sense. |
kaspar030
left a comment
There was a problem hiding this comment.
ACK.
@MichelRottleuthner do you want to give this a second look?
|
(ztimer also removes in any case) |
|
I now re-read the discussion in #13345 and I'm not sure if there's still a race when the timeout triggers between So I'd say let's go with this fix already (which was first discussed more than a year ago...). |
|
The unconditional remove makes sense to me.
Yes, AIUI the unpleasant queuing of the timer message could still happen. But I also agree with your reasoning that a copied message is preferable over a pointer to invalid stack memory...
Yep. |
|
Backport provided in #16376 |
Contribution description
This PR fixes a rare race-condition in
xtimer_msg_receive_timeoutwhich can lead to corruption of the timer list and subsequent hard faults.The race condition is triggered when:
msg_send.xtimerwhich sents the timeout message expires and executes beforextimer_remove(t)is called. This will cause the message queue for the thread to contain first a real message and second the timeout message. The timer will not be queued anymore, but the timeout message will still be in the queue.xtimer_msg_receive_timeoutfunction is called again. This will queue a new xtimer while the timeout message of the previous timer is still in the buffer._msg_waitwill see the old timeout message, think that the current xtimer has already expired and will not remove the timer. Whenxtimer_msg_receive_timeoutreturns, the timer will still be queued. However, as it is allocated on the stack it is no longer valid. This causes thetimer_list_headto now point to invalid memory. Crashes ensue.Testing procedure
This bug was found, validated, and fixed using a proprietary application. I have not written a separate example application which exhibits the problem which I could publish.