Skip to content

sys/ztimer: implement ztimer_mbox_get_timeout() and use it to fix race in gnrc_sock_recv()#21113

Merged
maribu merged 3 commits intoRIOT-OS:masterfrom
maribu:sys/ztimer/ztimer_mbox_get_timeout
Jan 10, 2025
Merged

sys/ztimer: implement ztimer_mbox_get_timeout() and use it to fix race in gnrc_sock_recv()#21113
maribu merged 3 commits intoRIOT-OS:masterfrom
maribu:sys/ztimer/ztimer_mbox_get_timeout

Conversation

@maribu
Copy link
Copy Markdown
Member

@maribu maribu commented Dec 31, 2024

Contribution description

This implements ztimer_mbox_get_timeout() and salvages the test app from #18977 with minor tweaking.

On top of ztimer_mbox_get_timeout(), the timeout of gnrc_sock_recv() is now implemented race-free.

Testing procedure

Run the provided test app. (Maybe also set ENABLE_DEBUG to 1 in sys/ztimer/utils.c to ensure that the race when a message was received just in time but the timeout was not cancelled in time is indeed triggered by the test app.)

Also do some testing with GNRC's SOCK implementation and proper timeout handling.

Issues/PRs references

Better alternative to #18977

This function fetches a message from an mbox, possibly blocking if the
mbox has no message - but with a specified timeout.
@maribu maribu requested a review from benpicco December 31, 2024 00:03
@github-actions github-actions bot added Area: network Area: Networking Area: tests Area: tests and testing framework Area: timers Area: timer subsystems Area: sys Area: System labels Dec 31, 2024
@maribu maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch and removed Area: network Area: Networking Area: tests Area: tests and testing framework Area: timers Area: timer subsystems Area: sys Area: System labels Dec 31, 2024
@riot-ci
Copy link
Copy Markdown

riot-ci commented Dec 31, 2024

Murdock results

✔️ PASSED

56ea5cd sys/net/gnrc_sock: fix race in gnrc_sock_recv()

Success Failures Total Runtime
10270 0 10271 18m:35s

Artifacts

@maribu maribu force-pushed the sys/ztimer/ztimer_mbox_get_timeout branch from aff3a6a to 97e862c Compare December 31, 2024 10:23
@github-actions github-actions bot added Area: network Area: Networking Area: tests Area: tests and testing framework Area: timers Area: timer subsystems Area: sys Area: System labels Dec 31, 2024
@maribu maribu force-pushed the sys/ztimer/ztimer_mbox_get_timeout branch from 97e862c to a1fd9e3 Compare January 10, 2025 13:17
@maribu
Copy link
Copy Markdown
Member Author

maribu commented Jan 10, 2025

Fixed a typo found by codespell and squashed

Copy link
Copy Markdown
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a much cleaner solution than what has been there before.

@maribu maribu enabled auto-merge January 10, 2025 14:46
@benpicco
Copy link
Copy Markdown
Contributor

Uh the provided test fails on CI

main(): This is RIOT! (Version: buildtest)
Testing ztimer_mbox_get_timeout()
=================================
testing mbox already full prior call: OK
testing timeout is reached: OK
testing timeout is reached despite message received (race): OK
Running test for reception prior timeout 1000 times: main.c:96 => failed condition
*** RIOT kernel panic:
CONDITION FAILED.

*** halted.

Implement the timeout using ztimer_mbox_get_timeout() to fix a race
condition.
@maribu maribu force-pushed the sys/ztimer/ztimer_mbox_get_timeout branch from a1fd9e3 to 56ea5cd Compare January 10, 2025 15:19
@maribu
Copy link
Copy Markdown
Member Author

maribu commented Jan 10, 2025

Let's try again with even more relaxed timeout on native.

@maribu maribu added this pull request to the merge queue Jan 10, 2025
Merged via the queue into RIOT-OS:master with commit ade999a Jan 10, 2025
@maribu
Copy link
Copy Markdown
Member Author

maribu commented Jan 10, 2025

Thx!

@maribu maribu deleted the sys/ztimer/ztimer_mbox_get_timeout branch January 10, 2025 20:23
maribu added a commit to maribu/RIOT that referenced this pull request Jan 10, 2025
This reverts commit e3d0068, which
added a work around for two bugs:

- ztimer triggering too early (fixed in
  RIOT-OS#20924)
- gnrc_sock_recv() returning when an old "timeout" message is still
  in the message queue (fixed in
  RIOT-OS#21113)

With those bugs fixed, the work around should not longer be needed.
@maribu
Copy link
Copy Markdown
Member Author

maribu commented Jan 10, 2025

Let's try again with even more relaxed timeout on native.

The test is flaky on native when build with LLVM 😢

@MrKevinWeiss MrKevinWeiss added this to the Release 2025.01 milestone Jan 20, 2025
dprigoshij pushed a commit to dprigoshij/RIOT that referenced this pull request Mar 24, 2025
This reverts commit e3d0068, which
added a work around for two bugs:

- ztimer triggering too early (fixed in
  RIOT-OS#20924)
- gnrc_sock_recv() returning when an old "timeout" message is still
  in the message queue (fixed in
  RIOT-OS#21113)

With those bugs fixed, the work around should not longer be needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: network Area: Networking Area: sys Area: System Area: tests Area: tests and testing framework Area: timers Area: timer subsystems CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants