sock_async_event: fix race-condition#13438
Conversation
If a new event is fired during the execution of the event callback, `event->type` might change. However as `event->type` is set to 0 after the execution of the callback, that leads to it being 0 on the next round of the event handler's execution, leading in the event getting lost.
|
Backport for 2020.01 probably not needed as no support for TCP for |
gschorcht
left a comment
There was a problem hiding this comment.
I cannot reproduce the race conditions. Therefore I have set it to 'request changes' to avoid accidental merging.
I don't understand your reasoning for blocking this then... Even if you can't reproduce, isn't it obvious that a race condition can occur and that this PR is a fix for that? Why should a merge be bad, even if accidentally? |
|
(can't reproduce it with #13427 either btw, most likely because my latest fixes hardened the backend itself against the race condition. Nevertheless, it still can occur and should be fixed.) |
Maybe, I missed something but I can't reproduce the race condition with PR #13427, neither with Without this PR I can observe following behavior when I leave I can reconnect afterwards. With this PR on top of PR #13427, I observer the following: But, at least I can reconnect afterwards. |
As said in #13438 (comment), the behavior seems to be more strange with this PR than before. |
This should have been fixed with 24ce2ba. Are you sure you used the most recent version? |
|
So... to prevent the garbage output and still have the race condition you need to revert 54069b1... |
OK, that's my fault, the last commit in PR #13427 was 54069b1. I will check again. |
|
Here is what I got:
|
|
(edited comment as I copy-pasted something wrongly 😅) |
|
Note the version strings that should make clear that I use the version I am actually claiming to use. Is this justification enough to have a bug fix for my own code merged 😉? |
|
I can confirm that #13427 resetted to aaa644b gives an error and I can't reconnect: I can further confirm that this PR on top of aaa644b gives some garbage but I can reconnect: I can also confirm that this PR on top of the last commit of PR #13427 works as expected. |
|
BTW, I saw this |
gschorcht
left a comment
There was a problem hiding this comment.
The change makes sense. The problem desribed in #13438 (comment) seems not to be related to this PR. ACK
|
I am pretty sure this PR has nothing to do with it or in the worst case just reveals a problem with #13427. From a code perspective this PR should be ready to go. |
Yes, but would you try to reproduce the behavior described in #13438 (comment)? |
Already was able to :-). It is definitely something being wrong with #13427 (the |
Contribution description
If a new event is fired during the execution of the event callback,
event->typemight change. However asevent->typeis set to 0 after the execution of the callback, that leads to it being 0 on the next round of the event handler's execution, leading in the event getting lost.Testing procedure
make -C tests/gnrc_sock_async_event flash testshould still work with a board of your choice.tests/lwipwith lwip: provide sock_async support #13427 merged and start a TCP server withtcp server start 1337netcat(e.g. nc fe80::0200:dead:beef:affe%tapbr0 1337 when usingnative)Ctrl+Cinnetcatis registered and printed to stdout. You can reconnect to the TCP server by starting a newnetcatprocess.Ctrl+Cinnetcatis not registered. Reconnecting to the TCP server is not possible (accept will error with an-ENOMEM).Issues/PRs references
Discovered in #13427.