gnrc_sixlowpan_frag: release packet rbuf_add error cases#10680
gnrc_sixlowpan_frag: release packet rbuf_add error cases#10680gschorcht merged 2 commits intoRIOT-OS:masterfrom
Conversation
Otherwise, there will be leaks ;-).
Otherwise, there will be leaks ;-).
|
Found another fix for that function (unrelated to the reassembly buffers state but still an error case in that function) so I fixed the title. |
|
This fixes the indefinitely overfilled buffer for me. pings still fail very frequently, but it recovers cleanly when the buffer runs full. Edit: And as I speak, with only one of my two boards still pinging, it runs into it again. pktbuf
So I guess there must be some missing free somewhere still. It's interesting how those packets are all larger than the MTU. |
|
Even together with #10679? |
Are you sure? The chunks are not packets. They can be multiple. The packet buffer just "sees" the empty spaces between the allocated space, so the chunks are the inverse of that. Here is the beginning of the illustration I was preparing Green are packet snips marking 6Lo frames, cyan are packet snips marking |
|
The test was done with this and #10679 applied. |
Did you wait until no pinging is happening anymore and gave the GC time to clean? Sorry, I'm asking. But I really can't reproduce this anymore, so this seems to be something ESP-now specific if anything :-/. |
|
Yes, when this happened, one of the two boards was already done sending pings entirely. The one where the error showed up had those things in the buffer for way longer than 3 seconds. Even after waiting for several minutes pktbuf still looked like that, and any attempt to ping throws the usual error. |
|
A first test provided some strange results. Test environment were two ESP32 nodes, one is pinging, the other is only answering: I have added the following At the moment, I'm only able to describe what I can observe:
|
|
In addition to my comment #10680 (comment) an example trace on pinging nodes that shows the order of errors quite good. After the test, I had again two hanging 6Lo fragments. |
|
In addition to #10680 (comment): I could also observe one hanging 6Lo fragment on answering node |
|
I have investigated a bit more the answering node side when
This is how it looks on answering node on a successful ping (six fragments received successful and six fragment sent successful): When the That is, on each receiption of a fragment, |
but when the other node is still pinging, the packets you see might just be the packets from the other node still in transit. I'm also seeing those. but when the other node is also done, both packet buffers are empty. |
I have observed the same after the test has been finished for at least 10 seconds. |
|
Even more strange, after a test with all different errors ( where Maybe, there is something wrong with |
|
@gschorcht ping timeouts and full packet buffers may still occur. Remember that we are in a high load scenario, so if the node just doesn't have the recources, it drops the packet. The important thing is, that no memory leaks are formed. I read your analysis and observations later, when I find the time, but here is what I've seen on native: at some point the reassembly buffer (due to lost in transit fragments etc.) runs just full. Since the packets are so huge this fills the packet buffer (packet buffer full, ping timeout on the other node). Due to the high load it never really is able to free those until the ping is done. Only after ping is done the buffers get some time to breath and finally clean after a while. |
That's clear, I didn't expect not to get the I just wanted to share what I observed. The real problems I see are:
Concerning the second problem, I would expect that a single small ping should be possible without any problems when the test with heavy traffic has been finished for more than 3 seconds. But this is obvious not the case. |
|
(sorry I didn't see your answers between my two answers, it only became clear to me that you still observe leaks with those) Ok. But since I can't reproduce native, this seems to be a different problem from what I am trying to fix here. As I said, I'll read what you wrote, when I find the time, but I rather would go for a new issue, since the steps to reproduce in #10672 are fixed. Also we should merge this, as I suppose it still makes things better, right? |
Yes, of course. I am very grateful for your support. I will try to investigate it further. |
|
Just to share my observation. Precondition:
Send one single small ping: On air, I can dump the following WiFi frame where On answering node this leads to Ups? Obviously, old information are used by 6Lo. This might be the reason for our problem. Seems like an |
|
@gschorcht we should discuss this in a separate issue, otherwise this might get lost. |
|
yes. |

Contribution description
Otherwise, there will be leaks ;-).
Testing procedure
Together with #10679, run steps to reproduce outlined in #10672. The packet buffer should now still be full at some point, but after ping finishes it should be empty again (give the reassembly buffer 3 secs to garbage collect itself before you check).
Issues/PRs references
Together with #10679 this is a fix for #10672.