cpu/esp: esp_now_netdev fixes and optimizations#10700
cpu/esp: esp_now_netdev fixes and optimizations#10700MrKevinWeiss merged 12 commits intoRIOT-OS:masterfrom
Conversation
Simplifies the _recv receive function structure of esp_now_netdev according to the possible parameter values.
Newest version of API requires to drop the frame when if parameter len is smaller than the received packet.
Although it should not be possible to reenter the function esp_now_recv_cb, this is avoided by an additional boolean variable. It can not be realized by mutex because it would reenter from same thread context. If macro NDEBUG is not defined, an assertion is used.
Since the esp_now_recv_cb function is not called directly as an ISR through interrupts, it is not executed in the interrupt context. _recv can therefore be called directly. The treatment of the recv_event is no longer necessary.
Since it is not possible to reenter the function `esp_now_recv_cb`, and the functions netif::_ recv and esp_now_netdev::_ recv are called directly in the same thread context, neither a mutual exclusion has to be realized nor have the interrupts to be deactivated.
Since it is not possible to reenter the function `esp_now_recv_cb`, and the functions netif::_ recv and esp_now_netdev::_ recv are called directly in the same thread context we only need a buffer which contains one frame and its source mac address.
Since it is not possible to reenter the function `esp_now_recv_cb`, and the functions netif::_ recv and esp_now_netdev::_ recv are called directly in the same thread context we can read directly from the `buf` and don't need a receive buffer anymmore.
|
I don't have the hardware to test this :-/ |
|
@TimoRoth Sorry, I used the wrong issue by mistake when I was asking you to take a look to the changes for esp_now. |
I know, but it would be really great if you could take a look to the changes in @TimoRoth will test it. |
Timer restart was moved from esp_now_scan_peers_done to esp_now_scan_peers_start to avoid that the scan for peers isn't triggered anymore if the esp_now_scan_peers_done callback isn't called because of errors.
|
@TimoRoth Could you test the changes? I would like to make some progress with these changes since they increase the stability a lot. |
This comment has been minimized.
This comment has been minimized.
|
With master I get something a but different... node1 seems to transmit very fast. NODE0 NODE1 |
|
@MrKevinWeiss Strange, I have stress tested over hours, but I have never seen that the UART interface is affected. UART is realized in Hardware with a buffer oft 128 characters. |
|
I am not too worried about it, I am running things through a usb hub. Keep in mind it is a different board (node1=esp32-wrover-32). I will investigate! |
|
Yup, just my usb ports, switched to the ones directly connected to the cpu and have 0% packet loss with 1000 pings. The testing shows a noticeable improvement! 👍 |
|
@MrKevinWeiss Thanks for testing 😄 |
|
How do we continue with this PR? PR #9917 depends these changes. |
| * If the NDEBUG macro is undefined, an assertion is used instead for | ||
| * debugging purposes. | ||
| */ | ||
| assert(!_in_recv_cb); |
There was a problem hiding this comment.
this seems redundant in light of the following if, or not?
There was a problem hiding this comment.
It was added intentionally rather to see when this happens in stress tests during the development process than to suppress it silently.
Should I remove it?
There was a problem hiding this comment.
mhm, I'see ... the assert will result in a crash, you could a DEBUG message inside the if below if you want to avoid the crash an just see when (and how often) this happens.
TL;DR: depends on you intention and use case: if you want to the app to crash first time this happens use assert (so leave as is) otherwise better use DEBUG.
There was a problem hiding this comment.
The problem with a DEBUG message is that you would miss it in the set of other debug messages, especially during stress tests.
| /* | ||
| * Since we are not in the interrupt context, we do not have to pass | ||
| * `NETDEV_EVENT_ISR` first. We can call the receive function directly. | ||
| * But we have to unlock the mutex and enable interrupts before. |
There was a problem hiding this comment.
The comments is from the time when a mutex was used. I forgot to change it it when I removed 😉
There was a problem hiding this comment.
please remove then, afterwards this seems good to go
There was a problem hiding this comment.
Of course. Thank you for the review.
|
I did a really quick test of the changes, and they indeed do increase the stability to the point where I was unable to make anything bad or unexpected happen. Looking forward to this and in turn the esp8266 PR being merged. |
assert was added intentionally for debugging purposes. For released version it is enough that function returns.
|
@TimoRoth Thanks for testing again 😄 |
|
@TimoRoth thx for reporting, with the tests by @MrKevinWeiss I'd say this is good to go then! |
|
Thank you all for your support. |
Contribution description
This PR provides a fix of problem 2 in issue #10682.
As described in #10682, a single ping without data was not possible anymore after heavy load with ping timeouts. The reason was that ISR events got lost and messed up the ringbuffer used in
esp_now_netdev.This PR is rather an evolution than only a fix. All different steps are realized as separate commits which contain the following changes:
_recvfunction ofesp_now_netdevsimplified according to the possible parameter values._recvfunction ofesp_now_netdevupdated to drop the frame as documented in new API if the parameterlenis smaller than the received frame.NETDEV_EVENT_ISRevents inesp_now_netdevfixed.In further extensive stress tests the following could be figured out:
esp_now_recv_cbis executed in the context of thewifithread (this was already known).wifithread.wifithread then processes the events sequentially and executes registered callback functions likeesp_now_recv_cbasynchronously.NETDEV_EVENT_ISRapproach in GNRC.This results in the following facts:
esp_now_recv_cbwill not be reentered.esp_now_recv_cbis executed in the context of the highest priority thread and is therefore not interrupted or superseded by other threads.These facts allow the following optimizations which are provided step by step as separate commits.
NETDEV_EVENT_ISRevents are not needed. The execution ofnetif::_recvand thusesp_now_netdev::_recvcan be triggered directly withNETDEV_EVENT_RX_COMPLETEon the receiption of a frame inesp_now_recv_cb.esp_now_recv_cbandesp_now_netdev::_recvare executed sequentially in same thread context.esp_now_netdev::_recvis executed immediately inesp_now_recv_cband in same thread context, there is no need for a ring buffer anymore.wifithread.Although function
esp_now_recv_cbwill not be reentered, it is secured by an additional boolean variable to avoid potential data inconsistencies. If the NDEBUG macro is undefined, an assertion is used instead for debugging purposes.Testing procedure
Use at least two or more ESP32 nodes and flash them with
examples/gnrc_networking, e.g.:Use the
pingcommand to ping from each node another node to produce heavy load, either with two nodesor with three nodes
A further test is bombarding one node from two other nodes:
Issues/PRs references
This PR fixes #10682 problem 2, the stucked gnrc buffer problem remains.
This PR is a follow-on PR of #10581, #10679, #10680.