cpu/esp32/esp-idf: fix of a memory leak when using esp-now#21975
Conversation
ESP-IDF uses `pthread` local storage functions such as `pthread_setspecific` and `pthread_getspecific` to store a thread-associated counting semaphore used by `libieee80211.a` without creating actual `pthread`s with `pthread_create` function. This is realized in ESP-IDF by allocating additional space in the FreeRTOS task control block and implementing their own `pthread_setspecific` and `pthread_getspecific` to access it. This type of implementation isn't possible with the RIOT module `pthread`. However, it seems that only one such semaphore is used per thread, so we can simulate it using a simple semaphore array.
cfedcbf to
ab19124
Compare
| * However, it seems that only one such semaphore is used per thread, so we | ||
| * can simulate it using a simple semaphore array. | ||
| */ | ||
| static SemaphoreHandle_t _wifi_thread_sem[KERNEL_PID_LAST + 1] = { }; |
There was a problem hiding this comment.
I guess since there is only one WiFi thread, a single one would also do?
But yea, a SemaphoreHandle_t is just a pointer, so this isn't too bad.
Did I understand it correctly that ESP IDF does not use RIOT's pthread_setspecific() but instead provides it's own version?
There was a problem hiding this comment.
I guess since there is only one WiFi thread, a single one would also do?
The function wifi_thread_semphr_get_wrapper is called by ieee80211_ioctl (libnet80211.a) in different threads, namely in thread netif-esp-now when the scan for peers is started, and in thread sys_evt when the scan is complete.
To be honest, I have no idea what the purpose of the binary semaphore is, how it is used by ieee80211_ioctl and whether it really has to be thread-specific or not.
I tried to use a single semaphore before I realized that array which also works.
Did I understand it correctly that ESP IDF does not use RIOT's
pthread_setspecific()but instead provides it's own version?
ESP-IDF implements its own POSIX threads, but we don't use them/can't use them to avoid conflicts with RIOT's POSIX thread implementation. In fact, RIOT's POSIX Thread functions are also used by ESP-IDF.
There was a problem hiding this comment.
Maybe we should try a single semaphore and test it a bit more.
There was a problem hiding this comment.
The only comment for the code I found in ESP-IDF repository (espressif/esp-idf@197e0ae):
esp_wifi: fix some WiFi bugs
Fix following WiFi bugs:
1. Make smartconfig thread-safe
2. Fix WiFi stop/deinit memory leak
3. Refactor for WiFi init/deinit/ioctl etc
4. Add declaration for esp_wifi_internal_ioctl()
5. Fix the bug that WiFi stop leads to task watchdog
There was a problem hiding this comment.
Maybe we should try a single semaphore and test it a bit more.
would be great if that sufficed (maybe we need a 2nd one for the ESPs with two WiFi interfaces?)
I mean it's a hack, so we shouldn't try to make the hack too generic since that gives a false impression of what it does.
There was a problem hiding this comment.
I tried to debug it a little bit more to understand the purpose of these semaphores.
Function wifi_thread_semphr_get_wrapper respective riot_wifi_thread_semphr_get is called only during the WiFi station management, i.e. when scanning for APs (like in scanning for ESP-NOW peers), connecting to and disconnecting from an AP.
This means that the memory leak with esp-now caused by missing semaphores is also a problem with esp-wifi, but unlike esp-now, esp_wifi does not search for APs (peers) every 10 seconds, so we didn't notice this memory leak.
I really have no idea what sense it makes that each thread hold its own binary semaphore, unless the semaphore is cached in the depths of ieee80211_ioctl and xSemaphoreGive is later executed asynchronously for it, even if the thread has already switched again 🤔
It tried it with a single semaphore but esp-wifi seems to work only with a semaphore per thread. The main thread is blocked permanently in this case because it seems that it tries to take the single semaphore at some time during the initialization while the semaphore is already taken by something else. The node connects to the AP but the command prompt never appears. Unfortunately, I can't debug it, because connecting to the AP doesn't work if conditional breakpoints are set, probably due to timeouts.
That is, we need the array of semaphores.
There was a problem hiding this comment.
would be great if that sufficed (maybe we need a 2nd one for the ESPs with two WiFi interfaces?)
Both interfaces use the same WiFi interface and the same wifi thread.
There was a problem hiding this comment.
Border router example also still works with this fix.
benpicco
left a comment
There was a problem hiding this comment.
If you find that we really need the possibility to have a semaphore for each thread, go ahead and hit merge.
But if it's possible to narrow down the scope of the hack, I'd welcome that.
|
@benpicco Thanks for reviewing. |
Contribution description
This PR fixes a memory leak when using
esp-now.When using
esp-now, thelibieee80211.alibrary uses the ESP-IDF functionwifi_thread_semphr_get_wrapperwhen scanning for peers to get a thread-associated counting semaphore that is created automatically by this function if it doesn't exist.For that purpose function
wifi_thread_semphr_get_wrapperuses thepthreadlocal storage functionspthread_setspecificandpthread_getspecificwithout creating actualpthreads by thepthread_createfunction. This is realized in ESP-IDF by allocating additional space in the FreeRTOS task control block and implementing their ownpthread_setspecificandpthread_getspecificto access it. That is, each task has its local storage that can be accessed bypthreadlocal storage functions. The function also creates the counting semaphore automatically if thepthread_getspecificfunction returnsNULL.In RIOT,
pthreadlocal storage access only works if threads are created before aspthreads bypthread_create, which is usually not the case. Therefore,pthread_getspecificreturns alwaysNULLwhen executing the functionwifi_thread_semphr_get_wrapper, and a new counting semaphore is created each time it is executed.It seems that only one such semaphore is used per thread, so RIOT can simulate this behavior by using a simple semaphore array and replacing the function
wifi_thread_semphr_get_wrapperin ESP-IDF.I think it fixes issue #21923.
Testing procedure
Compile and flash the GNRC networking example to one node, for example:
Without the PR, the size of allocated memory increases by 80 bytes every 10 seconds (during scanning peers).
With the PR, the size of allocated memory should remain the same.
Issues/PRs references
Fixes issue #21923