Skip to content

cpu/esp32/esp-idf: fix of a memory leak when using esp-now#21975

Merged
gschorcht merged 2 commits intoRIOT-OS:masterfrom
gschorcht:cpu/esp32/fix_esp_idf_wifi_thread_semaphor
Jan 14, 2026
Merged

cpu/esp32/esp-idf: fix of a memory leak when using esp-now#21975
gschorcht merged 2 commits intoRIOT-OS:masterfrom
gschorcht:cpu/esp32/fix_esp_idf_wifi_thread_semaphor

Conversation

@gschorcht
Copy link
Copy Markdown
Contributor

@gschorcht gschorcht commented Jan 11, 2026

Contribution description

This PR fixes a memory leak when using esp-now.

When using esp-now, the libieee80211.a library uses the ESP-IDF function wifi_thread_semphr_get_wrapper when scanning for peers to get a thread-associated counting semaphore that is created automatically by this function if it doesn't exist.

For that purpose function wifi_thread_semphr_get_wrapper uses the pthread local storage functions pthread_setspecific and pthread_getspecific without creating actual pthreads by the pthread_create function. This is realized in ESP-IDF by allocating additional space in the FreeRTOS task control block and implementing their own pthread_setspecific and pthread_getspecific to access it. That is, each task has its local storage that can be accessed by pthread local storage functions. The function also creates the counting semaphore automatically if the pthread_getspecific function returns NULL.

In RIOT, pthread local storage access only works if threads are created before as pthreads by pthread_create, which is usually not the case. Therefore, pthread_getspecific returns always NULL when executing the function wifi_thread_semphr_get_wrapper, and a new counting semaphore is created each time it is executed.

It seems that only one such semaphore is used per thread, so RIOT can simulate this behavior by using a simple semaphore array and replacing the function wifi_thread_semphr_get_wrapper in ESP-IDF.

I think it fixes issue #21923.

Testing procedure

Compile and flash the GNRC networking example to one node, for example:

USEMODULE='shell_cmd_heap' BOARD=esp32s3-devkit make -C examples/networking/gnrc/networking flash term

Without the PR, the size of allocated memory increases by 80 bytes every 10 seconds (during scanning peers).

2026-01-11 12:05:45,940 # main(): This is RIOT! (Version: 2026.01-devel-317-g66a6e)
2026-01-11 12:05:45,945 # RIOT network stack example application
2026-01-11 12:05:45,947 # All up, running the shell now
> heap
2026-01-11 12:06:00,072 # heap
2026-01-11 12:06:00,082 # heap: 341668 (used 103200, free 238468) [bytes]
> heap
2026-01-11 12:06:10,532 # heap
2026-01-11 12:06:10,534 # heap: 341660 (used 103272, free 238388) [bytes]
> heap
2026-01-11 12:06:20,292 # heap
2026-01-11 12:06:20,294 # heap: 341652 (used 103344, free 238308) [bytes]
> heap
2026-01-11 12:06:30,319 # heap
2026-01-11 12:06:30,320 # heap: 341644 (used 103416, free 238228) [bytes]
> 

With the PR, the size of allocated memory should remain the same.

2026-01-11 12:08:47,999 # main(): This is RIOT! (Version: 2026.01-devel-319-gcfedc-cpu/esp32/fix_esp_idf_wifi_thread_semaphor)
2026-01-11 12:08:48,000 # RIOT network stack example application
2026-01-11 12:08:48,003 # All up, running the shell now
> heap
2026-01-11 12:09:00,382 # heap
2026-01-11 12:09:00,384 # heap: 341572 (used 102948, free 238624) [bytes]
> heap
2026-01-11 12:09:10,361 # heap
2026-01-11 12:09:10,373 # heap: 341572 (used 102948, free 238624) [bytes]
> heap
2026-01-11 12:09:20,325 # heap
2026-01-11 12:09:20,328 # heap: 341572 (used 102948, free 238624) [bytes]
> heap
2026-01-11 12:09:30,365 # heap
2026-01-11 12:09:30,367 # heap: 341572 (used 102948, free 238624) [bytes]
> 

Issues/PRs references

Fixes issue #21923

@github-actions github-actions bot added Area: pkg Area: External package ports Platform: ESP Platform: This PR/issue effects ESP-based platforms Area: cpu Area: CPU/MCU ports labels Jan 11, 2026
@gschorcht gschorcht added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jan 11, 2026
@gschorcht gschorcht requested a review from benpicco January 11, 2026 11:18
ESP-IDF uses `pthread` local storage functions such as `pthread_setspecific`
and `pthread_getspecific` to store a thread-associated counting semaphore
used by `libieee80211.a` without creating actual `pthread`s with
`pthread_create` function. This is realized in ESP-IDF by allocating
additional space in the FreeRTOS task control block and implementing
their own `pthread_setspecific` and `pthread_getspecific` to access it.

This type of implementation isn't possible with the RIOT module `pthread`.
However, it seems that only one such semaphore is used per thread, so we
can simulate it using a simple semaphore array.
@gschorcht gschorcht force-pushed the cpu/esp32/fix_esp_idf_wifi_thread_semaphor branch from cfedcbf to ab19124 Compare January 11, 2026 11:24
@riot-ci
Copy link
Copy Markdown

riot-ci commented Jan 11, 2026

Murdock results

✔️ PASSED

ab19124 pkg/esp32_sdk: replace wifi_thread_semphr_get_wrapper

Success Failures Total Runtime
10983 0 10984 10m:58s

Artifacts

* However, it seems that only one such semaphore is used per thread, so we
* can simulate it using a simple semaphore array.
*/
static SemaphoreHandle_t _wifi_thread_sem[KERNEL_PID_LAST + 1] = { };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess since there is only one WiFi thread, a single one would also do?
But yea, a SemaphoreHandle_t is just a pointer, so this isn't too bad.

Did I understand it correctly that ESP IDF does not use RIOT's pthread_setspecific() but instead provides it's own version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess since there is only one WiFi thread, a single one would also do?

The function wifi_thread_semphr_get_wrapper is called by ieee80211_ioctl (libnet80211.a) in different threads, namely in thread netif-esp-now when the scan for peers is started, and in thread sys_evt when the scan is complete.

To be honest, I have no idea what the purpose of the binary semaphore is, how it is used by ieee80211_ioctl and whether it really has to be thread-specific or not.

I tried to use a single semaphore before I realized that array which also works.

Did I understand it correctly that ESP IDF does not use RIOT's pthread_setspecific() but instead provides it's own version?

ESP-IDF implements its own POSIX threads, but we don't use them/can't use them to avoid conflicts with RIOT's POSIX thread implementation. In fact, RIOT's POSIX Thread functions are also used by ESP-IDF.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should try a single semaphore and test it a bit more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only comment for the code I found in ESP-IDF repository (espressif/esp-idf@197e0ae):

esp_wifi: fix some WiFi bugs
Fix following WiFi bugs:
1. Make smartconfig thread-safe
2. Fix WiFi stop/deinit memory leak
3. Refactor for WiFi init/deinit/ioctl etc
4. Add declaration for esp_wifi_internal_ioctl()
5. Fix the bug that WiFi stop leads to task watchdog

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should try a single semaphore and test it a bit more.

would be great if that sufficed (maybe we need a 2nd one for the ESPs with two WiFi interfaces?)

I mean it's a hack, so we shouldn't try to make the hack too generic since that gives a false impression of what it does.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to debug it a little bit more to understand the purpose of these semaphores.

Function wifi_thread_semphr_get_wrapper respective riot_wifi_thread_semphr_get is called only during the WiFi station management, i.e. when scanning for APs (like in scanning for ESP-NOW peers), connecting to and disconnecting from an AP.

This means that the memory leak with esp-now caused by missing semaphores is also a problem with esp-wifi, but unlike esp-now, esp_wifi does not search for APs (peers) every 10 seconds, so we didn't notice this memory leak.

I really have no idea what sense it makes that each thread hold its own binary semaphore, unless the semaphore is cached in the depths of ieee80211_ioctl and xSemaphoreGive is later executed asynchronously for it, even if the thread has already switched again 🤔

It tried it with a single semaphore but esp-wifi seems to work only with a semaphore per thread. The main thread is blocked permanently in this case because it seems that it tries to take the single semaphore at some time during the initialization while the semaphore is already taken by something else. The node connects to the AP but the command prompt never appears. Unfortunately, I can't debug it, because connecting to the AP doesn't work if conditional breakpoints are set, probably due to timeouts.

That is, we need the array of semaphores.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great if that sufficed (maybe we need a 2nd one for the ESPs with two WiFi interfaces?)

Both interfaces use the same WiFi interface and the same wifi thread.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Border router example also still works with this fix.

Copy link
Copy Markdown
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you find that we really need the possibility to have a semaphore for each thread, go ahead and hit merge.

But if it's possible to narrow down the scope of the hack, I'd welcome that.

@gschorcht gschorcht added this pull request to the merge queue Jan 14, 2026
Merged via the queue into RIOT-OS:master with commit 96d9c50 Jan 14, 2026
34 checks passed
@gschorcht
Copy link
Copy Markdown
Contributor Author

@benpicco Thanks for reviewing.

@gschorcht gschorcht deleted the cpu/esp32/fix_esp_idf_wifi_thread_semaphor branch January 14, 2026 19:06
@leandrolanzieri leandrolanzieri added this to the Release 2026.01 milestone Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: cpu Area: CPU/MCU ports Area: pkg Area: External package ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ESP Platform: This PR/issue effects ESP-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants