core/sched_switch: fix crash with no active thread#20878
core/sched_switch: fix crash with no active thread#20878maribu wants to merge 2 commits intoRIOT-OS:masterfrom
Conversation
The function `sched_switch()` was implemented with the assumption that there will always be an active thread. This was true until threadless idle was implemented for Cortex M MCUs: If no thread is runnable and the running thread exists, there will be no active thread. If from ISR a thread is then unblocked, `sched_switch()` will be called without an active thread. This handles the corner case explicitly when the module `core_idle_thread` is not used (in other words: for threadless idle). Fixes RIOT-OS#20812
| /* If a thread exists and no other thread is runnable, we may end up in | ||
| * a situation with no active thread. This can not occur if there is an | ||
| * idle thread, which is always runnable, though */ | ||
| if (!IS_USED(MODULE_CORE_IDLE_THREAD) && unlikely(active_thread == NULL)) { |
There was a problem hiding this comment.
This change is correct, bit I'm not sure this is holding it right. thread_yield_higher() can always be used and is actually the preferred choice. sched_switch() was an optimization path for cases like unlocking a mutex or waking up another thread, from some thread, to only invoke the full scheduler if needed (as the target thread / priority is basically known).
If this is actually called from ISRs, that is the only way there is no active_thread. It might make more sense to encode this properly. As in, if there is no active thread, the irq_is_in() below becomes redundant, that is implicit. And in that case, sched_context_switch_request = 1 might be the correct thing to do.
Or, if irq_is_in(), always trigger scheduler, otherwise, assume not in ISR and that there is an active_thread.
(In theory, this function can be used to speed up task switching by quite a bit for some cases, b/c we could directly switch to the receiving thread, skipping register storing for the calling thread. we never implemented that, though.)
There was a problem hiding this comment.
sched_switch()was an optimization path for cases like unlocking a mutex or waking up another thread, from some thread, to only invoke the full scheduler if needed (as the target thread / priority is basically known).
This makes it sound like an ugly hack / premature optimization. In normal workloads contention over a mutex is pretty unlikely and the hot path is the case a mutex_unlock() will not find any waiters. And for real time behavior: Adding a sched_switch() prior to the call to thread_yield_higher() increases the worst case latency at the hope to occasionally omit the call to thread_yield_higher().
I wonder if just getting rid of sched_switch() is the better approach here. See #20890 for removing it for mutex_unlock(). There are a few other calls to sched_switch() that I think might happen from IRQ context to also address to actually fix the issue.
|
I suppose we can close this now? |
|
If I recall correctly, the same issue could be triggered when calling So, not yet. Maybe calling |
Contribution description
The function
sched_switch()was implemented with the assumption that there will always be an active thread. This was true until threadless idle was implemented for Cortex M MCUs: If no thread is runnable and the running thread exists, there will be no active thread. If from ISR a thread is then unblocked,sched_switch()will be called without an active thread.This handles the corner case explicitly when the module
core_idle_threadis not used (in other words: for threadless idle).Implementation choice
I first wanted to avoid adding the check for this very unlikely corner case to this hot path, and rather have in
cpu_switch_context_exit()of the only offender to make sure to set the active thread to something reasonable. However, switching to a non-runnable thread without running it is something that I have no elegant, simple and maintainable solution for. So I decided to rather add a simple and maintainable check into a hot path.Testing procedure
See #20812
Issues/PRs references
Fixes #20812