fe310: Fix potential deadlock in thread_yield_higher#15277
fe310: Fix potential deadlock in thread_yield_higher#15277nmeum wants to merge 2 commits intoRIOT-OS:masterfrom
Conversation
Makes the code a bit more readable.
The software interrupt may be raised before the WFI instruction is reached, if it is and no other interrupt is raised afterwards the thread will deadlock as it will never awake from the WFI. I noticed this bug with a custom RISC-V simulator and cannot reproduce it on real hardware though. However, even on real hardware the WFI instruction may be implemented as a NOP, thus the current code is suboptimal either way. The change proposed here simply busy waits until the interrupt handler for the software interrupt is invoked (which sets CLINT_MSIP), since this should only take a few cycles it shouldn't have a big impact on power consumption.
|
Just tested this PR on hifive1b and it seems to work. To give some numbers, I ran
So we have ~6-7% performance drop apparently. |
|
Note that the implementation right now is not a RISC-V implementation, but specific to the fe310. So far we can rightfully assume to have the exact behavior this MCU has (including the timing). Hence, WFI cannot be a That said, obviously we want to support more RISC-V MCUs in the future. Getting rid of code relying on the implementation details of a specific MCU would certainly be a step in the right direction. Let me do an |
|
The machine code generated by this PR looks like this: Disassembly of section .text.thread_yield_higher:
00000000 <thread_yield_higher>:
0: 020007b7 lui a5,0x2000
4: 4705 li a4,1
6: c398 sw a4,0(a5)
8: 020006b7 lui a3,0x2000
c: 4785 li a5,1
0000000e <.L12>:
e: 4298 lw a4,0(a3)
10: fef70fe3 beq a4,a5,e <.L12>
14: 8082 retThat is suboptimal because the address of Disassembly of section .text.thread_yield_higher:
00000000 <thread_yield_higher>:
0: 4785 li a5,1
2: 02000737 lui a4,0x2000
6: c31c sw a5,0(a4)
00000008 <loop_until_softirq_handled>:
8: 4314 lw a3,0(a4)
a: fef68fe3 beq a3,a5,8 <loop_until_softirq_handled>
e: 8082 retpatch of
|
|
My local toolchain also generates inefficient machine code. I checked out the wrong branch by accident. |
You are still making assumptions about how many cycles it will take until the WFI instruction is reached and whether or not the interrupt is raised before it is which also depends on the code generated by the compiler et cetera. |
The function right now contains a plain volatile store and afterwards the WFI instruction. This is very valid to make some reasonable assumptions about this. |
|
I don't think this is actually that easy to reason about. For one thing, what makes you certain that an external interrupt is never raised between the store and the wfi? Regardless of whether this is classified as "bug" or not: I would personally be interested in getting this fixed as it prevents execution of vanilla RIOT on any HiFive1 simulator which doesn't have a cycle-accurate timing model. If this is just about "saving ticks": You can also rewrite the RISC-V context switch implementation to use a synchronous ECALL instead of an asynchronous software interrupt thereby eliminating the need for the busy wait. This is the technique employed by zephyr for context switching on RISC-V. I don't really see the point in optimizing the code executed before the busy wait as you suggested before. |
|
I opened a bug report in GCC regarding the additional 6 bytes of ROM usage due to not reusing the registers. I think this should eventually be fixed. |
maribu
left a comment
There was a problem hiding this comment.
ACK. It is IMO a good idea to not rely on implementation details of a specific CPU: In the future, we might want to support hardware that behaves differently. Additionally, we have users that would profit right now by being able to run this code in simulators that are not cycle accurate in their IRQ handling.
@aabadie, @bergzand: Any second thoughts? I agree with @nmeum that we should consider using ECALL rather than soft IRQs for context switches.
Contribution description
The software interrupt may be raised before the WFI instruction is reached, if it is and no other interrupt is raised afterwards the thread will deadlock as it will never awake from the WFI.
I noticed this bug with a custom RISC-V simulator and cannot reproduce it on real hardware though. However, even on real hardware the WFI instruction may be implemented as a NOP, thus the current code is (in my opinion) suboptimal either way. The change proposed here simply busy waits until the interrupt handler for the software interrupt is invoked (which sets
CLINT_MSIP), since this should only take a few cycles it shouldn't have a big impact on power consumption.Testing procedure
I encountered this bug with the following application. However, since this is a concurrency issue, which I personally cannot reproduce on real hardware (the HiFive1), it's probably difficult to test this PR using the application. I would instead suggest reasoning based on my observations and the suggested changes.
Due to the bug the bug in
thread_yield_higherthe mutex (used internally byxtimer) would deadlock here:RIOT/core/mutex.c
Lines 61 to 65 in 57264c5
I think this example also illustrates why the current implementation of
thread_yield_highermight be a bad idea, given thatWFImight be implemented as a NOP.Issues/PRs references
I have only tested this on the HiFive1, not the HiFive1b. Maybe this could explain some of the test failures on the HiFive1b (xtimer failures in #13086) were timing might differ from the HiFive1 in terms of interrupt delivery? Just a wild guess...