Skip to content

cpu: cortexm_common: set pendSV to default priority#3450

Closed
kaspar030 wants to merge 1 commit intoRIOT-OS:masterfrom
kaspar030:set_pendsv_to_default_priority
Closed

cpu: cortexm_common: set pendSV to default priority#3450
kaspar030 wants to merge 1 commit intoRIOT-OS:masterfrom
kaspar030:set_pendsv_to_default_priority

Conversation

@kaspar030
Copy link
Copy Markdown
Contributor

While debugging #3438, some strange, timing dependent behaviour caused the thread calling uart_stdio_read() to never return from mutex_lock(), even when that mutex gets unlocked from within the ISR.

This problem shows only at very high baudrates (e.g., 500000).

Setting pendSV makes it vanish.

Ideas?

@kaspar030 kaspar030 added Platform: ARM Platform: This PR/issue effects ARM-based platforms Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR labels Jul 19, 2015
@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 19, 2015

Works for me, but I can't say I know what this is about…

@jnohlgard
Copy link
Copy Markdown
Member

Is this really the correct solution to the problem?
PendSV is used to perform a context switch between different threads. I think that having that at the same IRQ priority as the other ISRs can cause problems with ISRs being delayed, or am I just imagining problems that don't exist?

From the description of this and #3438 I think @kaspar030 came to the conclusion that the UART ISR is running too often to let the PendSV ISR finish its task switching, is that correct?

@OlegHahm
Copy link
Copy Markdown
Member

Is this really the correct solution to the problem?

I would say it is certainly not, but may be a workaround.

PendSV is used to perform a context switch between different threads. I think that having that at the same IRQ priority as the other ISRs can cause problems with ISRs being delayed, or am I just imagining problems that don't exist?

I'm not sure if it would cause real issues, but definitely would decrease the performance.

From the description of this and #3438 I think @kaspar030 came to the conclusion that the UART ISR is running too often to let the PendSV ISR finish its task switching, is that correct?

Yes, I guess something like this could indeed cause the problem.

@cgundogan
Copy link
Copy Markdown
Member

I actually would be more happy with disabling/enabling interrupts during the PendSV instead of raising the priority of it in order to forbid nesting during PendSV.

IMHO the PendSV was really thought to be at the end of the chain and this might be what other people assume and expect when they use RIOT. But again, I am not expert on this matter and I am not sure how this would effect the performance.

@thomaseichinger
Copy link
Copy Markdown
Member

I vaguely remember problems with context switching while porting some cortex M platforms with pendSV not set to 0xff. Maybe @haukepetersen can recall better?

@jnohlgard
Copy link
Copy Markdown
Member

I think we need a longer explanation of why this change fixes the problems with high baud rate UART communication before we can continue.
@kaspar030 which threads are actually given time to run when you were debugging?

An hypothesis: Some other thread with a higher priority was getting in the way of the shell getchar and before they had finished executing, the next UART byte came, and therefore only the higher priority thread got to run at all.

@OlegHahm
Copy link
Copy Markdown
Member

@gebart, IIRC @kaspar030 just tested with the simplest possible application with only idle and main thread.

@kaspar030
Copy link
Copy Markdown
Contributor Author

@gebart I've used the tests/shell example, but using getchar+putchar instead of uart0.

@haukepetersen
Copy link
Copy Markdown
Contributor

Let me try to provide some more detail on our findings. The behavior we were able to observe is the following:

Running tests/shell on iot-lab, UART configured to run at baudrate 500000, interfacing the board with pyterm, which send the characters without delaying them.

When sending short commands (with <= 3 chars, as ps), the behavior was as expected. When sending longer commands (e.g. help), the RIOT shell will hang and it becomes un-reactive. This occurs sometimes on the first time the command is send, under some circumstances it can occur when the command is repeatedly send. Holding the return key pressed and spamming the node does seem to crash the shell reliably...

Some observations when debugging:

  • the node does not hardfault, but just stays in the idle thread
  • after hanging, the UART interrupts still trigger and are handled correctly
  • changing timing very slightly (e.g. calling one more gpio_toggle()), changes the observed behavior. Sometimes one more function call somewhere could get rid of the error...

As the main thread (which runs the shell) is blocking on a mutex, it is at some point not correctly woken up from the UART interrupt callback, which unlocks that mutex. Certainly, the error is triggered by some kind of a race condition. So what we believe, is that either the atomic operations are slightly broken, or that a preemtion of the PendSV interrupt are causing the observed behavior...

@jnohlgard
Copy link
Copy Markdown
Member

Are there any hidden critical sections inside the context switch or mutex unlock which may cause this "imaginary" deadlock?

@haukepetersen
Copy link
Copy Markdown
Contributor

Not that I know of. As I see it, the 100% explanation for this bug is still missing, but this PR solves the error for now.

@OlegHahm OlegHahm added the Community: Hack'n'ACK candidate This PR is a candidate for review and discussion during one of RIOT's monthly Hack'n'ACK parties label Jul 28, 2015
@kaspar030
Copy link
Copy Markdown
Contributor Author

If pendsv can be interrupted, so can sched_run, as it is called within the pendSV exception handler, correct?

That would mean that an ISR modifying thread state by, e.g., unlocking a mutex, could do that while sched_run is scheduling.

@OlegHahm
Copy link
Copy Markdown
Member

Well, although we agree that this is a rather ugly workaround, it doesn't seem to break anything. On the other hand, I was able to reproduce the problem with a non-responsive shell on IoT-LAB_M3 in various settings. ACK!

@OlegHahm OlegHahm added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jul 28, 2015
@miri64
Copy link
Copy Markdown
Member

miri64 commented Jul 28, 2015

Replaced by #3511 because OP is sleeping and because of time pressure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Community: Hack'n'ACK candidate This PR is a candidate for review and discussion during one of RIOT's monthly Hack'n'ACK parties Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR Platform: ARM Platform: This PR/issue effects ARM-based platforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants