[PoC] core: remove "volatile" from scheduler variables. by jcarrano · Pull Request #11073 · RIOT-OS/RIOT

jcarrano · 2019-02-27T15:00:15Z

Contribution description

NOTE: The main purpose of this PR is to test the change in the CI.
NOTE2: The explanations are too long to paste here, read the commit messages.

This removes "volatile" from

active_thread
active_pid
sched_context_switch_request
sched_threads .

There is no reason for these variables to be volatile:

In application (thread) context, the "active_thread" never changes- if thread X is reading sched_active_pid, the value is always X whenever thread X is active.
In kernel context, it is the kernel that changes the value, so again, it needs not be volatile.

With regards to the contents of the thread structure itself, sometimes it is important that it is re-read (for example in msg_* functions). Here, however, there are calls to extern-defined functions to yield, etc. Since the compiler cannot know if these functions alter global memory, it is forced to emit code re-read these variables, so again volatile is unncesessary (maybe if using LTO it can "smart up" and do something stupid.) If later it turns out that an explicit barrier is necessary, it can be added.

For more information on this discussion, see this thread:

https://lists.riot-os.org/pipermail/devel/2019-January/006077.html

And for more info on barriers vs volatile, see:

Testing procedure

Run ALL the tests.

Related Issues

#252, #10177, also see the thread in the devel mailing list.

There is no reason for these two variables to be volatile: * In application (thread) context, the "active_thread" never changes- if thread X is reading sched_active_pid, the value is always X whenever thread X is active. * In kernel context, it is the kernel that changes the value, so again, it needs not be volatile. With regards to the contents of the thread structure itself, sometimes it is important that it is re-read (for example in msg_* functions). Here, however, there are calls to extern-defined functions to yield, etc. Since the compiler cannot know if these functions alter global memory, it is forced to emit code re-read these variables, so again volatile is unncesessary (maybe if using LTO it can "smart up" and do something stupid.) If later it turns out that an explicit barrier is necessary, it can be added. For more information on this discussion, see this thread: * https://lists.riot-os.org/pipermail/devel/2019-January/006077.html And for more info on barriers vs volatile, see: * https://www.kernel.org/doc/html/v4.17/process/volatile-considered-harmful.html * https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html * https://yarchive.net/comp/linux/memory_barriers.html

sched_context_switch_request ---------------------------- sched_context_switch_request is only set and read from interrupt context, as a way to signal the kernel that after the handler has run it has to switch threads. Because the interrupt context is essentially single threaded, then all the regular guarantees of single-theaded C apply. In architectures with interrupt preemption this may not be true, but since the setting and querying the request flag happens in separate functions, it is always reread. It could still be possible that the setting and querying happen in inline functions and that could lead the compiler to think that no modification of this value is possible and optimize the read out. It is convenient, therefore, to insert a compiler barrier, that is cheap (or free) and solves the problem perfectly. sched_threads ------------- When running in the context of a thread, this array has all volatile elements, except for the one referring to the current thread. For obvious reasons, this is the only one that does not change and can safely be accessed as if it was not volatile. The contents of this thread structure, as well as the other members of the sched_threads array can change. But even in this case volatile does not solve anything: correctly accessing the members of a shared data structure needs either atomics or disabling interrupts. The latter is what's done in RIOT. In interrupt or kernel context, these structures should not change. For this reason this variable should not be volatile. The same considereations as with sched_context_switch_request apply with respect to preemmptible IRQs. Again, it may be necessary to insert barriers.

Now that the variables are no longer volatile, the definitions conflict. This makes the test compile again and hopefully validates that removing the volatile qualifier is OK.

maribu · 2019-03-06T08:55:42Z

@kaspar030, @keestux: You might want to have a look here. What would be the best way to test this? Are tests/thread_* fitting the bill?

keestux · 2019-03-17T17:18:31Z

core/include/thread.h

 static inline kernel_pid_t thread_getpid(void)
 {
-    extern volatile kernel_pid_t sched_active_pid;
+    extern kernel_pid_t sched_active_pid;


Why is this extern declaration needed anyway? We do include sched.h ...

You're right, it should not be needed. I'd guess it was needed once upon a time...

keestux · 2019-03-17T18:24:53Z

The changes look good to me. I've manually checked a few tests and they give the expected output.

keestux · 2019-03-17T18:27:03Z

How about grepping for (thread_t*)sched_active_thread and remove the casts? Something like
\(thread_t\s*\*\)\s*sched_active_thread gives 32 hits

keestux · 2019-03-17T18:37:24Z

More things to weed out.

sys/posix/pthread/pthread.c:285:    kernel_pid_t pid = sched_active_pid; /* sched_active_pid is volatile */

keestux · 2019-03-17T18:43:46Z

What's our thought on this one?

volatile thread_t *thread_get(kernel_pid_t pid);

Can we drop volatile here too? I think we can. And if we do, we can remove a few casts in the process, like this one

cpu/kinetis/periph/i2c.c:461:    thread_flags_set((thread_t *)thread_get(pid), THREAD_FLAG_KINETIS_I2C);

keestux · 2019-03-17T18:54:07Z

I see that the subject of the PR is PoC (Proof of Concept?). In that case we don't care too much about completeness, just that this is the right track. I hope we can soon come to an agreement that the changes are correct and that we can continue cleaning (spring cleaning :-)

keestux · 2019-03-17T19:21:43Z

tests/cond_order/main.c

 #define THREAD_FIRSTGROUP_NUMOF (3U)

-extern volatile thread_t *sched_active_thread;
+extern thread_t *sched_active_thread;


I suggest we include sched.h instead of doing this declaration. (same for tests/mutex_order)

I think there is an API call for that which should be used instead of accessing the variable directly

If there is an API then quite some code is ignoring that API.

I was wrong, there only is thread_get(kernel_pid_t pid). One could use thread_get(thread_getpid()), but would be less efficient. Maybe anothrr static inline function should be added for that use case?

One could use thread_get(thread_getpid()), but would be less efficient. Maybe anothrr static inline function should be added for that use case?

+1 for another function. Maybe thread_get_current()? That would be a nice PR of its own.

Would it be an abuse to make thread_get(KERNEL_PID_UNDEF) get the current thread?

I'd say that thread_get(KERNEL_PID_UNDEF) is less readable than thread_t * thread_get_current(void).

jcarrano · 2019-03-19T12:01:09Z

@keestux

More things to weed out.

Yes, there's quite some things. I was planning on that as the next step. git-grep should help here.

I hope we can soon come to an agreement that the changes are correct and that we can continue cleaning (spring cleaning :-)

That was what I was looking for.

miri64 · 2020-06-25T09:41:26Z

What is the state here? Obviously the discussion around this PoC died down, so can it be closed?

maribu · 2020-06-25T10:48:15Z

This PR needs some love and rigorous testing. It could very well be that the volatile does mask some bug. But the volatile type qualifier should definitely not be used here. This will prevent a lot of optimization from happening.

E.g. if one function would repeatedly read sched_active_pid, without volatile the compiler would be allowed to only read it the first time to a register and allow subsequent accesses being served from that register. (Note: Registers are saved and stored during context switches, so even if the value of sched_active_pid changes when a thread is interrupted, after the thread is scheduled again both sched_active_pid in memory as well as the register are restored. So from the perspective of a single thread, sched_active_pid is constant. Only the scheduler itself needs to take some care accessing it. But a memory barrier should be in place at every context switch anyway, which would spill all registers to SRAM. Only if this barrier is missing (which would be a bug), the volatile would indeed change behavior other than slowing RIOT down.)

fjmolinas · 2021-11-18T09:13:26Z

I think this kind of PR falls into what @maribu has been pushing for replacing by atomics, or irq_disable/enable calls, maybe you would want to take over?

maribu · 2021-11-18T12:45:03Z

I think this kind of PR falls into what @maribu has been pushing for replacing by atomics, or irq_disable/enable calls, maybe you would want to take over?

In fact, most of this fight was already won when we changed the public API to no longer show any trace of volatile. Getting rid of it should be relatively straight forward now.

Note that if irq_disable(), irq_restore(), and irq_enable() are missing (compiler) memory barriers, the drop of volatile will trigger bugs. I wonder if I can write a test the explicitly checks for those barriers. It wasn't too long ago that the RISC-V implementation was still missing this barrier that caused sporadic memory corruptions and failing tests, so there might be other implementations also having them.

mguetschow · 2024-06-11T10:49:01Z

Would you be up for picking this up again @maribu?

jcarrano changed the title ~~[PoC] core: remove "volatile" from active_thread, active_pid.~~ [PoC] core: remove "volatile" from scheduler variables. Feb 27, 2019

jcarrano added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Feb 27, 2019

tests: remove volatile qualifiers from sched_active_thread.

d3a0136

Now that the variables are no longer volatile, the definitions conflict. This makes the test compile again and hopefully validates that removing the volatile qualifier is OK.

jcarrano added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Feb 27, 2019

keestux reviewed Mar 17, 2019

View reviewed changes

maribu mentioned this pull request Apr 24, 2019

sys: Added simple memory barrier API #11438

Closed

keestux mentioned this pull request Jun 4, 2020

tests/thread_flood: account for all threads already in use #14181

Merged

maribu mentioned this pull request Jul 27, 2020

core: add comments to symbols used by OpenOCD #14617

Merged

MrKevinWeiss added this to the Release 2021.07 milestone Jun 22, 2021

MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021

Conversation

jcarrano commented Feb 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution description

Testing procedure

Related Issues

Uh oh!

maribu commented Mar 6, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keestux commented Mar 17, 2019

Uh oh!

keestux commented Mar 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keestux commented Mar 17, 2019

Uh oh!

keestux commented Mar 17, 2019

Uh oh!

keestux commented Mar 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcarrano commented Mar 19, 2019

Uh oh!

miri64 commented Jun 25, 2020

Uh oh!

maribu commented Jun 25, 2020

Uh oh!

fjmolinas commented Nov 18, 2021

Uh oh!

maribu commented Nov 18, 2021

Uh oh!

mguetschow commented Jun 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jcarrano commented Feb 27, 2019 •

edited

Loading

keestux commented Mar 17, 2019 •

edited

Loading