-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
findRunnable makes a copy of the slice header of allp because once findRunnable drops its P a STW can change allp without synchronization (changing length and possibly allocating a new backing array) via procresize (GOMAXPROCS change).
"Possibly allocating a new backing array" is the problem, since allp is simply a standard heap allocation. allpSnapshot is on the system stack of an M without a P. This means that (a) STW can proceed without stopping the M in findRunnable and (b) the GC will not scan allpSnapshot. Thus, we could have this sequence:
- M1 copies the
allpslice header toallpSnapshot. - M1 drops its P.
- M2 calls runtime.GOMAXPROCS.
- STW need not stop M1.
- procresize reallocates the
allpbacking array. The old array is now only referenced by M1'sallpSnapshot. - World restarts.
- M2 triggers a GC.
- STW need not stop M1.
- GC does not scan M1's system stack, so it does not find a reference to the old
allparray. - Old
allparray is freed. - Word restarts.
- M2 allocates something which happens to reuse the same memory as the old
allparray, which zeroes it (and then maybe writes to it). - M1 reads from
allpSnapshot, reading the now-clobbered array.
This is only possible if the GOMAXPROCS increases beyond the initial startup value (to trigger reallocation).
This also requires M1 to run really slowly to lose the race. M2 needs to do multiple stop-the-worlds and run an entire GC all before M1 manages to finish using allpSnapshot. That seems pretty far-fetched, but could be possible if the kernel deschedules M1.
cc @golang/runtime