-
Notifications
You must be signed in to change notification settings - Fork 3.8k
runtime/runc: selectively lock events to reduce contention issues #8598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @laurazard. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Edit: I was running some more tests and this doesn't completely solve the issue – every now and then, containerd/runtime/v2/runc/task/service.go Line 536 in c7b9a95
which locks everything else out. I've got some more changes to address that which I'll include in a separate commit (tomorrow, I need sleep 😓) but for right now I'll draft this PR. UpdateAbove write up is correct, but misses another cause of delays in containerd/runtime/v2/runc/task/service.go Line 526 in 4b7145c
Specifically, containerd/runtime/v2/runc/task/service.go Line 536 in 4b7145c
containerd/pkg/process/utils.go Line 52 in 4b7145c
which contends with containerd/pkg/process/exec.go Line 173 in 4b7145c
func (e *execProcess) start(ctx context.Context) (err error) {
// The reaper may receive exit signal right after
// the container is started, before the e.pid is updated.
// In that case, we want to block the signal handler to
// access e.pid until it is updated.
e.pid.Lock()
defer e.pid.Unlock()This "preserving causality" business is annoying 😅 Alright, I've amended the commit to address this contention as well. |
| s.sendProcessEvent(e, p, c) | ||
| } | ||
| for p, c := range missing { | ||
| go tryMissing(e, p, c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event channel is closed on shutdown and shutdown is protected by the same s.mu lock. Is completing this event outside of the lock an important part of reducing the contention issue? If not, its probably safest just to ensure these complete before returning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most relevant change to resolving the contention issue is not blocking checkProcesses on processes that haven't started yet – so ensuring these complete before returning isn't a great solution and reintroduces a big part of the issue, but we can move around where we lock on s.mu to make sure we don't send events outside of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some changes, let me know if that LGTY @dmcgowan :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The channel close/send behavior is pretty annoying in Go as it necessitates additional locking and status checking to avoid a possible panic. It might warrant an explicit shutdown state variable and mutex for send/close.
Address performance issues caused by lock contention in `runtime/v2/runc/task/service.go` due to `checkProcesses()` and `Start()` both locking on `s.eventSendMU` (noticeable when starting a lot of tasks concurrently on the same container), we instead selectively lock on a per-process basis - preserving start-exit event order. Also resolves issues related to `checkProcesses()` performance due to locking while fetching the PID for processes still starting by optimistically checking already started processes, and only if needed, queue a separate routine to wait for the starting process. Signed-off-by: Laura Brehm <[email protected]>
corhere
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spy a race condition.
| if _, starting := s.processEventsLocker.Load(p.ID()); !starting { | ||
| break | ||
| } | ||
| time.Sleep(50 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do better than polling. Store a channel in s.processEventsLocker and close it when deleting the entry from the map. Then this function can receive on the channel to wait.
if startedCh, ok := s.processEventsLocker.Load(p.ID()); ok {
<-startedCh.(chan struct{})
}| containers: make(map[string]*runc.Container), | ||
| context: ctx, | ||
| events: make(chan interface{}, 128), | ||
| processEventsLocker: sync.Map{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| processEventsLocker: sync.Map{}, |
You don't need to explicitly initialize a field to its zero value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to be a map[... 😅 didn't catch that, thanks :)
| if !container.HasPid(e.Pid) { | ||
| continue | ||
| } | ||
| containers := s.containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maps are reference types, similar to slices. Copying a map by value only copies the reference to the underlying mutable data structure so this does not make it safe to access containers without holding s.mu. You would have to deep-copy the map to make this work.
| lockKey := r.ExecID | ||
| if lockKey == "" { | ||
| lockKey = r.ID | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like r.ExecID only needs to be unique for a given r.ID so I'm pretty sure the map key would need to be the tuple (r.ID, r.ExecID).
// Naming things is hard. Try to think of a better name than this.
type processEventsKey struct { ID, ExecID string }
s.processEventsLocker.Store(processEventsKey{ID: r.ID, ExecID: r.ExecID}, ...)Alternatively, have a separate map for each container, e.g. by moving it to the runc.Container struct?
|
Context
See: #8557 (comment) (writeup of the issue)
fixes #8557
What I did
Address the performance issue by reducing contention between
containerd/runtime/v2/runc/task/service.go
Line 154 in 4b7145c
and
containerd/runtime/v2/runc/task/service.go
Line 526 in 4b7145c
by removing
containerd/runtime/v2/runc/task/service.go
Line 104 in 4b7145c
and instead, individually keep track of starting processes which allows
checkProcesses()to optimistically check unblocked/already started processes, and only if necessary separately queue events for blocked processes to be retried on a separate goroutine.This model significantly reduces contention around this area, allowing
containerd/runtime/v2/runc/task/service.go
Line 526 in 4b7145c
to process the incoming events which decongests the line down to
containerd/sys/reaper/reaper_unix.go
Line 64 in 4b7145c
Benchmarks
(using the repro steps from #8557 (comment))
baseline:
with this change:
With some instrumenting, we can also confirm that the intended effect is achieved since the notify goroutine:
baseline:
with this change:
callout: I believe this solution preserves the intended behaviour re: ensuring start events are sent before exit events (however well this worked before), but PTAL/discuss/propose alternate ways of handling the issue.
(bonus) cute animal