runtime/runc: selectively lock events to reduce contention issues #8598

laurazard · 2023-05-30T14:11:59Z

Context

See: #8557 (comment) (writeup of the issue)
fixes #8557

What I did

Address the performance issue by reducing contention between

containerd/runtime/v2/runc/task/service.go

Line 154 in 4b7145c

    
           func (s *service) Start(ctx context.Context, r *taskAPI.StartRequest) (*taskAPI.StartResponse, error) {

and

containerd/runtime/v2/runc/task/service.go

Line 526 in 4b7145c

func (s *service) checkProcesses(e runcC.Exit) {

by removing

containerd/runtime/v2/runc/task/service.go

Line 104 in 4b7145c

eventSendMu sync.Mutex

and instead, individually keep track of starting processes which allows checkProcesses() to optimistically check unblocked/already started processes, and only if necessary separately queue events for blocked processes to be retried on a separate goroutine.

This model significantly reduces contention around this area, allowing

containerd/runtime/v2/runc/task/service.go

Line 526 in 4b7145c

func (s *service) checkProcesses(e runcC.Exit) {

to process the incoming events which decongests the line down to

containerd/sys/reaper/reaper_unix.go

Line 64 in 4b7145c

func Reap() error {

Benchmarks

(using the repro steps from #8557 (comment))

baseline:

0.11user 0.03system 0:01.00elapsed
0.09user 0.06system 0:25.40elapsed
0.08user 0.02system 0:17.45elapsed

with this change:

0.11user 0.08system 0:00.91elapsed
0.08user 0.02system 0:00.72elapsed
0.09user 0.01system 0:00.65elapsed

With some instrumenting, we can also confirm that the intended effect is achieved since the notify goroutine:

baseline:

notify goroutine took 10.167µs
notify goroutine took 6.542µs
notify goroutine took 16.5µs
notify goroutine took 4.417µs
notify goroutine took 8.063819576s
notify goroutine took 2.060341814s
notify goroutine took 5.061881112s
notify goroutine took 7.063627129s
notify goroutine took 1.060047492s
notify goroutine took 4.061607998s
notify goroutine took 6.063175599s
notify goroutine took 9.064799771s

with this change:

notify goroutine took 110.280639ms
notify goroutine took 173.13788ms
notify goroutine took 142.423614ms
notify goroutine took 25.899222ms
notify goroutine took 4.356252ms
notify goroutine took 3.834µs
notify goroutine took 2.833µs
notify goroutine took 2.5µs
notify goroutine took 64.833µs
notify goroutine took 2.197334ms
notify goroutine took 78.833µs
notify goroutine took 59.167µs
notify goroutine took 98.583µs
notify goroutine took 47.042µs
notify goroutine took 42.958µs
notify goroutine took 48.916µs
notify goroutine took 69.125µs
notify goroutine took 57.291µs
notify goroutine took 42.333µs
notify goroutine took 33.417µs
notify goroutine took 72.39987ms

callout: I believe this solution preserves the intended behaviour re: ensuring start events are sent before exit events (however well this worked before), but PTAL/discuss/propose alternate ways of handling the issue.

(bonus) cute animal

k8s-ci-robot · 2023-05-30T14:12:10Z

Hi @laurazard. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

laurazard · 2023-05-30T14:23:52Z

cc @corhere @cpuguy83 @dmcgowan @thaJeztah

runtime/v2/runc/task/service.go

laurazard · 2023-05-30T23:56:49Z

Edit: I was running some more tests and this doesn't completely solve the issue – every now and then, checkProcesses still takes an inordinate amount of time, in the following call:

containerd/runtime/v2/runc/task/service.go

Line 536 in c7b9a95

if p.Pid() != e.Pid {

which locks everything else out. I've got some more changes to address that which I'll include in a separate commit (tomorrow, I need sleep 😓) but for right now I'll draft this PR.

Update

Above write up is correct, but misses another cause of delays in

containerd/runtime/v2/runc/task/service.go

Line 526 in 4b7145c

func (s *service) checkProcesses(e runcC.Exit) {

Specifically,

containerd/runtime/v2/runc/task/service.go

Line 536 in 4b7145c

if p.Pid() != e.Pid {

sometimes takes multiple seconds to return due to the fact that it locks on

containerd/pkg/process/utils.go

Line 52 in 4b7145c

s.Lock()

which contends with

containerd/pkg/process/exec.go

Line 173 in 4b7145c

func (e *execProcess) start(ctx context.Context) (err error) {

(which is the desired behaviour, as the comment on that function shows:

func (e *execProcess) start(ctx context.Context) (err error) {
	// The reaper may receive exit signal right after
	// the container is started, before the e.pid is updated.
	// In that case, we want to block the signal handler to
	// access e.pid until it is updated.
	e.pid.Lock()
	defer e.pid.Unlock()

This "preserving causality" business is annoying 😅

Alright, I've amended the commit to address this contention as well.

laurazard · 2023-05-31T16:09:22Z

/cc @thaJeztah @corhere @cpuguy83 @dmcgowan

runtime/v2/runc/task/service.go

dmcgowan · 2023-05-31T18:48:21Z

runtime/v2/runc/task/service.go

+		s.sendProcessEvent(e, p, c)
+	}
+	for p, c := range missing {
+		go tryMissing(e, p, c)


The event channel is closed on shutdown and shutdown is protected by the same s.mu lock. Is completing this event outside of the lock an important part of reducing the contention issue? If not, its probably safest just to ensure these complete before returning.

The most relevant change to resolving the contention issue is not blocking checkProcesses on processes that haven't started yet – so ensuring these complete before returning isn't a great solution and reintroduces a big part of the issue, but we can move around where we lock on s.mu to make sure we don't send events outside of it.

I've made some changes, let me know if that LGTY @dmcgowan :)

The channel close/send behavior is pretty annoying in Go as it necessitates additional locking and status checking to avoid a possible panic. It might warrant an explicit shutdown state variable and mutex for send/close.

Address performance issues caused by lock contention in `runtime/v2/runc/task/service.go` due to `checkProcesses()` and `Start()` both locking on `s.eventSendMU` (noticeable when starting a lot of tasks concurrently on the same container), we instead selectively lock on a per-process basis - preserving start-exit event order. Also resolves issues related to `checkProcesses()` performance due to locking while fetching the PID for processes still starting by optimistically checking already started processes, and only if needed, queue a separate routine to wait for the starting process. Signed-off-by: Laura Brehm <[email protected]>

corhere

I spy a race condition.

corhere · 2023-05-31T19:03:38Z

runtime/v2/runc/task/service.go

+			if _, starting := s.processEventsLocker.Load(p.ID()); !starting {
+				break
 			}
+			time.Sleep(50 * time.Millisecond)


We can do better than polling. Store a channel in s.processEventsLocker and close it when deleting the entry from the map. Then this function can receive on the channel to wait.

if startedCh, ok := s.processEventsLocker.Load(p.ID()); ok { <-startedCh.(chan struct{}) }

corhere · 2023-05-31T19:07:48Z

runtime/v2/runc/task/service.go

-		containers: make(map[string]*runc.Container),
+		context:             ctx,
+		events:              make(chan interface{}, 128),
+		processEventsLocker: sync.Map{},


Suggested change

processEventsLocker: sync.Map{},

You don't need to explicitly initialize a field to its zero value.

This used to be a map[... 😅 didn't catch that, thanks :)

corhere · 2023-05-31T19:28:37Z

runtime/v2/runc/task/service.go

-		if !container.HasPid(e.Pid) {
-			continue
-		}
+	containers := s.containers


Maps are reference types, similar to slices. Copying a map by value only copies the reference to the underlying mutable data structure so this does not make it safe to access containers without holding s.mu. You would have to deep-copy the map to make this work.

corhere · 2023-05-31T19:31:56Z

runtime/v2/runc/task/service.go

+	lockKey := r.ExecID
+	if lockKey == "" {
+		lockKey = r.ID
+	}


It looks like r.ExecID only needs to be unique for a given r.ID so I'm pretty sure the map key would need to be the tuple (r.ID, r.ExecID).

// Naming things is hard. Try to think of a better name than this. type processEventsKey struct { ID, ExecID string } s.processEventsLocker.Store(processEventsKey{ID: r.ID, ExecID: r.ExecID}, ...)

Alternatively, have a separate map for each container, e.g. by moving it to the runc.Container struct?

thaJeztah · 2023-06-10T10:40:19Z

^^ superseded by runtime/v2/runc: handle early exits w/o big locks #8617

k8s-ci-robot added the needs-ok-to-test label May 30, 2023

laurazard force-pushed the fix-many-exec branch from daa47c7 to 19d4c15 Compare May 30, 2023 14:22

laurazard marked this pull request as ready for review May 30, 2023 14:22

corhere reviewed May 30, 2023

View reviewed changes

runtime/v2/runc/task/service.go Outdated Show resolved Hide resolved

laurazard marked this pull request as draft May 30, 2023 23:57

laurazard force-pushed the fix-many-exec branch from 19d4c15 to 30f98eb Compare May 31, 2023 15:25

laurazard marked this pull request as ready for review May 31, 2023 15:35

laurazard force-pushed the fix-many-exec branch from 30f98eb to dc0df78 Compare May 31, 2023 15:38

k8s-ci-robot requested review from corhere, cpuguy83, dmcgowan and thaJeztah May 31, 2023 16:09

dmcgowan reviewed May 31, 2023

View reviewed changes

laurazard force-pushed the fix-many-exec branch from dc0df78 to 1b1c1d7 Compare May 31, 2023 19:19

corhere requested changes May 31, 2023

View reviewed changes

corhere mentioned this pull request May 31, 2023

runtime/v2/runc: handle early exits w/o big locks #8617

Merged

laurazard closed this Jun 10, 2023

runtime/runc: selectively lock events to reduce contention issues #8598

runtime/runc: selectively lock events to reduce contention issues #8598

Uh oh!

Conversation

laurazard commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

What I did

Benchmarks

(bonus) cute animal

Uh oh!

k8s-ci-robot commented May 30, 2023

Uh oh!

laurazard commented May 30, 2023

Uh oh!

Uh oh!

laurazard commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update

Uh oh!

laurazard commented May 31, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

corhere left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thaJeztah commented Jun 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

laurazard commented May 30, 2023 •

edited

Loading

laurazard commented May 30, 2023 •

edited

Loading