Description
This is related to #10634 but I now have more information what is going wrong.
I have a large internal docker-compose project with a lot of dependencies.
I run docker build and then press ctrl+c.
The terminal becomes broken (as described here).
The docker-build plugin process does not terminate. It becomes a parent of PID 1.
It does not react to any TERM signals. It seems that it will keep running forever and never terminate.
I compiled compose with debug simple and the stacktraces show that the process
always hangs while resolving dependencies, related to nodeCh.
I observed that it gets stuck in 2 different places, sometimes variant 1,
sometimes 2.
Variant 1: stuck when receiving from nodeCh
No graphTraversal.run go routines are runnning.
1 goroutine executes graphTraversal.visit, it hangs at for node := range nodeCh {.
It will hang there forever because there are no go-routines that will send something to the channel.
My wild guess is:
Some run functions skipped sending something to nodeCH because of:
|
if len(t.filterAdjacentByStatusFn(graph, node.Key, t.adjacentServiceStatusToSkip)) != 0 { |
|
continue |
|
} |
|
|
matched.
It matches because cancelling the ctx, cancelled the start of children services.
No result for those services is send to
nodeCH.
The
expect counter in
visit() then can not reach 0,
nodeCh does not get closed and
visit() gets stuck in the
for node := range nodeCh { loop.
In a run were it happened
expect had the value
99.
Stacktraces of an occurence occurrence:
Details
(dlv) goroutines
Goroutine 1 - User: /usr/lib/go/src/runtime/sema.go:62 sync.runtime_Semacquire (0x472ae7) [semacquire 32698615744479]
Goroutine 2 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [force gc (idle) 32698285413127]
Goroutine 3 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [GC sweep wait]
Goroutine 4 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [GC scavenge wait]
Goroutine 5 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 32701968645867]
Goroutine 6 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [select 32698303227839]
Goroutine 7 - User: ./pkg/progress/tty.go:54 github.com/docker/compose/v2/pkg/progress.(*ttyWriter).Start (0x181282d) [select]
Goroutine 8 - User: /usr/lib/go/src/runtime/sema.go:62 sync.runtime_Semacquire (0x472ae7) [semacquire 32698667539124]
Goroutine 9 - User: /home/fho/go/pkg/mod/github.com/moby/[email protected]/util/progress/progressui/display.go:58 github.com/moby/buildkit/util/progress/progressui.DisplaySolveStatus (0x140464a) [select]
Goroutine 11 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32698667539124]
Goroutine 12 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32698667539124]
Goroutine 18 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [finalizer wait 32699818810804]
Goroutine 21 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 32701968645867]
Goroutine 22 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call]
Goroutine 23 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 32701968645867]
Goroutine 30 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32698667539124]
Goroutine 31 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32698667539124]
Goroutine 32 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait]
Goroutine 33 - User: /home/fho/go/pkg/mod/go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:278 go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).processQueue (0x1376865) [select]
Goroutine 34 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 32701968645867]
Goroutine 35 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call]
Goroutine 36 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call]
Goroutine 37 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 32701968645867]
Goroutine 38 - User: /usr/lib/go/src/runtime/sigqueue.go:152 os/signal.signal_recv (0x47336f) (thread 98567)
Goroutine 41 - User: ./pkg/compose/dependencies.go:145 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).visit.func1 (0x273ef07) [chan receive]
Goroutine 58 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32698667539124]
Goroutine 59 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32698667539124]
Goroutine 60 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait]
Goroutine 61 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 32698961561798]
Goroutine 75 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32698667539124]
Goroutine 76 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32698667539124]
Goroutine 77 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait]
Goroutine 78 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select]
Goroutine 79 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select]
Goroutine 85 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32698667539124]
Goroutine 86 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32698667539124]
Goroutine 87 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait]
Goroutine 88 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait]
Goroutine 89 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select]
Goroutine 90 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select]
Goroutine 91 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select]
Goroutine 92 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select]
Goroutine 98 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32698677784949]
Goroutine 99 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32698667539124]
Goroutine 100 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 32698961561798]
Goroutine 101 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 32698961561798]
Goroutine 104 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 32698677784949]
Goroutine 113 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select]
Goroutine 117 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32698667539124]
Goroutine 118 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32698667539124]
Goroutine 119 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait]
Goroutine 120 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 32698961561798]
Goroutine 121 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select]
Goroutine 122 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select]
Goroutine 123 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 32698961561798]
Goroutine 133 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32698667539124]
Goroutine 134 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32698667539124]
Goroutine 135 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait]
Goroutine 136 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select]
Goroutine 137 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select]
Goroutine 146 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 32698677784949]
Goroutine 147 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select]
Goroutine 148 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select]
Goroutine 162 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select]
Goroutine 625 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 32701968645867]
Goroutine 770 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 32701936210863]
Goroutine 5471 - User: /usr/lib/go/src/runtime/sema.go:62 sync.runtime_Semacquire (0x472ae7) [semacquire]
Goroutine 5506 - User: /usr/lib/go/src/runtime/sema.go:62 sync.runtime_Semacquire (0x472ae7) [semacquire 32700863284378]
Goroutine 5512 - User: /home/fho/go/pkg/mod/github.com/tonistiigi/[email protected]/send.go:128 github.com/tonistiigi/fsutil.(*sender).queue (0x13ceb7a) [chan send]
[69 goroutines]
Variant 2: stuck when sending to nodeCh
Multiple goroutines are executing graphTraversal.run, all of those got stuck
when trying to send to nodeCH.
No go-routine is running that executes graphTraversal.visit.
Stacktraces:
Details
(dlv) goroutines
Goroutine 1 - User: /usr/lib/go/src/runtime/sema.go:62 sync.runtime_Semacquire (0x472ae7) [semacquire 31142446935515]
Goroutine 2 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [force gc (idle) 31432637820791]
Goroutine 3 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [GC sweep wait]
Goroutine 4 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [GC scavenge wait]
Goroutine 5 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [finalizer wait 31432637820791]
Goroutine 8 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 31143389232291]
Goroutine 9 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 31143389232291]
Goroutine 10 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 31143201136170]
Goroutine 11 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [select 31142155736244]
Goroutine 12 - User: ./pkg/progress/tty.go:54 github.com/docker/compose/v2/pkg/progress.(*ttyWriter).Start (0x181282d) [select]
Goroutine 13 - User: /usr/lib/go/src/runtime/sema.go:62 sync.runtime_Semacquire (0x472ae7) [semacquire 31142500904315]
Goroutine 14 - User: /home/fho/go/pkg/mod/github.com/moby/[email protected]/util/progress/progressui/display.go:58 github.com/moby/buildkit/util/progress/progressui.DisplaySolveStatus (0x140464a) [select]
Goroutine 15 - User: /home/fho/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:68 golang.org/x/sync/errgroup.(*Group).Go (0xe02f05) [chan send 31142500904315]
Goroutine 16 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 18 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 31143389232291]
Goroutine 19 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call]
Goroutine 20 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 31143389232291]
Goroutine 21 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 22 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 26 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 31142500904315]
Goroutine 27 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 31142500904315]
Goroutine 28 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 31142617292951]
Goroutine 34 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call 31143389232291]
Goroutine 35 - User: /usr/lib/go/src/runtime/proc.go:382 runtime.gopark (0x44431d) [debug call]
Goroutine 36 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 31143389232291]
Goroutine 37 - User: /home/fho/go/pkg/mod/go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:278 go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).processQueue (0x1376865) [select]
Goroutine 38 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/balancer_conn_wrappers.go:112 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x10cf039) [select 31142500904315]
Goroutine 50 - User: /usr/lib/go/src/runtime/sigqueue.go:152 os/signal.signal_recv (0x47336f) (thread 94728)
Goroutine 68 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 31143389232291]
Goroutine 69 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select 31143389232291]
Goroutine 70 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select 31143389232291]
Goroutine 82 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 83 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31143389232291]
Goroutine 84 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 85 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 87 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31143389232291]
Goroutine 88 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 89 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 90 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 91 - User: ./pkg/compose/dependencies.go:187 github.com/docker/compose/v2/pkg/compose.(*graphTraversal).run.func1 (0x273f536) [chan send 31142500904315]
Goroutine 95 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 31142500904315]
Goroutine 96 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 31142500904315]
Goroutine 100 - User: /usr/lib/go/src/net/fd_posix.go:55 net.(*netFD).Read (0x7774d9) [IO wait 31142500904315]
Goroutine 101 - User: /usr/lib/go/src/net/http/transport.go:2410 net/http.(*persistConn).writeLoop (0x96c20e) [select 31142500904315]
Goroutine 102 - User: /usr/lib/go/src/io/pipe.go:57 io.(*pipe).read (0x4d978d) [select 31143389232291]
Goroutine 103 - User: /home/fho/go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:408 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x1051b86) [select 31143389232291]
[46 goroutines]
Steps To Reproduce
Update, added on 5.6.23:
This is actually quite easy to reproduce and also happens with small docker compose projects:
compose.yml:
services:
s0:
build:
context: s0
depends_on:
- s1
s1:
build:
context: s0
s0/Dockerfile:
FROM busybox AS build
RUN sleep 20
- Run
docker compose build
- Press
ctrl+c after ~2-3 sec
I expect that docker compose terminates, latest after ~20sec, when the sleep in the Dockerfiles expired.
But it will never terminate.
Compose Version
Docker Compose version 2.18.1
Docker Environment
Client:
Version: 24.0.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: 0.10.5
Path: /usr/lib/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: 2.18.1
Path: /usr/lib/docker/cli-plugins/docker-compose
scan: Docker Scan (Docker Inc.)
Version: v0.1.0-280-gc7fa31d4c4
Path: /usr/lib/docker/cli-plugins/docker-scan
Server:
Containers: 24
Running: 1
Paused: 0
Stopped: 23
Images: 133
Server Version: 24.0.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 1677a17964311325ed1c31e2c0a3589ce6d5c30d.m
runc version:
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.3.5-arch1-1
Operating System: Arch Linux
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.49GiB
Name: ltop
ID: KG5D:KMFV:DN27:F7AC:JLAN:USYW:MEB7:TXZQ:LASN:WAKE:OPJD:SULT
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: sisubot
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Anything else?
- I can not reproduce it with Docker Compose version v2.17.3
- DOCKER_BUILDKIT is enabled
- It also gets stuck with
--parallel=1
My first idea for a fix was to check if the context got cancelled when receiving or sending from nodeCh.
This works, the issue does not happen anymore.
But this solution now seems to me like a workaround for a bug in another place.
That only senders or receivers for nodeCh are still running should not happen. Maybe this situation can also happen when the ctx is not cancelled.
You can find that change here: fho@677c5fb
Description
This is related to #10634 but I now have more information what is going wrong.
I have a large internal docker-compose project with a lot of dependencies.
I run
docker buildand then press ctrl+c.The terminal becomes broken (as described here).
The
docker-buildplugin process does not terminate. It becomes a parent of PID 1.It does not react to any TERM signals. It seems that it will keep running forever and never terminate.
I compiled compose with debug simple and the stacktraces show that the process
always hangs while resolving dependencies, related to
nodeCh.I observed that it gets stuck in 2 different places, sometimes variant 1,
sometimes 2.
Variant 1: stuck when receiving from
nodeChNo
graphTraversal.rungo routines are runnning.1 goroutine executes
graphTraversal.visit, it hangs atfor node := range nodeCh {.It will hang there forever because there are no go-routines that will send something to the channel.
My wild guess is:
Some run functions skipped sending something to
nodeCHbecause of:compose/pkg/compose/dependencies.go
Lines 168 to 171 in 7c3fe35
It matches because cancelling the ctx, cancelled the start of children services.
No result for those services is send to
nodeCH.The
expectcounter invisit()then can not reach 0,nodeChdoes not get closed andvisit()gets stuck in thefor node := range nodeCh {loop.In a run were it happened
expecthad the value99.Stacktraces of an occurence occurrence:
Details
Variant 2: stuck when sending to
nodeChMultiple goroutines are executing
graphTraversal.run, all of those got stuckwhen trying to send to
nodeCH.No go-routine is running that executes
graphTraversal.visit.Stacktraces:
Details
Steps To Reproduce
Update, added on 5.6.23:
This is actually quite easy to reproduce and also happens with small docker compose projects:
compose.yml:
s0/Dockerfile:
docker compose buildctrl+cafter ~2-3 secI expect that docker compose terminates, latest after ~20sec, when the sleep in the Dockerfiles expired.
But it will never terminate.
Compose Version
Docker Environment
Anything else?
--parallel=1My first idea for a fix was to check if the context got cancelled when receiving or sending from nodeCh.
This works, the issue does not happen anymore.
But this solution now seems to me like a workaround for a bug in another place.
That only senders or receivers for nodeCh are still running should not happen. Maybe this situation can also happen when the ctx is not cancelled.
You can find that change here: fho@677c5fb