Skip to content

Image remove blocked by a container lock #37072

@dnephin

Description

@dnephin

This issue occurs when trying to deploy new hosts using 18.03.1-ce. Previously workloads would run correctly using ~17.06.

Trying to image rm any image will block and never complete. There were 5 containers running on the host. Trying to inspect or rm two of them had the same behaviour (blocked and never complete).

Full output from SIGUSR1:
goroutine-stacks-2018-05-15T151422Z.log

There are a bunch of goroutines stuck on image remove. There is also one stuck on container remove that seems to be holding the lock:

1: select [728 minutes] [Created by http.(*Server).Serve @ server.go:2720]
    transport     transport.go:256                (*Stream).Header(*Stream(#502), #659, #264, #689)
    grpc          call.go:64                      recvResponse(Context(#721), dialOptions(#189), ClientTransport(#685), *callInfo(0x0), *Stream(0x0), interface{}(0x0), ...)
    grpc          call.go:279                     invoke(Context(#721), string(#642, len=40), interface{}(#651), interface{}(#654), *ClientConn(#150), CallOption(0x0), ...)
    containerd    grpc.go:18                      namespaceInterceptor.unary(Context(#636), string(#720, len=842350551064), interface{}(#642), interface{}(#651), *ClientConn(#654), UnaryInvoker(#693), ...)
    containerd    grpc.go:34                      unary)-fm(string(#720, len=842350551064), #642, 0x28, #651, #517, #654, #693, #150, #658, ...)
    grpc          call.go:141                     Invoke(Context(#720), string(#642, len=40), interface{}(#651), interface{}(#654), *ClientConn(#150), CallOption(0x0), ...)
    v1            tasks.pb.go:468                 (*tasksClient).Kill(*tasksClient(#429), Context(#720), *KillRequest(#517), CallOption(0x0), CallOption(0x0), CallOption(#396))
    containerd    task.go:181                     (*task).Kill(*task(#533), Context(#687), Signal(0x9), KillOpts(0x0), KillOpts(0x0))
    libcontainerd client_daemon.go:389            (*client).SignalProcess(*client(#132), Context(#687), string(#316, len=64), string(#635, len=4), 9, #305, #434)
    daemon        kill.go:179                     (*Daemon).kill(*Daemon(#33), *Container(#548), 9, 0x2, 0x2)
    daemon        kill.go:99                      (*Daemon).killWithSignal(*Daemon(#33), *Container(#548), 9, 0x0, 0x0)
    daemon        kill.go:169                     (*Daemon).killPossiblyDeadProcess(*Daemon(#33), *Container(#548), 9, #265, #615)
    daemon        kill.go:129                     (*Daemon).Kill(*Daemon(#33), *Container(#548), 0x0, 0x0)
    daemon        delete.go:93                    (*Daemon).cleanupContainer(*Daemon(#33), *Container(#548), bool(0x101), bool(0x0))
    daemon        delete.go:48                    (*Daemon).ContainerRm(*Daemon(#33), string(#567, len=64), *ContainerRmConfig(#237), 0x0, 0x0)
    container     container_routes.go:494         (*containerRouter).deleteContainers(*containerRouter(#208), Context(#721), ResponseWriter(#684), *Request(#315), map[string]string(#514), 0x5)
    container     container.go:68                 deleteContainers)-fm(*containerRouter(#721), #516, #684, #274, #315, #514, #721, #516)
    middleware    experimental.go:27              ExperimentalMiddleware.WrapHandler.func1(func(#721), #684, #274, #315, #514, #721, #516)
    middleware    version.go:62                   VersionMiddleware.WrapHandler.func1(func(#721), #684, #274, #315, #514, #606, 0x40)
    authorization middleware.go:59                (*Middleware).WrapHandler.func1(*Middleware(#721), func(#515), #274, #315, #514, #721, #515)
    server        server.go:137                   (*Server).makeHTTPHandler.func1(*Server(#684), APIFunc(#274))
    http          server.go:1918                  HandlerFunc.ServeHTTP(ReadCloser(#203), func(#274))
    mux           mux.go:103                      (*Router).ServeHTTP(*Router(#42), ResponseWriter(#684), *Request(#315))
    server        router_swapper.go:29            (*routerSwapper).ServeHTTP(*routerSwapper(#238), ResponseWriter(#684), *Request(#315))
    http          server.go:2619                  serverHandler.ServeHTTP(*Server(#43), #684, #274, #315)
    http          server.go:1801                  (*conn).serve(#508, #686, #466)
1: semacquire [16 minutes] [Created by http.(*Server).Serve @ server.go:2720]
    sync          sema.go:71                      runtime_SemacquireMutex(*uint32(#491), bool(0x0))
    sync          mutex.go:134                    (*Mutex).Lock(*Mutex(#490))
    container     state.go:241                    (*State).IsRunning(*State(#490), 0x47)
    daemon        image_delete.go:378             (*Daemon).checkImageDeleteConflict.func1(*Daemon(#548), ID(#579))
    container     memory_store.go:62              (*memoryStore).First(*memoryStore(#188), StoreFilter(#547))
    daemon        image_delete.go:380             (*Daemon).checkImageDeleteConflict(*Daemon(#33), ID(#538), conflictType(0xf))
    daemon        image_delete.go:313             (*Daemon).imageDeleteHelper(*Daemon(#33), ID(#538), *<unknown>(#523), bool(0x10100), bool(0x1))
    daemon        image_delete.go:178             (*Daemon).ImageDelete(*Daemon(#33), string(#504, len=11), bool(0x100), bool(#616), #644, #554)
    image         image_routes.go:199             (*imageRouter).deleteImages(*imageRouter(#227), Context(#721), ResponseWriter(#684), *Request(#79), map[string]string(#476), 0x5)
    image         image.go:42                     deleteImages)-fm(*imageRouter(#721), #478, #684, #266, #79, #476, #721, #478)
    middleware    experimental.go:27              ExperimentalMiddleware.WrapHandler.func1(func(#721), #684, #266, #79, #476, #721, #478)
    middleware    version.go:62                   VersionMiddleware.WrapHandler.func1(func(#721), #684, #266, #79, #476, #700, 0x40)
    authorization middleware.go:59                (*Middleware).WrapHandler.func1(*Middleware(#721), func(#477), #266, #79, #476, #721, #477)
    server        server.go:137                   (*Server).makeHTTPHandler.func1(*Server(#684), APIFunc(#266))
    http          server.go:1918                  HandlerFunc.ServeHTTP(ReadCloser(#74), func(#266))
    mux           mux.go:103                      (*Router).ServeHTTP(*Router(#42), ResponseWriter(#684), *Request(#79))
    server        router_swapper.go:29            (*routerSwapper).ServeHTTP(*routerSwapper(#238), ResponseWriter(#684), *Request(#79))
    http          server.go:2619                  serverHandler.ServeHTTP(*Server(#43), #684, #266, #79)
    http          server.go:1801                  (*conn).serve(#503, #686, #568)

I looked around for related issues, but couldn't find anything recent.

$ docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:18:46 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:16:59 2018
  OS/Arch:      linux/amd64
  Experimental: false
$ docker info
Containers: 40
 Running: 34
 Paused: 0
 Stopped: 6
Images: 392
Server Version: 18.03.1-ce
Storage Driver: aufs
 Backing Filesystem: extfs
 Dirs: 4383
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Kernel Version: 4.4.0-124-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 58.97GiB

Metadata

Metadata

Labels

area/runtimeRuntimekind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.status/needs-attentionCalls for a collective discussion during a review sessionversion/18.03

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions