tl;dr is that docker run --privileged tries to assign all known caps, while dockerd itself might not have all caps already.
Background: Talos in new version 0.13 started dropping two capabilities (kexec + module loading) from all processes but PID 1. Talos itself doesn't use dockerd, but if I launch privileged pod on Kubernetes with docker:20.10-dind image, I can't run any privileged container inside:
/ # docker run -it --rm --privileged alpine
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: apply caps: operation not permitted: unknown.
The problem starts in
|
p.Capabilities.Bounding = caps.GetAllCapabilities() |
Which essentially uses the list of capabilities built here:
|
func init() { |
|
last := capability.CAP_LAST_CAP |
|
rawCaps := capability.List() |
|
allCaps = make([]string, min(int(last+1), len(rawCaps))) |
|
capabilityList = make(map[string]*capability.Cap, len(rawCaps)) |
|
for i, c := range rawCaps { |
|
capName := "CAP_" + strings.ToUpper(c.String()) |
|
if c > last { |
|
capabilityList[capName] = nil |
|
continue |
|
} |
|
allCaps[i] = capName |
|
capabilityList[capName] = &c |
|
} |
|
} |
This list contains every capability present on the host, which might be not be true (as some capabilities might have already been dropped).
Containerd OCI does proper thing:
https://github.com/containerd/containerd/blob/d193dc2b8afb1467255cea5326e9807514f94c0f/pkg/cap/cap_linux.go#L123-L136
I'm happy to send a PR, but what is the best way to solve this in the docker codebase?
tl;dr is that
docker run --privilegedtries to assign all known caps, whiledockerditself might not have all caps already.Background: Talos in new version 0.13 started dropping two capabilities (kexec + module loading) from all processes but PID 1. Talos itself doesn't use
dockerd, but if I launch privileged pod on Kubernetes withdocker:20.10-dindimage, I can't run any privileged container inside:The problem starts in
moby/daemon/exec_linux.go
Line 25 in 306fa44
Which essentially uses the list of capabilities built here:
moby/oci/caps/utils.go
Lines 23 to 37 in 306fa44
This list contains every capability present on the host, which might be not be true (as some capabilities might have already been dropped).
Containerd OCI does proper thing:
https://github.com/containerd/containerd/blob/d193dc2b8afb1467255cea5326e9807514f94c0f/pkg/cap/cap_linux.go#L123-L136
I'm happy to send a PR, but what is the best way to solve this in the docker codebase?