Description
If we run Docker in Docker with the outer container using the sysbox container runtime, then in the inner container if we use --net=host it fails like:
$ docker run -it --rm --net=host alpine:latest whoami
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
8a49fdb3b6a5: Pull complete
Digest: sha256:02bb6f428431fbc2809c5d1b41eab5a68350194fb508869a33cb1af4444c9b11
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: failed to create default sandbox: permission denied.
git-bisecting, I've determined that 0246332 introduces the bug.
I think the relevant code is
func createNetworkNamespace(path string, osCreate bool) error {
if err := createNamespaceFile(path); err != nil {
return err
}
do := func() error {
return mountNetworkNamespace(fmt.Sprintf("/proc/self/task/%d/ns/net", unix.Gettid()), path)
}
if osCreate {
return unshare.Go(unix.CLONE_NEWNET, do, nil)
}
return do()
}
When you use --net=host, osCreate=false and so do() is called without unshare. I think what's happening is that there is a subtle bug where unix.Gettid() is called to fill out the /proc/ path, but without unshare the actual system call happens on a different thread. On a "regular" Docker install, with the daemon running in the root namespace, Linux lets this mount system call succeed, but inside sysbox, mounting the network namespace of one thread from a different thread is forbidden.
When I change the mount call to use /proc/thread-self/ns/net instead, things work fine. PR coming shortly.
Reproduce
- Install Docker and sysbox
- Run a generic Linux inner container, e.g. Ubuntu, with the sysbox container runtime
- In the inner container, install Docker v24
- In the inner container,
docker run -it --rm --net=host alpine:latest whoami
Expected behavior
$ docker run -it --rm --net=host alpine:latest whoami
root
docker version
Client: Docker Engine - Community
Version: 24.0.2
API version: 1.43
Go version: go1.20.4
Git commit: cb74dfc
Built: Thu May 25 21:52:22 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.2
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 659604f
Built: Thu May 25 21:52:22 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 24.0.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.10.5
Path: /usr/libexec/docker/cli-plugins/docker-buildx
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 31
Server Version: 24.0.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc version: v1.1.7-0-g860f061
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 5.4.0-1095-gke
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.81GiB
Name: dogfood
ID: b35b4709-efd9-43f2-804f-5c95ddc10fb9
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://mirror.gcr.io/
Live Restore Enabled: false
WARNING: No swap limit support
Additional Info
docker version and docker info are for the inner Docker daemon
Description
If we run Docker in Docker with the outer container using the sysbox container runtime, then in the inner container if we use
--net=hostit fails like:git-bisecting, I've determined that 0246332 introduces the bug.
I think the relevant code is
When you use--net=host, osCreate=false and sodo()is called withoutunshare. I think what's happening is that there is a subtle bug whereunix.Gettid()is called to fill out the/proc/path, but withoutunsharethe actual system call happens on a different thread. On a "regular" Docker install, with the daemon running in the root namespace, Linux lets this mount system call succeed, but inside sysbox, mounting the network namespace of one thread from a different thread is forbidden.When I change the mount call to use
/proc/thread-self/ns/netinstead, things work fine.PR coming shortly.Reproduce
docker run -it --rm --net=host alpine:latest whoamiExpected behavior
docker version
Client: Docker Engine - Community Version: 24.0.2 API version: 1.43 Go version: go1.20.4 Git commit: cb74dfc Built: Thu May 25 21:52:22 2023 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 24.0.2 API version: 1.43 (minimum version 1.12) Go version: go1.20.4 Git commit: 659604f Built: Thu May 25 21:52:22 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.21 GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8 runc: Version: 1.1.7 GitCommit: v1.1.7-0-g860f061 docker-init: Version: 0.19.0 GitCommit: de40ad0docker info
Additional Info
docker version and docker info are for the inner Docker daemon