Skip to content

nvidia-container-runtime doesn't work with rootless mode #38729

@AkihiroSuda

Description

@AkihiroSuda

Description

nvidia-container-runtime doesn't work with rootless mode, because cgroup is not supported in rootless mode (yet).

$ docker -H unix:///run/user/1001/docker.sock run -it --rm --runtime=nvidia nvidia/cuda
...
docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused \"process_linux.go:407: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=10.0 brand=tesla,driver>=384,driver<385 --pid=548 /home/suda/.local/share/docker/vfs/dir/122c480857379482f5317caaca55bbf5e43f84991accfa9f3ea586ed63f3fabd]\\\\nnvidia-container-cli: mount error: open failed: /sys/fs/cgroup/devices/user.slice/devices.allow: permission denied\\\\n\\\"\"": unknown.
ERRO[0142] error waiting for container: context canceled

I'm wondering we can use bind-mount instead of device cgroup, but haven't looked into deeper yet.

Steps to reproduce the issue:
See above

Describe the results you received:
Failed as above

Describe the results you expected:
Should work

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client: Docker Engine - Community
 Version:           0.0.0-dev
 API version:       1.40
 Go version:        go1.11.5
 Git commit:
 Built:             Wed Feb  6 02:26:40 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          0.0.0-dev
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.11.5
  Git commit:       273aef0a90
  Built:            Wed Feb  6 02:26:57 2019
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.2
  GitCommit:        9754871865f7fe2f4e74d43e2fc7ccd237edcbce
 runc:
  Version:          1.0.0-rc6+dev
  GitCommit:        96ec2177ae841256168fcf76954f7177af9446eb
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
$ dpkg-query -s nvidia-container-runtime
Package: nvidia-container-runtime
Status: install ok installed
Priority: optional
Section: utils
Installed-Size: 7461
Maintainer: NVIDIA CORPORATION <[email protected]>
Architecture: amd64
Version: 2.0.0+docker18.09.2-1
Depends: libc6 (>= 2.14), libseccomp2 (>= 2.3.0), nvidia-container-runtime-hook (<< 2.0.0)
Description: NVIDIA container runtime
 Provides a modified version of runc allowing users to run GPU enabled
 containers.
Homepage: https://github.com/NVIDIA/nvidia-container-runtime/wiki

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 0.0.0-dev
 Storage Driver: vfs
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
 runc version: 96ec2177ae841256168fcf76954f7177af9446eb
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
  rootless
 Kernel Version: 4.9.0-8-amd64
 Operating System: Debian GNU/Linux 9 (stretch)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 7.309GiB
 Name: suda-gpu
 ID: 7O5L:MUNF:22VV:X3BE:CBR2:3NGG:LEES:LS75:AVCF:WQLH:CPA2:YAK2
 Docker Root Dir: /home/suda/.local/share/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No swap limit support
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.):

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions