Support nerdctl run --gpus#251
Conversation
|
👍 Can we support Compose as well? https://github.com/compose-spec/compose-spec/blob/master/deploy.md#devices Can be another PR. |
| - :whale: `--shm-size`: Size of `/dev/shm` | ||
|
|
||
| GPU flags: | ||
| - :whale: `--gpus`: GPU devices to add to the container ('all' to pass all GPUs). `nvidia-container-cli` is needed. |
There was a problem hiding this comment.
Can we have ./docs/gpu.md to explain all the options?
We should also clarify how to set up GPU for rootless. (Can be another PR)
631261e to
5e3075c
Compare
|
Added compose support and docs. |
| The following example exposes all available GPUs. | ||
|
|
||
| ``` | ||
| nerdctl run -it --rm --gpus all ubuntu:20.04 nvidia-smi |
There was a problem hiding this comment.
ubuntu:20.04 -> nvidia/cuda:9.0-base might be more useful?
| - NVIDIA Drivers | ||
| - Same requirement as when you use GPUs on Docker. For details, please refer to [the doc by NVIDIA](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#pre-requisites). | ||
| - `nvidia-container-cli` | ||
| - containerd relies on this CLI for setting up GPUs inside container. You can install this via [`libnvidia-container` package](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/arch-overview.html#libnvidia-container). |
There was a problem hiding this comment.
Did you try rootless (on cgroup v1)?
I guess it needs setting no-cgroups = true
moby/moby#38729 (comment)
There was a problem hiding this comment.
It doesn't work as of now.
We might need to patch github.com/containerd/containerd/contrib/nvidia for allowing to pass --no-cgroup option to nvidia-container-cli.
Containerd doesn't use nvidia-container-runtime (instead, it executes nvidia-container-cli directly) so we cannot use /etc/nvidia-container-runtime/config.toml for nerdctl.
There was a problem hiding this comment.
A very hacky workaround for this is to wrap nvidia-container-cli to forcefully specify --no-cgroups.
mkdir -p /opt/nvidia/bin
mv /usr/bin/nvidia-container-cli /opt/nvidia/bin/
cat <<'EOF' > /usr/bin/nvidia-container-cli
#!/bin/bash
/opt/nvidia/bin/nvidia-container-cli ${@:1:($#-1)} --no-cgroups ${@:$#}
EOF
There was a problem hiding this comment.
@AkihiroSuda containerd/containerd#5604 is merged.
Updated this PR to use --no-cgroup and now it works in rootless environment as well (without any additional configurations to /etc/nvidia-container-runtime/config.toml, etc.).
replace directive is needed in go.mod to forcefully point to the latest commit of containerd.
|
I'll release nedctl v0.9.0 after merging this. |
Signed-off-by: Kohei Tokunaga <[email protected]>
| if dev.Count != 0 { | ||
| e = append(e, fmt.Sprintf("count=%d", dev.Count)) | ||
| } | ||
|
|
There was a problem hiding this comment.
count and device_ids are mutually exclusive. we should define one field at a time. is it configured somewhere ?
There was a problem hiding this comment.
@AkihiroSuda I will clear this ticket tonight #239 . It will be good to have it in 0.9 :) |
Signed-off-by: Kohei Tokunaga <[email protected]>
| gotest.tools/v3 v3.0.3 | ||
| ) | ||
|
|
||
| replace github.com/containerd/containerd => github.com/containerd/containerd v1.5.1-0.20210614183500-0a3a77bc4453 |
There was a problem hiding this comment.
Without replace, go mod tidy wants to point to v1.5.2.
Fixes: #248
This PR adds
--gpusoption tonerdctl runbased on containerd's GPU support (github.com/containerd/containerd/contrib/nvidiaby containerd/containerd#2330).For compose (https://github.com/compose-spec/compose-spec/blob/master/deploy.md#devices):
nvidia-container-cliis needed.