Skip to content

Add support for CDI devices to ctr run command#8525

Merged
kzys merged 1 commit intocontainerd:mainfrom
elezar:ctr-add-cdi-devices
May 31, 2023
Merged

Add support for CDI devices to ctr run command#8525
kzys merged 1 commit intocontainerd:mainfrom
elezar:ctr-add-cdi-devices

Conversation

@elezar
Copy link
Copy Markdown
Contributor

@elezar elezar commented May 17, 2023

This change adds support for requesting CDI devices to ctr command line.

This means that, assuming the relevant CDI specification exists, the following command can be used to run a container with access to the required device:

crt run --rm -ti --device=nvidia.com/gpu=all {{CONTAINER_ID}}

The requested devices are injected directly into the generated OCI runtime specification as ctr does not use the CRI.

This mirrors how device nodes are handled in that they are resolved locally and the edits are applied to the container config before calling the Create API.

Some minor cleanup / refactoring that makes this addition simpler was also done.

This builds on the work to add CDI support to the containerd server:

And replaces previous attempts here:

@k8s-ci-robot
Copy link
Copy Markdown

Hi @elezar. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@elezar elezar force-pushed the ctr-add-cdi-devices branch from e3d1eef to 0de9fcc Compare May 17, 2023 14:29
@elezar
Copy link
Copy Markdown
Contributor Author

elezar commented May 17, 2023

/cc @bart0sh

@k8s-ci-robot
Copy link
Copy Markdown

@elezar: GitHub didn't allow me to request PR reviews from the following users: bart0sh.

Note that only containerd members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @bart0sh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@elezar elezar marked this pull request as draft May 17, 2023 14:32
@elezar elezar force-pushed the ctr-add-cdi-devices branch 2 times, most recently from cd1db00 to 9b039f2 Compare May 17, 2023 17:57
if limit != 0 {
opts = append(opts, oci.WithMemoryLimit(limit))
}
var cdiDeviceIDs []string
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question to the reviewers: Does CTR read the containerd config, and if so, do we want to use the same feature flag to control this behaviour here. We may also, for example, want to configure the CDI registry with the configured CDI paths -- although this is not configurable for existing CLI-based implementations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general, it may be a good idea to instantiate an instance of the CDI registry in the CLI code path explicily instead of relying on the instantiation in the (now) OCI package using the defaults.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CTR read the containerd config

No

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. I will update the PR to instantiate the singleton CDI registry with auto-refresh disabled and also provide a mechanism to configure CDI spec folders if desired at a later stage.

@elezar elezar marked this pull request as ready for review May 22, 2023 13:44
@elezar
Copy link
Copy Markdown
Contributor Author

elezar commented May 30, 2023

@mikebrow @AkihiroSuda @dmcgowan could you please take a look.

@elezar
Copy link
Copy Markdown
Contributor Author

elezar commented May 31, 2023

/cc @mikebrow

@k8s-ci-robot k8s-ci-robot requested a review from mikebrow May 31, 2023 13:08
@elezar
Copy link
Copy Markdown
Contributor Author

elezar commented May 31, 2023

/cc @AkihiroSuda @dmcgowan

@AkihiroSuda
Copy link
Copy Markdown
Member

Please consider squashing commits

@elezar
Copy link
Copy Markdown
Contributor Author

elezar commented May 31, 2023

Please consider squashing commits

OK. Let me rebase and do so.

This change adds support for CDI devices to the ctr --device flag.
If a fully-qualified CDI device name is specified, this is injected
into the OCI specification before creating the container.

Note that the CDI specifications and the devices that they represent
are local and mirror the behaviour of linux devices in the ctr command.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the ctr-add-cdi-devices branch from 9b039f2 to d3887b2 Compare May 31, 2023 14:14
@elezar elezar requested a review from AkihiroSuda May 31, 2023 14:14
@elezar
Copy link
Copy Markdown
Contributor Author

elezar commented May 31, 2023

Please consider squashing commits

OK. Let me rebase and do so.

Commits have been squashed. Thanks for the review.

@kzys kzys merged commit a81f3fb into containerd:main May 31, 2023
@elezar elezar deleted the ctr-add-cdi-devices branch May 31, 2023 17:22
@deitch
Copy link
Copy Markdown
Contributor

deitch commented Sep 11, 2023

Did this make it into 1.7.2 (released Jun 3, whereas this was merged in May 31)? Or a later 1.7.x version (and if so, which one)?

@deitch
Copy link
Copy Markdown
Contributor

deitch commented Sep 11, 2023

Or maybe the flags changed? I tried following the example in the OP comment:

# ctr run --rm --device=nvidia.com/gpu=all docker.io/library/hello-world:latest helloworld /hello
ctr: error stating device path: stat nvidia.com/gpu=all: no such file or directory

Version:

# ctr version
Client:
  Version:  v1.7.5
  Revision: fe457eb99ac0e27b3ce638175ef8e68a7d2bc373
  Go version: go1.20.7

Server:
  Version:  v1.7.5
  Revision: fe457eb99ac0e27b3ce638175ef8e68a7d2bc373
  UUID: ee13c59e-d92f-45de-8169-24ae9e6865d5

I am moderately sure I configured it correctly in config.toml:

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    enable_cdi = true
    cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]

and dump shows it enabled:

# containerd config dump | grep cdi
    cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
    enable_cdi = true

Either way, can I request that this be part of the docs somewhere?

@elezar
Copy link
Copy Markdown
Contributor Author

elezar commented Sep 11, 2023

@deitch I'm happy to update the docs. Do you know which ones should be updated for this?

/cc @AkihiroSuda

@deitch
Copy link
Copy Markdown
Contributor

deitch commented Sep 11, 2023

@elezar I do not. I sometimes struggle with finding the right docs. Are there docs for the ctr CLI? Or maybe in the help itself?

I also imagine we should indicate minimum version(s) for using it.

@AkihiroSuda
Copy link
Copy Markdown
Member

Did this make it into 1.7.2 (released Jun 3, whereas this was merged in May 31)? Or a later 1.7.x version (and if so, which one)?

No, this is for v2.0.
Patch releases do not receive new features.

@deitch
Copy link
Copy Markdown
Contributor

deitch commented Sep 11, 2023

Ah that explains why I could not get it to work.

What are my options for now, then? Build off master branch?

When is 2.0 ETA?

@elezar
Copy link
Copy Markdown
Contributor Author

elezar commented May 14, 2024

@deitch have you been able to verify this functionality with a v2.0 release such as https://github.com/containerd/containerd/releases/tag/v2.0.0-rc.1

@deitch
Copy link
Copy Markdown
Contributor

deitch commented May 14, 2024

Nope, haven't been on my device with special devices (that sounds odd) in a while. But it should be possible to test it even without? Create any old CDI file, run it with the command, it should create the config. Will get some errors on starting, because the actual devices do not exist, but the process of creating the runc container config should work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants