-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CRI: add support for CDI device injection #6654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @bart0sh. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8dd2e31 to
9b8ad75
Compare
|
/ok-to-test |
|
Build succeeded.
|
container_opts.go
Outdated
| "errors" | ||
| "fmt" | ||
|
|
||
| "github.com/container-orchestrated-devices/container-device-interface/pkg/cdi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the current spec version is 0.3.0. Does the git tag follow the spec version? How do you intend on handling upgrades and backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the current spec version follows tag and set to 0.3.0 and tagged: https://github.com/container-orchestrated-devices/container-device-interface/blob/9dc97167c538437ff6076f9d087516f8c8b3f5d4/specs-go/config.go#L6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you make a bug fix that is also a new spec? Also since this is an issue with the current CNI setup, what are your thoughts on upgrades and backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikeZappa87 so far it was the case. As CDI API is in alpha state we have been updating the spec and the code for any bug fixes and major API updates. We're going to do that until we reach first major release. After that we may consider using different API paths for incompatible API versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even if the code/git and the Spec versions are in sync now, going forward they will almost inevitably diverge. IMO the package version and the Spec version, while not completely orthogonal, they are not supposed to be updated in tandem. Instead, both should follow semantic versioning on their own. I think this implies that bumping the Spec version will always result in bumping the git tag, but not the other way around. For instance, changing the externally visible package programmatic interfaces/abstractions would result in a git tag bump, but it would leave the Spec version alone.
As for handling the evolution of the Spec, I think our default goal should be that when we change the Spec version, we provide transparent conversion (while loading the Spec) from a (reasonable window of) previous version(s) to the latest one, whenever this is possible. And when we design/decide Spec changes, we should keep this in mind and aim for such a possibility to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MikeZappa87 we discussed this in our WG meeting yesterday and just wanted to check in.
I think the short answer is that we don't have versioning quite figured out at this stage. For the time being we are using the following approach:
- API changes / bug fixes will result in a patch-version bump. This will only be applied to the git TAG and not affect the spec version.
- Spec changes will result in a minor-version bump. Here,
v0.3.0is our first tagged release. There will be a git TAG associated with each spec version bump.
I don't think that we have hard guarantees on backward compatibility at this stage which we feel is reflected by a non-v1 spec version. This is not to say that we will be making breaking changes to the spec at every version bump, but rather that we cannot commit to not having to make breaking changes -- especially now that we are seeing adoption of CDI in podman, cri-o, and containerd.
I see the following immediate action items from our side:
- Document the versioning strategy in our repository's README (Add note on versioning policy to README / SPEC cncf-tags/container-device-interface#57)
- Call out our position on spec / API stability and the possibility of breaking changes in our repository's README (Add note on spec / API evolution to our README cncf-tags/container-device-interface#58)
We will also consider whether separating the spec and API makes sense at this stage and reevaluate this periodically.
If you have any insights on how to handle spec versioning going forward or examples of projects that you feel could be used as good examples, please reach out to us.
With the above points considered, do you see this as a blocker for getting this PR merged (given that this functionality is gated by a feature flag)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that a GitHub issue is created for this? Also, a TODO added to the code with a link to that Github issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created cncf-tags/container-device-interface#60 to track this on our end. Please let me know if there is insufficient context there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that works.. we might need a longer term tracking issue for changes expected in containerd over time up to some desired golden state.. :-) and that could link back over to the cod/cod issues & prs
|
Build succeeded.
|
container_opts.go
Outdated
| registry := cdi.GetRegistry(cdi.WithSpecDirs(cdiSpecDirs...)) | ||
| if err = registry.Refresh(); err != nil { | ||
| // We don't consider registry refresh failure a fatal error. | ||
| // For instance, a dynamically generated invalid CDI Spec file for | ||
| // any particular vendor shouldn't prevent injection of devices of | ||
| // different vendors. CDI itself knows better and it will fail the | ||
| // injection if necessary. | ||
| log.G(ctx).Warnf("CDI registry refresh failed: %v", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but it seems GetRegistry returns a singleton object and Refresh takes a lock on the object.
- https://github.com/container-orchestrated-devices/container-device-interface/blob/46367ec063fda9da931d050b308ccd768e824364/pkg/cdi/registry.go
- https://github.com/container-orchestrated-devices/container-device-interface/blob/ee12044b3895afdff3841fee894e3edf01d7fc26/pkg/cdi/cache.go
Wouldn't it introduce a lock contention here? Do we need to refresh the registry every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but it seems GetRegistry returns a singleton object and Refresh takes a lock on the object.
...
Wouldn't it introduce a lock contention here? Do we need to refresh the registry every time?
Yes, it is a singleton object which is locked during both Refresh() and CDI device resolution. We don't need to refresh the registry (at least not the brute force way we do now), as long as we make sure in some way that it's up to date. We haven't added any logic for that yet, but we'll get down to it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kzys Once we have the implementation for automatic refresh of the CDI registry merged, we won't need to explicitly trigger a manual refresh here.
|
Is CDI Linux-only? Seems some of CDI dependencies are not build-able on Windows. |
yes, it's Linux-only currently. I'll update the PR to fix build breaks on non-Linux platforms. |
done |
|
Build succeeded.
|
|
@mikebrow can you re-trigger Windows integration test? I'm not sure its failure caused by the PR changes as they only change Linux code. |
that's a known issue you'll need to rebase |
|
@mikebrow thanks. rebased. |
e2a6153 to
66cc0d7
Compare
Signed-off-by: Ed Bartosh <[email protected]>
Signed-off-by: Ed Bartosh <[email protected]>
Signed-off-by: Ed Bartosh <[email protected]>
|
Build succeeded.
|
mikebrow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
backport is for bugfix and security fix. This change is new feature and big one. I don't think it should be backported. |
This PR adds support for injecting CDI devices into containers.
CDI injection is already supported in Podman since v 3.20 and was recently implemented in CRI-O
This is also an essential part of work to implement Dynamic Resource Allocation in Kubernetes.