Conversation
api/types/gpu/gpu.go
Outdated
There was a problem hiding this comment.
Yes, it is pretty confusing ;)
|
There's already some work (in swarmkit) around "GenericResources". It's a bit more abstract because it's designed to be pretty loose as resources are defined by admins for a given node and the scheduler matches the resources defined on the node with resources defined in the service.... so maybe not necessarily the best fit. |
|
Yes, the But those are not major problems, we could work with the existing |
|
So the fundamental question here is:
I'm fine either ways. |
|
I think a GPU type works fine and is much simpler to deal with. |
f45a887 to
ad193a6
Compare
|
Updated my PR, we now have the following: I don't think we need more than that today. If we agree on the API, I can start to flesh out the next step: integration in |
cpuguy83
left a comment
There was a problem hiding this comment.
I'm not really sure what makes sense from the GPU side of things, but I'd like to make sure the API does not limit how we can use this later on (or to what the proposed docker CLI solution can handle).
Would make make sense to have the API be GPUs []gpu.Resource where a gpu.Resource is:
package gpu
type Resource {
Type string // e.g. 'nvidia'
ID string // GPU UUID... whatever the vendor defines
Options map[string]string
}Also this may not require it's own package since there's not really much to it.
Anyway as I said I'm not sure what makes sense from the vendor perspective here but this seems like it makes the API a bit more extensible and the complexity of it can be handled by the client.
api/types/container/host_config.go
Outdated
There was a problem hiding this comment.
This limits the API quite a bit.
What do you think about using a slice here and then we can validate as needed in code.
|
I have a concern with this approach: Some options only make sense if they are applied to all GPUs from the same vendor. For instance, for us, the following would be non-sensical: The My API above assumes that other vendors will be in the same situation, so I wanted to be able to enforce that at the API level. With this knowledge, if you agree to put this burden on the vendor implementation, then your approach is fine. |
|
@flx42 Thanks for the feedback! So when you say "all" of the GPU's, does that mean literally all GPU's on the system or all the selected GPUs? Do we know anyone from AMD that would be able to provide input here as well? |
All the selected GPUs for the Maybe @kad can chime in on the Intel side for OpenCL, maybe they could have a similar option for selecting/modifying the OpenCL ICD. |
|
@flx42 It seems like your structs work well (but still would prefer a slice over a map in hostconfig, and leave it up to the implementation to validate). Perhaps we can change |
|
Just to confirm, to avoid re-triggering the CI with a push, are you OK with the following? |
|
Is |
|
Well the existing moby/api/types/container/host_config.go Lines 398 to 399 in 25ec60d I think it makes sense to consider a GPU as a |
|
The |
|
Ah I see, yeah that's perfect. |
Signed-off-by: Felix Abecassis <[email protected]>
ad193a6 to
b673450
Compare
|
Made the changes, also added Does anyone disagree with the current approach? |
|
Assuming we are OK with this design, I will go ahead with the corresponding implementation in the CLI (`--gpus) and in dockerd (for calling into the containerd features). |
Codecov Report
@@ Coverage Diff @@
## master #37434 +/- ##
==========================================
+ Coverage 35.31% 35.78% +0.46%
==========================================
Files 616 617 +1
Lines 48851 51403 +2552
==========================================
+ Hits 17254 18396 +1142
- Misses 29225 30422 +1197
- Partials 2372 2585 +213 |
|
@yongtang looks that this one passes tests. PTAL |
|
/cc @cpuguy83 |
|
Mainly concerned about other GPU vendors chiming in, but no one has in all this time... moving to code review. |
CLI discussion: docker/cli#1200
First draft of what the API might look like. I don't believe we need something more complex than that.
Discussion points
Resources(plural) is all the GPUs from all vendor.Resourceis all the GPUs from a single vendor. Thoughts?GPUstruct later without breaking existing code.--gpus nvidia. However at this point it's not possible to get the actual list of GPU ids. We have two options: a) ifgpu.Resource.GPUsis empty, then select all GPUs OR b) ifgpu.Resource.GPUs[0].ID == "*"then select all GPUs.