Platform Filter for Service Scheduling#1981
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1981 +/- ##
==========================================
- Coverage 60.03% 59.87% -0.16%
==========================================
Files 119 119
Lines 19670 19684 +14
==========================================
- Hits 11808 11786 -22
- Misses 6513 6550 +37
+ Partials 1349 1348 -1 |
manager/scheduler/filter.go
Outdated
| // copy the platform information | ||
| for _, p := range c.SupportedPlatforms { | ||
| f.supportedPlatforms = append(f.supportedPlatforms, p) | ||
| } |
There was a problem hiding this comment.
I don't see anywhere that f.supportedPlatforms is cleared, so I think this slice will just grow every time a task is scheduled, and the filter won't work as expected.
Also note that you can do append(f.supportedPlatforms, c.SupportedPlatforms...) instead of using a loop.
But wouldn't it be even better to just do f.supportedPlatforms = c.SupportedPlatforms?
There was a problem hiding this comment.
@aaronlehmann updated to remove the for loop. With using f.supportedPlatforms = c.SupportedPlatforms, we don't need to reset it anymore, right?
The general approach seems sound. I don't think One thing that needs to be thought through before we move forward with this is how updates are handled. Basically, we need to define when the manager will retrieve the manifest and update this platform information. I think doing this on every update is a non-starter, because often updates are changing something totally unrelated to the image (i.e. With digest pinning, the logic is relatively simple: if there's no digest, we add one, but if there's already a digest, nothing needs to be done. I think that for platform support, we need to define a mechanism that avoids unnecessary round trips to the registry when we update an image but keep the image/digest the same. My thought was that |
75abfc8 to
c6145e1
Compare
|
Isn't |
manager/scheduler/filter.go
Outdated
| // check if the platform for the node is supported | ||
| nodePlatform := n.Description.Platform | ||
| for _, p := range f.supportedPlatforms { | ||
| if p.Architecture == nodePlatform.Architecture && p.OS == nodePlatform.OS { |
There was a problem hiding this comment.
I'm not sure about the OS matching. From node, the OS is reported to daemon. The image spec on OS might be just "Linux". I don't know if we can compare them directly.
There was a problem hiding this comment.
@dongluochen in the examples I've seen, the common ones include Architecture = "amd64" and OS = "linux" or OS = "windows". This seemed like the common sense comparison, but perhaps @friism can add more insight here.
There was a problem hiding this comment.
The node info from docker info contains Operating System and OSType. I think currently we are collecting only one of them in NodeInfo. I think both are useful.
dchen@vm2:~$ docker info | egrep "Operating|OS"
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
There was a problem hiding this comment.
@dongluochen I can't remember why, but I think it was a conscious decision to not collect Operating System info reported by the node. Either way, I don't fully understand what constraints are sufficient to run multi-platform images, so I'm happy to make necessary changes.
There was a problem hiding this comment.
@stevvooe need your comments here. What is a sufficient comparison for Platforms?
I agree with this reasoning, but didn't know a better way. The reason I put it in
Good point. Can we agree that whenever we're contacting the registry to retrieve digest, we don't mind pulling platform information also? In cases where the The interesting case is when the digest is provided (say by the user). In this case, we do know at the manager level whether the digest of the new service spec is different from the existing one. If it isn't different, we don't need to do anything (this is probably the same case as above). If there is a change, then we want to contact the registry to get platform information. The only difference is that before, the user could update the service by providing a new digest, and skip the registry lookup, whereas now it would entail the registry lookup. To summarize, I don't see why we cannot avoid registry lookup when the service is updated but the image is not. |
I assume we want tasks to contain a list of supported platforms, so the scheduler can easily check against nodes. The
This seems reasonable. |
c6145e1 to
7d6722a
Compare
|
@aaronlehmann I've moved the In principle (assuming SwarmKit would support other workloads beyond containers in the future), do you think it's reasonable to attach platform restrictions to tasks? |
|
@aaronlehmann To have that in the Task contains more like runtime status data while the spec contains configuration and here I think this is configuration For instance, changing the supported list of platforms should trigger a rolling update IMO |
Hmm, I disagree.
Since |
|
Like - if we do put this into |
|
I guess I'm uncomfortable with the "engine manipulates the spec" model. It means the REST API is not really declarative, even if the gRPC API is. I think these differences will be confusing and become a maintenance burden. For example, what happens if the client decides to include a I don't have any great suggestions here. I think in my ideal world, the client would be doing these derivations, not the engine. I hoped to do it that way for digest pinning, but a different approach got chosen, and now I fear it's becoming precedent to continue adding similar hacks. |
|
This doesn't look right. We are coupling ourselves way too tightly to the image format (why are we even assuming that tasks have images or even a platform?). It seems like we should only be supporting a constraint that would dispatch a service to a particular platform. Making a platform determination when full context isn't available will make the system limiting when we try to schedule other kinds of workloads. Again, I hope we can step cautiously when adding multi-platform support. For this to work smoothly, sometimes it is more about what you don't do, than what you do do. In this case, this approach seems limiting. |
|
@stevvooe The engine has a convenience of filling out those fields based on the manifest |
That sounds more right. |
|
Personally, I agree with @aaronlehmann that the Given this, isn't There are also a couple of important questions to answer, based on moby/moby#31348 (comment):
|
7d6722a to
1339b2c
Compare
|
Changed to |
manager/scheduler/filter.go
Outdated
| if placement != nil { | ||
| if len(placement.Platforms) > 0 { | ||
| // copy the platform information | ||
| f.supportedPlatforms = placement.Platforms |
There was a problem hiding this comment.
Let's move this line outside above the if len(placement.Platforms) > 0. It shouldn't make a difference, because the filter should be disabled in this case, but it seems like a good idea to clear data that may be left over from a previous use of the filter.
There was a problem hiding this comment.
Ah, you're right. Fixed it.
|
Design LGTM |
1339b2c to
aa425f0
Compare
manager/scheduler/filter.go
Outdated
| if placement != nil { | ||
| f.supportedPlatforms = placement.Platforms | ||
| if len(placement.Platforms) > 0 { | ||
| // copy the platform information |
There was a problem hiding this comment.
comment needs to be moved too :)
3d244d0 to
8226e23
Compare
|
Let's add a test case for a missing architecture. Otherwise, LGTM |
8226e23 to
0cb059a
Compare
|
@aaronlehmann done, thanks! The tests seem flaky for some reason, do they need more time to run? |
|
No, it's a problem with your test. |
0cb059a to
6da8089
Compare
|
@aaronlehmann fixed it. |
| supportedPlatforms []*api.Platform | ||
| } | ||
|
|
||
| // SetTask returns true when the filter is enabled for a given task. |
There was a problem hiding this comment.
the name SetTask makes this seem like a setter, with side effects, but the doc comment makes it seem like a comparison, without side effects. but the code body clearly shows side effects. can you rewrite this comment? is the boolean supposed to indicate the success or failure of the operation? if it fails, would it be better to return an error?
There was a problem hiding this comment.
This name comes from the interface that PlatformFilter implements:
// Filter checks whether the given task can run on the given node.
// A filter may only operate
type Filter interface {
// SetTask returns true when the filter is enabled for a given task
// and assigns the task to the filter. It returns false if the filter
// isn't applicable to this task. For instance, a constraints filter
// would return `false` if the task doesn't contain any constraints.
SetTask(*api.Task) bool
// Check returns true if the task assigned by SetTask can be scheduled
// into the given node. This function should not be called if SetTask
// returned false.
Check(*NodeInfo) bool
// Explain what a failure of this filter means
Explain(nodes int) string
}
All filters are implemented this way. I agree that the name and the comment could be confusing. I'll open a separate PR to propose changing that, since all filters are implemented this way and this PR shouldn't touch others.
| if nodes == 1 { | ||
| return "unsupported platform on 1 node" | ||
| } | ||
| return fmt.Sprintf("unsupported platform on %d nodes", nodes) |
There was a problem hiding this comment.
would it not be better to have this return an error somewhere?
There was a problem hiding this comment.
Again, following the interface design here.
|
Requested changed, but you do not HAVE to make these changes, as long as you can give me a quick explanation as to why it's done the way it's done. |
|
@dperny: These methods satisfy an interface. |
|
@aaronlehmann thanks, sorry about that then. is the interface documented with that information? |
|
Thanks. LGTM 👍 👍 👍 |
|
It seems we are not very sure of the use cases, should we make this an experimental feature so we can change it when we get user feedbacks? |
| }, | ||
| } | ||
|
|
||
| // task3 - no platform constraints, should run on any node |
There was a problem hiding this comment.
I think you should add another test with task specifying multiple platforms explicitly. And verify it can be scheduled to nodes with different architecture.
There was a problem hiding this comment.
@dongluochen added a task that can run on multiple platforms, and an extra node.
|
@dongluochen the use case is straightforward (and uncommon IMO). When you have a mixed cluster, you don't want images that are not supported by a platform to run on a node with that platform. |
Signed-off-by: Nishant Totla <[email protected]>
6da8089 to
8edbb92
Compare
|
@nishanttotla My concern is mis-uses. For example, users may not realize an image can run on 2 platforms. It |
This is what happens right now. Images are scheduled to any node whether or not it can run there. This PR just provides an extra filter to allow for preventing that. |
|
please, see #2294 |
|
@erikbgithub regarding the problem you're facing with 3 raspberries, can you open a new issue providing more details about individual nodes as well as the service you're running? We can discuss possible solutions there, as well as ways to make the error report better. |
This PR adds the ability to automatically schedule service tasks based on platform constraints. It's part of a larger change, where platform constraints would be passed as part of the service spec, and used during scheduling to avoid running tasks on nodes that don't support them.
What this PR does
Platformsfield toTaskSpec.Placement. This is a list of all platforms that the container can be scheduled on.PlatformFilterstruct, that gets activated ifTaskSpec.Placement.Platformsis non-empty, and picks a node for a container ifnode.Description.Platformis inTaskSpec.Placement.Platforms.Where does the platform information come from?
Some images have fat manifests, that include references to platform-specific manifests (an example of such an image is
friism/golang). The Docker daemon already pins service images by digest, and can in addition retrieve platform information from the fat manifest (if it exists) and populate theTaskSpec.Placement.Platformsfield. If the field remains empty, then the platform filter is not enabled.See moby/moby#31348 for broader context.
cc @aaronlehmann @aluzzardi @dongluochen @friism