swarm: Add update/rollback order#30261
Conversation
|
Still depends on swarmkit PR |
bc1f8c2 to
71b5f61
Compare
|
Rebased. This is still waiting for the swarmkit PR, but I've updated it to match a change in the swarmkit PR. An additional option called |
71b5f61 to
1e4820a
Compare
|
Rebased and updated to support the rollback case. Still waiting for the swarmkit PR. |
1e4820a to
b748c60
Compare
|
Swarmkit PR was merged Updated this one to bring it in line, such as This should finally be ready to go now. |
|
Looks like CI is failing @aaronlehmann |
|
CI will be fixed when #31714 is merged. Until then, vendoring swarmkit is blocked. Let's try to make that happen. |
|
nit: Underscores vs dashes. I believe in the CLI we favor dashes (e.g. |
|
@aluzzardi: I think you're right about dashes vs. underscores. @thaJeztah can you please confirm? |
|
Oh, correct, yes. And, actually; the same is used for the API; https://github.com/docker/docker/blob/b36ce6f2f6cfdee4d376b58137dde2bc20fc4312/api/swagger.yaml#L304-L313 |
b748c60 to
72d1e5f
Compare
|
Updated to use |
cli/command/service/opts.go
Outdated
There was a problem hiding this comment.
Default should be stop-first. stop-first is usually safe as tasks would not compete for resources, like CPU/mem/TCP ports, while start-first is not.
There was a problem hiding this comment.
Thanks, that was a mistake.
72d1e5f to
b836795
Compare
There was a problem hiding this comment.
Stating default in service update command is a little misleading. If you don't specify the flag, it keeps existing value.
There was a problem hiding this comment.
Agreed, however the other --update-* flags have the same problem.
--update-delay duration Delay between updates (ns|us|ms|s|m|h) (default 0s)
--update-failure-action string Action on update failure ("pause"|"continue"|"rollback") (default "pause")
--update-monitor duration Duration after each task update to monitor for failure (ns|us|ms|s|m|h)
--update-parallelism uint Maximum number of tasks updated simultaneously (0 to update all at once) (default 1)
Let's deal with this in a separate PR. Would you mind filing an issue?
There was a problem hiding this comment.
nodeCount is not used here.
|
I've tried to rebuild rebuild/windowsRS1 several times and it always fails. Is this a known issue? |
|
ping @docker/core-swarmkit-maintainers Any chance of getting this in before the freeze? |
|
I am not a maintainer, but it LGTM. |
|
LGTM ping @aluzzardi |
daemon/cluster/convert/service.go
Outdated
There was a problem hiding this comment.
nit: typo unrecongized
This parameter controls the order of operations when rolling out an update task. Either the old task is stopped before starting the new one, or the new task is started first, and the running tasks will briefly overlap. This commit adds Rollout to the API, and --update-order / --rollback-order flags to the CLI. Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
428fe0b to
6763641
Compare
Update order is a new parameter to control the order of operations when rolling out an updated task. Either the old task is stopped before starting the new one, or the new task is started first, and the running tasks will briefly overlap.
Previously,
stop_firstwas the only supported behavior. The behavior ofstart_firstcould be emulated by temporarily scaling up the service, so for awhile we hesitated to add it. However, there are some cases where it is awkward to change the number of replicas in a service just to support temporary overlap during updates. It doesn't work for some new features we have in mind like automatic preemption.This PR adds
Orderto the API, and--update-order/--rollback-orderflags to the CLI. It adds an integration test that performs a rolling update in the newstart_firstmode.cc @aluzzardi @dongluochen