Added support for maximum replicas per node#37940
Conversation
|
build fails:
|
|
Sure it will until swarmkit PR have been merged that why this is marked as WIP now. Test automation set on Moby is quite complex and I'm not able run all these tests on my dev env so that why I prefer to have PR open here which can run those tests for me. Next steps are:
After these build and all these tests (should) pass. |
68f5387 to
8b32d49
Compare
Codecov Report
@@ Coverage Diff @@
## master #37940 +/- ##
=========================================
Coverage ? 36.6%
=========================================
Files ? 608
Lines ? 44990
Branches ? 0
=========================================
Hits ? 16467
Misses ? 26245
Partials ? 2278 |
22d5ad2 to
4c6a698
Compare
4c6a698 to
7584627
Compare
|
Swarmkit PR was merged so I included swarmkit bump to this. Builds should pass as they was earlier when I tested to include swarmkit changes from my repo. @cpuguy83 can you review ? EDIT: Swarmkit bump looks to be also on #38056 so I will drop it from this when it is merged. EDIT 2: #38056 was merged. Failing test on experimental looks to be flaky ( #32673 ). |
de44ba4 to
f422e25
Compare
thaJeztah
left a comment
There was a problem hiding this comment.
just some minor comments, but I want to give this a spin (haven't had time for that yet).
One thing I'd like to test is the behavior if the --max-replicas-per-node value is updated for an existing service (e.g. change from max 3 replicas to max 2; are the existing tasks properly killed and/or re-scheduled in that case?)
Currently it looks working on way that if user only updates --max-replicas-per-node (or actually it is If we need change that logic then it would mean one more change to swarmkit. Anyway, I will wait that #38003 is merged and implement these API versions checks on meanwhile. |
f422e25 to
261b3d6
Compare
261b3d6 to
169e690
Compare
|
Temporarily included commit from #38003 will cleanup after it is merged. |
169e690 to
8ac1832
Compare
|
@thaJeztah rebased on master and included changes you requested. Can you review again? |
f24d569 to
298228b
Compare
7bca461 to
92a4d55
Compare
|
@cpuguy83 @thaJeztah PTAL, now this contains only simple API test without logic test as it is on swarmkit side. |
thaJeztah
left a comment
There was a problem hiding this comment.
Left some minor nits inline (no critical issues, but would be nice to have them in the PR)
I was finally able to give this a spin (see below for some playing with the new option); one thing I noticed (also, see below) that reducing the max-replicas value does not terminate existing tasks. I wondered if this was by design, or an oversight? Not a show-stopper; just curious 🤗
Here's some things I tried:
Create a service with 2 replicas, but maxreplicas 1
curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/create" \
-H "Content-Type: application/json" \
-d '{"EndpointSpec":{"Mode":"vip"},"Labels":{},"Mode":{"Replicated":{"Replicas":2}},"Name":"test","TaskTemplate":{"ContainerSpec":{"DNSConfig":{},"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da"},"ForceUpdate":0,"Placement":{"MaxReplicas":1,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"Resources":{"Limits":{},"Reservations":{}}}}'
{"ID":"eln7vce863tedwd5q72kjaxkg"}Verify the MaxReplicas configuration is stored:
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test?insertDefaults=true" | jq .Spec.TaskTemplate.Placement.MaxReplicas
1And check that no more than 1 replica is running (on a single-node swarm);
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
qy2qc9p1yder test replicated 1/2 nginx:alpine Check the task, and confirm it's unable to start due to the MaxReplicas restriction:
docker service ps test --no-trunc --format 'table {{.Name}}\t{{.Error}}'
NAME ERROR
test.1
test.2 "no suitable node (max replicas per node limit exceed)"Updated MaxReplicas to 2;
Get the service definition
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test"{"ID":"cqnbsmr9foblwucehzu9ekrpb","Version":{"Index":87},"CreatedAt":"2018-12-23T22:23:22.415002133Z","UpdatedAt":"2018-12-23T22:23:22.415002133Z","Spec":{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":1,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":0,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}},"Endpoint":{"Spec":{}}}Update the service to set MaxReplicas to 2
curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/test/update?version=87" \
-H "Content-Type: application/json" \
-d '{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":2,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":0,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}}'Verify the service was updated with the new value (2);
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test" | jq .Spec.TaskTemplate.Placement.MaxReplicas
2And confirm that the service was now able to deploy 2 replicas:
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
cqnbsmr9fobl test replicated 2/2 nginx:alpine Now update the service again, and set MaxReplicas back to 1
curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/test/update?version=172" \
-H "Content-Type: application/json" \
-d '{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":1,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":0,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}}'Verify the service was updated with the new value (1)
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test" | jq .Spec.TaskTemplate.Placement.MaxReplicas
1Note that when reducing the maximum number of replicas, existing tasks are not terminated;
docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
cqnbsmr9fobl test replicated 2/2 nginx:alpine Another thing that needs to be taken into account is that scaling/updating a service with an older version of the CLI will strip the MaxReplicas. That's the expected behavior, but perhaps something we should mention somewhere in the documentation :thinking_face:
docker service scale test=3
test scaled to 3After scaling, the MaxReplicas property is reset (because the CLI does not know the new property);
curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test?insertDefaults=true" | jq .Spec.TaskTemplate.Placement.MaxReplicas
null
By design as we don't want kill running containers when it is not absolutely necessary and user have option to use --force (not sure what it is on API side) and then all tasks will be re-scheduled and situation fixed. |
92a4d55 to
6e86627
Compare
Signed-off-by: Olli Janatuinen <[email protected]>
6e86627 to
153171e
Compare
|
@thaJeztah updated based on review. Now hopefully ready to be merged? |
Gotcha; makes sense. We'll have to document that, and explain the behavior (i.e., if they want to force a reschedule, use the Trying that scenario: Create a service with curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/create" \
-H "Content-Type: application/json" \
-d '{"EndpointSpec":{"Mode":"vip"},"Labels":{},"Mode":{"Replicated":{"Replicas":2}},"Name":"test","TaskTemplate":{"ContainerSpec":{"DNSConfig":{},"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da"},"ForceUpdate":0,"Placement":{"MaxReplicas":2,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"Resources":{"Limits":{},"Reservations":{}}}}'
{"ID":"kdu0l3uxhk6p52xa719581h0h"}Verify the curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test" | jq .Spec.TaskTemplate.Placement
2Confirm the service has two tasks running: docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
rmg5iwvxl158 test replicated 2/2 nginx:alpine Update Get the service definition curl -s \
--unix-socket /var/run/docker.sock \
"http://localhost/v1.40/services/test"{"ID":"kdu0l3uxhk6p52xa719581h0h","Version":{"Index":27},"CreatedAt":"2018-12-24T09:37:12.633108733Z","UpdatedAt":"2018-12-24T09:37:12.633108733Z","Spec":{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":2,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":0,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}},"Endpoint":{"Spec":{}}}Update the service, set curl -s \
--unix-socket /var/run/docker.sock \
-X POST \
"http://localhost/v1.40/services/test/update?version=27" \
-H "Content-Type: application/json" \
-d '{"Name":"test","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine@sha256:2e497c294e3ba84aaeab7a0fbb1027819cd1f5f5892ed3c4a82b8b05010090da","DNSConfig":{},"Isolation":"default"},"Resources":{"Limits":{},"Reservations":{}},"Placement":{"MaxReplicas":1,"Platforms":[{"Architecture":"amd64","OS":"linux"},{"Architecture":"arm","OS":"linux"},{"Architecture":"arm64","OS":"linux"},{"Architecture":"386","OS":"linux"},{"Architecture":"ppc64le","OS":"linux"},{"Architecture":"s390x","OS":"linux"}]},"ForceUpdate":1,"Runtime":"container"},"Mode":{"Replicated":{"Replicas":2}},"EndpointSpec":{"Mode":"vip"}}'
{"Warnings":null}Confirm that the service now has 1 task running: docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
kdu0l3uxhk6p test replicated 1/2 nginx:alpine And that this is due to the new docker service ps test --no-trunc --format 'table {{.Name}}\t{{.Error}}'
NAME ERROR
test.1
test.2 "no suitable node (max replicas per node limit exceed)"
\_ test.2 Looking good 👍 |
|
@olljanat was the failure only a flaky test? If so, I'm happy to ignore PowerPC and merge |
|
@thaJeztah yes. Those which I disabled on #38423 looks causing issues. |
|
Alright let's get this one merged 👍 |
|
Hi! I greatly love this commit. With what docker & swarm-kit version could it be actually used at production |
|
It's not released yet; will be in the next docker release (19.03) (but you should be able to give it a spin with the nightly builds https://download.docker.com/linux/ubuntu/dists/bionic/pool/nightly/amd64/ |
- What I did
Added support for maximum replicas per node
Second step to be able solve #26259
- How I did it
Addes support for Placement parameter MaxReplicas
TODO: