Skip to content

Docker 1.12 swarm mode load balancing not consistently working #25325

@mschirrmeister

Description

@mschirrmeister

Hi,

I have a problem with the docker 1.12 swarm mode load balancing. The setup has 3 hosts, Docker 1.12 on CentOS 7 running in Azure. Nothing really special about the hosts. Plain CentOS 7 setup, Docker 1.12 from the Docker yum repo and btrfs as a data disk for /var/lib/docker.

If I create 2 services, scale them to 3 and then try to access them from a client the access occasionally does not work.
What it means is if you access the service via the docker host ip address(es) and exposed ports some containers do not respond.

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.12.0
Storage Driver: btrfs
 Build Version: Btrfs v3.19.1
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge overlay null host
Swarm: active
 NodeID: d7oq3rjt5llc47hr9wt19tood
 Is Manager: true
 ClusterID: 51zzdq5p2xe8otuwmbalyfy2t
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot interval: 10000
  Heartbeat tick: 1
  Election tick: 3
 Dispatcher:
  Heartbeat period: 5 seconds
 CA configuration:
  Expiry duration: 3 months
 Node Address: 10.218.3.5
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.806 GiB
Name: azeausdockerapps301t.azr.omg.wpp
ID: LWMY:RHUH:JJ5O:OP6G:5LV5:7P7B:WI3W:2JMI:B7HY:EP6J:A7SW:DUX2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):
Current test environment is running on Microsoft Azure

Steps to reproduce the issue:
Create overlay network

docker network create --driver overlay whoami-net

docker network ls | grep whoami-net
7bmymhp028ov        whoami-net          overlay             swarm

docker network inspect whoami-net
[
    {
        "Name": "whoami-net",
        "Id": "7bmymhp028ov19ia47xpdao7r",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": []
        },
        "Internal": false,
        "Containers": null,
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "257"
        },
        "Labels": null
    }
]

Create services and scale them

docker service create --name service1 --network whoami-net -p 8000 jwilder/whoami
docker service scale service1=3

docker service create --name service2 --network whoami-net -p 8000 jwilder/whoami
docker service scale service2=3

docker service ls

ID            NAME      REPLICAS  IMAGE           COMMAND
0u2d76899t30  service2  3/3       jwilder/whoami
3ecardus67vd  service1  3/3       jwilder/whoami

docker service ps service1

ID                         NAME        IMAGE           NODE                              DESIRED STATE  CURRENT STATE          ERROR
48kab5vtpwbiimn1ilsbakh0j  service1.1  jwilder/whoami  azeausdockerapps303t.marco.lan  Running        Running 3 minutes ago
800eov5dgg4hf1rgjwn2vb17d  service1.2  jwilder/whoami  azeausdockerapps302t.marco.lan  Running        Running 2 minutes ago
2klc639jzqhgy1ejyvqard46t  service1.3  jwilder/whoami  azeausdockerapps301t.marco.lan  Running        Running 2 minutes ago

docker service ps service2

ID                         NAME        IMAGE           NODE                              DESIRED STATE  CURRENT STATE           ERROR
1iyvqd2eskzdr78k86i4bjxc7  service2.1  jwilder/whoami  azeausdockerapps302t.marco.lan  Running        Running 52 seconds ago
b4ntijm8lc99oqq2af5dyh6u9  service2.2  jwilder/whoami  azeausdockerapps303t.marco.lan  Running        Running 48 seconds ago
e3i956f4fxgq847jwsqsstcbq  service2.3  jwilder/whoami  azeausdockerapps301t.marco.lan  Running        Running 48 seconds ago

docker service inspect service1

[
    {
        "ID": "3ecardus67vdjb552xf01hn3f",
        "Version": {
            "Index": 275
        },
        "CreatedAt": "2016-08-02T09:55:30.35862447Z",
        "UpdatedAt": "2016-08-02T09:56:53.477137303Z",
        "Spec": {
            "Name": "service1",
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "jwilder/whoami"
                },
                "Resources": {
                    "Limits": {},
                    "Reservations": {}
                },
                "RestartPolicy": {
                    "Condition": "any",
                    "MaxAttempts": 0
                },
                "Placement": {}
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 3
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause"
            },
            "Networks": [
                {
                    "Target": "7bmymhp028ov19ia47xpdao7r"
                }
            ],
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8000
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8000
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 8000,
                    "PublishedPort": 30000
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "dpac4u1zv98g9eayoql72jvhq",
                    "Addr": "10.255.0.6/16"
                },
                {
                    "NetworkID": "7bmymhp028ov19ia47xpdao7r",
                    "Addr": "10.0.0.2/24"
                }
            ]
        },
        "UpdateStatus": {
            "StartedAt": "0001-01-01T00:00:00Z",
            "CompletedAt": "0001-01-01T00:00:00Z"
        }
    }
]

Access service1 from a client against docker host 1

➜  ~ time curl http://10.218.3.5:30000
I'm 272dd0310a95
curl http://10.218.3.5:30000  0.01s user 0.01s system 6% cpu 0.217 total
➜  ~ time curl http://10.218.3.5:30000
curl: (7) Failed to connect to 10.218.3.5 port 30000: Operation timed out
curl http://10.218.3.5:30000  0.01s user 0.01s system 0% cpu 1:15.71 total
➜  ~ time curl http://10.218.3.5:30000
curl: (7) Failed to connect to 10.218.3.5 port 30000: Operation timed out
curl http://10.218.3.5:30000  0.01s user 0.01s system 0% cpu 1:16.82 total
➜  ~

Access service2 from a client against docker host 1

➜  ~ time curl http://10.218.3.5:30001
curl: (7) Failed to connect to 10.218.3.5 port 30001: Operation timed out
curl http://10.218.3.5:30001  0.01s user 0.01s system 0% cpu 1:17.69 total
➜  ~ time curl http://10.218.3.5:30001
I'm 8519ed607de5
curl http://10.218.3.5:30001  0.01s user 0.01s system 6% cpu 0.227 total
➜  ~ time curl http://10.218.3.5:30001
curl: (7) Failed to connect to 10.218.3.5 port 30001: Operation timed out
curl http://10.218.3.5:30001  0.01s user 0.01s system 0% cpu 1:15.79 total
➜  ~

Access service1 from a client against docker host 2

➜  ~ time curl http://10.218.3.6:30000
I'm 272dd0310a95
curl http://10.218.3.6:30000  0.01s user 0.01s system 5% cpu 0.232 total
➜  ~ time curl http://10.218.3.6:30000
curl: (7) Failed to connect to 10.218.3.6 port 30000: Operation timed out
curl http://10.218.3.6:30000  0.01s user 0.01s system 0% cpu 1:12.34 total
➜  ~ time curl http://10.218.3.6:30000
I'm 71f6aa01fad4
curl http://10.218.3.6:30000  0.01s user 0.01s system 7% cpu 0.267 total
➜  ~

Access service2 from a client against docker host 2

➜  ~ time curl http://10.218.3.6:30001
I'm 8519ed607de5
curl http://10.218.3.6:30001  0.01s user 0.01s system 6% cpu 0.241 total
➜  ~ time curl http://10.218.3.6:30001
I'm 24dbf906923a
curl http://10.218.3.6:30001  0.01s user 0.01s system 7% cpu 0.246 total
➜  ~ time curl http://10.218.3.6:30001
curl: (7) Failed to connect to 10.218.3.6 port 30001: Operation timed out
curl http://10.218.3.6:30001  0.01s user 0.01s system 0% cpu 1:15.87 total
➜  ~

Access service1 from a client against docker host 3

➜  ~ time curl http://10.218.3.7:30000
I'm 272dd0310a95
curl http://10.218.3.7:30000  0.01s user 0.01s system 4% cpu 0.353 total
➜  ~ time curl http://10.218.3.7:30000
I'm e6289ebe82da
curl http://10.218.3.7:30000  0.01s user 0.01s system 2% cpu 0.513 total
➜  ~ time curl http://10.218.3.7:30000
curl: (7) Failed to connect to 10.218.3.7 port 30000: Operation timed out
curl http://10.218.3.7:30000  0.01s user 0.01s system 0% cpu 1:16.79 total
➜  ~

Access service2 from a client against docker host 3

➜  ~ time curl http://10.218.3.7:30001
I'm 24dbf906923a
curl http://10.218.3.7:30001  0.01s user 0.01s system 7% cpu 0.234 total
➜  ~ time curl http://10.218.3.7:30001
I'm 8519ed607de5
curl http://10.218.3.7:30001  0.01s user 0.01s system 6% cpu 0.216 total
➜  ~ time curl http://10.218.3.7:30001
I'm da18d8e4b307
curl http://10.218.3.7:30001  0.01s user 0.01s system 6% cpu 0.214 total
➜  ~

Describe the results you received:
Not all containers respond when accessing the service via the docker host ip addresses and exposed ports.

Describe the results you expected:
All containers from a service should respond no matter via which docker host the service is accessed.

Additional information you deem important (e.g. issue happens only occasionally):
The issue is occasionally. Occasionally that if you delete and re-create the service maybe all containers respond, or containers on a different host do not respond.

It is at least consistent once a service is created. Lets say, containers on host 2 and host 3 do not respond when accessed via docker host 1, then it is always like this for the lifetime of that service.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions