This repository was archived by the owner on Jan 30, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 299
This repository was archived by the owner on Jan 30, 2020. It is now read-only.
Machines don´t join to the cluster or disappear #1044
Copy link
Copy link
Closed
Description
Hello, I have lot of issues regarding this. When I deploy a cluster of 3 machines in Amazon EC2, at the startup or after a while, some machines are not able to connect to the cluster or disappear.
I tried to change the fleet timeout and to use bigger VMs.
This is the log of the failing machine.
-- Logs begin at Sun 2014-11-30 09:53:38 UTC, end at Sun 2014-11-30 10:49:14 UTC. --
Nov 30 09:54:01 ip-172-31-19-237.eu-west-1.compute.internal systemd[1]: Starting fleet daemon...
Nov 30 09:54:01 ip-172-31-19-237.eu-west-1.compute.internal systemd[1]: Started fleet daemon.
Nov 30 09:54:04 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO fleet.go:141: Using default config file /etc/fleet/fleet.conf
Nov 30 09:54:04 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO server.go:137: Establishing etcd connectivity
Nov 30 09:54:04 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: dial tcp 127.0.0.1:4001: connection refused
Nov 30 09:54:04 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR client.go:200: Unable to get result for {Update /_coreos.com/fleet/machines/194c77faf88142d9ac6101763c979f3b/object}, retrying in 100ms
Nov 30 09:54:04 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: dial tcp 127.0.0.1:4001: connection refused
Nov 30 09:54:04 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR client.go:200: Unable to get result for {Update /_coreos.com/fleet/machines/194c77faf88142d9ac6101763c979f3b/object}, retrying in 200ms
Nov 30 09:54:04 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: dial tcp 127.0.0.1:4001: connection refused
Nov 30 09:54:04 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR client.go:200: Unable to get result for {Update /_coreos.com/fleet/machines/194c77faf88142d9ac6101763c979f3b/object}, retrying in 400ms
Nov 30 09:54:05 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: dial tcp 127.0.0.1:4001: connection refused
Nov 30 09:54:05 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR client.go:200: Unable to get result for {Update /_coreos.com/fleet/machines/194c77faf88142d9ac6101763c979f3b/object}, retrying in 800ms
Nov 30 09:54:05 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: dial tcp 127.0.0.1:4001: connection refused
Nov 30 09:54:05 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR client.go:200: Unable to get result for {Create /_coreos.com/fleet/machines/194c77faf88142d9ac6101763c979f3b/object}, retrying in 100ms
Nov 30 09:54:05 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: dial tcp 127.0.0.1:4001: connection refused
Nov 30 09:54:05 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR client.go:200: Unable to get result for {Create /_coreos.com/fleet/machines/194c77faf88142d9ac6101763c979f3b/object}, retrying in 200ms
Nov 30 09:54:05 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO server.go:148: Starting server components
Nov 30 09:55:57 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR engine.go:113: Unable to determine cluster engine version
Nov 30 09:55:57 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: cancelled
Nov 30 09:56:00 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR engine.go:113: Unable to determine cluster engine version
Nov 30 09:56:00 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: cancelled
Nov 30 09:56:05 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO engine.go:149: Engine leadership acquired
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO fleet.go:71: Reloading configuration from
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO fleet.go:141: Using default config file /etc/fleet/fleet.conf
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO fleet.go:78: Restarting server components
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: cancelled
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR event.go:91: etcd watcher {Watch /_coreos.com/fleet/job} returned error: cancelled
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: cancelled
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR event.go:91: etcd watcher {Watch /_coreos.com/fleet/job} returned error: cancelled
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO server.go:137: Establishing etcd connectivity
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: ERROR server.go:37: Failed serving HTTP on listener: 0x532ed0
Nov 30 09:58:02 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO server.go:148: Starting server components
Nov 30 09:58:32 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO fleet.go:71: Reloading configuration from
Nov 30 09:58:32 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO fleet.go:141: Using default config file /etc/fleet/fleet.conf
Nov 30 09:58:32 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO fleet.go:78: Restarting server components
Nov 30 09:58:32 ip-172-31-19-237.eu-west-1.compute.internal fleetd[596]: INFO client.go:278: Failed getting response from http://localhost:4001/: cancelled