Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return an empty stats if "container not found" #34004

Merged
merged 1 commit into from
Jul 12, 2017

Conversation

yummypeng
Copy link
Contributor

@yummypeng yummypeng commented Jul 7, 2017

If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

Steps to reproduce:

# docker run -tid --name tmp --restart=always ubuntu true && docker stats --no-stream
48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c


(hangs...)



^C
#

It's a duplicate of #27772 which has already been closed by #28026, but #28026 is not a complete fix because if docker stats --no-stream comes faster after docker run, the container is still running when the code hits https://github.com/moby/moby/blob/master/daemon/stats.go#L35. Then I can see a lot of error logs from daemon:

time="2017-07-07T14:54:41.210577988+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:42.210686973+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:43.210721305+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:44.210637074+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:45.210742374+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:46.210928003+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:47.210899239+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:48.210914277+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:49.210748399+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:50.210818936+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"
time="2017-07-07T14:54:51.210596037+08:00" level=error msg="collecting stats for 48d3344f404a2b4a28538af857dedab5bca7f3939667f51509c61dd9f150bf5c: rpc error: code = Unknown desc = containerd: container not found"

This pr will ignore this error and just return an empty stats. So that client will not hang forever.

after this pr:

# docker run -tid --name tmp --restart=always ubuntu true && docker stats --no-stream
0c93066033730361fb296af0e3b07e4b264169535353aecd9121159b81dfa101
CONTAINER           CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS

Signed-off-by: Yuanhong Peng [email protected]

- A picture of a cute animal (not mandatory but encouraged)
images

@yummypeng
Copy link
Contributor Author

Oops, I close this pr by mistake and CI stopped @thaJeztah

@thaJeztah
Copy link
Member

Looks like CI started again, and is green now 👍

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions :)

// if we get "container not found" error from containerd, it's possibly
// because that this container has already been stopped. it will be ok to
// ignore this error and just return an empty stats.
if _, ok := err.(notRunningErr); !ok && !strings.Contains(err.Error(), "container not found") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the container not found error a typed error (like we do for notRunningErr)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced in this PR, but while looking at this code:
The double negative is a bit confusing here as well; would this be more readable (assuming a typed error for container not found was implemented;?

for _, pair := range pairs {
	stats, err := s.supervisor.GetContainerStats(pair.container)

	switch err.(type) {
	case nil:
		// FIXME: move to containerd on Linux (not Windows)
		stats.CPUStats.SystemUsage = systemUsage
		stats.CPUStats.OnlineCPUs = onlineCPUs

		pair.publisher.Publish(*stats)

	case notRunningErr, notFoundErr:
		// publish empty stats containing only name and ID if not running or not found
		pair.publisher.Publish(types.StatsJSON{
			Name: pair.container.Name,
			ID:   pair.container.ID,
		})

	default:
		logrus.Errorf("collecting stats for %s: %v", pair.container.ID, err)
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah Thanks for your suggestion ❤️ I have updated the codes.

@yummypeng yummypeng force-pushed the fix-docker-stats-hang branch from c33a3f4 to 911c002 Compare July 10, 2017 02:07
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one more comment

also /cc @cpuguy83 for review

@@ -1165,6 +1165,9 @@ func (daemon *Daemon) stats(c *container.Container) (*types.StatsJSON, error) {
}
stats, err := daemon.containerd.Stats(c.ID)
if err != nil {
if strings.Contains(err.Error(), "container not found") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this also needs to be done in the _windows version;

// Obtain the stats from HCS via libcontainerd
stats, err := daemon.containerd.Stats(c.ID)
if err != nil {
return nil, err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 😉 @thaJeztah

@yummypeng yummypeng force-pushed the fix-docker-stats-hang branch from 911c002 to dc8b314 Compare July 10, 2017 03:04

type notFoundErr interface {
error
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at this again; this interface is not specific enough, as both daemon. errNotRunning, and daemon. errNotFound satisfy this interface; see https://play.golang.org/p/HtkBu1_QuM. If you make it more specific, it works though; https://play.golang.org/p/S7DzAuRITj

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah Tons of thanks for finding this ❤️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yummypeng you're welcome! I almost missed it as well ☺️

@yummypeng yummypeng force-pushed the fix-docker-stats-hang branch from dc8b314 to c567391 Compare July 10, 2017 06:18
If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

Signed-off-by: Yuanhong Peng <[email protected]>
@yummypeng yummypeng force-pushed the fix-docker-stats-hang branch from c567391 to 4a6cbf9 Compare July 10, 2017 08:31
@yummypeng
Copy link
Contributor Author

janky failed but it seems not related to my PR:

07:41:22 ----------------------------------------------------------------------
07:41:22 FAIL: docker_cli_swarm_test.go:1146: DockerSwarmSuite.TestSwarmLockUnlockCluster
07:41:22 
07:41:22 [d4ec576a08906] waiting for daemon to start
07:41:22 [d4ec576a08906] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 [d0e91bd982f1a] waiting for daemon to start
07:41:22 [d0e91bd982f1a] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [d4ec576a08906] exiting daemon
07:41:22 [d4ec576a08906] waiting for daemon to start
07:41:22 [d4ec576a08906] daemon started
07:41:22 
07:41:22 [d0e91bd982f1a] exiting daemon
07:41:22 [d0e91bd982f1a] waiting for daemon to start
07:41:22 [d0e91bd982f1a] daemon started
07:41:22 
07:41:22 [dacb4fc0da4ef] waiting for daemon to start
07:41:22 [dacb4fc0da4ef] daemon started
07:41:22 
07:41:22 docker_cli_swarm_test.go:1192:
07:41:22     // d2 is now set to lock
07:41:22     checkSwarmUnlockedToLocked(c, d2)
07:41:22 docker_utils_test.go:471:
07:41:22     c.Assert(v, checker, args...)
07:41:22 ... obtained bool = false
07:41:22 ... expected bool = true
07:41:22 
07:41:22 [d4ec576a08906] exiting daemon
07:41:22 [dacb4fc0da4ef] exiting daemon
07:41:22 [d0e91bd982f1a] exiting daemon
07:41:35 
07:41:35 ----------------------------------------------------------------------

@yummypeng
Copy link
Contributor Author

ping @thaJeztah @cpuguy83

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vieux
Copy link
Contributor

vieux commented Jul 11, 2017

LGTM ping @tiborvass

@vieux
Copy link
Contributor

vieux commented Jul 11, 2017

ping @kolyshkin

@tiborvass
Copy link
Contributor

LGTM

@tiborvass tiborvass merged commit c8a2596 into moby:master Jul 12, 2017
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

fix DTS2017062812251
related upstream PR: moby#34004

Signed-off-by: Yuanhong Peng <[email protected]>
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
Return an empty stats if "container not found"

If we get "container not found" error from containerd, it's possibly
because that this container has already been stopped. It will be ok to
ignore this error and just return an empty stats.

fix DTS2017062812251
related upstream PR: moby#34004

Signed-off-by: Yuanhong Peng <[email protected]>



See merge request docker/docker!666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants