Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stats collector spinning CPU if no stats are collected #36609

Merged
merged 1 commit into from
Mar 16, 2018

Conversation

thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Mar 15, 2018

fixes #36608

Commit fd0e24b (#36519) changed the stats collection loop to use a sleep() instead of time.Tick() in the for-loop.

This change caused a regression in situations where no stats are being collected, or an error is hit in the loop (in which case the loop would continue, and the sleep() is not hit).

This patch puts the sleep at the start of the loop to guarantee it's always hit.

This will delay the sampling, which is similar to the behavior before fd0e24b.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Commit fd0e24b changed
the stats collection loop to use a `sleep()` instead
of `time.Tick()` in the for-loop.

This change caused a regression in situations where
no stats are being collected, or an error is hit
in the loop (in which case the loop would `continue`,
and the `sleep()` is not hit).

This patch puts the sleep at the start of the loop
to guarantee it's always hit.

This will delay the sampling, which is similar to the
behavior before fd0e24b.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah
Copy link
Member Author

ping @stevvooe @jhowardmsft PTAL

Copy link
Member

@lowenna lowenna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily horrible, but this delays the initial collection.
What about a goto Sleep?

@thaJeztah
Copy link
Member Author

@cpuguy83 I'm not a big fan of goto X, I think it's easier to forget adding a goto, and ending up in the same situation as we're now (because there's no error, and everything "works", you just can't do as much bitcoin mining on the machine)

The loop is started when the daemon starts, so it would be a 1 second delay once, after the daemon starts.

Situation before #36519 was applied (using time.Tick(s.interval))

diff --git a/daemon/stats/collector.go b/daemon/stats/collector.go
index 88e20984b..d8045710a 100644
--- a/daemon/stats/collector.go
+++ b/daemon/stats/collector.go
@@ -90,10 +90,12 @@ func (s *Collector) Run() {
        // it will grow enough in first iteration
        var pairs []publishersPair
 
-       for {
+       logrus.Debugf("starting to collect stats")
+       for range time.Tick(s.interval) {
                // Put sleep at the start so that it will always be hit,
                // preventing a tight loop if no stats are collected.
-               time.Sleep(s.interval)
+               //time.Sleep(s.interval)
+               logrus.Debugf("collecting stats")
 
                // it does not make sense in the first iteration,
                // but saves allocations in further iterations
INFO[2018-03-15T22:51:12.463038617Z] Loading containers: start.                   
DEBU[2018-03-15T22:51:12.463106540Z] Option Experimental: false                   
DEBU[2018-03-15T22:51:12.463185637Z] Option DefaultDriver: bridge                 
...
DEBU[2018-03-15T22:51:12.657985139Z] Registering DELETE, /networks/{id:.*}        
INFO[2018-03-15T22:51:12.658555097Z] API listen on /var/run/docker.sock           
DEBU[2018-03-15T22:51:13.463968808Z] collecting stats                             
DEBU[2018-03-15T22:51:14.463225328Z] collecting stats                           

Situation in this PR (time.Sleep(s.interval) at the start of the loop)

diff --git a/daemon/stats/collector.go b/daemon/stats/collector.go
index 88e20984b..7f9735293 100644
--- a/daemon/stats/collector.go
+++ b/daemon/stats/collector.go
@@ -90,10 +90,12 @@ func (s *Collector) Run() {
        // it will grow enough in first iteration
        var pairs []publishersPair
 
+       logrus.Debugf("starting to collect stats")
        for {
                // Put sleep at the start so that it will always be hit,
                // preventing a tight loop if no stats are collected.
                time.Sleep(s.interval)
+               logrus.Debugf("collecting stats")
 
                // it does not make sense in the first iteration,
                // but saves allocations in further iterations
DEBU[2018-03-15T22:57:01.096367667Z] starting to collect stats                    
INFO[2018-03-15T22:57:01.096571550Z] Loading containers: start.                   
DEBU[2018-03-15T22:57:01.096692096Z] Option Experimental: false                   
DEBU[2018-03-15T22:57:01.096746537Z] Option DefaultDriver: bridge                
...
DEBU[2018-03-15T22:57:01.287620703Z] Registering DELETE, /networks/{id:.*}        
INFO[2018-03-15T22:57:01.287997746Z] API listen on /var/run/docker.sock           
DEBU[2018-03-15T22:57:02.097046585Z] collecting stats                             
DEBU[2018-03-15T22:57:03.098938984Z] collecting stats                             

@thaJeztah
Copy link
Member Author

thaJeztah commented Mar 16, 2018

The DockerSuite.TestPostContainersAttach test looks to be broken(ish) or flaky(ish) recently (also see #36606 (comment)), and #36611:

https://jenkins.dockerproject.org/job/Docker-PRs-experimental/39825/console

17:47:26 ----------------------------------------------------------------------
17:47:26 FAIL: docker_api_attach_test.go:98: DockerSuite.TestPostContainersAttach
17:47:26 
17:47:26 docker_api_attach_test.go:211:
17:47:26     c.Assert(actualStdout.Bytes(), checker.DeepEquals, []byte("hello\nsuccess"), check.Commentf("Attach didn't return the expected data from stdout"))
17:47:26 ... obtained []uint8 = []byte{0x73, 0x75, 0x63, 0x63, 0x65, 0x73, 0x73}
17:47:26 ... expected []uint8 = []byte{0x68, 0x65, 0x6c, 0x6c, 0x6f, 0xa, 0x73, 0x75, 0x63, 0x63, 0x65, 0x73, 0x73}
17:47:26 ... Attach didn't return the expected data from stdout
17:47:26 

@codecov
Copy link

codecov bot commented Mar 16, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@3d14173). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #36609   +/-   ##
=========================================
  Coverage          ?   34.69%           
=========================================
  Files             ?      612           
  Lines             ?    45424           
  Branches          ?        0           
=========================================
  Hits              ?    15759           
  Misses            ?    27603           
  Partials          ?     2062

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM since this is happening at daemon start it seems fine.

@lowenna lowenna merged commit 72ba7f5 into moby:master Mar 16, 2018
@thaJeztah thaJeztah deleted the fix-stats-loop branch March 16, 2018 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#36519 causes daemon to continually spin CPU
4 participants