Skip to content

CI: Introduce flaky test finder#38523

Merged
thaJeztah merged 1 commit intomoby:masterfrom
olljanat:flaky-test-finder
Jan 12, 2019
Merged

CI: Introduce flaky test finder#38523
thaJeztah merged 1 commit intomoby:masterfrom
olljanat:flaky-test-finder

Conversation

@olljanat
Copy link
Contributor

@olljanat olljanat commented Jan 9, 2019

- What I did
We all hate to see CI builds to failing randomly to flaky tests.

Lot of work will be needed before all builds will run smoothly but to make sure that any PR does not make things worse I implemented "flaky test finder" which detect new integration tests from PR commits and run stress tests for them.

- How I did it
Included bash script which:

  • detects new tests on integration/*_test.go
  • will run hack/make.sh test-integration which five time with setting -test.count 5 -test.run
  • => new tests are run totally 25 times before continue to normal integration tests.

- How to verify it
Included second commit ( 4a5d478 ) which contains two example flaky tests. I will drop that after design is approved.

@olljanat
Copy link
Contributor Author

olljanat commented Jan 9, 2019

Janky worked correctly:

19:16:37 ---> Making bundle: test-integration-flaky (in bundles/test-integration-flaky)
19:16:38 
19:16:38 Found new integrations tests:
19:16:38 +func TestCommitFlakyRandom1(t *testing.T) {
19:16:38 +func TestCommitFlakyRandom2(t *testing.T) {
19:16:38 Running stress test for them.
19:16:38 Using test flags: -test.count 5 -test.run TestCommitFlakyRandom1|TestCommitFlakyRandom2
.
.
.
19:18:08 Running /go/src/github.com/docker/docker/integration/image
19:18:08 INFO: Testing against a local daemon
19:18:08 === RUN   TestCommitFlakyRandom1
19:18:08 --- PASS: TestCommitFlakyRandom1 (0.00s)
19:18:08 === RUN   TestCommitFlakyRandom2
19:18:08 --- PASS: TestCommitFlakyRandom2 (0.00s)
19:18:08 === RUN   TestCommitFlakyRandom1
19:18:08 --- PASS: TestCommitFlakyRandom1 (0.00s)
19:18:08 === RUN   TestCommitFlakyRandom2
19:18:08 --- PASS: TestCommitFlakyRandom2 (0.00s)
19:18:08 === RUN   TestCommitFlakyRandom1
19:18:08 --- FAIL: TestCommitFlakyRandom1 (0.00s)
19:18:08     commit_test.go:61: assertion failed: error is not nil: Game Over, Better Luck Next Time
19:18:08 === RUN   TestCommitFlakyRandom2
19:18:08 --- PASS: TestCommitFlakyRandom2 (0.00s)
19:18:08 === RUN   TestCommitFlakyRandom1
19:18:08 --- PASS: TestCommitFlakyRandom1 (0.00s)
19:18:08 === RUN   TestCommitFlakyRandom2
19:18:08 --- PASS: TestCommitFlakyRandom2 (0.00s)
19:18:08 === RUN   TestCommitFlakyRandom1
19:18:08 --- PASS: TestCommitFlakyRandom1 (0.00s)
19:18:08 === RUN   TestCommitFlakyRandom2
19:18:08 --- PASS: TestCommitFlakyRandom2 (0.00s)
19:18:08 FAIL

and it looks that others do not have .git folder mounted so they cannot use this.

And this is output from Janky after I dropped example flaky tests from this PR:

20:03:52  ---> Making bundle: test-integration-flaky (in bundles/test-integration-flaky)
20:03:52  No new tests added to integration.

@olljanat
Copy link
Contributor Author

olljanat commented Jan 9, 2019

(reserved for my derek commands)

@derek derek bot added the area/testing label Jan 9, 2019
@codecov
Copy link

codecov bot commented Jan 9, 2019

Codecov Report

Merging #38523 into master will increase coverage by 0.06%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #38523      +/-   ##
==========================================
+ Coverage   36.64%    36.7%   +0.06%     
==========================================
  Files         608      608              
  Lines       45173    45230      +57     
==========================================
+ Hits        16553    16603      +50     
- Misses      26339    26340       +1     
- Partials     2281     2287       +6

@olljanat
Copy link
Contributor Author

olljanat commented Jan 9, 2019

@vdemeester @kolyshkin PTAL

@thaJeztah
Copy link
Member

/cc @tonistiigi

@thaJeztah
Copy link
Member

This is really cool!

I was also thinking if we could label flaky tests in the test itself, and either run them separate, e.g. have a special "skip";

skip.If(t, SkipFlakyTests)

Then either;

  • have a separate CI job for the flaky tests (retry those tests if they fail, and PASS if on of those runs passes)
  • Same as above, but include them in the current CI jobs, but run the integration tests twice; once with, and once without the flaky tests)

@tonistiigi
Copy link
Member

btw, updated flaky stats https://gist.github.com/tonistiigi/9e91cbb968dea83113d09ce5f9dbff79

@olljanat
Copy link
Contributor Author

@tonistiigi how you btw generate that one? I have been working on similar stats on https://github.com/olljanat/moby-jenkins-statistics but I can see that you have data from longer period.

@olljanat
Copy link
Contributor Author

olljanat commented Jan 10, 2019

I was also thinking if we could label flaky tests in the test itself, and either run them separate, e.g. have a special "skip";

skip.If(t, SkipFlakyTests)

@thaJeztah For existing known flaky tests it is option but I'm not sure how actually run only tests which have SkipFlakyTests flag?

@thaJeztah
Copy link
Member

I'm not sure how actually run only tests which have SkipFlakyTests flag?

Might be hacky. Some quick thoughts;

  • Abuse testing.Short() to either skip or run them
  • Add a Flaky suffix to the test names, and use that to match flaky and non-flaky tests

@tonistiigi
Copy link
Member

how you btw generate that one?

https://gist.github.com/tonistiigi/2e474449ac80e25753a84712e19dfd4d . I have all master logs/status from build 9564 if you are interested. Lost couple of days in the last run because I forgot to run it in december. Only master as PRs may be just failing because of actual bugs.

olljanat added a commit to olljanat/moby that referenced this pull request Jan 10, 2019
… tests for all existing integration tests

Signed-off-by: Olli Janatuinen <[email protected]>
@olljanat
Copy link
Contributor Author

  • Add a Flaky suffix to the test names, and use that to match flaky and non-flaky tests

That is probably easiest but as most of the flaky tests are actually on legacy integration-cli tests it is better to do that on another PR.

Btw. I also tested to run this logic for all existing tests on integration and this was the result:

--- FAIL: TestServiceUpdateLabel (2.88s)
--- FAIL: TestServiceUpdateLabel (2.83s)
--- FAIL: TestServiceUpdateLabel (2.85s)
--- FAIL: TestServiceUpdateLabel (2.90s)
--- FAIL: TestCreateServiceSecretFileMode (33.23s)
--- FAIL: TestServiceUpdateLabel (2.84s)

#38520 will fix TestServiceUpdateLabel and TestCreateServiceSecretFileMode is already reported on #37132

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits

@tianon could you have a look as well?

@olljanat olljanat force-pushed the flaky-test-finder branch 2 times, most recently from 7d0ef03 to b3e49df Compare January 10, 2019 20:18
@olljanat
Copy link
Contributor Author

olljanat commented Jan 10, 2019

Updated 😄

@kolyshkin
Copy link
Contributor

LGTM, and thanks 👍

comparing PR commit(s) to HEAD of moby/moby master branch and if founds
new (or renamed) integration tests will run stress tests for them.

Signed-off-by: Olli Janatuinen <[email protected]>
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐯

@olljanat
Copy link
Contributor Author

Hmm. Weird error on Janky. Maybe some server where it try connect is failing?

08:06:14 FAIL: docker_cli_daemon_plugins_test.go:161: DockerDaemonSuite.TestDaemonKillWithPlugins
08:06:14 
08:06:14 [d2ddfbf1a9745] waiting for daemon to start
08:06:14 [d2ddfbf1a9745] daemon started
08:06:14 
08:06:14 docker_cli_daemon_plugins_test.go:166:
08:06:14     c.Fatalf("Could not install plugin: %v %s", err, out)
08:06:14 ... Error: Could not install plugin: exit status 1 Error response from daemon: error parsing HTTP 404 response body: invalid character '<' looking for beginning of value: "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The requested URL was not found on the server.  If you entered the URL manually please check your spelling and try again.</p>\n"

Let's try once more...

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

let's wait a bit to see if @tianon wants a chance to review, but otherwise we can merge 🤗

@tianon
Copy link
Member

tianon commented Jan 12, 2019

Between @thaJeztah and @cpuguy83, my syntax comments are all taken! 🙌 ❤️

This looks sane to me. 👍

@thaJeztah thaJeztah merged commit ad2765b into moby:master Jan 12, 2019
@thaJeztah
Copy link
Member

Thanks!!

(
TESTARRAY=$(echo "$new_tests" | sed 's/+func //' | awk -F'\\(' '{print $1}' | tr '\n' '|')
export TESTFLAGS="-test.count 5 -test.run ${TESTARRAY%?}"
export TEST_REPEAT=5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olljanat @thaJeztah why set both -test.count 5 and TEST_REPEAT 5 ? That will run 25 times. Also, if you do set -test.count you need to export TIMEOUT=... (which will set -test.timeout) to 5 times the existing TIMEOUT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 runs is not enough to find example known flaky test TestCreateServiceSecretFileMode. Look comments above when I tested to run this to existing tests.

Difference between this and just run 25 times is this will restart daemon after every 5 test runs.

Increasing timeout of course can be done if needed.

Btw. Did you saw this timeouting on some PR now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olljanat yes on our internal EE fork. I'm fixing the bug in #38693

@kolyshkin
Copy link
Contributor

Difference between this and just run 25 times is this will restart daemon after every 5 test runs

Makes sense to add a comment explaining that

@tiborvass
Copy link
Contributor

Thanks I'll fix it in #38693

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants