ci: Add Travis check to make sure unit test coverage reports stay deterministic #16320

practicalswift · 2019-07-01T08:56:30Z

Add Travis check to make sure unit test coverage reports stay deterministic.

Rationale:

A necessary condition for meaningful line coverage measuring is that the test suite
is deterministic in the sense that the set of lines executed at least once is identical between test suite runs.

This PR addresses issue #14343 (MarcoFalke): "coverage reports non-deterministic":

Our unit tests and functional tests are non-deterministic in the overall execution, but the coverage should not be affected by that. I.e. some functions might be executed in a different order or sometimes skipped, but every line, function and branch should be executed at least once.

This is currently not true, even for serialization errors that should be hit exactly once.

Beside the obvious issue of missing test coverage on some runs, this also makes it impossible to see how test coverage changes between two commits.

Example output in case of line coverage deterministic unit tests:

[2019-06-30 08:32:59] Measuring coverage, run #1 of 3
[2019-06-30 08:36:38] Measuring coverage, run #2 of 3
[2019-06-30 08:40:15] Measuring coverage, run #3 of 3

Coverage test passed: Deterministic coverage across 3 runs.

Example output in case of line coverage non-deterministic unit tests:

[2019-06-30 08:32:59] Measuring coverage, run #1 of 3
[2019-06-30 08:36:38] Measuring coverage, run #2 of 3

The line coverage is non-deterministic between runs.

The test suite must be deterministic in the sense that the set of lines executed at least
once must be identical between runs. This is a necessary condition for meaningful coverage
measuring.

--- gcovr.run-1.txt   2019-01-30 23:14:07.419418694 +0100
+++ gcovr.run-2.txt   2019-01-30 23:15:57.998811282 +0100
@@ -471,7 +471,7 @@
 test/crypto_tests.cpp                        270     270   100%
 test/cuckoocache_tests.cpp                   142     142   100%
 test/dbwrapper_tests.cpp                     148     148   100%
-test/denialofservice_tests.cpp               225     225   100%
+test/denialofservice_tests.cpp               225     224    99%   363
 test/descriptor_tests.cpp                    116     116   100%
 test/fs_tests.cpp                             24       3    12%   14,16-17,19-20,23,25-26,29,31-32,35-36,39,41-42,45-46,49,51-52
 test/getarg_tests.cpp                        111     111   100%
@@ -585,5 +585,5 @@
 zmq/zmqpublishnotifier.h                       5       0     0%   12,31,37,43,49
 zmq/zmqrpc.cpp                                21       0     0%   16,18,20,22,33-35,38-45,49,52,56,60,62-63
 ------------------------------------------------------------------------------
-TOTAL                                      61561   27606    44%
+TOTAL                                      61561   27605    44%
 ------------------------------------------------------------------------------

This Travis check uses test_deterministic_coverage.sh which was introduced in #15296.

To make sure test_deterministic_coverage.sh won't trigger any line coverage non-determinism alarms with the current test suite I've performed 8 000 test runs (against 98958c8) which all gave identical line coverage results.

Note to reviewers: Which would be the most appropriate Travis job to put this on? I'm currently using x86_64 Linux [GOAL: install] [bionic] [uses qt5 dev package instead of depends Qt to speed up build and avoid timeout], but I'm afraid the total run-time of that job is a bit on the high end with this check included. Would it be preferable to add a new job instead of adding to an existing job? Please advice :-)

DrahtBot · 2019-07-01T09:17:36Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#15134 (tests: Switch one of the Travis jobs to an unsigned char environment (-funsigned-char) by practicalswift)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko · 2019-07-01T13:35:16Z

I think there is a value to non-deterministic tests, as long as their seed is known (e.g printed to stdout at the beginning of the test, or written to a file on failure, ...)

sdaftuar · 2019-07-01T18:42:22Z

I also think that non-deterministic tests can be valuable; as an example, a non-deterministic functional test (feature_dbcrash.py) found a bug in the non-atomic utxo writes PR (#10148). I doubt I would have found that bug without it.

So if this PR's intention is to move towards eliminating non-determinism as a tool in our testing, then I tend to concept NACK.

practicalswift · 2019-07-02T07:58:25Z

@MarcoFalke @sdaftuar

It seems like I didn't properly communicate the goal of this PR :-) Sorry about that.

The goal of this PR is to make sure line coverage-reports are deterministic.

This PR addresses #14343: "coverage reports non-deterministic" (opened by you @MarcoFalke :-))

The problem is described in MarcoFalke's issue:

Our unit tests and functional tests are non-deterministic in the overall execution, but the coverage should not be affected by that. I.e. some functions might be executed in a different order or sometimes skipped, but every line, function and branch should be executed at least once.

This is currently not true, even for serialization errors that should be hit exactly once.

Beside the obvious issue of missing test coverage on some runs, this also makes it impossible to see how test coverage changes between two commits.

More specifically this PR makes sure line coverage is stable between unit test runs - from the OP of this PR:

A necessary condition for meaningful line coverage measuring is that the test suite is deterministic in the sense that the set of lines executed at least once is identical between test suite runs.

I've now updated the PR title to better reflect the goal of this PR.

@MarcoFalke @sdaftuar Just to make sure we share the same goal: we all want line coverage unit tests reports to be deterministic, right? :-)

sdaftuar · 2019-07-03T18:34:21Z

I'm not familiar with the line coverage reports at all and don't really have a view there. If we merged this PR, then in the future if someone wanted to write a unit test that was non-deterministic, what should they do if this linter fails?

practicalswift · 2019-07-03T21:06:26Z

@sdaftuar If line coverage fluctuation in a test is intentional then one would simply add a suppression to test_deterministic_coverage.sh :-)

This PR is only meant to guard against accidental introduction of new tests which turns the line coverage number into a random process.

More concretely the situation I want to guard against is the following: line coverage is reported as 80,5% on first run, 80,6% on second run, 80,4% on third run, etc :-)

Some context:

tests: Add script checking for deterministic line coverage in unit tests #15296 ("tests: Add script checking for deterministic line coverage in unit tests")
test: Make bloom tests deterministic #15324 ("test: Make bloom tests deterministic")
tests: Make test updatecoins_simulation_test deterministic #15327 ("tests: Make test updatecoins_simulation_test deterministic")
coverage reports non-deterministic #14343 ("coverage reports non-deterministic")

sdaftuar · 2019-07-04T15:33:56Z

@practicalswift Thanks for clarifying that.

I tend to think that test writers face too many hurdles and knowledge requirements about the repo (and figuring out how to deal with linters that provide unclear benefits when applied as a hard-and-fast rule, like this one would). It's demoralizing to include a test for something and then have some unrelated script break for spurious reasons that requires learning and investigating something new in order to make travis pass.

So I think I'm still a Concept NACK. I think our concerns about line-coverage, which reflect a goal about the whole project's test suite, should not provide too much inconvenience for contributors working on a smaller piece.

practicalswift · 2019-07-04T15:49:34Z

@MarcoFalke As the person who filed issue #14343 ("coverage reports non-deterministic") which this PR tries to address: can you clarify your position?

Personally I kind of took it for granted that we want to know how suggested PR:s changes unit test line coverage. Assumptions are dangerous :-)

If the consensus opinion is that we are okay with the unit test line coverage sometimes being a random process (80,5% on first run, 80,6% on second run, 80,4% on third run, etc.) then this PR is clearly not needed and I'll close :-)

fanquake · 2019-07-05T02:30:09Z

Assumptions are dangerous :-)

I'm more concerned about the potential for reduced input from high-quality, long term contributors due to the overhead of new scripts, lints, tools etc than I am about occasionally seeing a small variation in coverage output. So I'm in agreement with @sdaftuar here.

maflcko · 2019-07-08T21:13:54Z

I think it is nice to know whether a new test has stable line coverage, but it non-stable coverage shouldn't be blocking a merge. Especially, since it might not be trivial to make them stable (c.f. #14343 (comment))

I guess I wouldn't mind if some of the linter were put behind the allow_failures travis flag.

practicalswift · 2019-07-09T16:50:06Z

@MarcoFalke Using allow_failures sounds like a very good idea. I'll investigate! :-)

ci: Test for deterministic coverage in Travis

d825f0c

fanquake added the Tests label Jul 1, 2019

practicalswift changed the title ~~ci: Add Travis check to catch accidental introduction of non-determinism in unit tests~~ ci: Add Travis check to catch accidental introduction of line coverage non-determinism in unit tests Jul 2, 2019

practicalswift changed the title ~~ci: Add Travis check to catch accidental introduction of line coverage non-determinism in unit tests~~ ci: Add Travis check to make sure unit test coverage reports stay deterministic Jul 2, 2019

practicalswift mentioned this pull request Jul 4, 2019

coverage reports non-deterministic #14343

Closed

practicalswift closed this Aug 10, 2019

davereikher mentioned this pull request Sep 18, 2019

Fix non-deterministic coverage of test DoS_mapOrphans #16878

Merged

practicalswift deleted the test-deterministic-coverage-in-travis branch April 10, 2021 19:38

bitcoin locked as resolved and limited conversation to collaborators Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: Add Travis check to make sure unit test coverage reports stay deterministic #16320

ci: Add Travis check to make sure unit test coverage reports stay deterministic #16320

Uh oh!

practicalswift commented Jul 1, 2019 •

edited

Loading

Uh oh!

DrahtBot commented Jul 1, 2019

Uh oh!

maflcko commented Jul 1, 2019

Uh oh!

sdaftuar commented Jul 1, 2019

Uh oh!

practicalswift commented Jul 2, 2019 •

edited

Loading

Uh oh!

sdaftuar commented Jul 3, 2019

Uh oh!

practicalswift commented Jul 3, 2019

Uh oh!

sdaftuar commented Jul 4, 2019

Uh oh!

practicalswift commented Jul 4, 2019 •

edited

Loading

Uh oh!

fanquake commented Jul 5, 2019

Uh oh!

maflcko commented Jul 8, 2019

Uh oh!

practicalswift commented Jul 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ci: Add Travis check to make sure unit test coverage reports stay deterministic #16320

ci: Add Travis check to make sure unit test coverage reports stay deterministic #16320

Uh oh!

Conversation

practicalswift commented Jul 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Jul 1, 2019

Conflicts

Uh oh!

maflcko commented Jul 1, 2019

Uh oh!

sdaftuar commented Jul 1, 2019

Uh oh!

practicalswift commented Jul 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdaftuar commented Jul 3, 2019

Uh oh!

practicalswift commented Jul 3, 2019

Uh oh!

sdaftuar commented Jul 4, 2019

Uh oh!

practicalswift commented Jul 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fanquake commented Jul 5, 2019

Uh oh!

maflcko commented Jul 8, 2019

Uh oh!

practicalswift commented Jul 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

practicalswift commented Jul 1, 2019 •

edited

Loading

practicalswift commented Jul 2, 2019 •

edited

Loading

practicalswift commented Jul 4, 2019 •

edited

Loading