Skip to content

Conversation

@practicalswift
Copy link
Contributor

@practicalswift practicalswift commented Jul 1, 2019

Add Travis check to make sure unit test coverage reports stay deterministic.

Rationale:

A necessary condition for meaningful line coverage measuring is that the test suite
is deterministic in the sense that the set of lines executed at least once is identical between test suite runs.

This PR addresses issue #14343 (MarcoFalke): "coverage reports non-deterministic":

Our unit tests and functional tests are non-deterministic in the overall execution, but the coverage should not be affected by that. I.e. some functions might be executed in a different order or sometimes skipped, but every line, function and branch should be executed at least once.

This is currently not true, even for serialization errors that should be hit exactly once.

Beside the obvious issue of missing test coverage on some runs, this also makes it impossible to see how test coverage changes between two commits.

Example output in case of line coverage deterministic unit tests:

[2019-06-30 08:32:59] Measuring coverage, run #1 of 3
[2019-06-30 08:36:38] Measuring coverage, run #2 of 3
[2019-06-30 08:40:15] Measuring coverage, run #3 of 3

Coverage test passed: Deterministic coverage across 3 runs.

Example output in case of line coverage non-deterministic unit tests:

[2019-06-30 08:32:59] Measuring coverage, run #1 of 3
[2019-06-30 08:36:38] Measuring coverage, run #2 of 3

The line coverage is non-deterministic between runs.

The test suite must be deterministic in the sense that the set of lines executed at least
once must be identical between runs. This is a necessary condition for meaningful coverage
measuring.

--- gcovr.run-1.txt   2019-01-30 23:14:07.419418694 +0100
+++ gcovr.run-2.txt   2019-01-30 23:15:57.998811282 +0100
@@ -471,7 +471,7 @@
 test/crypto_tests.cpp                        270     270   100%
 test/cuckoocache_tests.cpp                   142     142   100%
 test/dbwrapper_tests.cpp                     148     148   100%
-test/denialofservice_tests.cpp               225     225   100%
+test/denialofservice_tests.cpp               225     224    99%   363
 test/descriptor_tests.cpp                    116     116   100%
 test/fs_tests.cpp                             24       3    12%   14,16-17,19-20,23,25-26,29,31-32,35-36,39,41-42,45-46,49,51-52
 test/getarg_tests.cpp                        111     111   100%
@@ -585,5 +585,5 @@
 zmq/zmqpublishnotifier.h                       5       0     0%   12,31,37,43,49
 zmq/zmqrpc.cpp                                21       0     0%   16,18,20,22,33-35,38-45,49,52,56,60,62-63
 ------------------------------------------------------------------------------
-TOTAL                                      61561   27606    44%
+TOTAL                                      61561   27605    44%
 ------------------------------------------------------------------------------

This Travis check uses test_deterministic_coverage.sh which was introduced in #15296.

To make sure test_deterministic_coverage.sh won't trigger any line coverage non-determinism alarms with the current test suite I've performed 8 000 test runs (against 98958c8) which all gave identical line coverage results.

Note to reviewers: Which would be the most appropriate Travis job to put this on? I'm currently using x86_64 Linux [GOAL: install] [bionic] [uses qt5 dev package instead of depends Qt to speed up build and avoid timeout], but I'm afraid the total run-time of that job is a bit on the high end with this check included. Would it be preferable to add a new job instead of adding to an existing job? Please advice :-)

@fanquake fanquake added the Tests label Jul 1, 2019
@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 1, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #15134 (tests: Switch one of the Travis jobs to an unsigned char environment (-funsigned-char) by practicalswift)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko
Copy link
Member

maflcko commented Jul 1, 2019

I think there is a value to non-deterministic tests, as long as their seed is known (e.g printed to stdout at the beginning of the test, or written to a file on failure, ...)

@sdaftuar
Copy link
Member

sdaftuar commented Jul 1, 2019

I also think that non-deterministic tests can be valuable; as an example, a non-deterministic functional test (feature_dbcrash.py) found a bug in the non-atomic utxo writes PR (#10148). I doubt I would have found that bug without it.

So if this PR's intention is to move towards eliminating non-determinism as a tool in our testing, then I tend to concept NACK.

@practicalswift practicalswift changed the title ci: Add Travis check to catch accidental introduction of non-determinism in unit tests ci: Add Travis check to catch accidental introduction of line coverage non-determinism in unit tests Jul 2, 2019
@practicalswift practicalswift changed the title ci: Add Travis check to catch accidental introduction of line coverage non-determinism in unit tests ci: Add Travis check to make sure unit test coverage reports stay deterministic Jul 2, 2019
@practicalswift
Copy link
Contributor Author

practicalswift commented Jul 2, 2019

@MarcoFalke @sdaftuar

It seems like I didn't properly communicate the goal of this PR :-) Sorry about that.

The goal of this PR is to make sure line coverage-reports are deterministic.

This PR addresses #14343: "coverage reports non-deterministic" (opened by you @MarcoFalke :-))

The problem is described in MarcoFalke's issue:

Our unit tests and functional tests are non-deterministic in the overall execution, but the coverage should not be affected by that. I.e. some functions might be executed in a different order or sometimes skipped, but every line, function and branch should be executed at least once.

This is currently not true, even for serialization errors that should be hit exactly once.

Beside the obvious issue of missing test coverage on some runs, this also makes it impossible to see how test coverage changes between two commits.

More specifically this PR makes sure line coverage is stable between unit test runs - from the OP of this PR:

A necessary condition for meaningful line coverage measuring is that the test suite is deterministic in the sense that the set of lines executed at least once is identical between test suite runs.

I've now updated the PR title to better reflect the goal of this PR.

@MarcoFalke @sdaftuar Just to make sure we share the same goal: we all want line coverage unit tests reports to be deterministic, right? :-)

@sdaftuar
Copy link
Member

sdaftuar commented Jul 3, 2019

I'm not familiar with the line coverage reports at all and don't really have a view there. If we merged this PR, then in the future if someone wanted to write a unit test that was non-deterministic, what should they do if this linter fails?

@practicalswift
Copy link
Contributor Author

@sdaftuar If line coverage fluctuation in a test is intentional then one would simply add a suppression to test_deterministic_coverage.sh :-)

This PR is only meant to guard against accidental introduction of new tests which turns the line coverage number into a random process.

More concretely the situation I want to guard against is the following: line coverage is reported as 80,5% on first run, 80,6% on second run, 80,4% on third run, etc :-)

Some context:

@sdaftuar
Copy link
Member

sdaftuar commented Jul 4, 2019

@practicalswift Thanks for clarifying that.

I tend to think that test writers face too many hurdles and knowledge requirements about the repo (and figuring out how to deal with linters that provide unclear benefits when applied as a hard-and-fast rule, like this one would). It's demoralizing to include a test for something and then have some unrelated script break for spurious reasons that requires learning and investigating something new in order to make travis pass.

So I think I'm still a Concept NACK. I think our concerns about line-coverage, which reflect a goal about the whole project's test suite, should not provide too much inconvenience for contributors working on a smaller piece.

@practicalswift
Copy link
Contributor Author

practicalswift commented Jul 4, 2019

@MarcoFalke As the person who filed issue #14343 ("coverage reports non-deterministic") which this PR tries to address: can you clarify your position?

Personally I kind of took it for granted that we want to know how suggested PR:s changes unit test line coverage. Assumptions are dangerous :-)

If the consensus opinion is that we are okay with the unit test line coverage sometimes being a random process (80,5% on first run, 80,6% on second run, 80,4% on third run, etc.) then this PR is clearly not needed and I'll close :-)

@fanquake
Copy link
Member

fanquake commented Jul 5, 2019

Assumptions are dangerous :-)

I'm more concerned about the potential for reduced input from high-quality, long term contributors due to the overhead of new scripts, lints, tools etc than I am about occasionally seeing a small variation in coverage output. So I'm in agreement with @sdaftuar here.

@maflcko
Copy link
Member

maflcko commented Jul 8, 2019

I think it is nice to know whether a new test has stable line coverage, but it non-stable coverage shouldn't be blocking a merge. Especially, since it might not be trivial to make them stable (c.f. #14343 (comment))

I guess I wouldn't mind if some of the linter were put behind the allow_failures travis flag.

@practicalswift
Copy link
Contributor Author

@MarcoFalke Using allow_failures sounds like a very good idea. I'll investigate! :-)

@practicalswift practicalswift deleted the test-deterministic-coverage-in-travis branch April 10, 2021 19:38
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants