-
Notifications
You must be signed in to change notification settings - Fork 38.7k
test: Fix RANDOM_CTX_SEED use with parallel tests #30737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Fix RANDOM_CTX_SEED use with parallel tests #30737
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Can be verified through running |
src/test/util/setup_common.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't set in the fuzz tests, IIRC. So, as explained in #30696 (comment) it would fail if there were two fuzz tests that set up a (Basic)TestingSetup each. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, obviously, the issue would remain if the same test is run in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your detailed linked comment @maflcko. Apologies for not going back to re-read it before publishing this PR.
Before publishing, I did recall that you were worried about fuzz tests and ran:
RANDOM_CTX_SEED=21 ./test/fuzz/test_runner.py -j40 .
..with this added temporarily:
printf("%s\n", rand_str.c_str());The hashes outputted were unique. I admit it surprised me and I also printed test_path and saw they were empty. Should have dug deeper at that point to figure out the cause of randomness before publishing the PR.
After your latest comment I dug deeper, discovering:
- test/fuzz/test_runner.py does not automatically forward on environment variables, I guess it wants to keep the environment under tighter control. Making it explicitly forward
RANDOM_CTX_SEEDmakes the deterministic randomness of the fuzz tests start from the same seed.
Outputting getpid() in the BasicTestingSetup-ctor now shows unique PIDs per fuzz test. I hesitate to add the PID to the seed since the intent of 97e16f5 seems to be to decrease determinism). In the case of benchmarks, the PID would probably not help either, as the intent seems to be to run them single-threaded(?) - which would at the same time mean directories being created/destroyed should not be an issue for benchmarks.
I've opted in the latest push to implement forwarding of RANDOM_CTX_SEED and implemented G_TEST_GET_FULL_NAME for bench+fuzz. Please let me know what you think about that approach.
(Noticed in the debugger that the BasicTestingSetup-ctor, after creating the test directories, creates the kernel::Context which calls RandomInit() which calls ProcRand(..., /*always_use_real_rng=*/true) ...which should be harmless as long as RNGState::m_deterministic_prng has been set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To offer another alternative fix, beside the ones proposed in #30696 (comment):
Another possible fix would be to ensure the g_rng_temp_path is seeded before any call to SeedRandomForTest.
I think it would help why you didn't think that any of the 4 alternative fixes are worth a try and instead went for this, which is an incomplete fix, as explained in my previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help why you didn't think that any of the 4 alternative fixes are worth a try and instead went for this, which is an incomplete fix, as explained in my previous comment.
In general I think determinism should be increased, not decreased. Increased determinism appears to be the original intent behind moving SeedRandomForTest earlier in 97e16f5, please let me know if you have a different interpretation. All your suggestions lead to decreased determinism, to avoid repetition, I only left additional comments:
-
"Use
GetStrongRandBytes, however, I am not sure how slow that'd be in the worst case." - You pointed out that it may be a bit slow. -
"Move the
SeedRandomForTestafter the fuzz target'sinitfunction, but beforeResetCoverageCounters. However that'd still break if the fuzz target spun up a TestingSetup." - Asinitoften createsBasicTestingSetup, that approach would also need to have the setup_common.cpp part of 97e16f5 reverted. As you said any later createdBasicTestingSetupafterinitwould become deterministic. -
"Use the hash of the current folder generator, as well as the hash of the pid. Which may obviously break once the kernel wraps the pids, but I am not sure how likely that is." - Addressed in my previous comment.
-
Ensure the
g_rng_temp_pathis seeded before any call toSeedRandomForTest. - Just the general determinism concern above.
Also, obviously, the issue would remain if the same test is run in parallel
The same test being run in parallel with itself could use -testdatadir. Is that a common case?
Edit: Could use different RANDOM_CTX_SEED if wanting to run the same test in parallel with itself as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. There may be a misunderstanding here. According to the comment you removed, the datadir should actually be "unique", not "deterministic". ("Random context to get unique temp data dirs.")
Aha, time to bring out the git log -S "Random context to get unique temp data dirs" command. ;)
I see it was your fae43a9 from 2019, so it's hard to argue that the author may have meant "unique" as in simply different from other tests. :) See you introduced the later seeding of the RNG in that commit too.
On the other hand, you (together with me) ACKed #29625 which contains 97e16f5, moving the seeding prior to directory naming randomization. Do you now disagree with it?
Once you make it deterministic, there will be a chance of collision, which may lead to a crash, which this pull request is trying to fix, no?
The PR does fix collisions between different tests that would otherwise happen when RANDOM_CTX_SEED was set to the same value, which was resulting in the same folder name being used. (RANDOM_CTX_SEED=21 make -j10 check).
I don't think anything here is a common case. So in theory an alternative "fix" would be to simply say in a comment that RANDOM_CTX_SEED can only be used when running a single test process at a time.
Combining tests with setting RANDOM_CTX_SEED is indeed an edge case. This PR arguably implements missing test names for bench+fuzz, missing support for forwarding RANDOM_CTX_SEED to fuzz tests, simplifies the code through removing g_rng_temp_path which had a no longer valid comment after 97e16f5, the main fix/complexity lies in adding entropy to the random folder path using the test name.
The extremely paranoid would say that the folder name generation could influence test success/failure, so having it deterministic is helpful. Unit tests are poorly written if that is ever the case, but for fuzzing it's still an open question to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, you (together with me) ACKed #29625 which contains 97e16f5, moving the seeding prior to directory naming randomization. Do you now disagree with it?
I think it was simply an oversight. At least for me.
Once you make it deterministic, there will be a chance of collision, which may lead to a crash, which this pull request is trying to fix, no?
The PR does fix collisions between different tests that would otherwise happen when
RANDOM_CTX_SEEDwas set to the same value, which was resulting in the same folder name being used. (RANDOM_CTX_SEED=21 make -j10 check).
Yes, I understand the code diff. I am just saying that it would be nicer if running the same test also didn't crash for that reason, when run in parallel. It seems plausible that someone would want to do that, given that it is possible and supported right now (when not setting RANDOM_CTX_SEED).
Otherwise, someone else will report that as an issue and the code has to be changed again.
The extremely paranoid would say that the folder name generation could influence test success/failure
Ok, that helps to understand your concern. However, I'd say that the folder name is printed. So if it lead to an issue, it would be trivial to reproduce.
(If it is not printed on failure, then that should be fixed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, you (together with me) ACKed #29625 which contains 97e16f5, moving the seeding prior to directory naming randomization. Do you now disagree with it?
I think it was simply an oversight. At least for me.
Alright, thanks for addressing this. I think it would be good to wait for sipa to have time as this PR continues in the same direction as his commit.
Yes, I understand the code diff. I am just saying that it would be nicer if running the same test also didn't crash for that reason, when run in parallel. It seems plausible that someone would want to do that, given that it is possible and supported right now (when not setting
RANDOM_CTX_SEED).Otherwise, someone else will report that as an issue and the code has to be changed again.
If they don't want to set different RANDOM_CTX_SEED they can set different -testdatadir (and probably should, for clarity in case there are failures).
The extremely paranoid would say that the folder name generation could influence test success/failure
Ok, that helps to understand your concern. However, I'd say that the folder name is printed. So if it lead to an issue, it would be trivial to reproduce.
It will be even more trivial to reproduce if one can just set the RANDOM_CTX_SEED env var and re-run the same binary without modifying and recompiling the code (and potentially having to set up a git & build environment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be even more trivial to reproduce if one can just set the
RANDOM_CTX_SEEDenv var and re-run the same binary without modifying and recompiling the code (and potentially having to set up a git & build environment).
I just think this optimized in the wrong direction. Has there ever been a known issue due to a fixed-size ascii-hex-encoded path element? Stuff like variable-size or non-hex-ascii are known to possibly find issues, but a fixed size [a-f0-9] element, I am not aware of.
Moreover, adding code like fuzz_env['RANDOM_CTX_SEED'] = random_seed to the fuzz runner also seems to go in the wrong direction a bit. The goal of the fuzz tests is to be fully deterministic and stable, within a single run and across different runs. If the fuzz randomness is seeded with anything other than stuff provided by the fuzz engine, it is easy to get non-determinism, or coverage instability.
An having different types of "seed" in the fuzz context is also confusing.
If they don't want to set different
RANDOM_CTX_SEEDthey can set different-testdatadir(and probably should, for clarity in case there are failures).
Well, the only point of setting RANDOM_CTX_SEED is to reproduce a past failure. However, if you want to go down the route of requiring to set a different testdatadir when wanting to run the same test in parallel, it may be best to fully go that route. Instead of a hash, just use the test name as-is? This would also be parallel to the functional tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if you want to go down the route of requiring to set a different testdatadir when wanting to run the same test in parallel, it may be best to fully go that route. Instead of a hash, just use the test name as-is? This would also be parallel to the functional tests.
For reference, #31000 should be one step into that direction.
|
Will review after rebase. |
To fix the issue with test directories clashing when running in parallel with the env var RANDOM_CTX_SEED set, we now mix in the test name. Issue was introduced by 97e16f5, and comment for g_insecure_rand_ctx_temp_path became invalid with that commit. Make the determinism clear through removing the no longer useful g_insecure_rand_ctx_temp_path. Fixes bitcoin#30696.
eeeb56b to
065f0f6
Compare
|
🐙 This pull request conflicts with the target branch and needs rebase. |
|
Cross-ref to #30748 (comment), depending on how this evolves. |
|
Closing since the main issue was fixed in a different way by the now merged #31000. |
|
I think the issue in the fuzz tests still persists. |
That's correct. Fuzzing was not my main concern, but could maybe resurrect that part of this PR. I've rebased on current master and re-pushed the branch I had deleted, but am not allowed to reopen this one. Are you still against passing through RANDOM_CTX_SEED in the test_runner.py? In that case I can drop that commit. |
|
If you want you can just open a new pull, it will have to be reviewed from scratch anyway.
Yes, adding code like If you disagree, it would be good to explain why. |
|
For reference, the fuzz issue can be hit more frequent on machines that do not have nanosecond precision. GCE may not have it, according to the oss-fuzz crashes of the form: Edit: Fixed in #31317 |
My understanding is that with the current state of that pull, all outstanding observable issues mentioned in this thread are fixed? If not, please leave a comment. There may or may not be some small style-fixups/follow-ups such as #31317 (comment), but no real issue to be fixed. |
|
Agree that issues here are fixed. For orderliness, maybe the fuzz-part of dfef02f from this PR could be done as a new PR just to ensure named folders? |
|
Ah yes. I am happy to review a pull for named folders when fuzzing |
|
New PR #31333. |
To fix the issue with test directories clashing when running in parallel with the env var
RANDOM_CTX_SEEDset, we now mix in the test name.Issue was introduced by 97e16f5, and comment for
g_insecure_rand_ctx_temp_pathbecame invalid with that commit. Make the determinism clear through removing the no longer usefulg_insecure_rand_ctx_temp_path.Fixes #30696.