-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Description of the problem / feature request:
When running a test twice using remote-cache or disk-cache, bazel fails to cache the action that generates the xml test report (external/bazel_tools/tools/test/generate-xml.sh) on the second invocation. This happens because the action use as arguments the walltime duration of the previous action (the action running the test, external/bazel_tools/tools/test/test-setup.sh), and that time is zero when the action gets a cache hit on the second invocation.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Using the following (repository)[https://github.com/segonpin/bazel_test_cache_miss] (or any simple test that takes longer than 1 second to run), just run the test three times.
$ bazelisk clean; bazelisk test //... --disk_cache=/tmp/bazel_disk_cache --execution_log_json_file=exec1.json
$ bazelisk clean; bazelisk test //... --disk_cache=/tmp/bazel_disk_cache --execution_log_json_file=exec2.json
$ bazelisk clean; bazelisk test //... --disk_cache=/tmp/bazel_disk_cache --execution_log_json_file=exec3.json
The first time there are 0 cache-hits, the third time everything gets a cache-hit, but the second time one action does not get a cache-hit, and that seems incorrect to me.
What operating system are you running Bazel on?
Ubuntu 20.04
What's the output of bazel info release?
release 6.0.0-pre.20211117.1
Any other information, logs, or outputs that you want to share?
When we run the test for the first time. It will run 7 actions, of which 4 are linux-sandbox. Notice that the test takes 5 seconds to run.
$ bazelisk clean; bazelisk test //... --disk_cache=/tmp/bazel_disk_cache --execution_log_json_file=exec1.json
...
INFO: 7 processes: 3 internal, 4 linux-sandbox.
INFO: Build completed successfully, 7 total actions
//:sleep_test PASSED in 5.0s
...
Then we run the test for the second time. It will run 7 actions, and now only 1 is linux-sandbox. None should be. The reason for the cache-miss is that the test duration is used as input for the missed action, and because the test was cached now, the duration has changed from 5 seconds to 0 seconds, so the commandArgs was different for the action.
$ bazelisk clean; bazelisk test //... --disk_cache=/tmp/bazel_disk_cache --execution_log_json_file=exec2.json
...
INFO: 7 processes: 3 disk cache hit, 3 internal, 1 linux-sandbox.
INFO: Build completed successfully, 7 total actions
//:sleep_test (cached) PASSED in 0.0s
...
Now we run the test again, and now the test run in 0 seconds, as the previous time, so that the action printing the xml report get a cache hit from before.
$ bazelisk clean; bazelisk test //... --disk_cache=/tmp/bazel_disk_cache --execution_log_json_file=exec3.json
...
INFO: 7 processes: 4 disk cache hit, 3 internal.
INFO: Build completed successfully, 7 total actions
//:sleep_test (cached) PASSED in 0.0s
...
The action getting the cache miss runs
# in the first invocation
"commandArgs": ["external/bazel_tools/tools/test/generate-xml.sh", "bazel-out/k8-fastbuild/testlogs/sleep_test/test.log", "bazel-out/k8-fastbuild/testlogs/sleep_test/test.xml", "5", "0"],
# second and third invocation
"commandArgs": ["external/bazel_tools/tools/test/generate-xml.sh", "bazel-out/k8-fastbuild/testlogs/sleep_test/test.log", "bazel-out/k8-fastbuild/testlogs/sleep_test/test.xml", "0", "0"],
And the generated test report looks like follows (using 5 the first invocation, and 0 the second and third invocations)
$ cat bazel-out/k8-fastbuild/testlogs/sleep_test/test.xml
<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
<testsuite name="sleep_test" tests="1" failures="0" errors="0">
<testcase name="sleep_test" status="run" duration="5" time="5"></testcase>
<system-out>
Generated test.log (if the file is not UTF-8, then this may be unreadable):
<![CDATA[exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:sleep_test
-----------------------------------------------------------------------------]]>
</system-out>
</testsuite>
</testsuites>In my opinion the test report should be always the first one (with the duration="5"), or otherwise the last action should not be remoteCacheable.