Force synchronous upload and reuse of possibly modified spawn outputs#23382
Force synchronous upload and reuse of possibly modified spawn outputs#23382fmeum wants to merge 5 commits intobazelbuild:masterfrom
Conversation
826f1c9 to
5bf3c1a
Compare
|
FYI @aranguyen: We don't need to modify anything about how paths are rewritten in Java actions to benefit from execution deduplication with this change. We should still look into getting rid of the in-place modification for direct classpath header compilation to avoid the synchronous fallback introduced by this change. |
5bf3c1a to
f051c2f
Compare
|
@tjgq I have no idea how to test this short of running a large build in CI. I tested it manually on Bazel itself, with promising results: I no longer get any failures and build times improve noticeably. Representative build results for I expect the sandboxing overhead to be lower on Linux, but even with the overhead, path mapping is substantially faster. |
src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java
Show resolved
Hide resolved
tjgq
left a comment
There was a problem hiding this comment.
This is great! I think we should be in good shape to flip --experimental_remote_cache_async once it's in.
src/main/java/com/google/devtools/build/lib/actions/ActionExecutionMetadata.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java
Show resolved
Hide resolved
f051c2f to
2c82260
Compare
994ba7f to
967b286
Compare
I added a Java test that relies on mocking to make The problem is that we can't just make the other spawn slow (it doesn't run anyway), we need to make its output reuse slow. A more realistic test would be possible if we could somehow delay file system operations. Is that something that the test filesystem supports (say, stall updates to |
It might be possible with a custom FileSystem implementation; see https://cs.opensource.google/bazel/bazel/+/master:src/test/java/com/google/devtools/build/lib/remote/ActionInputPrefetcherTestBase.java;l=134;drc=3fe227ff418ef81eb5f221625bfd90f52ac70150 for inspiration. |
|
Thanks for the pointer, that looks pretty interesting. Since the main code path here goes through I added an assert that should make it less likely that the test fails open. |
| // Make it more likely to detect races by waiting for this thread to make as much progress as | ||
| // possible before letting the second spawn continue. | ||
| while (completeFirstSpawn.getState().compareTo(Thread.State.WAITING) < 0) { | ||
| Thread.sleep(10); | ||
| } | ||
| assertThat(completeFirstSpawn.getState()).isEqualTo(Thread.State.WAITING); |
There was a problem hiding this comment.
Can we instead use a latch to explicitly synchronize with the call to uploadOutputs (and avoid unnecessarily slowing down the test)?
There was a problem hiding this comment.
Done. Note that with the latch we can no longer assert that the thread would naturally wait (in the call to awaitAllOutputReuse), so I removed that assert.
src/test/java/com/google/devtools/build/lib/remote/RemoteSpawnCacheTest.java
Show resolved
Hide resolved
|
@bazel-io fork 7.4.0 |
When an action may modify a spawn's outputs after execution, the upload of outputs to the cache and reuse for deduplicated actions need to happen synchronously directly after spawn execution to avoid a race. This commit implements this for cache uploads by marking all actions with this property and simply disabling async upload for all spawns executed by such actions. For output reuse, all executions deduplicated against the first one register atomically upon deduplication and cause the cache upload to wait for all of them to complete reuse. Fixes bazelbuild#22501 Fixes bazelbuild#23288 Work towards bazelbuild#21578 Closes bazelbuild#23307 (no longer needed) Closes bazelbuild#23382. PiperOrigin-RevId: 668101364 Change-Id: Ice75dbe14a7dd46e02ecb096d2b2a30940216356
Cherry-picks the following changes to implement output reuse: * Deduplicate locally executed path mapped spawns (#22556) * Fix local execution deduplication to work with optional outputs (#23296) * Force synchronous upload and reuse of possibly modified spawn outputs (#23382) * Add support for in-memory outputs to output reuse (#23422) Fixes #23377 Fixes #23444 Fixes #23457
When an action may modify a spawn's outputs after execution, the upload of outputs to the cache and reuse for deduplicated actions need to happen synchronously directly after spawn execution to avoid a race.
This commit implements this for cache uploads by marking all actions with this property and simply disabling async upload for all spawns executed by such actions.
For output reuse, all executions deduplicated against the first one register atomically upon deduplication and cause the cache upload to wait for all of them to complete reuse.
Fixes #22501
Fixes #23288
Work towards #21578
Closes #23307 (no longer needed)