[7.2.0] Add option to track sandbox stashes in memory #22637

keertk · 2024-06-05T05:55:34Z

This change introduces the flag --experimental_inmemory_sandbox_stashes to track in memory the contents of the stashes stored with --reuse_sandbox_directories=true

With the old behavior Bazel has to perform a lot of I/O to read the contents of each stash before reusing it in a new action. Essentially, it checks every directory and subdirectory in the stashed sandbox to find out which files are different to the inputs of the new action about to be executed.

With in-memory stashes we associate each stash to the symlinks that needed to be created for that action. We also store the time at which these symlinks were created. In a background thread after the action has finished executing we stat every directory and for the ones that have changed (this should be rare) we update the contents. Because we only read the contents of the directories that have changed we do much less I/O than before.

If an action purposefully changes a sandbox symlink in-place, this won't be detected by statting the directory. I don't know any use case for this since the symlink itself is an implementation detail to achieve sandboxing. For this reason, manipulation of sandbox symlinks in-place is not supported.

Depending on the build this change might have a significant effect on memory. It should generally improve wall time further in builds where --reuse_sandbox_directories already improved wall time.

This change also introduces a minor optimization which is to associate each stash with the target that it was originally created for. When a new action wants to reuse a stash and there is more than one available, it will take the stash whose target is closest to its own. This is done with the assumption that targets that are closer together in the graph will have more inputs in common.

Fixes #22309 .

Closes #22559.

PiperOrigin-RevId: 640142481
Change-Id: Iece2d718df47f403e2fe91c1bd887604eceee8ee

Commit 1c0135c

This change introduces the flag `--experimental_inmemory_sandbox_stashes` to track in memory the contents of the stashes stored with `--reuse_sandbox_directories=true` With the old behavior Bazel has to perform a lot of I/O to read the contents of each stash before reusing it in a new action. Essentially, it checks every directory and subdirectory in the stashed sandbox to find out which files are different to the inputs of the new action about to be executed. With in-memory stashes we associate each stash to the symlinks that needed to be created for that action. We also store the time at which these symlinks were created. In a background thread after the action has finished executing we stat every directory and for the ones that have changed (this should be rare) we update the contents. Because we only read the contents of the directories that have changed we do much less I/O than before. If an action purposefully changes a sandbox symlink in-place, this won't be detected by statting the directory. I don't know any use case for this since the symlink itself is an implementation detail to achieve sandboxing. For this reason, manipulation of sandbox symlinks in-place is not supported. Depending on the build this change might have a significant effect on memory. It should generally improve wall time further in builds where `--reuse_sandbox_directories` already improved wall time. This change also introduces a minor optimization which is to associate each stash with the target that it was originally created for. When a new action wants to reuse a stash and there is more than one available, it will take the stash whose target is closest to its own. This is done with the assumption that targets that are closer together in the graph will have more inputs in common. Fixes #22309 . Closes #22559. PiperOrigin-RevId: 640142481 Change-Id: Iece2d718df47f403e2fe91c1bd887604eceee8ee

keertk added the team-Local-Exec Issues and PRs for the Execution (Local) team label Jun 5, 2024

keertk requested a review from a team as a code owner June 5, 2024 05:55

github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Jun 5, 2024

keertk requested a review from oquenchil June 5, 2024 05:56

keertk mentioned this pull request Jun 5, 2024

[7.2.0] Sandbox reuse for tests spends a while #22631

Closed

keertk enabled auto-merge June 5, 2024 05:56

keertk requested a review from meteorcloudy June 5, 2024 06:00

meteorcloudy approved these changes Jun 5, 2024

View reviewed changes

keertk added this pull request to the merge queue Jun 5, 2024

Merged via the queue into bazelbuild:release-7.2.0 with commit b2b78a2 Jun 5, 2024

github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[7.2.0] Add option to track sandbox stashes in memory #22637

[7.2.0] Add option to track sandbox stashes in memory #22637

Uh oh!

keertk commented Jun 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[7.2.0] Add option to track sandbox stashes in memory #22637

[7.2.0] Add option to track sandbox stashes in memory #22637

Uh oh!

Conversation

keertk commented Jun 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants