Skip to content

Conversation

@keertk
Copy link
Member

@keertk keertk commented Jun 5, 2024

This change introduces the flag --experimental_inmemory_sandbox_stashes to track in memory the contents of the stashes stored with --reuse_sandbox_directories=true

With the old behavior Bazel has to perform a lot of I/O to read the contents of each stash before reusing it in a new action. Essentially, it checks every directory and subdirectory in the stashed sandbox to find out which files are different to the inputs of the new action about to be executed.

With in-memory stashes we associate each stash to the symlinks that needed to be created for that action. We also store the time at which these symlinks were created. In a background thread after the action has finished executing we stat every directory and for the ones that have changed (this should be rare) we update the contents. Because we only read the contents of the directories that have changed we do much less I/O than before.

If an action purposefully changes a sandbox symlink in-place, this won't be detected by statting the directory. I don't know any use case for this since the symlink itself is an implementation detail to achieve sandboxing. For this reason, manipulation of sandbox symlinks in-place is not supported.

Depending on the build this change might have a significant effect on memory. It should generally improve wall time further in builds where --reuse_sandbox_directories already improved wall time.

This change also introduces a minor optimization which is to associate each stash with the target that it was originally created for. When a new action wants to reuse a stash and there is more than one available, it will take the stash whose target is closest to its own. This is done with the assumption that targets that are closer together in the graph will have more inputs in common.

Fixes #22309 .

Closes #22559.

PiperOrigin-RevId: 640142481
Change-Id: Iece2d718df47f403e2fe91c1bd887604eceee8ee

Commit 1c0135c

This change introduces the flag `--experimental_inmemory_sandbox_stashes` to track in memory the contents of the stashes stored with `--reuse_sandbox_directories=true`

With the old behavior Bazel has to perform a lot of I/O to read the contents of each stash before reusing it in a new action. Essentially, it checks  every directory and subdirectory in the stashed sandbox to find out which files are different to the inputs of the new action about to be executed.

With in-memory stashes we associate each stash to the symlinks that needed to be created for that action. We also store the time at which these symlinks were created. In a background thread after the action has finished executing we stat every directory and for the ones that have changed (this should be rare) we update the contents. Because we only read the contents of the directories that have changed we do much less I/O than before.

If an action purposefully changes a sandbox symlink in-place, this won't be detected by statting the directory. I don't know any use case for this since the symlink itself is an implementation detail to achieve sandboxing. For this reason,  manipulation of sandbox symlinks in-place is not supported.

Depending on the build this change might have a significant effect on memory. It should generally improve wall time further in builds where `--reuse_sandbox_directories` already improved wall time.

This change also introduces a minor optimization which is to associate each stash with the target that it was originally created for. When a new action wants to reuse a stash and there is more than one available, it will take the stash whose target is closest to its own. This is done with the assumption that targets that are closer together in the graph will have more inputs in common.

Fixes #22309 .

Closes #22559.

PiperOrigin-RevId: 640142481
Change-Id: Iece2d718df47f403e2fe91c1bd887604eceee8ee
@keertk keertk added the team-Local-Exec Issues and PRs for the Execution (Local) team label Jun 5, 2024
@keertk keertk requested a review from a team as a code owner June 5, 2024 05:55
@github-actions github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Jun 5, 2024
@keertk keertk requested a review from oquenchil June 5, 2024 05:56
@keertk keertk enabled auto-merge June 5, 2024 05:56
@keertk keertk requested a review from meteorcloudy June 5, 2024 06:00
@keertk keertk added this pull request to the merge queue Jun 5, 2024
Merged via the queue into bazelbuild:release-7.2.0 with commit b2b78a2 Jun 5, 2024
@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-Local-Exec Issues and PRs for the Execution (Local) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants