Skip to content

Retry build when RemoteActionFileSystem encounters a missing digest#25358

Closed
fmeum wants to merge 1 commit intobazelbuild:masterfrom
fmeum:build-rewinding-jdeps
Closed

Retry build when RemoteActionFileSystem encounters a missing digest#25358
fmeum wants to merge 1 commit intobazelbuild:masterfrom
fmeum:build-rewinding-jdeps

Conversation

@fmeum
Copy link
Copy Markdown
Collaborator

@fmeum fmeum commented Feb 21, 2025

Cache evictions encountered during reads of remote files in RemoteActionFileSystem now result in the build being retried when --experimental_remote_cache_eviction_retries is set to a positive value (the default).

This is enabled by implementing the checkForLostInputs method on the RemoteOutputService, building on the refactoring performed in #25396.

@fmeum fmeum force-pushed the build-rewinding-jdeps branch 4 times, most recently from 7d8cbef to 9cb500c Compare February 24, 2025 12:28
@fmeum fmeum changed the title Build rewinding jdeps Rewind builds when the RemoteActionFileSystem encounters a missing digest Feb 25, 2025
@fmeum fmeum changed the title Rewind builds when the RemoteActionFileSystem encounters a missing digest Retry build when RemoteActionFileSystem encounters a missing digest Feb 25, 2025
@fmeum fmeum marked this pull request as ready for review February 25, 2025 11:56
@fmeum fmeum requested review from a team and lberki as code owners February 25, 2025 11:56
@github-actions github-actions Bot added team-Performance Issues for Performance teams team-Rules-Java Issues for Java rules team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Feb 25, 2025
@fmeum fmeum removed the request for review from a team February 25, 2025 11:57
@github-actions github-actions Bot added the awaiting-review PR is awaiting review from an assigned reviewer label Feb 25, 2025
@fmeum fmeum removed the request for review from lberki February 25, 2025 11:57
@fmeum
Copy link
Copy Markdown
Collaborator Author

fmeum commented Feb 25, 2025

@justinhorvitz Could you review the interaction with the action rewinding machinery, including the changes I had to make to RewindingTest?
@coeuvre Could you review the rest, i.e., the integration with build rewinding and the remote module?

catastrophe = true;
} else if (remoteCacheFailed) {
} else if (BulkTransferException.allCausedByCacheNotFoundException(exception)) {
// At this point, cache evictions that affect uploaded inputs have already been handled.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coeuvre This is a change in behavior, but I think it's for the better as it avoids retries that are very unlikely to succeed.


# Incremental build in toplevel build triggers remote cache eviction error
# but Bazel doesn't automatically retry the build yet.
# TODO: This documents the current behavior, but it's not intended.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinhorvitz To fix this, I would need to thread information about lost inputs obtained in

ensureToplevelArtifacts(env, importantArtifacts, inputMap);
through to the ImportantOutputHandler. I think I roughly understand what that would require, but the two calls to informImportantOutputHandler further above and the mentioning of error bubbling tell me that this is probably pretty difficult to get right. Do you have any advice for me?

Copy link
Copy Markdown
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I like the direction that you make build rewinding more similar to action rewinding.

If it is not too difficult to do, I would like you to split this PR into 3 PRs for easier reviews:

  1. A PR that overhauls the build rewinding mechanism.
  2. A PR that fixes build rewinding for jdeps.
  3. A PR that contains remaining changes in this PR that don't belong to above 2.

Comment thread src/main/java/com/google/devtools/build/lib/remote/RemoteActionFileSystem.java Outdated
Comment thread src/main/java/com/google/devtools/build/lib/remote/RemoteExecutionCache.java Outdated
Comment thread src/main/java/com/google/devtools/build/lib/remote/util/DigestUtil.java Outdated
@fmeum fmeum force-pushed the build-rewinding-jdeps branch 4 times, most recently from 7f44f95 to d4de887 Compare February 26, 2025 19:50
@fmeum
Copy link
Copy Markdown
Collaborator Author

fmeum commented Feb 26, 2025

I split off #25396 with the refactoring, this PR is now stacked on it.

@fmeum fmeum requested a review from coeuvre February 26, 2025 19:52
@fmeum fmeum force-pushed the build-rewinding-jdeps branch 3 times, most recently from 12b24a3 to f3fbba5 Compare February 27, 2025 18:17
@fmeum fmeum force-pushed the build-rewinding-jdeps branch from f3fbba5 to 29c67f1 Compare February 27, 2025 21:31
@fmeum fmeum force-pushed the build-rewinding-jdeps branch from 29c67f1 to 2064a9c Compare March 7, 2025 18:11
@fmeum
Copy link
Copy Markdown
Collaborator Author

fmeum commented Mar 7, 2025

@coeuvre This is no longer stacked and ready for another review.

@coeuvre
Copy link
Copy Markdown
Member

coeuvre commented Mar 10, 2025

Importing.

@fmeum
Copy link
Copy Markdown
Collaborator Author

fmeum commented Mar 10, 2025

@bazel-io fork 8.2.0

@justinhorvitz justinhorvitz removed their request for review March 10, 2025 21:07
@github-actions github-actions Bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Mar 12, 2025
@fmeum fmeum deleted the build-rewinding-jdeps branch March 12, 2025 12:02
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Mar 12, 2025
Cache evictions encountered during reads of remote files in `RemoteActionFileSystem` now result in the build being retried when `--experimental_remote_cache_eviction_retries` is set to a positive value (the default).

This is enabled by implementing the `checkForLostInputs` method on the `RemoteOutputService`,  building on the refactoring performed in bazelbuild#25396.

Closes bazelbuild#25358.

PiperOrigin-RevId: 736064349
Change-Id: Idc70d55138c3366d22ece7f0579b242b3c217da8
iancha1992 pushed a commit to iancha1992/bazel that referenced this pull request Mar 13, 2025
Cache evictions encountered during reads of remote files in `RemoteActionFileSystem` now result in the build being retried when `--experimental_remote_cache_eviction_retries` is set to a positive value (the default).

This is enabled by implementing the `checkForLostInputs` method on the `RemoteOutputService`,  building on the refactoring performed in bazelbuild#25396.

Closes bazelbuild#25358.

PiperOrigin-RevId: 736064349
Change-Id: Idc70d55138c3366d22ece7f0579b242b3c217da8
github-merge-queue Bot pushed a commit that referenced this pull request Mar 26, 2025
…g digest (#25538)

Cache evictions encountered during reads of remote files in
`RemoteActionFileSystem` now result in the build being retried when
`--experimental_remote_cache_eviction_retries` is set to a positive
value (the default).

This is enabled by implementing the `checkForLostInputs` method on the
`RemoteOutputService`, building on the refactoring performed in
#25396.

Closes #25358.

PiperOrigin-RevId: 736064349
Change-Id: Idc70d55138c3366d22ece7f0579b242b3c217da8

Commit
85d9cc9

Co-authored-by: Fabian Meumertzheim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-Performance Issues for Performance teams team-Remote-Exec Issues and PRs for the Execution (Remote) team team-Rules-Java Issues for Java rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants