Skip to content

Fix noremote_upload_local_results behavior in combined blobstore#9338

Closed
borkaehw wants to merge 1 commit intobazelbuild:masterfrom
lucidsoftware:fix-noremote-upload-local-results
Closed

Fix noremote_upload_local_results behavior in combined blobstore#9338
borkaehw wants to merge 1 commit intobazelbuild:masterfrom
lucidsoftware:fix-noremote-upload-local-results

Conversation

@borkaehw
Copy link
Copy Markdown

@borkaehw borkaehw commented Sep 5, 2019

Disable the upload action to remote cache when --noremote_upload_local_results is set.

This problem would only happen in CombinedDiskHttpBlobStore at this moment when users want read/write permission to disk cache but only read permission to remote cache.

Once the GrpcCache refactoring (making GrpcCache a SimpleBlobStore implementation) is done, CombinedDiskHttpBlobStore should be renamed to CombinedDiskRemoteBlobStore. Then this fix will apply to both GrpcCache and HttpCache.

Fix #8216

@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch 2 times, most recently from c875fd7 to de90588 Compare September 5, 2019 23:12
@borkaehw borkaehw marked this pull request as ready for review September 5, 2019 23:27
@buchgr
Copy link
Copy Markdown
Contributor

buchgr commented Sep 6, 2019

@borkaehw Thanks for the patch! I agree that we should have this but it's incompatible behavior. You need to make it so that this new behavior is guarded behind an incompatible flag that is disabled by default.

@ishikhman
Copy link
Copy Markdown
Contributor

@borkaehw here is a link for the policy with some instructions: https://bazel.build/breaking-changes-guide.html

@buchgr
Copy link
Copy Markdown
Contributor

buchgr commented Sep 6, 2019

Hmm. I am wondering if it would make more sense if we gave the disk cache it's own knobs and introduced --disk_write_local_results and --disk_accept_cached instead?

@ishikhman
Copy link
Copy Markdown
Contributor

Hmm. I am wondering if it would make more sense if we gave the disk cache it's own knobs and introduced --disk_write_local_results and --disk_accept_cached instead?

Hm, do you know a use case for that?

@SrodriguezO
Copy link
Copy Markdown

I can't think of a strong use case for using the disk cache as read-only. To disable it altogether, you might as well not specify a --disk_cache in the config.

In any case, I think this would fall outside the scope of this PR, which is to resolve #8216 (and the changes to remote_upload_local_results's behavior would have to be done anyway).

@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch from de90588 to 8dfae94 Compare September 6, 2019 19:34
@borkaehw
Copy link
Copy Markdown
Author

borkaehw commented Sep 6, 2019

I have made the incompatible flag and written a better commit message. I suppose GitHub issue should be created after we merge. Let me know if I did anything wrong. Thanks.

Comment thread src/main/java/com/google/devtools/build/lib/remote/options/RemoteOptions.java Outdated
@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch from 8dfae94 to ab923b3 Compare September 6, 2019 20:48
Copy link
Copy Markdown
Contributor

@ishikhman ishikhman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thanks a lot for the contribution!

throws IOException, InterruptedException {
diskCache.putActionResult(actionKey, actionResult);
remoteCache.putActionResult(actionKey, actionResult);
if (!options.incompatibleUploadLocalResultsDiskOnly || options.remoteUploadLocalResults) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the following version a bit more readable, but it's up to you: to change it or not :)
if (options.incompatibleUploadLocalResultsDiskOnly && !options.remoteUploadLocalResults)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd need to negate the entire clause for those to be equivalent:
!(options.incompatibleUploadLocalResultsDiskOnly && !options.remoteUploadLocalResults)

I think either way works, but I prefer the original : )

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @SrodriguezO, also prefer the original. More suggestions are welcome.

@buchgr
Copy link
Copy Markdown
Contributor

buchgr commented Sep 10, 2019

In any case, I think this would fall outside the scope of this PR, which is to resolve #8216 (and the changes to remote_upload_local_results's behavior would have to be done anyway).

I think it's important to be keep the behaviour consistent. This change makes it so that --remote_upload_local_results not longer applies to --disk_cache whether a --remote_cache is specified or not. So I think to stay sane it we should introduce the same behaviour for --remote_accept_cached?

@SrodriguezO
Copy link
Copy Markdown

I think it's important to be keep the behaviour consistent

That's a fair point.

Personally, I can't think of a use case for this, but I don't have a strong opinion either way. @borkaehw wdyt?

@buchgr
Copy link
Copy Markdown
Contributor

buchgr commented Sep 10, 2019

Personally, I can't think of a use case for this, but I don't have a strong opinion either way.

I don't think we necessarily need the flags that I proposed but I think that we'll need to also disable --remote_accept_cached for the --disk_cache along with --remote_upload_local_results.

Does that sound reasonable?

@SrodriguezO
Copy link
Copy Markdown

but I think that we'll need to also disable --remote_accept_cached for the --disk_cache along with --remote_upload_local_results

Oh I see. Yeah we should definitely do that.

@borkaehw
Copy link
Copy Markdown
Author

@buchgr thanks for suggestions, it sounds reasonable to me. I will take a closer look.

@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch 2 times, most recently from b2e3e95 to 8630d72 Compare September 10, 2019 23:32
@borkaehw
Copy link
Copy Markdown
Author

Just pushed changes to address comments, please take a look. Thanks.

@irengrig irengrig added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Sep 11, 2019
}
boolean checkCache = options.remoteAcceptCached;

if (checkCache) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come you removed this?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I see that you moved the progress status down to the other block, but will the logic between here and there cause a delay?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passes the tests. I don't see a strong reason to keep it separate. I might be wrong. @ishikhman do you have more context on it?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't cause tests to fail, the message might just not show up right away

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a bit difference to be honest. Should be fine both ways.


if (options.remoteUploadLocalResults) {
if (options.remoteUploadLocalResults ||
(options.incompatibleRemoteResultsDiskOnly && options.diskCache != null && !options.diskCache.isEmpty())) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the options.diskCache != null && !options.diskCache.isEmpty()) part to a helper usesDiskCache method?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

public boolean remoteUploadLocalResults;

@Option(
name = "incompatible_remote_results_disk_only",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename to incompatible_remote_results_flags_ignore_disk? The current name makes me think this flag makes remote results flags apply to the disk cache only (which is the opposite of what they do)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I prefer just incompatible_remote_results_ignore_disk to make it a bit shorter.

help =
"If set to true, --noremote_upload_local_results and --noremote_accept_cached "
+ "will not apply to the disk cache. "
+ "If a combined cache is used, --noremote_upload_local_results will cause "
Copy link
Copy Markdown

@SrodriguezO SrodriguezO Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ "If a combined cache is used: "
+ "\n\t--noremote_upload_local_results will cause results to be written to the disk cache, but not uploaded to the remote cache"
+ "\n\t--noremote_accept_cached will result in Bazel checking for results in the disk cache, but not in the remote cache."
+ "See #8216 for details."

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch 2 times, most recently from eaefc90 to 4385531 Compare September 11, 2019 23:03
@ulfjack
Copy link
Copy Markdown
Contributor

ulfjack commented Oct 2, 2019

Ping @buchgr

I think there might be people who use a read-only disk cache on NFS, which is written by CI, and read by everyone else.

@borkaehw
Copy link
Copy Markdown
Author

borkaehw commented Oct 7, 2019

@ishikhman friendly ping here.

I think this is valuable for people. I am happy to address feedback and pushing it toward merged.

@ishikhman
Copy link
Copy Markdown
Contributor

@borkaehw still LGTM :)
Pinging @buchgr for merging part

@SrodriguezO
Copy link
Copy Markdown

@buchgr friendly ping : )

@SrodriguezO
Copy link
Copy Markdown

@buchgr can you please merge this when you have a minute? It was approved nearly a month ago and is just pending merge. Or maybe @ishikhman can merge it? We've been using this on our fork without issue, and we'd hate to see it rot.

@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch from 4385531 to c14deac Compare November 6, 2019 17:36
@borkaehw
Copy link
Copy Markdown
Author

borkaehw commented Nov 6, 2019

Since #9200 was merged. Force pushed to address conflicts.

@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch from c14deac to fa2fc22 Compare November 11, 2019 20:12
@borkaehw
Copy link
Copy Markdown
Author

Since #10200 was merged. Force pushed to address conflicts.

@kastiglione
Copy link
Copy Markdown
Contributor

Just wanted to chime in and say we've been waiting for many releases (since 0.17 or earlier), hoping for this fix soon. Another friendly ping @buchgr :)

@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch from fa2fc22 to 7fba47a Compare November 18, 2019 20:49
@borkaehw
Copy link
Copy Markdown
Author

Since #10233 was merged. Force pushed to address conflicts.

Also added tests for combined disk and grpc cache.

@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch from 7fba47a to 765e9ae Compare November 18, 2019 21:19
@borkaehw
Copy link
Copy Markdown
Author

Not sure why //src/test/shell/bazel:bazel_proto_library_test is failing. It doesn't seem to be related to my change.

… in combined blobstore

Fixes bazelbuild#8216

RELNOTES: If --noremote_upload_local_results and --disk_cache are
provided at the same time, Bazel will only upload local results to
disk cache but not remote cache. If --noremote_accept_cached and
--disk cache are provided at the same time, Bazel will only check
results from disk cache but not remote cache.
@borkaehw borkaehw force-pushed the fix-noremote-upload-local-results branch from 765e9ae to ef3bb98 Compare November 25, 2019 19:36
@borkaehw
Copy link
Copy Markdown
Author

Force push to fix //src/test/shell/bazel:bazel_proto_library_test.

@borkaehw
Copy link
Copy Markdown
Author

@buchgr friendly ping. Thanks for making grpc-disk combined cache #10233, it improves our developer workflow dramatically. This fix is the final piece blocking us from using official release of Bazel (we have carried this commit on custom Bazel for a long time). This simple PR will bring a huge value to the community. Would you be able to take a quick look?

@borkaehw
Copy link
Copy Markdown
Author

This PR breaks the CI on master, but not the CI on PRs. Here is the fix. #10462

bazel-io pushed a commit that referenced this pull request Dec 20, 2019
CI is broken because of #9338. It's called `processwrapper-sandbox` on MacOS. This should fix it.

Note: MacOS never failed on #9338 until the tests got merged. I wonder why we couldn't catch it earlier. How can we verify it won't fail again?

Closes #10462.

PiperOrigin-RevId: 286582888
@borkaehw borkaehw deleted the fix-noremote-upload-local-results branch December 20, 2019 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--disk_cache overrides --noremote_upload_local_results

8 participants