Skip to content

Implement combined disk and HTTP cache#7512

Closed
aherrmann wants to merge 5 commits intobazelbuild:masterfrom
aherrmann:combined-cache-pr
Closed

Implement combined disk and HTTP cache#7512
aherrmann wants to merge 5 commits intobazelbuild:masterfrom
aherrmann:combined-cache-pr

Conversation

@aherrmann
Copy link
Copy Markdown
Contributor

This PR adds the ability to enable disk and HTTP cache simultaneously. Specifically, if you set

build --remote_http_cache=http://some.remote.cache
build --disk_cache=/some/disk/cache

Then Bazel will look for cached items in the disk cache first. If an item is not found in the disk cache, it will be looked up in the HTTP cache. If it is found there, it will be copied into the disk cache. On put, Bazel will store items in both the disk and the HTTP cache.

We tested this change on a relatively large internal project and found that a fully cached, clean build was 5-10 times faster with the mixed cache (i.e. reading from disk cache) than with a pure remote cache, depending on the remote cache location.

cc @dajmaki @francesco-da

@jin jin added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Feb 22, 2019
@jin
Copy link
Copy Markdown
Member

jin commented Feb 22, 2019

Additionally assigning @ola-rozenfeld because @buchgr is unavailable.

@aherrmann-da
Copy link
Copy Markdown

CI fails on Ubuntu 14.04 with

ERRO[0559] error waiting for container: unexpected EOF
🚨 Error: The command exited with status 125

All other steps pass. I don't know, but this looks unrelated to this PR to me.

@googlebot
Copy link
Copy Markdown

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

// Write a temporary file first, and then rename, to avoid data corruption in case of a crash.
Path temp = toPath(UUID.randomUUID().toString());

try (OutputStream tempOut = temp.getOutputStream()) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using try-with-resources here caused tempOut to be closed to early occasionally leading to StreamClosed errors. Thanks @moritzkiefer-da for catching that!

@aherrmann
Copy link
Copy Markdown
Contributor Author

@googlebot recheck CLA

@googlebot
Copy link
Copy Markdown

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@moritzkiefer-da
Copy link
Copy Markdown

I agree to my commits being contributed 👍

Copy link
Copy Markdown
Contributor

@buchgr buchgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

thanks for the PR! I think the overall idea is sound but I think it's more future proof to implement it slightly differently. Instead of having CombinedDiskHttpBlobStore extend from DiskBlobStore you should create a generic blobstore implementation that uses composition and that can check a list of blob stores in sequence. that way the code can be used more widely, as for example we would also like to use this code with remote execution and the gRPC based --remote_cache flag.

Also please add some unit and integration tests.

@francesco-da
Copy link
Copy Markdown

@buchgr we can't treat the disk store too opaquely because we want to have access to the file itself to be able to stream to it and then from (or the reverse) it efficiently without intermediate buffers. so we extended it to reuse the functions getting the file names and similar.

i think if we were to treat it opaquely we'd have to tweak some of the types in SimpleBlobStore a bit.

Copy link
Copy Markdown
Contributor

@buchgr buchgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair points. overall looks good then. can add some tests please?

@@ -0,0 +1,148 @@
// Copyright 2017 The Bazel Authors. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019


@Override
public ListenableFuture<Boolean> get(String key, OutputStream out) {
boolean use_bsDisk = super.containsKey(key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foundOnDisk?


public CombinedDiskHttpBlobStore(Path root, SimpleBlobStore bsHttp) {
super(root);
this.bsHttp = bsHttp;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preconditions.checkNotNull


import com.google.devtools.build.lib.vfs.Path;

/** A {@link SimpleBlobStore} implementation combining two blob stores.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line break after **

@googlebot
Copy link
Copy Markdown

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@aherrmann
Copy link
Copy Markdown
Contributor Author

@buchgr Thanks! I've addressed your review comments.

can add some tests please?

Yes, I'm happy to add some tests. Though, I'm afraid I'll need some pointers. I tried adding two simple test cases, that check whether put puts data into both caches, and whether get copies from http cache to disk cache. I've checked the existing test suite for examples for blob store tests. Unfortunately, I couldn't find a test case for the disk cache. However, I think I figured out how to use InMemoryFileSystem to setup a disk cache for testing. But, I also couldn't find an example of a test case that tests a working http cache. The existing test cases only seem to operate against a dummy http server that doesn't actually store or read cached items. I'm not familiar with Bazel's test suite. So, it's perfectly possible that I overlooked something obvious. Could you point me to an example of how to test the http cache?

@googlebot
Copy link
Copy Markdown

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

ℹ️ Googlers: Go here for more info.

@buchgr
Copy link
Copy Markdown
Contributor

buchgr commented Feb 28, 2019

Unfortunately, I couldn't find a test case for the disk cache.

ouch :-). it's time to add some then.

However, I think I figured out how to use InMemoryFileSystem to setup a disk cache for testing.

Great. Yep, that's what to use for testing.

But, I also couldn't find an example of a test case that tests a working http cache. The existing test cases only seem to operate against a dummy http server that doesn't actually store or read cached items.

Correct. It's a bit of a mess (it's an ongoing cleanup effort). The unit tests currently mostly test error scenarios. We do have integration tests that provide a working http cache here: https://source.bazel.build/bazel/+/master:src/test/shell/bazel/remote/remote_execution_http_test.sh?q=remote_execution_http_test.sh i.e. look at test_cc_binary_http_cache.

Hope that helps.

@aherrmann
Copy link
Copy Markdown
Contributor Author

@buchgr Thank you, yes, that helps a lot.

I had been looking in the wrong place. I found some integration tests for the disk cache in src/test/shell/bazel/disk_cache_test.sh. I used those and the HTTP cache tests that you pointed to as a guide and added tests for the combined cache.

mkdir -p a
cat > a/BUILD <<EOF
package(default_visibility = ["//visibility:public"])
cc_binary(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are using cc_binary through the remote execution tests, but we are currently working towards removing it completely where not necessary and using a faster genrule (i.e one that writes hello world to some file) instead in order to speed up our tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've changed the test case to use a genrule instead.

cp -f bazel-bin/a/test ${TEST_TMPDIR}/test_expected

# Fetch from disk cache
bazel clean --expunge
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--expunge shouldn't be necessary here. we are trying to avoid --expunge in tests in order to make them run faster.

|| fail "Disk cache generated different result"

# Fetch from http cache
bazel clean --expunge
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

mkdir $cache

# Copy from http cache to disk cache
bazel clean --expunge
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

|| fail "HTTP cache generated different result"

# Fetch from disk cache
bazel clean --expunge
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

bazel clean --expunge
bazel build $disk_flags //a:test &> $TEST_log \
|| fail "Failed to fetch //a:test from disk cache"
expect_log "remote cache hit"
Copy link
Copy Markdown
Contributor

@buchgr buchgr Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"1 remote cache hit"? same below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is more concrete. I've changed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: no team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants