-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Add BLAKE3 hasher to vfs #18784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BLAKE3 hasher to vfs #18784
Conversation
|
@coeuvre -- Could use your advice on how to fix:
Thanks! |
|
For 1, yes the test is to verify Bazel can be built without network. You need to add blake3 to @meteorcloudy what's your opinion on 2? |
| this.messageDigestPrototypeSupportsClone = supportsClone(messageDigestPrototype); | ||
| } | ||
|
|
||
| public static DigestHashFunction register( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot change how the registration works here because we rely on this internally. You could, however, glue BLAKE3 into this with following steps:
- Write a class that implements
java.security.MessageDigestforBLAKE3. This is the class that interact with JNI and call into c code. - Register the
MessageDigestfrom step 1 by implementing ajava.security.Providerand add it withjava.security.Security.addProvider(...). So that when you callMessageDigest.getInstance("BLAKE3"), you can get back the instance from the provider. - Adapt
DigestMessageforBLAKE3intoHashFunctionby calling MessageDigestHashFunction. - Create a
DigestHashFunctionby callingDigestHashFunction.register(...)using theHashFunctionfrom step 3.
I know this is verbose and not fun but we have to write it in this way.
|
For the Lines 276 to 281 in 75b0221
|
|
For "Clang on :ubuntu: 18.04 LTS (OpenJDK 11, gcc 7.5.0)", we do plan to phase out ubuntu 1804 soon. But we need to schedule the work and ideally we do that for Bazel 7 not in the middle of a LTS release since we promises backwards compatibility. So if there is other ways to workaround it, that will be preferred. |
a904ba6 to
8e26aa5
Compare
|
PTAL @coeuvre -- I've made the changes you suggested and made the implementation a MessageDigest as well as reverting the changes to Fingerprint etc. Thank you for the detailed pointers there. Re failing tests, I have merged bazelbuild/bazel-central-registry#698 to fix the bzlmod tests but now they are failing in a new way with: Any ideas on how to fix this? |
| public final class Blake3Provider extends Provider { | ||
| public Blake3Provider() { | ||
| super("BLAKE3Provider", "1.0", "A BLAKE3 digest provider"); | ||
| put("MessageDigest.BLAKE3", "com.google.devtools.build.lib.vfs.Blake3MessageDigest"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Use Blake3MessageDigest.class.getName() to make refactor easier in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| public static final DigestHashFunction SHA1 = register(Hashing.sha1(), "SHA-1", "SHA1"); | ||
| public static final DigestHashFunction SHA256 = register(Hashing.sha256(), "SHA-256", "SHA256"); | ||
| public static final DigestHashFunction BLAKE3 = register(new Blake3HashFunction(), "BLAKE3"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use new MessageDigestHashFunction("BLAKE3") (doc) instead of implementing HashFunction by yourself in Blake3HashFunction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, we don't want put BLAKE3 code in the common code path. Neither the registration here nor the provider below.
I suggest we have a utility class BazelHashFunctions which installs the provider and register the hash function, e.g.:
class BazelHashFunctions {
static {
addProvider(...);
}
public static final DigestHashFunction BLAKE3 = DigestHashFunction.register(...);
}
In a following PR when we want to create a file system based on BLAKE3, we could do that in BazelFileSystemModule with code like:
new UnixFileSystem(BazelHashFunctions.BLAKE3, ...);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to read between the lines here and figure out where you want stuff / how you want it to be injected -- let me know what I got wrong.
I moved blake3 stuff into a bazel subdir of vfs. The provider is statically registered when BazelHashFunctions is loaded by the classloader in src/main/java/com/google/devtools/build/lib/bazel/Bazel.java. Re the MessageDigestHashFunction -- addressed above, but that's not public afaict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we want the provider to be statically registered by the classloader when it is loading class BazelHashFunctions. But we don't want to have it implementing BlazeModule and put it in src/main/java/com/google/devtools/build/lib/bazel/Bazel.java.
What I proposed above is we could add the code
new UnixFileSystem(BazelHashFunctions.BLAKE3, ...);
to somewhere like here so that when we want to use BLAKE3, it registers it.
We don't need to do it in this CL as we haven't focused on integrating BLAKE3 into Bazel's VFS. Let's limit the scope of this CL to only adapt BLAKE3 as a DigestHashFunction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a little more context on what else is required to add the digest function to vfs? I'm a little unclear on that part and it will help me for the next cl. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think I overcomplicated that. The only necessary part is we want to register BLAKE3 for Bazel only. We can just force loading BazelHashFunctions by adding following code block in BazelFilesystemModule
class BazelFilesystemmodule {
static {
BazelHashFunctions.ensureRegistered();
}
}
where BazelHashFunctions.ensureRegistered is just an empty static function whose sole purpose is to let JVM load class BazelHashFunctions and run the static block there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is just what I needed. Done.
| import java.security.DigestException; | ||
| import java.security.MessageDigest; | ||
|
|
||
| public final class Blake3MessageDigest extends MessageDigest implements Hasher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use MessageDigestHashFunction as suggested below, you can remove implements Hasher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MessageDigestHashFunction is not public, and copying it in is prohibitive because it brings like 5 other classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed that. Can we then create a separate class to implement Hasher? I found it is confusing to combine these two especially when we have update, engineUpdate and share the underlying buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/main/java/com/google/devtools/build/lib/vfs/Blake3MessageDigest.java
Outdated
Show resolved
Hide resolved
07fbd47 to
c92ed72
Compare
| } | ||
|
|
||
| public void engineUpdate(byte[] data, int offset, int length) { | ||
| while (length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One optimization here is we could avoid copying if the size of data is larger than ONESHOT_THRESHOLD. But this is totally optional and we could leave that for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was trying to keep this simple for now -- enough complexity already :) Agree on deferring this for a followup (where we measure perf).
| } | ||
|
|
||
| public void engineUpdate(byte b) { | ||
| engineUpdate(new byte[] {b}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: avoid creating a new object, put the byte into buffer directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
|
||
| @CanIgnoreReturnValue | ||
| public Hasher putBytes(ByteBuffer b) { | ||
| buffer = b; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to copy/use the argument b now instead of saving it for use later. I believe the contract of the interface is the caller is free to use the passed buffer after the function is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right you are.
339d81d to
3d6b67e
Compare
coeuvre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this PR is pretty good now! Just a few more comments and we are ready to go!
src/main/native/BUILD
Outdated
| cc_binary( | ||
| name = "libunix_jni.so", | ||
| srcs = [ | ||
| "blake3_jni.cc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this line since blake3_jni is already in the deps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| java_test( | ||
| name = "BazelTests", | ||
| size = "small", | ||
| tags = ["no_windows"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see BALKE3 code can be compiled on Windows. Are there any blockers for running the tests on windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| "c/blake3_avx512_x86-64_unix.S", | ||
| # Disable to appease bazel-ci which uses ubuntu-18 (EOL) and GCC 7 | ||
| # lacking the headers to compile AVX512. | ||
| # "c/blake3_avx512_x86-64_unix.S", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@meteorcloudy FYI, this is an unfortunate consequence of the old versions running in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: #18784 (comment)
src/main/native/blake3_jni.cc
Outdated
| return env->ReleaseByteArrayElements(array, addr, 0); | ||
| } | ||
|
|
||
| extern "C" JNIEXPORT long JNICALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns long while the corresponding Java method returns int. It should also the JNIEnv * and jobject parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| } | ||
|
|
||
| @Override | ||
| protected void finalize() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That usage is fine as it comes after the loadJni call. The failing test bootstraps Bazel and may not honor BUILD files, maybe the new JNI sources need to be listed somewhere else?
81ded64 to
890327f
Compare
|
@coeuvre I believe this is ready for another look / attempt at merging. Thanks |
|
Thanks, I am importing now! |
|
Everything went well. Currently waiting for internal review. |
@coeuvre sweet, thanks! Btw did you see my comment on the gerrit repo? https://bazel-review.googlesource.com/c/bazel/+/225070 Hoping we can make that last minute change before it goes in. Thanks again! |
|
Sorry, I didn't see that because there wasn't any notification :(. Np, I can make the change now. |
|
Thanks @coeuvre! |
|
@coeuvre thank you again for all your help shepherding this through the process |
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in bazelbuild#18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in bazelbuild#18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in #18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in bazelbuild#18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in #18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc)
Currently, it is included in runtime library with glob which is shared between bazel and blaze. Moving it to bazel package to prevent from packaging it when building blaze. An upcoming change (in #18784) will package BLAKE3 to this module and we don't want it in blaze. PiperOrigin-RevId: 547516042 Change-Id: I673fa3d6824d56c85e85b02d13b4eb56571edda9 (cherry picked from commit 6d6bbdc) Co-authored-by: Googler <[email protected]>
|
The changes in this PR have been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc. |
Add BLAKE3 hasher to vfs
This PR adds the Blake3Hasher and Blake3HashFunction classes to vfs and
makes them available under the flag --digest_function=BLAKE3.
I spent a little time trying to optimize performance more here, and am happy to
chat about different approaches (ByteBuffer vs byte[] vs unsafe vs mmap). In the
end this CL implements something pretty conservative, but there is room to get
more performance out of this with a different approach.
This is a partial commit for #18658.