Experimentally support remote persistent workers.#16362
Experimentally support remote persistent workers.#16362benjaminp wants to merge 1 commit intobazelbuild:masterfrom
Conversation
Add a new --experimental_remote_mark_tool_inputs flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value (this is just a boolean tag). Implements #10091. Change-Id: Iccb36081fee399855be7c487c2d4091cb36f8df3
| // directory, which differs from the behavior of both local and sandboxed execution. | ||
| SortedMap<PathFragment, ActionInput> outputDirMap = buildOutputDirMap(spawn); | ||
| if (remoteOptions.remoteMerkleTreeCache) { | ||
| if (remoteOptions.remoteMerkleTreeCache && toolSignature == null) { |
There was a problem hiding this comment.
So this means with workers we can't use the remote Merkle tree cache? Is that a problem performance-wise? Is that something that can/should be fixed?
There was a problem hiding this comment.
This change was developed before the merkle tree cache was merged. Performance benefits were realized for the intended usecases. So, when the merkle tree cache came along, it didn't seem necessary to integrate tightly with it. I think our benchmarks didn't benefit from the merkle tree cache anyway because, for example, Javac actions normally don't have massive input trees.
No doubt this could be improved but one has to do the thinking to make sure if inputs move in and out of the tool set, the merkle tree is invalidated properly.
There was a problem hiding this comment.
Ok, that can be a follow-up item. Just mark it as such here, otherwise this if is very confusing.
There was a problem hiding this comment.
I have now expanded the logic here and also added a comment.
c55a27b to
f58c5e7
Compare
| * A simple value class combining a hash of the tool inputs (and their digests) as well as a set | ||
| * of the relative paths of all tool inputs. | ||
| */ | ||
| private static final class ToolSignature { |
There was a problem hiding this comment.
Hi! I've been experimenting with remote persistent workers in a setting where the remote execution backend has zero knowledge of the tools. I ran into the issue of the backend not knowing how to actually run the tool. To overcome this, I added the actual command (with --persistent_worker) called by Bazel to ToolSignature. Implementation here: wiwa@07d40ce#diff-5556e28152f91eac5403663774979225b86e64133da164366b56f91d1daa641cR1548
I then pass it to the backend via another entry in platformAdditionalProperties called persistentWorkerCommand: wiwa@07d40ce#diff-5556e28152f91eac5403663774979225b86e64133da164366b56f91d1daa641cR487
Do you think this PR could also support this functionality? Or would it have to be a different change?
There was a problem hiding this comment.
I propose that we not risk delaying this change by complicating it further. Let's file additional issues or pull requests for enhancements.
There was a problem hiding this comment.
Yep, I agree. Was not sure if it would particularly delay it, but the answer seems like "yes it would".
There was a problem hiding this comment.
@wiwa That sounds dangerous. If your backend doesn't know if it's running workers or not, it could end up with a lot of sometimes large processes hanging around.
There was a problem hiding this comment.
It's not that the backend doesn't know there are workers. It's that it doesn't need to know how to start specific worker processes. The backend is agnostic to the action which spins up a worker. I don't believe that is possible without addtional modification beyond the change in this PR. This is because Bazel persistent workers have more info to spawn the process than what is provided by a persistentWorkerKey.
There was a problem hiding this comment.
Example: a Scalac action calling scalac.sh <various startup flags> --persistent_worker to initialize the worker process before passing the "actual" compile args <compilation flags> -cp <classpath> <files to compile>. The remote execution backend would need to know the inputs/hash of scalac.sh and also what those startup flags are.
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <[email protected]>
f58c5e7 to
bdac5e3
Compare
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <[email protected]> Closes bazelbuild#16362. PiperOrigin-RevId: 482433908 Change-Id: I2a80834731fd0169c08d4beea348f61a323ca028
Add a new `--experimental_remote_mark_tool_inputs` flag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request. This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x. We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag. Fixes bazelbuild#10091. Co-authored-by: Ulf Adams <[email protected]> Closes bazelbuild#16362. PiperOrigin-RevId: 482433908 Change-Id: I2a80834731fd0169c08d4beea348f61a323ca028
Add a new
--experimental_remote_mark_tool_inputsflag, which makes Bazel tag tool inputs when executing actions remotely, and also adds a tools input key to the platform proto sent as part of the remote execution request.This allows a remote execution system to implement persistent workers, i.e., to keep worker processes around and reuse them for subsequent actions. In a trivial example, this improves build performance by ~3x.
We use "persistentWorkerKey" for the platform property, with the value being a hash of the tool inputs, and "bazel_tool_input" as the node property name, with an empty string as value—this is just a boolean tag.
Fixes #10091.
Co-authored-by: Ulf Adams [email protected]