-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Description of the feature request:
Create a separate input map and merkle tree for each tool used in an action, then combine those to calculate the final action digest. Each tool's input map and merkle tree should be reused between actions that depend on the same tool.
Feature requests: what underlying problem are you trying to solve with this feature?
These two function calls cause remote caching to be really slow for actions with lots of inputs. Having a way to re-use the parts that don't change between multiple actions that use the same tool could make these significantly faster.
Tools written in JavaScript will often have a high number of runfiles (>10,000) because they depend on the node_modules folder. This is also sometimes true of Python tools.
Have you found anything relevant by searching the web?
- remote/performance: don't construct a merkle tree for remote caching #4839 from 2018 is a different option to potentially speed up remote caching, but looks like it was decided that it wouldn't offer much benefit.
- This bazel-discuss thread is what prompted me to look into the code, and see if/how this could be improved.
Any other information, logs, or outputs that you want to share?
In our specific case, we have an action per source file that uses a nodejs_binary tool. So we have 15,000 actions with the same ~10,000 runfiles for the tool and 1 unique input file. Each of these takes ~200ms to calculate the cache key, so that's 50 CPU-minutes of work just to check if the actions are cached.
