-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Description of the feature request:
tl;dr: Request an API that is similar to repository_ctx.download, but download multiple files simultaneously.
Which category does this issue belong to?
Core, External Dependency
What underlying problem are you trying to solve with this feature?
Example to reproduce the issue:
my_repo_rule.bzl
def _impl(repository_ctx):
repository_ctx.download(
url = "https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb",
output = "chrome.deb",
)
repository_ctx.download(
url = "https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm",
output = "chrome.rpm",
)
repository_ctx.file("WORKSPACE", "")
repository_ctx.file("BUILD", """exports_files(glob(["**"]))""")
my_repo_rule = repository_rule(
implementation = _impl,
)
WORKSPACE
load(":my_repo_rule.bzl", "my_repo_rule")
my_repo_rule(
name = "my_ext_repo",
)
BUILD
# empty file
Then run
bazel fetch @my_ext_repo//...
You can see that the two files are downloaded in serial, not in parallel.
To workaround this issue, one could define two repositories. For example:
my_repo_rule.bzl
def _impl(repository_ctx):
repository_ctx.download(
url = repository_ctx.attr.url,
output = "myfile",
)
repository_ctx.file("WORKSPACE", "")
repository_ctx.file("BUILD", """exports_files(glob(["**"]))""")
my_repo_rule = repository_rule(
implementation = _impl,
attrs = {"url": attr.string()}
)
WORKSPACE
load(":my_repo_rule.bzl", "my_repo_rule")
my_repo_rule(
name = "chrome_deb",
url = "https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb"
)
my_repo_rule(
name = "chrome_rpm",
url = "https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm"
)
However, when there are a lot of files, this approach requires creating a lot of repositories.
The benefit of downloading in parallel is:
- If there are a lot of small files to download, the time savings are big, because most of the time spent on setting up the connection is parallelized. If there are a few big files to download, the time savings are small.
- I could also do something like the following with good parallelization (pseudocode):
def _impl(repository_ctx):
repository_ctx.download(<metadata file>)
metadata = repository_ctx.read(<metadata file>)
repository_ctx.download_multiple(metadata.urls)
Without the API to download multiple files, the above can only be done in WORKSPACE only.
# WORKSPACE
load(":download.bzl", "download_metadata", "download")
download_metadata(name = "metadata")
load("@metadata//:metadata.bzl", "metadata")
[
download(name = elem.name, url = elem.url)
for elem in meatadata
]
I can't hide all these in a single macro because the load statements interleave with the repository definitions.
Which operating system are you running Bazel on?
Linux
What is the output of bazel info release?
release 6.3.2
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?
No response
Have you found anything relevant by searching the web?
Any other information, logs, or outputs that you want to share?
No response