Description of the feature request:
tl;dr: Request an API that is similar to repository_ctx.download, but download multiple files simultaneously.
Which category does this issue belong to?
Core, External Dependency
What underlying problem are you trying to solve with this feature?
Example to reproduce the issue:
my_repo_rule.bzl
def _impl(repository_ctx):
repository_ctx.download(
url = "https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb",
output = "chrome.deb",
)
repository_ctx.download(
url = "https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm",
output = "chrome.rpm",
)
repository_ctx.file("WORKSPACE", "")
repository_ctx.file("BUILD", """exports_files(glob(["**"]))""")
my_repo_rule = repository_rule(
implementation = _impl,
)
WORKSPACE
load(":my_repo_rule.bzl", "my_repo_rule")
my_repo_rule(
name = "my_ext_repo",
)
BUILD
Then run
bazel fetch @my_ext_repo//...
You can see that the two files are downloaded in serial, not in parallel.
To workaround this issue, one could define two repositories. For example:
my_repo_rule.bzl
def _impl(repository_ctx):
repository_ctx.download(
url = repository_ctx.attr.url,
output = "myfile",
)
repository_ctx.file("WORKSPACE", "")
repository_ctx.file("BUILD", """exports_files(glob(["**"]))""")
my_repo_rule = repository_rule(
implementation = _impl,
attrs = {"url": attr.string()}
)
WORKSPACE
load(":my_repo_rule.bzl", "my_repo_rule")
my_repo_rule(
name = "chrome_deb",
url = "https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb"
)
my_repo_rule(
name = "chrome_rpm",
url = "https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm"
)
However, when there are a lot of files, this approach requires creating a lot of repositories.
The benefit of downloading in parallel is:
- If there are a lot of small files to download, the time savings are big, because most of the time spent on setting up the connection is parallelized. If there are a few big files to download, the time savings are small.
- I could also do something like the following with good parallelization (pseudocode):
def _impl(repository_ctx):
repository_ctx.download(<metadata file>)
metadata = repository_ctx.read(<metadata file>)
repository_ctx.download_multiple(metadata.urls)
Without the API to download multiple files, the above can only be done in WORKSPACE only.
# WORKSPACE
load(":download.bzl", "download_metadata", "download")
download_metadata(name = "metadata")
load("@metadata//:metadata.bzl", "metadata")
[
download(name = elem.name, url = elem.url)
for elem in meatadata
]
I can't hide all these in a single macro because the load statements interleave with the repository definitions.
Which operating system are you running Bazel on?
Linux
What is the output of bazel info release?
release 6.3.2
If bazel info release returns development version or (@non-git), tell us how you built Bazel.
No response
What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?
No response
Have you found anything relevant by searching the web?
https://bazel.build/rules/lib/builtins/repository_ctx#download
Any other information, logs, or outputs that you want to share?
No response
Description of the feature request:
tl;dr: Request an API that is similar to repository_ctx.download, but download multiple files simultaneously.
Which category does this issue belong to?
Core, External Dependency
What underlying problem are you trying to solve with this feature?
Example to reproduce the issue:
my_repo_rule.bzl
WORKSPACE
BUILD
Then run
You can see that the two files are downloaded in serial, not in parallel.
To workaround this issue, one could define two repositories. For example:
my_repo_rule.bzl
WORKSPACE
However, when there are a lot of files, this approach requires creating a lot of repositories.
The benefit of downloading in parallel is:
Without the API to download multiple files, the above can only be done in WORKSPACE only.
I can't hide all these in a single macro because the load statements interleave with the repository definitions.
Which operating system are you running Bazel on?
Linux
What is the output of
bazel info release?release 6.3.2
If
bazel info releasereturnsdevelopment versionor(@non-git), tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD?No response
Have you found anything relevant by searching the web?
https://bazel.build/rules/lib/builtins/repository_ctx#download
Any other information, logs, or outputs that you want to share?
No response