Skip to content

Request API in repository_ctx to download multiple files simultaneously #19674

@jacky8hyf

Description

@jacky8hyf

Description of the feature request:

tl;dr: Request an API that is similar to repository_ctx.download, but download multiple files simultaneously.

Which category does this issue belong to?

Core, External Dependency

What underlying problem are you trying to solve with this feature?

Example to reproduce the issue:

my_repo_rule.bzl

def _impl(repository_ctx):
    repository_ctx.download(
        url = "https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb",
        output = "chrome.deb",
    )
    repository_ctx.download(
        url = "https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm",
        output = "chrome.rpm",
    )

    repository_ctx.file("WORKSPACE", "")
    repository_ctx.file("BUILD", """exports_files(glob(["**"]))""")

my_repo_rule = repository_rule(
    implementation = _impl,
)

WORKSPACE

load(":my_repo_rule.bzl", "my_repo_rule")

my_repo_rule(
    name = "my_ext_repo",
)

BUILD

# empty file

Then run

bazel fetch @my_ext_repo//...

You can see that the two files are downloaded in serial, not in parallel.

To workaround this issue, one could define two repositories. For example:

my_repo_rule.bzl

def _impl(repository_ctx):
    repository_ctx.download(
        url = repository_ctx.attr.url,
        output = "myfile",
    )

    repository_ctx.file("WORKSPACE", "")
    repository_ctx.file("BUILD", """exports_files(glob(["**"]))""")

my_repo_rule = repository_rule(
    implementation = _impl,
    attrs = {"url": attr.string()}
)

WORKSPACE

load(":my_repo_rule.bzl", "my_repo_rule")

my_repo_rule(
    name = "chrome_deb",
    url = "https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb"
)

my_repo_rule(
    name = "chrome_rpm",
    url = "https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm"
)

However, when there are a lot of files, this approach requires creating a lot of repositories.

The benefit of downloading in parallel is:

  • If there are a lot of small files to download, the time savings are big, because most of the time spent on setting up the connection is parallelized. If there are a few big files to download, the time savings are small.
  • I could also do something like the following with good parallelization (pseudocode):
def _impl(repository_ctx):
    repository_ctx.download(<metadata file>)
    metadata = repository_ctx.read(<metadata file>)
    repository_ctx.download_multiple(metadata.urls)

Without the API to download multiple files, the above can only be done in WORKSPACE only.

# WORKSPACE
load(":download.bzl", "download_metadata", "download")
download_metadata(name = "metadata")

load("@metadata//:metadata.bzl", "metadata")
[
  download(name = elem.name, url = elem.url)
  for elem in meatadata
]

I can't hide all these in a single macro because the load statements interleave with the repository definitions.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 6.3.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

https://bazel.build/rules/lib/builtins/repository_ctx#download

Any other information, logs, or outputs that you want to share?

No response

Metadata

Metadata

Assignees

Labels

P2We'll consider working on this in future. (Assignee optional)help wantedSomeone outside the Bazel team could own thisteam-CoreSkyframe, bazel query, BEP, options parsing, bazelrcteam-ExternalDepsExternal dependency handling, remote repositiories, WORKSPACE file.type: feature request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions