Skip to content

Duplicate file uploads over gRPC #12113

@ulfjack

Description

@ulfjack

Description of the problem / feature request:

Bazel does not deduplicate file uploads across actions, so it may attempt multiple uploads of the same content in parallel. If these are large files, then this causes significant network overhead.

For remote builds of TensorFlow over the public internet, this makes the build take a very long time.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

[genrule(
  name = "bar-%s" % i,
  srcs = ["input.txt"],
  outs = ["bar-%s.txt" % i],
  cmd = "cp $< $@",
) for i in range(0, 100)]

Enable a remote gRPC cache or executor.

What operating system are you running Bazel on?

Linux.

Have you found anything relevant by searching the web?

This was previously also reported here, but I could not find a matching bug report:
bazelbuild/remote-apis#131 (comment)

@EdSchouten

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3We're not considering working on this, but happy to review a PR. (No assignee)team-Remote-ExecIssues and PRs for the Execution (Remote) teamtype: bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions