Skip to content

Add strip_components to extract/download_and_extract `http_arch…#29281

Closed
willstranton wants to merge 3 commits intobazelbuild:masterfrom
willstranton:strip
Closed

Add strip_components to extract/download_and_extract `http_arch…#29281
willstranton wants to merge 3 commits intobazelbuild:masterfrom
willstranton:strip

Conversation

@willstranton
Copy link
Copy Markdown
Contributor

Add strip_components to extract/download_and_extract http_archive

Description

The strip_components attribute functions similar to tar --strip-components:

Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing strip_prefix attribute, which required knowing the exact prefix to be stripped. Only one of the two attributes (strip_prefix, strip_components) can be set at one time.

Motivation

See #28879

Build API Changes

  1. Has this been discussed in a design doc or issue? (Please link it)

See #28879

  1. Is the change backward compatible?

Yes

  1. If it's a breaking change, what is the migration plan?

N/A - this is not a breaking change.

Checklist

  • I have added tests for the new use cases (if any).
  • I have updated the documentation (if applicable).

Release Notes

RELNOTES[NEW]: Adds the strip_components attribute to extract/download_and_extract/http_archive to allow stripping of path components when extracting files.

@willstranton willstranton force-pushed the strip branch 2 times, most recently from da36a7a to 613bd88 Compare April 13, 2026 22:11
@willstranton willstranton marked this pull request as ready for review April 13, 2026 22:12
@github-actions github-actions Bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Core Skyframe, bazel query, BEP, options parsing, bazelrc awaiting-review PR is awaiting review from an assigned reviewer labels Apr 13, 2026
@meteorcloudy
Copy link
Copy Markdown
Member

which required knowing the exact prefix to be stripped.

If the source archive URL is deterministic, the exact prefix should be known?

@willstranton
Copy link
Copy Markdown
Contributor Author

If the source archive URL is deterministic, the exact prefix should be known?

Yes, that's true, but it's inconvenient to have to examine an archive to determine that exact prefix. This pull request is a "quality of life" improvement. As you point out, it's not a "must have".

Summarizing from the community:

  1. Copying the inconvenience expressed by the original issue filer in http_archive (also repository_ctx.extract) strip_components #28879 and why it's useful to have:

Archives often have a containing directories.

Sometimes, this is long or not easily memorable -- a version number, or a commit hash
Sometimes, this is not readily known. E.g. npm packages usually use a package/ prefix, but not always.
Usually, users don't actually care what the leading component is, they just want to remove it.
...
This feature is in both BSD and GNU tar; it's very useful.
While no mentioned in my original comment, it would also be very useful for archive_override (bzlmod).

I remember having to update dependencies manually before BCR. You had to update the tar archive AND the prefix that was stripped.

  1. Feature request: download_and_extract(strip_prefix="*") #13960 is an earlier request from 2021 that expresses similar friction.

When first adding a http_archive (or alternative) to your workspace, it's easy enough to find what the top level directory is called... but with many archives it requires a bit more effort...
...with dependencies that change... this can get very tiresome....
My particular use case is a custom build definition that provides a simpler interface to private repositories... I don't know of any justification for requiring strip_prefix to be specified manually.

  1. Issue 28879 has at least 2 members commenting on/in agreement with this proposal. With me being the author of this pull request, that makes 3. The second issue 13960 has two members commenting as well. So 5? people who want this solved somehow? I'll admit that counting users can be disingenuous since they could all be from the same company/friends rallying each other on. I have no relation to any of folks mentioned.

@meteorcloudy
Copy link
Copy Markdown
Member

OK, thanks for the context! If we do this, we should also backport this to Bazel 8 & 9, so that modules can keep the compatibility with multiple LTS releases when using this feature.

Comment thread tools/build_defs/repo/http.bzl Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new strip_components integer attribute/parameter (similar to tar --strip-components) to http_archive, repository_ctx.download_and_extract, and repository_ctx.extract, enabling prefix stripping without knowing an exact directory name.

Changes:

  • Introduces strip_components plumbing from Starlark (http_archive, download_and_extract, extract) down to the Java decompressor layer.
  • Implements component stripping during extraction for .zip, .7z, and tar-based archives.
  • Adds/updates integration + unit tests covering component stripping and rename-ordering behavior.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tools/build_defs/repo/http.bzl Adds strip_components attr, enforces mutual exclusivity with strip_prefix, passes through to download_and_extract.
src/main/java/com/google/devtools/build/lib/vfs/PathFragment.java Adds PathFragment.stripComponents(int) utility used by decompressors.
src/main/java/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkBaseExternalContext.java Adds strip_components params to download_and_extract/extract and wires into decompression.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/DecompressorDescriptor.java Adds stripComponents field + builder validation for mutual exclusivity with prefix.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/ZipDecompressor.java Applies component stripping to zip entry paths before extraction.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressor.java Applies component stripping to 7z entry paths before extraction.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/CompressedTarFunction.java Applies component stripping to tar entry paths before extraction.
src/main/java/com/google/devtools/build/lib/bazel/repository/decompressor/CompressedFunction.java Updates docs to note stripComponents is ignored for single-file compressor formats.
src/test/shell/bazel/external_integration_test.sh Adds http_archive integration coverage for strip_components (tar/zip + add_prefix).
src/test/java/com/google/devtools/build/lib/vfs/PathFragmentTest.java Adds unit tests for PathFragment.stripComponents.
src/test/java/com/google/devtools/build/lib/bazel/repository/starlark/StarlarkBaseExternalContextTest.java Updates test calls for new downloadAndExtract signature.
src/test/java/com/google/devtools/build/lib/bazel/repository/decompressor/ZipDecompressorTest.java Adds zip decompression tests for strip_components (+ rename ordering).
src/test/java/com/google/devtools/build/lib/bazel/repository/decompressor/SevenZDecompressorTest.java Adds 7z decompression tests for strip_components (+ rename ordering + strip-all).
src/test/java/com/google/devtools/build/lib/bazel/repository/decompressor/CompressedTarFunctionTest.java Adds tar.gz decompression tests for strip_components (+ rename ordering).
src/test/tools/bzlmod/MODULE.bazel.lock Updates lockfile digests due to test/module changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/java/com/google/devtools/build/lib/vfs/PathFragment.java Outdated
willstranton added a commit to willstranton/bazel that referenced this pull request Apr 16, 2026
@meteorcloudy
Copy link
Copy Markdown
Member

Thanks, please run Please run "bazel run //src/test/tools/bzlmod:update_default_lock_file" to address CI failure

@meteorcloudy
Copy link
Copy Markdown
Member

Let me know this is fixed, I will add the import label

…ive`

The `strip_components` attribute functions similar to tar --strip-components:

> Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing `strip_prefix` attribute, which required
knowing the exact prefix to be stripped. Only one of the two attributes
(`strip_prefix`, `strip_components`) can be set at one time.

Fixes bazelbuild#28879

RELNOTES[NEW]: Adds the `strip_components` attribute to `extract`/`download_and_extract`/`http_archive` to allow stripping of path components when extracting files.
@willstranton
Copy link
Copy Markdown
Contributor Author

Let me know this is fixed, I will add the import label

CI now passing.

@meteorcloudy meteorcloudy added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Apr 17, 2026
@meteorcloudy
Copy link
Copy Markdown
Member

@bazel-io fork 8.7.0

@meteorcloudy
Copy link
Copy Markdown
Member

@bazel-io fork 9.1.0

deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 17, 2026
This PR updates the documentation for `http_archive` to include the new `strip_components` attribute and clarifies the mutual exclusivity between `strip_prefix` and `strip_components`.

Fixes bazelbuild#29281
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 17, 2026
This PR updates the documentation for `http_archive` to include the new `strip_components` attribute and clarifies the mutual exclusivity between `strip_prefix` and `strip_components`.

Fixes bazelbuild#29281
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 17, 2026
This PR updates the documentation for the `http_archive` rule to include the newly added `strip_components` attribute. This attribute allows users to strip a specified number of leading path components from extracted files, offering an alternative to `strip_prefix`.

Additionally, the documentation for `strip_prefix` has been updated to clarify that only one of `strip_prefix` or `strip_components` can be used.

Original PR: bazelbuild#29281
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 17, 2026
This PR documents the new `strip_components` attribute for `http_archive` and the `strip_components` parameter for `ctx.download_and_extract` and `ctx.extract`. It also clarifies that `strip_prefix` and `strip_components` are mutually exclusive.

This documentation is sourced from the code changes in bazelbuild#29281.
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 17, 2026
This PR documents the new `strip_components` attribute for `http_archive` and the `strip_components` parameter for `ctx.download_and_extract` and `ctx.extract`. It also clarifies that `strip_prefix` and `strip_components` are mutually exclusive.

This documentation is sourced from the code changes in bazelbuild#29281.
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 18, 2026
… .extract (placeholder)

This PR introduces `strip_components` to `repository_ctx.download_and_extract` and `repository_ctx.extract` functions, allowing users to specify the number of leading path components to strip during extraction. This new attribute is mutually exclusive with `strip_prefix`. 

Original PR: bazelbuild#29281

**Note**: Due to persistent issues in programmatically identifying the correct documentation files within the Bazel repository (404 errors on multiple plausible paths, search API rate limits, and incomplete `list_docs_in_repo` results), the changes have been applied to placeholder files. A manual review is required to integrate this content into the definitive documentation for `repository_ctx` methods.
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 18, 2026
This PR introduces the `strip_components` attribute for `http_archive` and similar repository rules, allowing users to strip a specified number of leading path segments from extracted archives. It also clarifies that `strip_components` and `strip_prefix` are mutually exclusive.

See original PR: bazelbuild#29281
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 18, 2026
This PR adds documentation for the new `strip_components` attribute for `repository_ctx.download_and_extract` and `repository_ctx.extract`, introduced in bazelbuild#29281.

Since the original documentation file for `repository_ctx` could not be located, this PR creates new, minimal documentation files at the location suggested by broken links in the existing documentation.
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 18, 2026
…ion methods

This PR updates the documentation for repository rules to reflect the addition of the `strip_components` attribute to `http_archive`, `repository_ctx.extract()`, and `repository_ctx.download_and_extract()`. The `http_archive` example now includes `strip_components`, and a note has been added to the `repository_ctx` section clarifying the use of `strip_components` and its mutual exclusivity with `strip_prefix` for extraction functions. This update corresponds to the changes in bazelbuild#29281.
deepalak56 added a commit to deepalak56/bazel_deepa that referenced this pull request Apr 19, 2026
This PR updates the documentation for `http_archive` to include the new `strip_components` attribute, as introduced in bazelbuild#29281. It also clarifies that `strip_components` and `strip_prefix` are mutually exclusive.
@github-actions github-actions Bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Apr 21, 2026
@meteorcloudy
Copy link
Copy Markdown
Member

@willstranton Can you look into backporting this to 9.x and perhaps 8.x? The auto cherry-pick process failed #29323 (comment)

willstranton added a commit to willstranton/bazel that referenced this pull request Apr 22, 2026
bazelbuild#29281)

Add `strip_components` to `extract`/`download_and_extract` `http_archive`

### Description

The `strip_components` attribute functions similar to `tar --strip-components`:

> Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing `strip_prefix` attribute, which required knowing the exact prefix to be stripped. Only one of the two attributes (`strip_prefix`, `strip_components`) can be set at one time.

### Motivation
See bazelbuild#28879

### Build API Changes

> 1. Has this been discussed in a design doc or issue? (Please link it)

See bazelbuild#28879

> 2. Is the change backward compatible?

Yes

> 3. If it's a breaking change, what is the migration plan?

N/A - this is not a breaking change.

### Checklist

- [X] I have added tests for the new use cases (if any).
- [X] I have updated the documentation (if applicable).

### Release Notes

RELNOTES[NEW]: Adds the `strip_components` attribute to `extract`/`download_and_extract`/`http_archive` to allow stripping of path components when extracting files.

Closes bazelbuild#29281

PiperOrigin-RevId: 902961227
Change-Id: I3fda77ec42c3d052f6655e42c8b57ec27667c758
@willstranton
Copy link
Copy Markdown
Contributor Author

8.7.0: #29367
9.2.0: #29369

fmeum pushed a commit to fmeum/bazel that referenced this pull request Apr 24, 2026
…ttp_arch… (bazelbuild#29367)

…… (bazelbuild#29281)

Add `strip_components` to `extract`/`download_and_extract`
`http_archive`

### Description

The `strip_components` attribute functions similar to `tar
--strip-components`:

> Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing `strip_prefix` attribute, which
required knowing the exact prefix to be stripped. Only one of the two
attributes (`strip_prefix`, `strip_components`) can be set at one time.

### Motivation
See bazelbuild#28879

### Build API Changes

> 1. Has this been discussed in a design doc or issue? (Please link it)

See bazelbuild#28879

> 2. Is the change backward compatible?

Yes

> 3. If it's a breaking change, what is the migration plan?

N/A - this is not a breaking change.

### Checklist

- [X] I have added tests for the new use cases (if any).
- [X] I have updated the documentation (if applicable).

### Release Notes

RELNOTES[NEW]: Adds the `strip_components` attribute to
`extract`/`download_and_extract`/`http_archive` to allow stripping of
path components when extracting files.

Closes bazelbuild#29281.

PiperOrigin-RevId: 902961227
Change-Id: I3fda77ec42c3d052f6655e42c8b57ec27667c758
bazel-io pushed a commit to bazel-io/bazel that referenced this pull request Apr 27, 2026
…ttp_arch… (bazelbuild#29369)

…… (bazelbuild#29281)

Add `strip_components` to `extract`/`download_and_extract`
`http_archive`

### Description

The `strip_components` attribute functions similar to `tar
--strip-components`:

> Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing `strip_prefix` attribute, which
required knowing the exact prefix to be stripped. Only one of the two
attributes (`strip_prefix`, `strip_components`) can be set at one time.

### Motivation
See bazelbuild#28879

### Build API Changes

> 1. Has this been discussed in a design doc or issue? (Please link it)

See bazelbuild#28879

> 2. Is the change backward compatible?

Yes

> 3. If it's a breaking change, what is the migration plan?

N/A - this is not a breaking change.

### Checklist

- [X] I have added tests for the new use cases (if any).
- [X] I have updated the documentation (if applicable).

### Release Notes

RELNOTES[NEW]: Adds the `strip_components` attribute to
`extract`/`download_and_extract`/`http_archive` to allow stripping of
path components when extracting files.

Closes bazelbuild#29281.

PiperOrigin-RevId: 902961227
Change-Id: I3fda77ec42c3d052f6655e42c8b57ec27667c758
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants