Skip to content

[CLI] [API] Add HfApi.copy_files method to copy files remotely and update 'hf buckets cp' #3874

Merged
Wauplin merged 33 commits intomainfrom
feat/hfapi-copy-files
Apr 9, 2026
Merged

[CLI] [API] Add HfApi.copy_files method to copy files remotely and update 'hf buckets cp' #3874
Wauplin merged 33 commits intomainfrom
feat/hfapi-copy-files

Conversation

@Wauplin
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin commented Mar 2, 2026

Note: requires https://github.com/huggingface-internal/moon-landing/pull/17593 to be merged first. EDIT: merged!


This PR adds a new HfApi.copy_files API and extends hf buckets cp to support remote HF-handle copy workflows.

  • Copy from bucket to bucket (same bucket or different bucket)
  • Copy from repo (model/dataset/space) to bucket
  • Reject bucket->repo and repo->repo destinations (not supported yet)

If source is a file, copies it. If a directory, recursively copy files under source folder.

  • Repo source file with xet_hash: copied directly by hash
  • Repo source file without xet_hash (regular small file): download then re-upload
  • Bucket to bucket: always copied by hash

See https://github.com/huggingface-internal/moon-landing/pull/17593#issue-4201288199 PR description for working test.

Tested on https://huggingface.co/buckets/Wauplin/bucket-raw

hf buckets cp hf://models/openai-community/gpt2 hf://buckets/Wauplin/bucket-raw/models/gpt2
hf buckets cp hf://models/google/gemma-4-31B-it hf://buckets/Wauplin/bucket-raw/models/gemma4
hf buckets cp hf://models/zai-org/GLM-5.1 hf://buckets/Wauplin/bucket-raw/models/glm5.1

hf buckets cp hf://datasets/wikimedia/wikipedia hf://buckets/Wauplin/bucket-raw/datasets/wikipedia 
hf buckets cp hf://datasets/badlogicgames/pi-mono hf://buckets/Wauplin/bucket-raw/datasets/pi-mono-traces

hf buckets cp hf://buckets/julien-c/my-training-bucket/art hf://buckets/Wauplin/bucket-raw/buckets/art

Note

Medium Risk
Introduces new path parsing and copy semantics (including revision handling and mixed copy/download code paths) plus changes bucket batch operation payloads/chunking, which could impact correctness and performance of bucket file operations.

Overview
Adds HfApi.copy_files (exported as copy_files) to copy a file or folder from an hf:// bucket or repo (model/dataset/space, with optional @revision) into a bucket destination, using server-side hash copies when possible and falling back to download+reupload for non-Xet repo files.

Extends batch_bucket_files with a new copy operation type (NDJSON copyFile) and updates internal batching/chunk sizing and upload logic so copy-by-hash operations can be sent without uploading data.

Updates hf buckets cp to accept generic hf://... handles and enable remote-to-remote copies via api.copy_files, plus adds docs and tests covering bucket↔bucket, repo→bucket, and rejected bucket→repo cases.

Reviewed by Cursor Bugbot for commit 5508120. Bugbot is set up for automated code reviews on this repo. Configure here.

@bot-ci-comment
Copy link
Copy Markdown

bot-ci-comment Bot commented Mar 2, 2026

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin
Copy link
Copy Markdown
Contributor Author

Wauplin commented Mar 3, 2026

Current status: x-bucket copies do not work. Need an extra call to CAS to tell about the xethash being registered in destination bucket

@Wauplin Wauplin changed the title [API] Add HfApi.copy_files method to copy files remotely [CLI] [API] Add HfApi.copy_files method to copy files remotely and update 'hf buckets cp' Apr 7, 2026
@Wauplin Wauplin requested a review from hanouticelina April 7, 2026 14:44
@Wauplin Wauplin marked this pull request as ready for review April 7, 2026 14:44
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment on lines +12604 to +12605
else:
all_adds.append((_download_from_repo(file.path), target_path))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub-optimal: could be parallelize but that's not something we want to optimize for now



def _parse_hf_copy_handle(hf_handle: str) -> _BucketCopyHandle | _RepoCopyHandle:
# TODO: Harmonize hf:// parsing. See https://github.com/huggingface/huggingface_hub/issues/3971
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, #3971 is getting high in my priorities 🙈

Comment thread src/huggingface_hub/hf_api.py
Comment thread src/huggingface_hub/_buckets.py Outdated
Comment thread tests/test_buckets_cli.py Outdated
Copy link
Copy Markdown
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass!

Comment thread src/huggingface_hub/_buckets.py Outdated
Comment thread src/huggingface_hub/hf_api.py
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/cli/buckets.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread tests/test_buckets.py
Comment thread src/huggingface_hub/hf_api.py
Comment thread src/huggingface_hub/hf_api.py Outdated
@Wauplin Wauplin requested a review from hanouticelina April 8, 2026 13:08
Copy link
Copy Markdown
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9db2cec. Configure here.

Comment thread src/huggingface_hub/hf_api.py
Comment thread src/huggingface_hub/hf_api.py
@Wauplin Wauplin merged commit d82a7f7 into main Apr 9, 2026
13 of 21 checks passed
@Wauplin Wauplin deleted the feat/hfapi-copy-files branch April 9, 2026 09:52
@Wauplin Wauplin mentioned this pull request Apr 9, 2026

Notes:

- Bucket-to-repo copy is not supported.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants