Skip to content

[Fix] Match Unix cp -r nesting semantics in copy_files#4081

Merged
Wauplin merged 3 commits intomainfrom
fix/bucket-cp-folder-nesting-unix-semantics
Apr 10, 2026
Merged

[Fix] Match Unix cp -r nesting semantics in copy_files#4081
Wauplin merged 3 commits intomainfrom
fix/bucket-cp-folder-nesting-unix-semantics

Conversation

@Wauplin
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin commented Apr 10, 2026

Why? Let's make hf buckets cp as consistent as possible with unix cp. It's already almost the case (see table), this PR solves the few remaining issues + adds tests.


Summary

When copying a folder to a destination that already exists as a directory, Unix cp -r nests the source folder inside it. Our copy_files implementation was instead always copying the contents directly (merge/rename semantics), which is closer to rsync src/ dst/ than cp -r.

This PR introduces a destination_exists_as_directory flag (separate from destination_is_directory) so the folder-copy path can distinguish "destination confirmed to exist via API" from "user hinted directory with trailing /". Nesting only applies when the destination actually exists as a prefix.

cp vs hf buckets cp — full comparison

Scenario Unix cp result hf buckets cp result Notes
File → explicit name dst.txt dst.txt
cp f.txt new.txt new.txt new.txt
File → existing file Overwrites Overwrites
cp f.txt existing.txt existing.txt (overwritten) existing.txt (overwritten)
File → nonexistent path Creates file Creates file
cp f.txt newname newname newname
File → existing dir (no /) Copies into dir Copies into dir Detected via API lookup
cp f.txt dir dir/f.txt dir/f.txt
File → path with trailing / Copies into dir Copies into dir
cp f.txt dir/ dir/f.txt dir/f.txt
File → bucket root Copies into root Copies into root
cp f.txt / /f.txt f.txt (at root)
Folder → nonexistent dest Rename semantics Rename semantics
cp -r src newdst newdst/contents... newdst/contents...
Folder → existing dir (no /) Nests source Nests source Fixed in this PR (was: merge)
cp -r src dst dst/src/contents... dst/src/contents...
Folder → existing dir (trailing /) Nests source Nests source Fixed in this PR (was: merge)
cp -r src dst/ dst/src/contents... dst/src/contents...
Folder → nonexistent with / ERROR Rename semantics Can't error in object storage — no real dirs
cp -r src newdst/ error: no such dir newdst/contents... Reasonable divergence for object storage
Folder → bucket root Nests source Nests source Fixed in this PR (was: merge)
cp -r src / /src/contents... src/contents... (at root)
Trailing / on source No effect No effect Both strip/ignore it for dirs
cp -r src/ dst same as cp -r src dst same
Recursive flag Required (-r) Automatic Not applicable to object storage

Test plan

  • Updated test_copy_files_folder_to_existing_folder_dest — now expects nested paths
  • Added test_copy_files_folder_to_existing_folder_with_trailing_slash — existing dir + trailing /
  • Added test_copy_files_folder_to_nonexistent_dest_with_trailing_slash — rename semantics preserved
  • Added test_copy_files_folder_to_bucket_root — nesting at bucket root
  • Existing single-file tests unaffected (file→file, file→dir, file→nonexistent all unchanged)

🤖 Generated with Claude Code


Note

Medium Risk
Changes destination path resolution for folder copies, which can alter where files land for existing bucket prefixes (including root) and may break callers relying on the previous merge behavior.

Overview
HfApi.copy_files now distinguishes between destination hinted as a directory (trailing /) and destination confirmed to exist as a directory prefix via API checks. When copying a folder into an existing destination directory (including bucket root), it now nests the source folder under the destination to match Unix cp -r semantics instead of merging contents directly.

Tests were updated/added to cover nesting for existing destinations (with and without trailing /), preserved rename semantics for non-existent destinations, and the bucket-root folder copy case.

Reviewed by Cursor Bugbot for commit c4f0413. Bugbot is set up for automated code reviews on this repo. Configure here.

When copying a folder to an existing directory, Unix `cp -r` nests the
source folder inside it (e.g. `cp -r src dst` → `dst/src/...`). The
previous implementation always copied contents directly (→ `dst/...`),
which is rsync-like rather than cp-like.

Track `destination_exists_as_directory` separately from the
`destination_is_directory` flag (which also covers trailing-slash hints)
so that folder copies can distinguish "destination exists" from
"destination was requested as a directory". Nesting only applies when
the destination actually exists as a prefix in the bucket.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@bot-ci-comment
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Contributor Author

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2696ea3. Configure here.

Comment thread src/huggingface_hub/hf_api.py
@Wauplin Wauplin changed the title fix: match Unix cp -r nesting semantics in copy_files [Fix] Match Unix cp -r nesting semantics in copy_files Apr 10, 2026
Wauplin and others added 2 commits April 10, 2026 14:10
Copy link
Copy Markdown
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about sending you into such a rabbit hole. =)

Copy link
Copy Markdown
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Wauplin Wauplin merged commit 086b8cf into main Apr 10, 2026
19 of 21 checks passed
@Wauplin Wauplin deleted the fix/bucket-cp-folder-nesting-unix-semantics branch April 10, 2026 12:54
@huggingface-hub-bot
Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.11.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants