Comparing changes

Currently, the maximum number of downloaded files is fixed, regardless of the number of downloads currently in flight. However, as the number of downloads increases, a fixed size total could lead to waiting on individual segments that download out-of-order or don't have enough turnaround time to saturate the output. While writing to disk or the download itself often becomes the bottleneck before these effects, planned features such as streaming files and caching could be affected by this limit. The default formula for the download buffer size now is (2GB + 512MB * number of concurrent downloads) up to a maximum of 8GB (these are adjustable). This PR alleviates this by allocating an additional 512MB buffer allocation per file, prioritized to the specific download, releasing that capacity when the file finishes downloading. This is done using the AdjustableSemaphore class, first introduced for the concurrent scaling, which allows the number of total permits in a semaphore to be incremented or decremented; on decrement, permits are discarded upon return until the total permits is at the target number.

Introduces a client benchmark utility to track system resource usage (CPU, memory, disk I/O, and network I/O) of a process, so we don't need to write scripts to capture usage stats according to different OS standards. This becomes extremely helpful when I benchmark on Python notebook instances, e.g. Google Colab, where system monitor is not easily accessible or when running a separate monitor script is not easy. # Usage # Users can enable monitoring by setting `HF_XET_SYSTEM_MONITOR_ENABLED` to true, set usage sample interval using `HF_XET_SYSTEM_MONITOR_SAMPLE_INTERVAL`, this outputs metrics to the tracing stream at `INFO` level by default. In addition, these metrics can be redirected to a separate file by setting sample log path using `HF_XET_SYSTEM_MONITOR_LOG_PATH`. # Output # The stats are output in JSON format, which can be queried using tools like `jq`, e.g. 1. Trace of peak memory usage: `jq '.memory.peak_used_bytes' [HF_XET_SYSTEM_MONITOR_LOG_PATH]` 2. Trace of disk write speed: `jq '.disk.average_write_speed' [HF_XET_SYSTEM_MONITOR_LOG_PATH]` 3. Trace of network receive speed: `jq '.network.average_rx_speed' [HF_XET_SYSTEM_MONITOR_LOG_PATH]`

This PR adds an integrated API for streaming downloads, exposing a DownloadStream object that is integrated with the file reconstructor. It also uses the same memory management buffer limiting process to work with the stream object. It also introduces cancellation support to the FileReconstructor to ensure that tasks waiting on a long running download or semaphore wait don't cause things to hang when an error is reported or the user drops the stream.

## Summary - Fix command injection vulnerability in `.github/workflows/release.yml` (HackerOne #3581567, severity High 8.8) - `${{ github.event.inputs.tag }}` was interpolated directly in `run:` blocks, allowing arbitrary RCE via crafted tag input (e.g. `v0.1.0; id; cat /etc/passwd;#`) - Moved all 6 occurrences to `env:` variables so the value is passed as a shell environment variable instead of being interpolated into the script ## Jobs fixed - `linux` — "Update version in toml" step - `musllinux` — "Update version in toml" step - `windows` — "Update version in toml" step - `macos` — "Update version in toml" step - `sdist` — "Update version in toml" step - `github-release` — "Create GitHub Release" step (`gh release create`)

## Summary - Add optional `sha256s` keyword parameter to the Python-exposed `upload_files()` function - Forward it to `data_client::upload_async()` which already supports it ## Context ### Double computation today `huggingface_hub` computes SHA-256 on every file during `CommitOperationAdd.__post_init__()` for LFS batch negotiation, then `hf_xet` recomputes it internally because `upload_files()` doesn't accept pre-computed hashes. ### Performance impact This change eliminates the redundant computation entirely. ### Backward compatibility - `sha256s` is a keyword-only parameter with default `None` — no change for existing callers - `data_client::upload_async()` already accepts `sha256s: Option<Vec<String>>` since day one - When provided, `SingleFileCleaner` uses `ShaGenerator::ProvidedValue` and skips internal recomputation Companion PR: huggingface/huggingface_hub#3876

This PR introduces a new `xet_session` crate that provides a session-based hierarchical API: Users create a XetSession to manage runtime and configuration, then batch uploads into UploadCommit objects and downloads into DownloadGroup objects — each of which runs transfers in the background by the inner XetRuntime. All pub functions are exposed as sync functions - making them easy to use in other languages, e.g. Python, C, etc.

…er. (#680) This PR makes the use of the `cas` and `xorb` terms consistent. Previously, "cas" (for content addressed store) could simultaneously refer to either the remote server or the data bytes stored as a collection of chunks. After the renames in this PR, we consistently use `xorb` to refer to the data object and cas to refer to the remote server. This renames quite a few places; to aid in rebasing current work or updating downstream dependencies, this PR includes a file `API_UPDATES.md` that can be fed into an AI agent to quickly and accurately perform the renaming on any downstream dependencies.

Currently, the async stream logic silently swallows an UnexpectedEOF, treating it the same as an EOF. This is a bug; this PR fixes it to propagate UnexpectedEOF while handling correct EOF as the end of the stream.

…ccess, data dumps, etc. (#681) This PR adds interface functions to the LocalServer class that will allow it to become a full simulation environment for testing all the garbage collection stages.

…siliency. (#648) This PR replaces the previous collection of scripts around setting up docker containers with a much more nimble and lightweight set of rust scripts and a simple, reusable proxy that can limit bandwidth and congestion simulations. The previous scripts are rewritten to be more nimble and use more reusable components. New tools: - cas_client/src/simulation/network_simulation: A lightweight, in-process network congestion simulation proxy that lives between the LocalServer instance and the RemoteClient instance, allowing simulation tests to run on a network with realistic congestion conditions and a gated bandwidth. This can be controlled dynamically through a LocalTestServer instance. - simulation/: A new package for collecting simulation scripts and analyzing the results. To run the new simulation scripts for the adaptive concurrency on upload, compile in release mode and run one of the scripts in `simulation/src/adaptive_concurrency/scripts/`. Docker is no longer needed to run any of the simulations. The old `cas_client/tests/adaptive_concurrency/` paths were removed.

## Summary - Add `ShaGenerator::Skip` variant that skips SHA-256 computation entirely - `ShaGenerator::finalize()` now returns `Option<Sha256>` (None when skipped) - `SingleFileCleaner::new()` and `FileUploadSession::start_clean()` accept a `skip_sha256` boolean - When skipped, no `FileMetadataExt` is included in the shard ## Context Bucket uploads don't need SHA-256 in the shard metadata — the `sha_index` GSI is only used for LFS pointer resolution, which doesn't apply to buckets. Skipping SHA-256 for bucket uploads removes the main CPU bottleneck in the upload pipeline on non-SHA-NI instances. ## Alternative: dummy SHA-256 Instead of skipping entirely, the client could send a zeroed/dummy `FileMetadataExt`. The server would still store it but queries would never match. This avoids the server-side schema change (xetcas PR) but pollutes the GSI with dummy entries. Companion PRs: - xetcas: huggingface-internal/xetcas#498 (make `FileIdItem.sha256` optional server-side)

## Summary Fixes download stalls/deadlocks on large file reconstruction (reported on 48.5 GB GGUF files). The root cause is a circular dependency: the main reconstruction loop holds a buffer semaphore permit while blocking on CAS connection permit acquisition, and xorb write locks held during HTTP downloads cause CAS permit starvation. ### Changes 1. **Single-flight xorb downloads via `OnceCell`** (`xorb_block.rs`): replaces `RwLock<Option<...>>` with `tokio::sync::OnceCell`. Only one task per xorb block acquires a CAS permit and downloads the data; concurrent callers wait on the same result without acquiring permits or duplicating work. This eliminates duplicate downloads, prevents double-counted transfer progress, and avoids a failing duplicate from killing the reconstruction. 2. **Decouple CAS permit from buffer permit** (`file_term.rs`): the main loop no longer blocks on CAS permits while holding a buffer permit. The spawned download task delegates to `retrieve_data` which handles permit acquisition internally via the OnceCell single-flight. This breaks the circular dependency that causes stalls. 3. **Improve error propagation** (`sequential_writer.rs`): when the background writer channel closes, check `RunState` for the original error before returning a generic "channel closed" message. ### Root cause The reconstruction pipeline has three resource pools: buffer permits (bounded semaphore), CAS download permits (64 concurrent), and per-xorb write locks. Before this fix, the main loop would: 1. Acquire a **buffer permit** (blocking if buffer full) 2. Call `get_data_task()` which acquires a **CAS permit** (blocking if pool exhausted) 3. Inside `retrieve_data()`, hold a **write lock** during the entire HTTP download This creates two deadlock vectors: - **Buffer vs CAS**: buffer fills up with terms waiting for CAS permits, but CAS permits are held by tasks blocked behind xorb write locks, and the writer can't drain the buffer because it's waiting for those tasks - **CAS vs write lock**: multiple tasks sharing the same xorb each hold a CAS permit while blocked on the write lock, starving other xorbs of permits ## Reproduction Reliably reproducible with small buffer: ``` HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_SIZE=64mb \ HF_XET_RECONSTRUCTION_DOWNLOAD_BUFFER_LIMIT=64mb \ python3 -c "from huggingface_hub import hf_hub_download; hf_hub_download('unsloth/Qwen3-Coder-Next-GGUF', 'Qwen3-Coder-Next-Q4_K_M.gguf', local_dir='/tmp/test')" ``` - **Before fix**: stalls at ~3.4 GB, no progression (deadlock) - **After fix**: continuous progression, completes successfully With default buffer (2 GB), the stall is intermittent depending on network speed (consistently reproduced on slower connections).

Fixes [XET-885](https://linear.app/xet/issue/XET-885/investigate-unsloth-upload-failure-shard-upload-timeout-on-cas) ## Summary Shard uploads to CAS can take a long time due to server-side processing (DynamoDB writes scale with file entry count). The default `read_timeout(120s)` on the reqwest client kills these uploads. **Key insight:** reqwest's per-request `RequestBuilder::timeout()` does NOT override the client-level `read_timeout()` — they are independent mechanisms polled as separate futures. So the original approach of using per-request timeouts was ineffective. **Fix:** Create a dedicated `shard_upload_http_client` on `RemoteClient` with **no `read_timeout`**, built once at construction time and reused for all shard uploads. All other settings (connect timeout, pool config, auth middleware) are identical to the standard client. ## Changes ### `cas_client/src/http_client.rs` - Added `reqwest_client_no_read_timeout()` — creates a reqwest client with no `read_timeout` - Added `build_auth_http_client_no_read_timeout()` — public API wrapping it with middleware - 4 unit tests for the new builder ### `cas_client/src/remote_client.rs` - Added `shard_upload_http_client` field to `RemoteClient` (cfg'd out on wasm) - `upload_shard()` uses the pre-built no-timeout client instead of building one per request ### `cas_client/tests/test_shard_upload_timeout.rs` - Updated: slow server test now asserts **success** (shard uploads should wait as long as needed) ### `xet_config/src/groups/client.rs` - Removed `shard_read_timeout` config field (no longer needed) --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

This PR is a massive rearrangement of the code base into 5 packages intended for release on cargo. The directories and corresponding packages are: 1. xet_runtime/ — compiles into the xet-runtime package. Contains the runtime, config, and logging management. 2. xet_core_structures/ — compiles into the xet-core-structures package. Contains core data structures for hashing, shards, and xorbs as well as internal data structures that depend on these. 3. xet_client/ — compiles into the xet-client package, contains client code for remotely connecting to the Hugging Face servers. 4. xet_data/ — compiles into the xet-data package, contains the data processing pipeline: chunking/deduplication, file reconstruction, clean/smudge operations, and progress tracking. 5. xet_pkg/ — compiles into the hf-xet package, provides the top-level session-based API for file upload and download with user-facing error categorization. This is the primary package downstream dependencies would use. This also contains a single summary error type, XetError, that translates cleanly into python error types. In addition, the other tools are: - git_xet/ — the git_xet CLI binary crate (location preserved). - hf_xet/ -- the hf_xet python package (location preserved). - simulation/ — the simulation crate for upload scenario benchmarking. - wasm/ -- the wasm objects. The full description — and information for an AI agent to use to update downstream dependencies — is at api_changes/update_260309_package_restructure.md. Summary of moves: - xet_runtime: became xet_runtime::core inside xet_runtime/. - utils: became xet_runtime::utils inside xet_runtime/. - xet_config: became xet_runtime::config inside xet_runtime/. - xet_logging: became xet_runtime::logging inside xet_runtime/. - error_printer: became xet_runtime::error_printer inside xet_runtime/. - file_utils: became xet_runtime::file_utils inside xet_runtime/. - merklehash: became xet_core_structures::merklehash inside xet_core_structures/. - mdb_shard: became xet_core_structures::metadata_shard inside xet_core_structures/. - xorb_object: became xet_core_structures::xorb_object inside xet_core_structures/. - cas_client: became xet_client::cas_client inside xet_client/. - hub_client: became xet_client::hub_client inside xet_client/. - cas_types: became xet_client::cas_types inside xet_client/. - chunk_cache: became xet_client::chunk_cache inside xet_client/. - data: became xet_data::processing inside xet_data/. - deduplication: became xet_data::deduplication inside xet_data/. - file_reconstruction: became xet_data::file_reconstruction inside xet_data/. - progress_tracking: became xet_data::progress_tracking inside xet_data/. - xet_session: became xet::xet_session inside xet_pkg/. - Wasm packages (hf_xet_wasm, hf_xet_thin_wasm): moved from top-level into wasm/; internal imports updated, public APIs unchanged.

This PR creates a folder, api_changes, in which AI agents can record updates to the API surface that could affect downstream PRs and dependencies. This can be scanned by AI agents to reliably perform merges or to propagate changes. See api_changes/README.md for a description of how this should work.

…pload (#690) This PR updates the interface for retrieving per-task results after UploadCommit::commit() or DownloadGroup::finish(). The problem with the previous interface is that commit() and finish() return a vector of FileMetadata or DownloadResult, making it difficult for users to associate each result with a specific task. The new interface uses `task_id` as a strong binding bridge: ## Upload per-task result access patterns After commit() completes, there are two equivalent ways to retrieve a per-task FileMetadata result: 1. Lookup in the global result map: ``` let commit = session.new_upload_commit()?; let handle = commit.upload_from_path(src)?; let results = commit.commit()?; let result = results.get(&handle.task_id) ``` 2. Direct access from the handle: ``` let commit = session.new_upload_commit()?; let handle = commit.upload_from_path(src)?; commit.commit()?; // handle.result() is populated by commit() via the shared Arc. let result = handle.result() ``` ## Download per-task result access patterns The pattern is similar to the above. ## Why not put results in a vector in the same order as tasks are registered to the commit instance? After a commit instance is created, it can be cloned (since it is itself an Arc wrapping an internal struct) and sent to different threads. When multiple threads are registering tasks, there is no static registration order that a program can observe upfront.

## Summary Add `skip_sha256` and `sha256s` parameters to `upload_bytes()` Python binding for per-file SHA-256 policies: - `skip_sha256: bool = False` - Skip SHA-256 computation entirely (sets `Sha256Policy::Skip`) - `sha256s: Optional[List[str]] = None` - Provide pre-computed SHA-256 hashes (companion to existing parameter on `upload_files()`) - These parameters are mutually exclusive ## Changes **Python binding changes:** - Add `skip_sha256` + `sha256s` params to `upload_bytes()` / `upload_files()` - All policy conversion happens at Python boundary **Internal refactoring:** - Add `Clone`/`Copy` derives + `from_skip()`/`from_hex()` helpers to `Sha256Policy` - Update `upload_bytes_async`, `upload_async`, `clean_file` to use `Vec<Sha256Policy>` - Update all internal callers across `git_xet`, `xet_pkg`, migration tool, tests ## Motivation `huggingface_hub` already knows whether SHA-256 is required. This change enables skipping expensive computation when unnecessary, or passing pre-computed hashes for bulk operations. Companion to #678. --------- Co-authored-by: Wauplin <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Feb 27, 2026

Commits on Mar 2, 2026

Commits on Mar 3, 2026

Commits on Mar 4, 2026

Commits on Mar 5, 2026

Commits on Mar 9, 2026

Commits on Mar 10, 2026

Commits on Mar 11, 2026

Commits on Mar 12, 2026

This comparison is taking too long to generate.

Uh oh!