A Swift implementation of the Xet protocol for downloading files from Hugging Face's content-addressable storage (CAS).
Xet is a storage layer that provides efficient file transfer through content-defined chunking, deduplication, and compression. This package implements the download path, enabling Swift applications to fetch files from Hugging Face Hub repositories that use Xet storage.
- Swift 6.0+ / Xcode 16+
- macOS 13+ / iOS 15+ / tvOS 15+ / watchOS 8+ / visionOS / Linux
Add the following to your Package.swift file:
dependencies: [
.package(url: "https://github.com/huggingface/swift-xet.git", from: "0.2.0")
]Then add the dependency to your target:
.target(
name: "YourTarget",
dependencies: [
.product(name: "Xet", package: "swift-xet")
]
)To download a file, you need:
- A file ID: the 64-character hex hash from the
X-Xet-Hashresponse header - A refresh URL: the Hub endpoint for obtaining CAS access tokens
import Xet
try await Xet.withDownloader(
refreshURL: refreshURL,
hubToken: "hf_..." // optional, required for private repos
) { downloader in
// Download to memory
let data = try await downloader.data(for: fileID)
// Download to disk
try await downloader.download(fileID, to: destinationURL)
}Both methods support partial downloads via the byteRange parameter:
// Download first 1MiB only
let data = try await downloader.data(
for: fileID,
byteRange: 0..<(1024 * 1024)
)The file ID comes from the X-Xet-Hash header
when resolving a file URL without following redirects:
// Construct URLs for a Hugging Face repository
let repoType = "datasets" // or "models", "spaces"
let repoID = "username/repo-name"
let revision = "main"
let filePath = "path/to/file.bin"
let resolveURL = URL(string:
"https://huggingface.co/\(repoType)/\(repoID)/resolve/\(revision)/\(filePath)"
)!
let refreshURL = URL(string:
"https://huggingface.co/api/\(repoType)/\(repoID)/xet-read-token/\(revision)"
)!
// Get the file ID by making a request that doesn't follow redirects
// and reading the X-Xet-Hash header from the responseThis package uses AsyncHTTPClient under the hood for CAS and xorb downloads.
The downloader manages a small pool of HTTP clients and shuts them down
automatically when you use Xet.withDownloader.
Tuning can help when you need to balance throughput, memory, and connection
limits for your network environment.
You can configure the client pool and timeouts through
XetDownloader.Configuration:
var configuration = XetDownloader.Configuration.default
configuration.connectionsPerHost = 8
configuration.poolSize = 2
configuration.readTimeout = 300
try await Xet.withDownloader(
refreshURL: refreshURL,
hubToken: "hf_...",
configuration: configuration
) { downloader in
try await downloader.download(fileID, to: destinationURL)
}The Xet protocol reconstructs files from deduplicated, compressed chunks:
- Token Refresh: Obtain a short-lived CAS access token from the Hub
- Reconstruction Query: Fetch metadata describing which chunks comprise the file
- Chunk Download: Fetch compressed chunk data from xorb storage
- Decompression: Decompress chunks using LZ4 or BG4+LZ4
- Reassembly: Concatenate chunks in order to reconstruct the file
Files are stored as xorbs (Xet Orbs)—sequences of compressed chunks. Each chunk has an 8-byte header specifying:
- Version (1 byte)
- Compressed size (3 bytes, little-endian)
- Compression scheme (1 byte): none, LZ4, or BG4+LZ4
- Uncompressed size (3 bytes, little-endian)
BG4 (Byte Grouping 4) is a preprocessing step that improves compression for floating-point and structured data by grouping bytes by position.
This is a community project and we welcome contributions.
Please check out
Issues tagged with good first issue
if you are looking for a place to start!
Please ensure your code passes the build and test suite
before submitting a pull request.
You can run the tests with swift test.