DuckDB extension for block-level replication to S3-compatible storage. Intercepts DuckDB's file I/O, tracks dirty 256KB blocks, and ships them to S3 on checkpoint.
Part of the hadb ecosystem for making embedded databases highly available.
haduck registers a custom FileSystem with DuckDB that wraps the local filesystem:
- Write interception: every
Write()call marks the affected 256KB blocks as dirty in a bitmap - Sync/checkpoint: on
FileSync()(triggered by DuckDB's CHECKPOINT), dirty blocks are read from disk and uploaded to S3 - S3 key layout:
{prefix}/{db_file_key}/block_{block_id}per block
All reads and writes hit local disk at full speed. S3 uploads happen asynchronously during checkpoint, not on the write path.
LOAD haduck;
ATTACH 'haduck:///path/to/my.duckdb' AS mydb;
-- Use normally. Writes go to local disk.
CREATE TABLE mydb.t AS SELECT * FROM range(1000000);
-- CHECKPOINT ships dirty blocks to S3
CHECKPOINT mydb;Set environment variables before loading the extension:
| Variable | Required | Description |
|---|---|---|
HADUCK_S3_BUCKET |
Yes | S3 bucket name |
HADUCK_S3_ENDPOINT |
Yes | S3 endpoint URL (e.g., https://fly.storage.tigris.dev) |
HADUCK_S3_ACCESS_KEY |
Yes | AWS access key ID |
HADUCK_S3_SECRET_KEY |
Yes | AWS secret access key |
HADUCK_S3_PREFIX |
No | Key prefix (default: haduck) |
Without S3 credentials, haduck runs in local-only mode (dirty block tracking with no uploads).
DuckDB
|
v
HaduckFileSystem (C++ extension)
|-- Write() -> local disk + DirtyTracker bitmap
|-- Read() -> local disk (unchanged)
|-- FileSync() -> drain bitmap, upload dirty blocks via FFI
|
v
haduck-s3 (Rust staticlib, linked via CMake)
|-- haduck_s3_put_block() -> S3 PutObject
|-- haduck_s3_init/shutdown() -> lifecycle
- DirtyTracker: thread-safe bitmap of modified 256KB blocks per file handle
- HaduckFileSystem: DuckDB
FileSystemsubclass, delegates toLocalFileSystemfor I/O - HaduckFileHandle: wraps inner file handle, owns a
DirtyTracker - haduck-s3 (
rust/): Rust FFI crate providing S3 upload viaaws-sdk-s3
- duckblock: segment format, checksum chaining, and replication primitives for DuckDB blocks (uses hadb-changeset)
- hadb: HA building blocks for embedded databases (leases, coordination, replication traits)
Requires DuckDB source (submodule), CMake, and Rust toolchain.
# Clone with submodules
git clone --recurse-submodules https://github.com/russellromney/haduck.git
cd haduck
# Build
make release
# Test
make testSee ROADMAP.md for the full plan. Current status:
- Phase Anvil (dirty block tracking): done
- Phase Anvil-b (S3 block shipping): done
- Phase Kestrel-Osprey (segment format, storage, apply): done (in duckblock)
- Phase Harrier (replace raw S3 with duckblock segments): next
- Phase Peregrine (follower mode with hadb coordination): planned
Apache-2.0