Skip to content

russellromney/duckblock

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

duckblock

Block-level replication primitives for DuckDB. Encodes dirty 256KB blocks into checksummed segments, ships them to S3, and applies them on followers.

Part of the hadb ecosystem. Used by haduck (the DuckDB extension) for its replication layer.

What it does

duckblock is a Rust library that provides:

  • Segment format: wraps hadb-changeset physical changesets with DuckDB defaults (u64 page IDs, 256KB block size)
  • Storage: S3 key layout, upload/download, discovery of incrementals and snapshots
  • Apply: pwrite blocks at offsets with checksum chain verification (fail-fast)
  • Pull: discover new segments from S3, download, apply sequentially
  • Replicator: hadb::Replicator implementation for DuckDB (add/pull/remove/sync + push_checkpoint)
  • FollowerBehavior: hadb::FollowerBehavior for follower poll loops and catchup-on-promotion

Architecture

haduck (C++ DuckDB extension)
  |-- FileSync() triggers push_checkpoint()
  v
duckblock (this crate)
  |-- DuckblockReplicator: bundles dirty blocks into segments, uploads
  |-- DuckblockFollowerBehavior: polls S3, downloads + applies segments
  |-- pull_incremental: discover -> download -> apply with chain verification
  v
hadb-changeset
  |-- .hadbp binary format (header + sorted pages + SHA-256 checksum)
  |-- S3 key layout: {prefix}{db}/{gen:04x}/{seq:016x}.hadbp
  v
hadb-io (ObjectStore trait, S3 backend, retry)

Segment format

Each segment is a .hadbp file containing:

  • Sorted 256KB blocks with u64 block IDs
  • SHA-256 checksum chain (each segment chains from the previous)
  • Sequence number (monotonic, +1 per checkpoint)

Segments are thin wrappers over hadb-changeset::physical::PhysicalChangeset:

use duckblock::segment::{new_segment, block, encode, decode};

let seg = new_segment(1, 0, vec![
    block(0, dirty_block_0_bytes),
    block(5, dirty_block_5_bytes),
]);
let bytes = encode(&seg);
let decoded = decode(&bytes).unwrap();

Replicator

Implements hadb::Replicator for the HA coordinator:

use duckblock::replicator::DuckblockReplicator;

let replicator = DuckblockReplicator::new(s3_storage, "prefix/");

// Leader: register DB and take initial snapshot
replicator.add("mydb", Path::new("/data/my.duckdb")).await?;

// Leader: push dirty blocks after CHECKPOINT
replicator.push_checkpoint("mydb", dirty_blocks).await?;

// Follower: restore from snapshot + apply incrementals
replicator.pull("mydb", Path::new("/data/follower.duckdb")).await?;

Follower behavior

DuckblockFollowerBehavior implements hadb::FollowerBehavior for the coordinator's follower loop:

use duckblock::follower_behavior::DuckblockFollowerBehavior;

let behavior = DuckblockFollowerBehavior::new(s3_storage);
// Used by hadb::Coordinator internally

S3 key layout

{prefix}{db_name}/0001/{seq:016x}.hadbp   -- snapshots (generation 1)
{prefix}{db_name}/0000/{seq:016x}.hadbp   -- incrementals (generation 0)

Same layout as walrust (SQLite) and all hadb-changeset consumers.

Testing

cargo test

S3 integration tests require credentials:

# Set S3 credentials, then:
cargo test --test s3_integration

License

Apache-2.0

About

Block-level replication primitives for DuckDB. Segments, checksums, S3 storage, follower apply.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages