Skip to content

datahaven-xyz/datahaven

Repository files navigation

DataHaven 🫎

AI-First Decentralized Storage secured by EigenLayer — a verifiable storage network for AI training data, machine learning models, and Web3 applications.

Overview

DataHaven is a decentralized storage and retrieval network designed for applications that need verifiable, production-scale data storage. Built on StorageHub and secured by EigenLayer's restaking protocol, DataHaven separates storage from verification: providers store data off-chain while cryptographic commitments are anchored on-chain for tamper-evident verification.

Core Capabilities:

  • Verifiable Storage: Files are chunked, hashed into Merkle trees, and committed on-chain — enabling cryptographic proof that data hasn't been tampered with
  • Provider Network: Main Storage Providers (MSPs) serve data with competitive offerings, while Backup Storage Providers (BSPs) ensure redundancy through decentralized replication with on-chain slashing for failed proof challenges
  • EigenLayer Security: Validator set secured by Ethereum restaking — DataHaven validators register as EigenLayer operators with slashing for misbehavior
  • EVM Compatibility: Full Ethereum support via Frontier pallets for smart contracts and familiar Web3 tooling
  • Cross-chain Bridge: Native, trustless bridging with Ethereum via Snowbridge for tokens and messages

Architecture

DataHaven combines EigenLayer's shared security with StorageHub's decentralized storage infrastructure:

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Ethereum (L1)                                  │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │  EigenLayer AVS Contracts                                             │  │
│  │  • DataHavenServiceManager (validator lifecycle & slashing)           │  │
│  │  • RewardsRegistry (validator performance & rewards)                  │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                    ↕                                        │
│                          Snowbridge Protocol                                │
│                    (trustless cross-chain messaging)                        │
└─────────────────────────────────────────────────────────────────────────────┘
                                     ↕
┌─────────────────────────────────────────────────────────────────────────────┐
│                          DataHaven (Substrate)                              │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │  StorageHub Pallets                     DataHaven Pallets             │  │
│  │  • file-system (file operations)        • External Validators         │  │
│  │  • providers (MSP/BSP registry)         • Native Transfer             │  │
│  │  • proofs-dealer (challenge/verify)     • Rewards                     │  │
│  │  • payment-streams (storage payments)   • Frontier (EVM)              │  │
│  │  • bucket-nfts (bucket ownership)                                     │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘
                                     ↕
┌─────────────────────────────────────────────────────────────────────────────┐
│                        Storage Provider Network                             │
│  ┌─────────────────────────────┐    ┌─────────────────────────────┐        │
│  │  Main Storage Providers     │    │  Backup Storage Providers   │        │
│  │  (MSP)                      │    │  (BSP)                      │        │
│  │  • User-selected            │    │  • Network-assigned         │        │
│  │  • Serve read requests      │    │  • Replicate data           │        │
│  │  • Anchor bucket roots      │    │  • Proof challenges         │        │
│  │  • MSP Backend service      │    │  • On-chain slashing        │        │
│  └─────────────────────────────┘    └─────────────────────────────┘        │
│  ┌─────────────────────────────┐    ┌─────────────────────────────┐        │
│  │  Indexer                    │    │  Fisherman                  │        │
│  │  • Index on-chain events    │    │  • Audit storage proofs     │        │
│  │  • Query storage metadata   │    │  • Trigger challenges       │        │
│  │  • PostgreSQL backend       │    │  • Detect misbehavior       │        │
│  └─────────────────────────────┘    └─────────────────────────────┘        │
└─────────────────────────────────────────────────────────────────────────────┘

How Storage Works

  1. Upload: User selects an MSP, creates a bucket, and uploads files. Files are chunked (8KB default), hashed into Merkle trees, and the root is anchored on-chain.
  2. Replication: The MSP coordinates with BSPs to replicate data across the network based on the bucket's replication policy.
  3. Retrieval: MSP returns files with Merkle proofs that users verify against on-chain commitments.
  4. Verification: BSPs face periodic proof challenges — failure to prove data custody results in on-chain slashing via StorageHub pallets.

Repository Structure

datahaven/
├── contracts/      # EigenLayer AVS smart contracts
│   ├── src/       # Service Manager, Rewards Registry, Slasher
│   ├── script/    # Deployment scripts
│   └── test/      # Foundry test suites
├── operator/       # Substrate-based DataHaven node
│   ├── node/      # Node implementation & chain spec
│   ├── pallets/   # Custom pallets (validators, rewards, transfers)
│   └── runtime/   # Runtime configurations (mainnet/stagenet/testnet)
├── test/           # E2E testing framework
│   ├── suites/    # Integration test scenarios
│   ├── framework/ # Test utilities and helpers
│   └── launcher/  # Network deployment automation
├── deploy/         # Kubernetes deployment charts
│   ├── charts/    # Helm charts for nodes and relayers
│   └── environments/ # Environment-specific configurations
├── tools/          # GitHub automation and release scripts
└── .github/        # CI/CD workflows

Each directory contains its own README with detailed information. See:

Quick Start

Prerequisites

  • Kurtosis - Network orchestration
  • Bun v1.3.2+ - TypeScript runtime
  • Docker - Container management
  • Foundry - Solidity toolkit
  • Rust - For building the operator
  • Helm - Kubernetes deployments (optional)
  • Zig - For macOS cross-compilation (macOS only)

Launch Local Network

The fastest way to get started is with the interactive CLI:

cd test
bun i                    # Install dependencies
bun cli launch           # Interactive launcher with prompts

This deploys a complete environment including:

  • Ethereum network: 2x EL clients (reth), 2x CL clients (lodestar)
  • Block explorers: Blockscout (optional), Dora consensus explorer
  • DataHaven node: Single validator with fast block times
  • Storage providers: MSP and BSP nodes for decentralized storage
  • AVS contracts: Deployed and configured on Ethereum
  • Snowbridge relayers: Bidirectional message passing

For more options and detailed instructions, see the test README.

Run Tests

cd test
bun test:e2e              # Run all integration tests
bun test:e2e:parallel     # Run with limited concurrency

NOTES: Adding the environment variable INJECT_CONTRACTS=true will inject the contracts when starting the tests to speed up setup.

Development Workflows

Smart Contract Development:

cd contracts
forge build               # Compile contracts
forge test                # Run contract tests

Node Development:

cd operator
cargo build --release --features fast-runtime
cargo test
./scripts/run-benchmarks.sh

After Making Changes:

cd test
bun generate:wagmi        # Regenerate contract bindings
bun generate:types        # Regenerate runtime types

Key Features

Verifiable Decentralized Storage

Production-scale storage with cryptographic guarantees:

  • Buckets: User-created containers managed by an MSP, summarized by a Merkle-Patricia trie root on-chain
  • Files: Deterministically chunked, hashed into Merkle trees, with roots serving as immutable fingerprints
  • Proofs: Merkle proofs enable verification of data integrity without trusting intermediaries
  • Audits: BSPs prove ongoing data custody via randomized proof challenges

Storage Provider Network

Two-tier provider model balancing performance and reliability:

  • MSPs: User-selected providers offering data retrieval with competitive service offerings
  • BSPs: Network-assigned backup providers ensuring data redundancy and availability, with on-chain slashing for failed proof challenges
  • Fisherman: Auditing service that monitors proofs and triggers challenges for misbehavior
  • Indexer: Indexes on-chain storage events for efficient querying

EigenLayer Security

DataHaven validators secured through Ethereum restaking:

  • Validators register as operators via DataHavenServiceManager contract
  • Economic security through ETH restaking
  • Slashing for validator misbehavior (separate from BSP slashing which is on-chain)
  • Performance-based validator rewards through RewardsRegistry

EVM Compatibility

Full Ethereum Virtual Machine support via Frontier pallets:

  • Deploy Solidity smart contracts
  • Use existing Ethereum tooling (MetaMask, Hardhat, etc.)
  • Compatible with ERC-20, ERC-721, and other standards

Cross-chain Communication

Trustless bridging via Snowbridge:

  • Native token transfers between Ethereum ↔ DataHaven
  • Cross-chain message passing
  • Finality proofs via BEEFY consensus
  • Three specialized relayers (beacon, BEEFY, execution)

Use Cases

DataHaven is designed for applications requiring verifiable, tamper-proof data storage:

  • AI & Machine Learning: Store training datasets, model weights, and agent configurations with cryptographic proofs of integrity — enabling federated learning and verifiable AI pipelines
  • DePIN (Decentralized Physical Infrastructure): Persistent storage for IoT sensor data, device configurations, and operational logs with provable data lineage
  • Real World Assets (RWAs): Immutable storage for asset documentation, ownership records, and compliance data with on-chain verification

Docker Images

Production images published to DockerHub.

Build optimizations:

Build locally:

cd test
bun build:docker:operator    # Creates datahavenxyz/datahaven:local

Development Environment

VS Code Configuration

IDE configurations are excluded from version control for personalization, but these settings are recommended for optimal developer experience. Add to your .vscode/settings.json:

Rust Analyzer:

{
  "rust-analyzer.linkedProjects": ["./operator/Cargo.toml"],
  "rust-analyzer.cargo.allTargets": true,
  "rust-analyzer.procMacro.enable": false,
  "rust-analyzer.server.extraEnv": {
    "CARGO_TARGET_DIR": "target/.rust-analyzer",
    "SKIP_WASM_BUILD": 1
  },
  "rust-analyzer.diagnostics.disabled": ["unresolved-macro-call"],
  "rust-analyzer.cargo.buildScripts.enable": false
}

Optimizations:

  • Links operator/ directory as the primary Rust project
  • Disables proc macros and build scripts for faster analysis (Substrate macros are slow)
  • Uses dedicated target directory to avoid conflicts
  • Skips WASM builds during development

Solidity (Juan Blanco's extension):

{
  "solidity.formatter": "forge",
  "solidity.compileUsingRemoteVersion": "v0.8.28+commit.7893614a",
  "[solidity]": {
    "editor.defaultFormatter": "JuanBlanco.solidity"
  }
}

Note: Solidity version must match foundry.toml

TypeScript (Biome):

{
  "biome.lsp.bin": "test/node_modules/.bin/biome",
  "[typescript]": {
    "editor.defaultFormatter": "biomejs.biome",
    "editor.codeActionsOnSave": {
      "source.organizeImports.biome": "always"
    }
  }
}

CI/CD

Local CI Testing

Run GitHub Actions workflows locally using act:

# Run E2E workflow
act -W .github/workflows/e2e.yml -s GITHUB_TOKEN="$(gh auth token)"

# Run specific job
act -W .github/workflows/e2e.yml -j test-job-name

Automated Workflows

The repository includes GitHub Actions for:

  • E2E Testing: Full integration tests on PR and main branch
  • Contract Testing: Foundry test suites for smart contracts
  • Rust Testing: Unit and integration tests for operator
  • Docker Builds: Multi-platform image builds with caching
  • Release Automation: Version tagging and changelog generation

See .github/workflows/ for workflow definitions.

Contributing

Development Cycle

  1. Make Changes: Edit contracts, runtime, or tests
  2. Run Tests: Component-specific tests (forge test, cargo test)
  3. Regenerate Types: Update bindings if contracts/runtime changed
  4. Integration Test: Run E2E tests to verify cross-component behavior
  5. Code Quality: Format and lint (cargo fmt, forge fmt, bun fmt:fix)

Common Pitfalls

  • Type mismatches: Regenerate with bun generate:types after runtime changes
  • Contract changes not reflected: Run bun generate:wagmi after modifications
  • Kurtosis issues: Ensure Docker is running and Kurtosis engine is started
  • Slow development: Use --features fast-runtime for shorter epochs/eras (block time stays 6s)
  • Network launch hangs: Check Blockscout - forge output can appear frozen

See CLAUDE.md for detailed development guidance.

License

GPL-3.0 - See LICENSE file for details

Links

About

An EVM compatible Substrate chain, powered by StorageHub and secured by EigenLayer

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 10