Incubating Project - Arco is under active development and not yet ready for production use. APIs may change without notice. We welcome early feedback and contributions.
Serverless lakehouse infrastructure - A file-native catalog and execution-first orchestration layer for modern data platforms.
Arco unifies a file-native catalog and an execution-first orchestration layer into one operational metadata system. It stores metadata as immutable, queryable files on object storage and treats deterministic planning, replayable history, and explainability as product requirements.
| Feature | Description |
|---|---|
| Metadata as files | Parquet-first storage for catalog and operational metadata, optimized for direct SQL access |
| Query-native reads | Browser and server query engines read metadata directly via signed URLs, eliminating always-on infrastructure |
| Lineage-by-execution | Lineage captured from real runs (inputs/outputs/partitions), not inferred from SQL parsing |
| Two-tier consistency | Strong consistency for DDL; eventual consistency for high-volume operational facts |
| Tenant isolation | Enforced at storage layout, service boundaries, and test gates |
- Documentation map:
docs/README.md - Canonical guide/reference docs (mdBook):
docs/guide/src/ - Architecture decisions:
docs/adr/README.md - Operations runbooks:
docs/runbooks/ - Release process:
RELEASE.md - Evidence policy:
docs/guide/src/reference/evidence-policy.md
arco/
├── crates/
│ ├── arco-core/ # Core abstractions: types, storage traits, tenant context
│ ├── arco-catalog/ # Catalog service: registry, lineage, search, Parquet storage
│ ├── arco-flow/ # Orchestration: planning, scheduling, state machine
│ ├── arco-api/ # HTTP/gRPC composition layer
│ ├── arco-proto/ # Protobuf definitions
│ └── arco-compactor/ # Compaction binary for Tier 2 events
├── proto/ # Canonical .proto files
├── python/ # Python SDK
└── docs/ # mdBook docs + ADRs + audits
Arco enforces explicit engine responsibilities in split-services deployments:
arco-apiandarco-floware control-plane services.- DataFusion endpoints (
/api/v1/query,/api/v1/query-data) are read-only (SELECT/CTE). - Compactors are sole writers for state/snapshot Parquet projections.
- Browser query path uses DuckDB-WASM via signed URLs only.
- ETL compute runs in external workers via canonical
WorkerDispatchEnvelope.
Current cycle non-goals:
- No in-process ETL engine.
- No Spark/dbt/Flink adapter implementation.
- Rust 1.85+ (Edition 2024)
- Protocol Buffers compiler (
protoc) - mdBook (
cargo install mdbook --version 0.4.52 --locked) for local docs builds
# Clone the repository
git clone https://github.com/daxis-io/arco.git
cd arco
# Build all crates
cargo build --workspace
# Run tests
cargo test --workspace
# Check formatting and lints
cargo fmt --all --check
cargo clippy --workspace --all-features -- -D warningscargo xtask doctor
cargo xtask adr-check
cargo xtask verify-integrity
cargo xtask parity-matrix-check
cargo xtask repo-hygiene-check
cargo xtask uc-openapi-inventory
git diff --exit-code -- docs/guide/src/reference/unity-catalog-openapi-inventory.md
# Build docs
cd docs/guide && mdbook build| Crate | Description | Status |
|---|---|---|
arco-core |
Shared primitives: tenant context, IDs, errors, storage traits | Alpha |
arco-catalog |
Catalog domain: asset registry, lineage, Parquet snapshots | Alpha |
arco-flow |
Orchestration domain: planning, scheduling, run state | Alpha |
arco-api |
HTTP/gRPC composition layer | Alpha |
arco-proto |
Protobuf definitions for cross-language contracts | Alpha |
arco-compactor |
Compaction binary for Tier 2 event consolidation | Alpha |
- Contribution guidelines: CONTRIBUTING.md
- Community pathways: COMMUNITY.md, SUPPORT.md
- Code of conduct: CODE_OF_CONDUCT.md
cargo fmt --all --check
cargo clippy --workspace --all-features -- -D warnings
cargo test --workspace --all-features --exclude arco-flow --exclude arco-apiFor security vulnerabilities, please see SECURITY.md.
Licensed under the Apache License, Version 2.0.