Skip to content

rocky-data/rocky

Rocky

Engine CI Dagster CI VS Code CI License: Apache 2.0

Rocky is a Rust-based control plane for warehouse SQL pipelines: branches, replay, column-level lineage, compile-time type safety, per-model cost attribution. Storage and compute stay with your warehouse — Databricks, Snowflake, BigQuery, or DuckDB. Apache 2.0.

Rocky quickstart — create a project, compile, and run 3 models in under 15s

Try it in 60 seconds

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.ps1 | iex
rocky playground my-first-project
cd my-first-project
rocky compile && rocky test && rocky run

No credentials needed — the playground runs end-to-end on local DuckDB.

See it in action

Each demo below is a self-contained POC in examples/playground/pocs/cd in, run ./run.sh, reproduce locally.

Detects schema drift the moment it happens

A source column type changes upstream. On the next run, Rocky diffs source vs. target, drops the target, and recreates it. No silent data corruption, no dbt-style quiet divergence.

rocky run detects source type change and recreates the target

POC — 02-performance/06-schema-drift-recover

Enforces data contracts at compile time

Missing required columns, protected columns being removed, or unsafe type changes surface as diagnostic codes (E010, E013) before a single row is written.

rocky compile flags E010 and E013 contract violations on broken_metrics

POC — 01-quality/01-data-contracts-strict

Named branches for risk-free experiments

Create a branch, run against it in an isolated schema, inspect, then drop or promote. Column-level lineage shows the downstream blast radius before you ship.

rocky branch create, run on branch, and trace column lineage downstream

POC — 00-foundations/06-branches-replay-lineage

Column-level lineage, not table-level

Trace a single column from a downstream fact back through its aggregations, all the way to the seed. Blast-radius analysis without reading every model.

rocky lineage --column traces fct_revenue.total back to seeds.orders.amount

POC — 06-developer-experience/01-lineage-column-level

AI model generation with a compile-validate loop

Describe what you want in plain English. Rocky generates a Rocky DSL model, compiles it, and retries on parse failure — the Attempts: 2 line shows the loop catching a first-pass error invisibly.

rocky ai generates a .rocky model from natural language intent, Attempts: 2

POC — 03-ai/01-model-generation

PR-time blast-radius with rocky lineage-diff

Compare two git refs and get a per-changed-column readout of downstream consumers — pre-rendered Markdown drops straight into a GitHub PR comment. CODEOWNERS-style review tooling can't reach this granularity without a compiled engine.

rocky lineage-diff main lists added and removed columns across two models with downstream consumers per change

POC — 06-developer-experience/11-lineage-diff

Classify columns, mask by environment, gate CI

Tag PII columns in the model sidecar; bind tags to mask strategies in [mask] / [mask.<env>]. rocky compliance --env prod --fail-on exception exits 1 the moment a classified column has no resolved strategy — a one-line CI gate against accidentally-unmasked data.

rocky compliance rolls up classification tags to mask strategies; --fail-on exception exits 1, gating CI on unmasked PII

POC — 04-governance/05-classification-masking-compliance

Incremental loads with persistent watermark state

strategy = "incremental" plus a timestamp_column is all it takes. Rocky writes the high-water mark to the embedded state store; subsequent runs only INSERT … WHERE timestamp > watermark. Append 25 rows after a 500-row load — run 2 still finishes in 0.2s.

rocky run with incremental strategy: run 1 copies 500 rows; appended 25 rows; run 2 only copies the delta in 0.2s

POC — 02-performance/01-incremental-watermark

Subprojects

Path Artifact Language Description
engine/ rocky CLI binary Rust Core SQL transformation engine — 21-crate Cargo workspace
integrations/dagster/ dagster-rocky PyPI wheel Python Dagster resource and component wrapping the Rocky CLI
editors/vscode/ Rocky VSIX TypeScript VS Code extension — LSP client + commands for AI features
examples/playground/ (config only) TOML / SQL Self-contained DuckDB sample pipeline used for smoke tests and benchmarks

Each subproject has its own README with detailed usage. The engine/README.md is the canonical product reference for the Rocky CLI.

Adapters

Role Adapter Status Notes
Warehouse Databricks Production SQL Statement API · Unity Catalog · SHALLOW CLONE for branches
Warehouse Snowflake Beta REST connector · zero-copy CLONE for branches · masking policies
Warehouse BigQuery Beta REST connector · CREATE TABLE … COPY for branches
Warehouse DuckDB Local / Testing Embedded · powers rocky playground (no credentials needed)
Source Fivetran Production REST connector + table discovery
Source Airbyte Beta Catalog discovery
Source Iceberg Beta REST catalog discovery of namespaces and tables
Source Manual Production Schema/table lists inline in rocky.toml

Building a warehouse Rocky doesn't ship in-tree (ClickHouse, Trino, Redshift, …)? See the Adapter SDK guide and the Rust-native skeleton POC.

Building from source

git clone https://github.com/rocky-data/rocky.git
cd rocky
just build       # builds engine + dagster wheel + vscode extension
just test        # runs all test suites
just lint        # cargo clippy/fmt + ruff + eslint

just is optional — you can also build each subproject directly. See CONTRIBUTING.md for per-subproject build commands.

Releases

Each artifact is released independently using a tag-namespaced scheme:

  • engine-v* → Rocky CLI binary (cross-compiled, on GitHub Releases)
  • dagster-v*dagster-rocky wheel
  • vscode-v* → Rocky VSIX

See CONTRIBUTING.md for the full release flow.

Documentation

Full documentation: rocky-data.dev — concepts, guides, CLI reference, Dagster integration, adapter SDK.

Contributing

See CONTRIBUTING.md. Before opening a PR, please read the cross-project change guidance — schema and DSL changes must update consumers atomically.

Sponsoring

Rocky is free and open source. If it saves your team time, consider sponsoring the project so development can continue.

License

Apache 2.0

About

Rust SQL transformation engine with branches, replay, column-level lineage, compile-time type safety, and per-model cost attribution. Single static binary; adapters for Databricks, Snowflake, BigQuery, DuckDB. Apache 2.0.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors