feat: Trust-system Arc 1 (first wave) — lineage --downstream, rocky branch, rocky replay#170
Merged
hugocorreia90 merged 7 commits intomainfrom Apr 20, 2026
Merged
Conversation
Mirrors SemanticGraph::trace_column with a forward walk backed by a new edges_by_source_model index. --downstream on rocky lineage --column traverses every consumer of the target column to any depth; ColumnLineageOutput gains a `direction` field so JSON consumers can distinguish the two shapes. Completes Trust-system Arc 1's column-level lineage surface — the IR already had column_consumers, only the transitive closure and the CLI flag were missing.
Adds `rocky branch create|delete|list|show` backed by a new BRANCHES
redb table. A branch records a schema_prefix ("branch__<name>") that is
intended to be appended to every model target when `rocky run --branch
<name>` is introduced in a follow-up — the persistent, named analogue
of the existing --shadow mode. Warehouse-native clones (Delta SHALLOW
CLONE, Snowflake zero-copy CLONE) are explicitly deferred; the
schema-prefix approach works uniformly across all four adapters.
BranchOutput / BranchListOutput / BranchDeleteOutput are registered
with the schema-export pipeline so the Dagster + VS Code bindings pick
them up on the next `just codegen`.
Part of Trust-system Arc 1 (branches + replay + column-level lineage).
Adds `rocky replay <run_id|latest>` surfacing the state store's RunRecord — per-model SQL hash, row counts, bytes, and timings as captured at execution time. Optional --model flag filters to a single model. Emits ReplayOutput for JSON consumers (Dagster, VS Code). Inspection-only by design: re-execution with pinned inputs is a follow-up once the Arc-1 content-addressed write path arrives. Today's value is "what exactly ran at 03:15 UTC with what SQL hash?" — the reproducibility artefact that the Trust-system pitch needs to point at. Part of Trust-system Arc 1 (branches + replay + column-level lineage).
…m direction) Regenerates `schemas/`, dagster `types_generated/`, and vscode `src/types/generated/` for the four new Arc 1 schema entries: branch, branch_list, branch_delete, replay Plus the `direction` field added to `column_lineage` for the new `--downstream` flag. The dagster + vscode barrels pick up the new types through their respective re-export sections; `types.py` re-exports the new generated classes under their canonical Rust names. Produced by `just codegen`. Enforced by the codegen-drift CI workflow.
Closes the 'branch CRUD is hollow without a way to run against it' gap. --branch <name> resolves a branch from the state store, routes through the existing shadow-mode machinery as --shadow --shadow-schema <branch.schema_prefix>, and is mutually exclusive with --shadow / --shadow-schema (clap-enforced). Missing branches surface a crisp error pointing at `rocky branch create <name>`. Also adds Unreleased entries for the three Arc 1 CHANGELOGs (engine, dagster, vscode) documenting the new CLI surface and the deferred warehouse-native clone + replay re-execution work.
3 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 20, 2026
Closes the first wave of every trust-system arc (Arcs 1-7) plus the two wave-2 follow-ups landed the same day. Nine feature PRs since v1.10.0. - Arc 1 (#170): rocky lineage --downstream, rocky branch, rocky run --branch, rocky replay - Arc 2 (#171): per-run cost attribution, [budget] block, budget_breach hook - Arc 3 (#172): three-state CircuitBreaker, adapter consolidation - Arc 4 (#173): rocky trace Gantt + feature-gated OTLP metrics export - Arc 5 (#174): schema-grounded rocky ai prompt + project-aware validator - Arc 6 wave 1 (#184): --target-dialect P001 portability lint (12 constructs) - Arc 7 wave 1 (#185): blast-radius P002 SELECT * lint (semantic-graph aware) - Arc 6 wave 2 (#186): [portability] config block + per-model rocky-allow pragma - Arc 7 wave 2 wave-1 (#187): --with-seed source-schema inference Plus #169 fix: install scripts pick latest engine version by semver. Version bump: 20 Cargo.toml files (all workspace members except rocky-bigquery, which tracks its own version). Wave 2/3 work for every arc remains in the deferred backlog — see the changelog Deferred section for the full carry-forward.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First implementation wave of Trust-system Arc 1 (
plans/rocky-trust-system-direction.md). Four user-visible changes across four commits, plus codegen cascade and CHANGELOG entries.rocky lineage --column <col> --downstream— transitive forward walk through the column-level graph. Mirrorstrace_columnvia a newedges_by_source_modelindex so the cost scales with fan-out, not total edges.ColumnLineageOutputgets adirectionfield so JSON consumers can distinguish the two shapes.rocky branch create|delete|list|show— named virtual branches persisted in a newBRANCHESredb table. Each branch carries aschema_prefix(branch__<name>). Acceptsfix-price,feat_new_join,hotfix.2026-04-20; rejects names with slashes, spaces, semicolons, or >64 chars.rocky run --branch <name>— routes through the existing shadow-mode machinery (--shadow --shadow-schema <record.schema_prefix>). Mutually exclusive with--shadow/--shadow-schema(clap-enforced). Missing branch surfaces a crisp error pointing atrocky branch create.rocky replay <run_id|latest>— inspection-only surface overRunRecord: per-model SQL hash, row counts, bytes, timings. Optional--modelfilter. Re-execution with pinned inputs is deferred.Design choices
SHALLOW CLONE, Snowflake zero-copyCLONE) is a follow-up.SemanticGraph::trace_column+column_consumersshipped in prior work. The gap was (a) exposing a CLI flag for downstream and (b) making the forward walk transitive and index-backed. Ten lines of index bookkeeping, a newtrace_column_downstream, and a flag.Codegen cascade
Four new schemas registered:
branch,branch_list,branch_delete,replay.just codegenregenerated JSON schemas + Pydantic models + TypeScript interfaces.types.py(dagster) andindex.ts(vscode) barrels updated to re-export. Dagster fixturelineage_column.jsonregenerated to pick up the newdirectionfield.What's deferred (will land as follow-ups as the arc progresses)
rocky branch create— DeltaSHALLOW CLONE, SnowflakeCLONE.rocky replayre-execution path — waits on Arc 1 storage path.rocky branch compare— diff branch targets against main targets; reuses existingcomparemachinery.Test plan
cargo test -p rocky-core -p rocky-compiler -p rocky-cli)uv run pytest)npm run compilecleancargo clippy --workspace -- -D warningscleancargo fmt --all --checkcleanjust codegenidempotent (codegen-drift test passes)rocky branch create fix-price,rocky branch list,rocky branch show,rocky branch delete,rocky run --branch nonexistent(error surface),rocky run --branch foo --shadow(clap conflict),rocky replay latest(empty store error),rocky replay fake-id(missing-run error)