Skip to content

feat: Trust-system Arc 1 (first wave) — lineage --downstream, rocky branch, rocky replay#170

Merged
hugocorreia90 merged 7 commits intomainfrom
feat/arc-1-trust-system
Apr 20, 2026
Merged

feat: Trust-system Arc 1 (first wave) — lineage --downstream, rocky branch, rocky replay#170
hugocorreia90 merged 7 commits intomainfrom
feat/arc-1-trust-system

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

First implementation wave of Trust-system Arc 1 (plans/rocky-trust-system-direction.md). Four user-visible changes across four commits, plus codegen cascade and CHANGELOG entries.

  • rocky lineage --column <col> --downstream — transitive forward walk through the column-level graph. Mirrors trace_column via a new edges_by_source_model index so the cost scales with fan-out, not total edges. ColumnLineageOutput gets a direction field so JSON consumers can distinguish the two shapes.
  • rocky branch create|delete|list|show — named virtual branches persisted in a new BRANCHES redb table. Each branch carries a schema_prefix (branch__<name>). Accepts fix-price, feat_new_join, hotfix.2026-04-20; rejects names with slashes, spaces, semicolons, or >64 chars.
  • rocky run --branch <name> — routes through the existing shadow-mode machinery (--shadow --shadow-schema <record.schema_prefix>). Mutually exclusive with --shadow / --shadow-schema (clap-enforced). Missing branch surfaces a crisp error pointing at rocky branch create.
  • rocky replay <run_id|latest> — inspection-only surface over RunRecord: per-model SQL hash, row counts, bytes, timings. Optional --model filter. Re-execution with pinned inputs is deferred.

Design choices

  1. Schema-prefix branches, not warehouse-native clones. Option 1 from the plan doc's advisor review: works uniformly across DuckDB / Databricks / Snowflake / BigQuery with zero adapter changes. Warehouse-native clone (Delta SHALLOW CLONE, Snowflake zero-copy CLONE) is a follow-up.
  2. Replay is inspection-only. Re-execution needs content-addressed writes to guarantee reproduction; that piece waits on the Arc 1 storage spike, which Hugo has explicitly deferred from this loop.
  3. Column-level lineage already existed. SemanticGraph::trace_column + column_consumers shipped in prior work. The gap was (a) exposing a CLI flag for downstream and (b) making the forward walk transitive and index-backed. Ten lines of index bookkeeping, a new trace_column_downstream, and a flag.

Codegen cascade

Four new schemas registered: branch, branch_list, branch_delete, replay. just codegen regenerated JSON schemas + Pydantic models + TypeScript interfaces. types.py (dagster) and index.ts (vscode) barrels updated to re-export. Dagster fixture lineage_column.json regenerated to pick up the new direction field.

What's deferred (will land as follow-ups as the arc progresses)

  • Warehouse-native clones on rocky branch create — Delta SHALLOW CLONE, Snowflake CLONE.
  • rocky replay re-execution path — waits on Arc 1 storage path.
  • rocky branch compare — diff branch targets against main targets; reuses existing compare machinery.

Test plan

  • 948 engine unit + 30 e2e tests pass (cargo test -p rocky-core -p rocky-compiler -p rocky-cli)
  • 307 dagster tests pass (uv run pytest)
  • vscode npm run compile clean
  • cargo clippy --workspace -- -D warnings clean
  • cargo fmt --all --check clean
  • just codegen idempotent (codegen-drift test passes)
  • Smoke tested end-to-end: rocky branch create fix-price, rocky branch list, rocky branch show, rocky branch delete, rocky run --branch nonexistent (error surface), rocky run --branch foo --shadow (clap conflict), rocky replay latest (empty store error), rocky replay fake-id (missing-run error)

Mirrors SemanticGraph::trace_column with a forward walk backed by a new
edges_by_source_model index. --downstream on rocky lineage --column
traverses every consumer of the target column to any depth; ColumnLineageOutput
gains a `direction` field so JSON consumers can distinguish the two shapes.
Completes Trust-system Arc 1's column-level lineage surface — the IR
already had column_consumers, only the transitive closure and the CLI
flag were missing.
Adds `rocky branch create|delete|list|show` backed by a new BRANCHES
redb table. A branch records a schema_prefix ("branch__<name>") that is
intended to be appended to every model target when `rocky run --branch
<name>` is introduced in a follow-up — the persistent, named analogue
of the existing --shadow mode. Warehouse-native clones (Delta SHALLOW
CLONE, Snowflake zero-copy CLONE) are explicitly deferred; the
schema-prefix approach works uniformly across all four adapters.

BranchOutput / BranchListOutput / BranchDeleteOutput are registered
with the schema-export pipeline so the Dagster + VS Code bindings pick
them up on the next `just codegen`.

Part of Trust-system Arc 1 (branches + replay + column-level lineage).
Adds `rocky replay <run_id|latest>` surfacing the state store's
RunRecord — per-model SQL hash, row counts, bytes, and timings as
captured at execution time. Optional --model flag filters to a single
model. Emits ReplayOutput for JSON consumers (Dagster, VS Code).

Inspection-only by design: re-execution with pinned inputs is a
follow-up once the Arc-1 content-addressed write path arrives. Today's
value is "what exactly ran at 03:15 UTC with what SQL hash?" — the
reproducibility artefact that the Trust-system pitch needs to point at.

Part of Trust-system Arc 1 (branches + replay + column-level lineage).
…m direction)

Regenerates `schemas/`, dagster `types_generated/`, and vscode
`src/types/generated/` for the four new Arc 1 schema entries:

  branch, branch_list, branch_delete, replay

Plus the `direction` field added to `column_lineage` for the new
`--downstream` flag. The dagster + vscode barrels pick up the new
types through their respective re-export sections; `types.py` re-exports
the new generated classes under their canonical Rust names.

Produced by `just codegen`. Enforced by the codegen-drift CI workflow.
Closes the 'branch CRUD is hollow without a way to run against it' gap.
--branch <name> resolves a branch from the state store, routes through
the existing shadow-mode machinery as --shadow --shadow-schema
<branch.schema_prefix>, and is mutually exclusive with --shadow /
--shadow-schema (clap-enforced). Missing branches surface a
crisp error pointing at `rocky branch create <name>`.

Also adds Unreleased entries for the three Arc 1 CHANGELOGs (engine,
dagster, vscode) documenting the new CLI surface and the deferred
warehouse-native clone + replay re-execution work.
@hugocorreia90 hugocorreia90 merged commit 706b03c into main Apr 20, 2026
14 of 15 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/arc-1-trust-system branch April 20, 2026 08:59
@hugocorreia90 hugocorreia90 mentioned this pull request Apr 20, 2026
3 tasks
hugocorreia90 added a commit that referenced this pull request Apr 20, 2026
Closes the first wave of every trust-system arc (Arcs 1-7) plus the two
wave-2 follow-ups landed the same day. Nine feature PRs since v1.10.0.

- Arc 1 (#170): rocky lineage --downstream, rocky branch, rocky run --branch, rocky replay
- Arc 2 (#171): per-run cost attribution, [budget] block, budget_breach hook
- Arc 3 (#172): three-state CircuitBreaker, adapter consolidation
- Arc 4 (#173): rocky trace Gantt + feature-gated OTLP metrics export
- Arc 5 (#174): schema-grounded rocky ai prompt + project-aware validator
- Arc 6 wave 1 (#184): --target-dialect P001 portability lint (12 constructs)
- Arc 7 wave 1 (#185): blast-radius P002 SELECT * lint (semantic-graph aware)
- Arc 6 wave 2 (#186): [portability] config block + per-model rocky-allow pragma
- Arc 7 wave 2 wave-1 (#187): --with-seed source-schema inference

Plus #169 fix: install scripts pick latest engine version by semver.

Version bump: 20 Cargo.toml files (all workspace members except
rocky-bigquery, which tracks its own version).

Wave 2/3 work for every arc remains in the deferred backlog — see
the changelog Deferred section for the full carry-forward.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant