Skip to content

feat(engine): Trust-system Arc 7 (first wave) — blast-radius SELECT * linter (P002)#185

Merged
hugocorreia90 merged 3 commits intomainfrom
feat/arc-7-sql-typing
Apr 20, 2026
Merged

feat(engine): Trust-system Arc 7 (first wave) — blast-radius SELECT * linter (P002)#185
hugocorreia90 merged 3 commits intomainfrom
feat/arc-7-sql-typing

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

  • Adds always-on, warning-severity P002 diagnostic emitted when a model uses SELECT * AND at least one downstream model references specific columns of its output. Leaf SELECT * is intentionally not flagged (the user inspected the result themselves).
  • Reuses Arc 6's lint shape (P-prefixed code, crate::diagnostic::P002 const) and the existing SemanticGraph::column_consumers index — no new parser or type-inference machinery for this wave.
  • Wires into rocky-compiler::compile_project (and the incremental path) so both rocky compile and the LSP per-keystroke loop surface P002 without any opt-in flag.

Trust outcome

Trust-system Arc 7 is "SQL as first-class with types." Wave 1 ships the blast-radius-aware linter the arc description promises: when a model M does SELECT * FROM upstream and a downstream model D references specific columns of M, an upstream schema change at upstream silently propagates through M's star into D's reference scope. The diagnostic names exactly which downstream consumers reference which of M's columns (capped at 3 per consumer for legibility), with an actionable suggestion to switch to an explicit column list.

What's deferred to wave 2

Per the trust-direction plan, Arc 7 wave 2 unlocks the heavier work:

  • Type inference over raw .sql models so they hold typed DAG membership equal to .rocky models. Sharpens P002's notion of "typed upstream" beyond the semantic-graph has_star heuristic.
  • DAG-aware SQL refactoring (rename column across dependents).
  • [lints] block in rocky.toml + per-model -- rocky-allow: select-star pragma, mirroring Arc 6's deferred portability config.
  • Sharper source spans (per-construct byte offsets) — same wave-2 line item Arc 6 deferred.

Test plan

  • 8 unit tests in blast_radius::tests cover the matrix: explicit downstream consumer, leaf, no star, typed external source, untyped external source, multi-downstream aggregation, column-list truncation, suggestion shape.
  • cargo test --workspace — full engine suite passes (no regressions from making P002 always-on).
  • pytest in integrations/dagster/ — all 307 tests pass; fixtures unaffected (playground default POC has no SELECT *).
  • cargo fmt --check + cargo clippy -p rocky-compiler -p rocky-cli --all-targets -- -D warnings.
  • Eyeballed CLI text mode (rocky compile --output table) against a 3-model temp project — miette renders the P002 warning with source-span underline and help: line; final summary correctly reads 0 errors, 1 warnings.
  • Eyeballed CLI JSON mode — P002 surfaces in the existing diagnostics[] array, no schema change.
  • Verified just codegen produces no schema or generated-binding drift (P002 is just another diagnostic code string in the existing wire shape).

… linter (P002)

Adds always-on, warning-severity P002 diagnostic emitted when a model uses
`SELECT *` AND at least one downstream model references specific columns
of its output. The lint is *blast-radius-aware*: a leaf SELECT * with no
downstream consumers is intentionally not flagged — the user inspected
the result themselves. The diagnostic only fires when an upstream schema
change at the star's source would silently propagate into a downstream
that names columns of this model.

Detection runs against the existing semantic graph rather than re-parsing
SQL: `ModelSchema::has_star` is already set during graph construction by
`rocky_sql::lineage`, and `column_consumers(model, column)` enumerates
the exact downstream edges that make the radius concrete. No new parser
or type-inference machinery is required for this wave; the next wave
(Arc 7 type inference over raw `.sql` models) will sharpen which upstream
is treated as "typed."

Wiring lives in `rocky-compiler::compile_project` (and the incremental
path) so both the CLI's `rocky compile` and the LSP's per-keystroke
publish loop surface P002 to users without any opt-in flag. Severity
is warning — non-blocking on `has_errors`, so existing CI green stays
green; the diagnostic is informational pressure to make schema deps
explicit. Wave 2 may add a `[lints]` block in rocky.toml + per-model
`-- rocky-allow: select-star` pragma, mirroring Arc 6's deferred
portability config.

Diagnostics include the offending model, list of affected downstream
consumers, and the specific columns each downstream references (capped
at 3 per consumer for legibility on wide schemas), with an actionable
suggestion to switch to an explicit column list. Source spans point at
the model file so miette renders the warning in-place in both terminal
and LSP clients.
- rocky-sql/portability.rs: collapse nested `if !supports_ilike(self.target)`
  into a match-arm guard. Rust 1.95 clippy promotes `collapsible_match` so
  the previous shape (landed in Arc 6 PR #184, pre-toolchain bump) now
  fires under `-D warnings`.

- integrations/dagster: regen `lineage/compile.json` to capture the new
  P002 warning emitted by the lineage POC's stg_orders model
  (`SELECT * FROM source.raw.orders` with a downstream `fct_revenue`
  that references `customer_id, amount`). The fixture proves P002 fires
  end-to-end through the dagster CLI subprocess path with the expected
  shape.
- portability.rs: wrap the long ILIKE suggestion string the way CI's
  rustfmt expects. Local rustfmt accepted the inline form but the
  CI toolchain enforces the multiline break.

- lineage/compile.json: regen against a freshly-built release binary
  so the fixture reflects the current P002 message ("silently
  propagates" — tightened from "may silently propagate" mid-PR).
  Previous regen used a stale release binary cached before the
  message edit.
@hugocorreia90 hugocorreia90 merged commit dfcdc39 into main Apr 20, 2026
17 of 18 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/arc-7-sql-typing branch April 20, 2026 15:52
@hugocorreia90 hugocorreia90 mentioned this pull request Apr 20, 2026
3 tasks
hugocorreia90 added a commit that referenced this pull request Apr 20, 2026
Closes the first wave of every trust-system arc (Arcs 1-7) plus the two
wave-2 follow-ups landed the same day. Nine feature PRs since v1.10.0.

- Arc 1 (#170): rocky lineage --downstream, rocky branch, rocky run --branch, rocky replay
- Arc 2 (#171): per-run cost attribution, [budget] block, budget_breach hook
- Arc 3 (#172): three-state CircuitBreaker, adapter consolidation
- Arc 4 (#173): rocky trace Gantt + feature-gated OTLP metrics export
- Arc 5 (#174): schema-grounded rocky ai prompt + project-aware validator
- Arc 6 wave 1 (#184): --target-dialect P001 portability lint (12 constructs)
- Arc 7 wave 1 (#185): blast-radius P002 SELECT * lint (semantic-graph aware)
- Arc 6 wave 2 (#186): [portability] config block + per-model rocky-allow pragma
- Arc 7 wave 2 wave-1 (#187): --with-seed source-schema inference

Plus #169 fix: install scripts pick latest engine version by semver.

Version bump: 20 Cargo.toml files (all workspace members except
rocky-bigquery, which tracks its own version).

Wave 2/3 work for every arc remains in the deferred backlog — see
the changelog Deferred section for the full carry-forward.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant