feat: Trust-system Arc 5 (first wave) — schema-grounded rocky ai prompt + project-aware validator#174
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 20, 2026
Merged
Conversation
…pt + project-aware validator Grounds the `rocky ai` prompt in the project's typed schemas so the LLM sees real column names with their RockyTypes instead of names only. Compiles the generated model inside the full project graph rather than in isolation; upstream typed schemas now propagate into the generated model's output type. - Add compact `Display` for `RockyType` — `INT64`, `DECIMAL(10,2)`, `ARRAY<STRING>`, `STRUCT<id:INT64,name:STRING>`, etc. - `rocky_ai::prompt::build_system_prompt` takes typed columns for both existing models and source tables; renders them as `name: TYPE` pairs under `# Existing Models` and `# Available Source Tables`. - `rocky_ai::generate::generate_model` takes an optional `ValidationContext` that carries the upstream project models + source schemas. When present, the compile-verify loop typechecks the generated model alongside the existing project so upstream column types propagate into its output schema. Rocky's typechecker is lenient on unresolved column references, so schema grounding in the prompt is the primary mechanism preventing hallucinated columns; the validator change is plumbing that later waves (auto-fix, contract-aware generation) will build on. - `rocky ai` CLI gains a `--models <path>` flag (default `models`). Best-effort compiles the project, buckets `type_check.typed_models` into (models, sources), and threads both the typed schemas and the project models into the generator. Compile failure degrades silently to unschema'd generation — `rocky ai` still works outside a project. No JSON output schema changes (no codegen drift).
3 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 20, 2026
Closes the first wave of every trust-system arc (Arcs 1-7) plus the two wave-2 follow-ups landed the same day. Nine feature PRs since v1.10.0. - Arc 1 (#170): rocky lineage --downstream, rocky branch, rocky run --branch, rocky replay - Arc 2 (#171): per-run cost attribution, [budget] block, budget_breach hook - Arc 3 (#172): three-state CircuitBreaker, adapter consolidation - Arc 4 (#173): rocky trace Gantt + feature-gated OTLP metrics export - Arc 5 (#174): schema-grounded rocky ai prompt + project-aware validator - Arc 6 wave 1 (#184): --target-dialect P001 portability lint (12 constructs) - Arc 7 wave 1 (#185): blast-radius P002 SELECT * lint (semantic-graph aware) - Arc 6 wave 2 (#186): [portability] config block + per-model rocky-allow pragma - Arc 7 wave 2 wave-1 (#187): --with-seed source-schema inference Plus #169 fix: install scripts pick latest engine version by semver. Version bump: 20 Cargo.toml files (all workspace members except rocky-bigquery, which tracks its own version). Wave 2/3 work for every arc remains in the deferred backlog — see the changelog Deferred section for the full carry-forward.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Grounds
rocky aiin real project typed schemas. The LLM now seescolumn names with types (e.g.
orders: id: INT64, amount: DECIMAL(10,2))instead of names only, and the compile-verify loop runs inside the full
project graph so upstream types propagate correctly.
The primary win is first-attempt generation accuracy from richer
prompt context. The validator change is honest plumbing — Rocky's
typechecker is lenient on unresolved column references today, so it
doesn't act as a hard gate against every hallucinated column; later
waves (auto-fix on failed runs, contract-aware generation) build on the
new
ValidationContexthook.Primitives
RockyType: Display— compact one-liner rendering (INT64,DECIMAL(10,2),ARRAY<STRING>,STRUCT<id:INT64,name:STRING>,?forUnknown) used by the prompt so schemas don't bloat thecontext window.
rocky_ai::prompt::build_system_prompt— replaces name-onlysignatures with
&[(String, Vec<TypedColumn>)]for both modeldependencies and source tables. Renders under
# Existing Modelsand# Available Source Tablessections.rocky_ai::generate::ValidationContext— optional hook thatcarries the upstream project models + source schemas. When passed,
the compile-verify loop typechecks the generated model alongside
the existing project rather than in isolation. Upstream diagnostics
surface (not swallowed); callers are responsible for passing a
clean-compiling project.
rocky ai --models <path>— new flag (defaultmodels). Best-effort compiles the project, buckets
type_check.typed_modelsinto(models, sources) by dotted-name convention, threads both the typed
schemas and the project models into generation. Compile failure
degrades silently to unschema'd generation so
rocky aistill worksoutside a project.
Framing (honest)
generation. The LLM now refuses to invent
orders.bogus_columnbecause
ordersis listed with its real columns and types.every hallucinated column. Rocky's typechecker defaults unresolved
columns to
RockyType::Unknownrather than erroring, so thevalidator's value here is type-propagation scaffolding for later
waves, not strict column-existence enforcement.
discovery is plumbed through to the CLI — it's wired so a later wave
drops in
source_column_infowithout another CLI change.Test plan
cargo test -p rocky-ai -p rocky-compiler -p rocky-cli -p rocky-core— all greencargo clippy --workspace -- -D warnings— cleancargo fmt -- --check— cleancargo run --bin rocky -- export-schemas schemas/— no codegen drift (no JSON output struct changes)test_display_compact— everyRockyTypevariant renders compactlytest_build_prompt_rocky_with_types— typed schemas render under both headerstest_build_prompt_empty_schemas_no_header— empty input → no empty section headerstest_project_aware_compiles_alongside_upstream— positive: generated model typechecks inside the project graph with real upstream types