Skip to content

feat: Trust-system Arc 5 (first wave) — schema-grounded rocky ai prompt + project-aware validator#174

Merged
hugocorreia90 merged 1 commit intomainfrom
feat/arc-5-schema-ai
Apr 20, 2026
Merged

feat: Trust-system Arc 5 (first wave) — schema-grounded rocky ai prompt + project-aware validator#174
hugocorreia90 merged 1 commit intomainfrom
feat/arc-5-schema-ai

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Grounds rocky ai in real project typed schemas. The LLM now sees
column names with types (e.g. orders: id: INT64, amount: DECIMAL(10,2))
instead of names only, and the compile-verify loop runs inside the full
project graph so upstream types propagate correctly.

The primary win is first-attempt generation accuracy from richer
prompt context. The validator change is honest plumbing — Rocky's
typechecker is lenient on unresolved column references today, so it
doesn't act as a hard gate against every hallucinated column; later
waves (auto-fix on failed runs, contract-aware generation) build on the
new ValidationContext hook.

Primitives

  • RockyType: Display — compact one-liner rendering (INT64,
    DECIMAL(10,2), ARRAY<STRING>, STRUCT<id:INT64,name:STRING>,
    ? for Unknown) used by the prompt so schemas don't bloat the
    context window.
  • rocky_ai::prompt::build_system_prompt — replaces name-only
    signatures with &[(String, Vec<TypedColumn>)] for both model
    dependencies and source tables. Renders under # Existing Models and
    # Available Source Tables sections.
  • rocky_ai::generate::ValidationContext — optional hook that
    carries the upstream project models + source schemas. When passed,
    the compile-verify loop typechecks the generated model alongside
    the existing project rather than in isolation. Upstream diagnostics
    surface (not swallowed); callers are responsible for passing a
    clean-compiling project.
  • rocky ai --models <path> — new flag (default models). Best-
    effort compiles the project, buckets type_check.typed_models into
    (models, sources) by dotted-name convention, threads both the typed
    schemas and the project models into generation. Compile failure
    degrades silently to unschema'd generation so rocky ai still works
    outside a project.

Framing (honest)

  • What this actually delivers: richer prompt → better first-attempt
    generation. The LLM now refuses to invent orders.bogus_column
    because orders is listed with its real columns and types.
  • What it does not deliver yet: a hard compile-time gate against
    every hallucinated column. Rocky's typechecker defaults unresolved
    columns to RockyType::Unknown rather than erroring, so the
    validator's value here is type-propagation scaffolding for later
    waves, not strict column-existence enforcement.
  • Source-tables section will be empty in practice until warehouse
    discovery is plumbed through to the CLI — it's wired so a later wave
    drops in source_column_info without another CLI change.

Test plan

  • cargo test -p rocky-ai -p rocky-compiler -p rocky-cli -p rocky-core — all green
  • cargo clippy --workspace -- -D warnings — clean
  • cargo fmt -- --check — clean
  • cargo run --bin rocky -- export-schemas schemas/ — no codegen drift (no JSON output struct changes)
  • New unit tests:
    • test_display_compact — every RockyType variant renders compactly
    • test_build_prompt_rocky_with_types — typed schemas render under both headers
    • test_build_prompt_empty_schemas_no_header — empty input → no empty section headers
    • test_project_aware_compiles_alongside_upstream — positive: generated model typechecks inside the project graph with real upstream types

…pt + project-aware validator

Grounds the `rocky ai` prompt in the project's typed schemas so the LLM
sees real column names with their RockyTypes instead of names only.
Compiles the generated model inside the full project graph rather than in
isolation; upstream typed schemas now propagate into the generated
model's output type.

- Add compact `Display` for `RockyType` — `INT64`, `DECIMAL(10,2)`,
  `ARRAY<STRING>`, `STRUCT<id:INT64,name:STRING>`, etc.
- `rocky_ai::prompt::build_system_prompt` takes typed columns for both
  existing models and source tables; renders them as `name: TYPE` pairs
  under `# Existing Models` and `# Available Source Tables`.
- `rocky_ai::generate::generate_model` takes an optional
  `ValidationContext` that carries the upstream project models + source
  schemas. When present, the compile-verify loop typechecks the
  generated model alongside the existing project so upstream column
  types propagate into its output schema. Rocky's typechecker is lenient
  on unresolved column references, so schema grounding in the prompt is
  the primary mechanism preventing hallucinated columns; the validator
  change is plumbing that later waves (auto-fix, contract-aware
  generation) will build on.
- `rocky ai` CLI gains a `--models <path>` flag (default `models`).
  Best-effort compiles the project, buckets `type_check.typed_models`
  into (models, sources), and threads both the typed schemas and the
  project models into the generator. Compile failure degrades silently
  to unschema'd generation — `rocky ai` still works outside a project.

No JSON output schema changes (no codegen drift).
@hugocorreia90 hugocorreia90 merged commit 3adbeb4 into main Apr 20, 2026
5 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/arc-5-schema-ai branch April 20, 2026 11:21
@hugocorreia90 hugocorreia90 mentioned this pull request Apr 20, 2026
3 tasks
hugocorreia90 added a commit that referenced this pull request Apr 20, 2026
Closes the first wave of every trust-system arc (Arcs 1-7) plus the two
wave-2 follow-ups landed the same day. Nine feature PRs since v1.10.0.

- Arc 1 (#170): rocky lineage --downstream, rocky branch, rocky run --branch, rocky replay
- Arc 2 (#171): per-run cost attribution, [budget] block, budget_breach hook
- Arc 3 (#172): three-state CircuitBreaker, adapter consolidation
- Arc 4 (#173): rocky trace Gantt + feature-gated OTLP metrics export
- Arc 5 (#174): schema-grounded rocky ai prompt + project-aware validator
- Arc 6 wave 1 (#184): --target-dialect P001 portability lint (12 constructs)
- Arc 7 wave 1 (#185): blast-radius P002 SELECT * lint (semantic-graph aware)
- Arc 6 wave 2 (#186): [portability] config block + per-model rocky-allow pragma
- Arc 7 wave 2 wave-1 (#187): --with-seed source-schema inference

Plus #169 fix: install scripts pick latest engine version by semver.

Version bump: 20 Cargo.toml files (all workspace members except
rocky-bigquery, which tracks its own version).

Wave 2/3 work for every arc remains in the deferred backlog — see
the changelog Deferred section for the full carry-forward.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant