feat(engine): wire Arc 7 wave 2 wave-2 schema cache into typecheck callsites (PR 1b)#228
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 22, 2026
Merged
Conversation
…llsites (PR 1b) Wires `CompilerConfig.source_schemas` against the persisted schema cache shipped in #223 at 9 of the 10 previously `HashMap::new()` callsites in `rocky-cli` and `rocky-server`. Read-only, no new features, no write tap (PR 2), no new CLI flags (PRs 3-4), no output-struct changes. Wired callsites - rocky-cli/src/commands/compile.rs (preserving `--with-seed` precedence) - rocky-cli/src/commands/dag.rs (column-lineage compile) - rocky-cli/src/commands/lineage.rs - rocky-cli/src/commands/ai.rs::compile_project (grounds AI prompt) - rocky-cli/src/commands/ci_diff.rs (both HEAD and base-ref compiles) - rocky-cli/src/commands/run.rs::execute_models - rocky-server/src/state.rs::ServerState::recompile - rocky-server/src/lsp.rs::RockyLsp::recompile (initial + did_save) - rocky-server/src/lsp.rs did_change debounced recompile Deliberate non-wires (commented in place) - rocky-cli/src/commands/ai.rs:112 — `ValidationContext.source_schemas` is a distinct surface from `CompilerConfig.source_schemas`; promotion needs a `rocky-ai::generate::ValidationContext` audit that's out of scope for PR 1b. Design doc §4.4 calls this out as an intentional stub. - rocky-cli/src/commands/bench.rs:268 — synthetic tempdir projects have no `.rocky-state.redb`; wiring would either no-op or read a surrounding CWD's cache and make benchmarks non-reproducible across machines. Shared helpers - `rocky-cli::source_schemas::load_cached_source_schemas` — opens `StateStore` read-only (doesn't block concurrent `rocky run`), gates on `[cache.schemas] enabled`, filters TTL, emits a once-per-CLI-process info log on hit. Does not create `state.redb` as a side effect. - `rocky-server::schema_cache_throttle::SchemaCacheThrottle` — `Mutex<HashSet<String>>`-backed per-session throttle for the info log so the LSP doesn't spam per-keystroke. Keyed on `models_dir` for PR 1b; PR 2's write tap will extend the key with a cache-version suffix so the log re-fires after cache updates. Precedence in `rocky compile` 1. `--with-seed` wins (explicit user intent, wave-1). 2. Otherwise `[cache.schemas]` from `rocky.toml` (wave-2). 3. Cold cache / no config -> empty map (matches pre-wave-2 behaviour). Scope discipline - All `source_schemas` loads go through `StateStore::open_read_only` so a concurrent `rocky run` never causes `LockHeldByOther`. - Cold-cache and missing-`state.redb` degrade to empty; the loader never creates `state.redb` as a side effect of `rocky compile` on a fresh checkout. - LSP honours `<root>/rocky.toml`'s `[cache.schemas]` (parent of `models_dir`, matching the `initialize` convention) — `enabled = false` disables the path in the IDE the same way it does at the CLI. - No `[cache.schemas]` default changes; all locked per design doc §8. Tests - `rocky-cli::source_schemas` — 4 unit tests (disabled config, missing state, cached entries, TTL expiry). - `rocky-cli::commands::compile` — 3 integration tests (cache-seeded compile flows through typecheck, loader round-trips columns via `default_type_mapper`, cold cache doesn't create state.redb). - `rocky-server::schema_cache_throttle` — 4 unit tests (first call, repeat key, distinct keys, version-bump re-fire shape for PR 2). - `rocky-server::lsp` — 3 LSP-specific tests (config disabled, zero-config defaults, cold cache no side-effect). Verification - `cargo test --workspace` — full suite green. - `cargo clippy --workspace --all-targets -- -D warnings` clean. - `cargo fmt --all --check` clean. - `just codegen` — no schema/binding diff (no output-struct changes). - `just regen-fixtures` — byte-stable (no run/compile output changes). - `uv run pytest` in `integrations/dagster/` — 312 green. - `npm run compile` in `editors/vscode/` green. Follow-up PRs - PR 2: `rocky run` write tap on `batch_describe_schema` (the cache-fill path; this PR's read path is a no-op until that lands). - PR 3: `rocky discover --with-schemas` (CI warm-up flag). - PR 4: `rocky state clear-schema-cache` + CLI TTL override + `[cache.schemas] enabled = false` surfacing in `rocky doctor`. Known follow-up (to fix in PR 2) - CLI default state path is `.rocky-state.redb` in CWD (main.rs:71); LSP convention is `models_dir.join(".rocky-state.redb")`. Today no writes land, so the divergence is invisible. PR 2's write tap should make the CLI write to the LSP's path so the claimed "inlay-hint improvement" is observable end-to-end. Design doc: ~/Developer/rocky-plans/plans/rocky-arc7-wave2-wave2-design.md Infra dependency: #223 (merged 2026-04-22).
This was referenced Apr 22, 2026
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
…ave 2 wave-2 PR 2) (#230) Tap every successful `BatchCheckAdapter::batch_describe_schema` result in `rocky run` into the persisted schema cache shipped in #223 and wired for reads in #228. Downstream compiles (and the LSP's per-keystroke typecheck) can now resolve leaf `FROM <schema>.<table>` references against real warehouse types without a round-trip on every call. Scope: - New `rocky-cli::schema_cache_writer` module with `persist_batch_describe(store, config, tap, catalog, schema, cols_by_table)`. One entry per returned table — the DESCRIBE cost is already paid, so sibling tables in the same source schema join the cache too. - Gate on `[cache.schemas] enabled` (default true, per design doc §4.3). Cache-write failures log `warn!` and never fail the run; the helper returns `()` so the best-effort contract is enforced at the type level. - Dedup within one run via `SchemaCacheWriteTap::seen` (a `HashSet` over `schema_cache_key`). Databricks already deduplicates at the `(catalog, schema)` pair level, but the tighter guarantee keeps the invariant local for PR 3's `rocky discover --with-schemas`. - Writes the map returned by `batch_describe_schema` for both source and target schema directions — distinct keys, free signal for models that read from a sibling's target. Deliberate non-scope: - Per-table `warehouse.describe_table(...)` fallback inside `process_table` stays untapped for now. That path only fires when (a) the warehouse has no `BatchCheckAdapter` (DuckDB — not a wave-2 cache target, no warehouse schemas to cache) or (b) the batch call failed (rare; adding a lock-held write inside a concurrent task-spawn contends with the rest of the run for dubious cache benefit). Can be a follow-up if demand appears. - `rocky discover --with-schemas` (PR 3, parallel fan-out). - `rocky state clear-schema-cache` and `--cache-ttl` override (PR 4). - The `state_path` CLI/LSP divergence. CLI default is `.rocky-state.redb` (CWD); LSP default is `models_dir.join(...)`. Fixing this requires a migration story for existing users' CWD state files with watermarks and run history; scoped out of PR 2 and tracked as a follow-up. PR 1b's commit message already flagged it. Tests (7 new): - `writes_entry_when_enabled` — happy-path write + readback. - `writes_nothing_when_disabled` — config gate short-circuits before redb. - `dedups_repeated_key_within_one_run` — second call with same key is suppressed (evidence: differing column list has no effect). - `writes_all_tables_in_batch_not_just_selected` — full-schema write. - `distinct_catalogs_do_not_collide` — key composition includes catalog. - `round_trip_through_reader` — writer + PR 1b reader contract stays consistent; key shape and column conversion survive the round-trip. - `signature_does_not_propagate_errors` — compile-time pin on the `()` return type; the best-effort contract can't be accidentally changed without a test failure. Verification: - `cargo test -p rocky-cli` — 220 prior + 7 new = 227 tests green. - `cargo test --workspace` — full suite green. - `cargo clippy -p rocky-cli --all-targets -- -D warnings` clean. - `cargo fmt --all --check` clean. - Playground smoke: `rocky run` against the default POC stays green (DuckDB has no `BatchCheckAdapter`, so the tap branch is never entered). Design doc: `~/Developer/rocky-plans/plans/rocky-arc7-wave2-wave2-design.md` (§4.2 write path, §4.5 JSON serialization, §4.6 ColumnInfo conversion). Infra dependencies: #223 (PR 1a — schema cache types + state CRUD) and #228 (PR 1b — read-path wiring). PR 2 closes the write end.
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
… 2 wave-2 PR 3) (#231) Explicit warm-up path for the Arc 7 wave 2 wave-2 schema cache (design doc §4.2 route B). When `--with-schemas` is set, `rocky discover` walks each unique `(catalog, schema)` pair reachable via the source's `BatchCheckAdapter`, issues one `batch_describe_schema` round-trip, and persists every returned table as a `SchemaCacheEntry` via `StateStore::write_schema_cache_entry` (the infra PR 1a shipped in #223). Downstream `rocky compile` / `rocky lsp` invocations pick up those entries via the read path wired in #228 (PR 1b), so leaf models that reference the cached source stop typechecking as `Unknown`. What the flag does NOT do: - Does not touch the `rocky run` write tap (PR 2, parallel agent). - Does not add `clear-schema-cache` or a CLI TTL override (PR 4). - Does not alter the read path, the cache-entry format, or the `state_path` resolution. Error handling (design doc §4.2 + trust positioning): - `--with-schemas` + `[cache.schemas] enabled = false` in rocky.toml → hard error with an actionable message. The two signals are contradictory; silently skipping would leave the user guessing why `schemas_cached=0`. Erroring keeps the user's mental model aligned with what the cache actually does. - Missing `source.catalog` → warn once, skip writes (cannot key entries without a catalog). - `BatchCheckAdapter` not registered for the source adapter (DuckDB today) → warn once, skip writes. - Per-schema `batch_describe_schema` failure → warn and continue. - Per-entry `write_schema_cache_entry` failure → warn and continue. DiscoverOutput schema change: - New `schemas_cached: usize` field (skipped from the wire format when zero — fixtures captured without the flag stay byte-stable). Full codegen cascade run: `schemas/discover.schema.json`, `integrations/dagster/.../types_generated/discover_schema.py`, and `editors/vscode/src/types/generated/discover.ts` all regenerated. Tests: - 5 new unit tests (`discover::tests`) covering the dedup helper and the inner warm-up loop against a stub `BatchCheckAdapter`: writes one entry per table, continues past describe failures, handles the empty schema list, and lowercases key components. DuckDB adapter has no `BatchCheckAdapter` so playground integration tests hit the warn-and-skip path; the stub gives meaningful assertions for the happy path that would otherwise require a live warehouse. Test plan: - `cargo test -p rocky-cli -p rocky-core` — 1236 tests green. - `cargo clippy --all-targets -- -D warnings` — clean on full workspace. - `cargo fmt --all --check` — clean. - `just codegen` — schemas regenerated, only the expected `schemas_cached` node added to `discover`. - `uv run pytest` in `integrations/dagster/` — 370 tests green after Pydantic regeneration. - `npm run compile` in `editors/vscode/` — clean. - `scripts/regen_fixtures.sh` — fixtures byte-stable (field skipped when zero). - Smoke-tested against the 00-playground-default POC: discover without the flag returns the same JSON as before (no `schemas_cached` field); discover with `--with-schemas` warns about the missing DuckDB `BatchCheckAdapter` and returns the same JSON (no entries written, `schemas_cached=0` elided); discover with `enabled = false` + `--with-schemas` errors cleanly.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
…(Arc 7 PR 4) (#232) * feat(engine/rocky-cli): add rocky state clear-schema-cache + --cache-ttl override (PR 4) Arc 7 wave 2 wave-2 PR 4 — user-facing control surface for the schema cache: - `rocky state clear-schema-cache [--dry-run]` — explicit flush of the SCHEMA_CACHE redb table. Missing state store treated as no-op (CI-friendly: safe to run on an ephemeral runner before a build). - `--cache-ttl <seconds>` global CLI flag — overrides `[cache.schemas] ttl_seconds` for this invocation. Precedence: `--cache-ttl` > `rocky.toml` > built-in default (86400s / 24h). Applies to CLI read paths; the `rocky lsp` / `rocky serve` daemons keep the config-derived TTL. - `rocky state` becomes a subcommand group; bare `rocky state` preserved via `Option<StateAction>` defaulting to `Show`. Completes the Arc 7 wave 2 wave-2 sequence (PR 1a #223 infra, PR 1b #228 reads, PR 2 #230 write tap, PR 3 #231 discover warm-up, PR 4 user controls). * docs(engine/rocky-cli): strip task references from ClearSchemaCacheOutput doc The doc comment on the output struct flows into schemas/*.schema.json, dagster Pydantic docstrings, and vscode TypeScript jsdoc. Keep the behavioral description; drop the 'Arc 7 wave 2 wave-2 PR 4 / PR 2 / PR 1b' references per monorepo CLAUDE.md (task refs in code rot over time). * docs(engine): add CHANGELOG entries for rocky state clear-schema-cache + --cache-ttl * fix(integrations/dagster): sort ClearSchemaCacheOutput in types.py import block Ruff I001 was tripping on the import block order in types.py; the original PR 4 agent inserted ClearSchemaCacheOutput between SourceOutput and StateOutput instead of between CiOutput and ColumnLineageOutput.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.
5 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 23, 2026
* feat(engine): unify CLI + LSP state_path resolution The CLI default state_path (`.rocky-state.redb` in CWD) and the LSP / server default (`<models>/.rocky-state.redb`) diverged after Arc 7 wave 2 wave-2. The schema-cache write tap (PR #230) persisted entries to CWD, while the LSP read-path (PR #228) looked next to the models directory — so inlay-hint cache-hits never fired end-to-end for any project where the two locations weren't already co-located. Add `rocky_core::state::resolve_state_path(explicit, models_dir)` as the single resolution point for both halves of the binary: 1. Explicit `--state-path` wins verbatim. 2. `<models>/.rocky-state.redb` exists — use it (canonical default). 3. CWD `.rocky-state.redb` exists — use it with a one-time deprecation warning on stderr (legacy fallback; protects existing watermarks, branches, partitions, run history). 4. Both exist — CWD wins (preserves legacy state) with a louder warning asking the user to reconcile. Merge is lossy. 5. Neither exists (fresh project) — default to `<models>/.rocky-state.redb`, no warning. Wire the helper through main.rs (single top-level resolution), commands/watch.rs, and all four rocky-server callsites (state.rs, lsp.rs, api.rs, dashboard.rs). Passing `--state-path` explicitly remains a hard override, so the Dagster integration — which always passes an explicit path — is unchanged. Five resolver unit tests cover every branch (explicit / models-dir / CWD-fallback / both-exist / fresh). Smoke-tested end-to-end against the release binary: the warning lands on stderr; the models-dir case is silent; the both-exist case emits the louder warning. * fix(engine/rocky-core): fall back to CWD when models dir is missing Case 5 of `resolve_state_path` returned `<models>/.rocky-state.redb` unconditionally on fresh projects. For replication-only and quality-only pipelines (and several POCs, e.g. `02-performance/06-schema-drift-recover`) there is no `models/` directory at all, so the next `rocky run` failed trying to open the state lock file at a path whose parent doesn't exist: Error: failed to open state store at models/.rocky-state.redb i/o error opening state lock file at models/.rocky-state.redb.lock: No such file or directory (os error 2) That crash surfaced as a `just regen-fixtures` normalizer failure in the codegen-drift CI workflow for PR #238 — the `drift/run_clean` capture emitted empty stdout and the JSON normalizer then errored on `Expecting value: line 1 column 1 (char 0)`. Refine the resolver to check `models_dir.is_dir()`: - Case 5 (fresh project): default to `<models>/.rocky-state.redb` only when `models_dir` exists; otherwise fall back to CWD. - Case 3 (legacy CWD state): emit the migration-nudge warning only when `models_dir` is a real directory. Without one there's nowhere to move the file to, so the warning would be noise. The LSP only attaches to projects with `.rocky` files (i.e. projects that have a models dir by definition), so the no-models fallback path has no CLI-vs-LSP divergence to unify. Two new unit tests pin the behaviour — fresh-project-without-models and CWD-state-without-models. Verified locally: the drift POC run now emits valid JSON, and `just regen-fixtures` completes with zero drift against the committed `fixtures_generated/`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Arc 7 wave 2 wave-2 — PR 1b of 4. Mechanical follow-up to #223: wires
the 10 previously
source_schemas: HashMap::new()callsites inrocky-cliandrocky-serveragainst the persisted schema cacheshipped in PR 1a. Read-only, no new features, no write tap (PR 2), no
new CLI flags (PRs 3-4), no output-struct changes.
Design doc:
~/Developer/rocky-plans/plans/rocky-arc7-wave2-wave2-design.md.Infra dependency: #223 (merged 2026-04-22).
What this PR does
Every CLI command whose internal flow calls
rocky_compiler::compile::compilenow loads
CompilerConfig.source_schemasfromrocky_compiler::schema_cache::load_source_schemas_from_cacheinstead ofpassing an empty map. Cold cache = empty map = pre-wave-2 behaviour, so
this PR is a user-visible no-op until PR 2's write tap fills the cache.
Wired callsites (9)
rocky-cli/src/commands/compile.rs--with-seedprecedence; cache fallback when seed isn't usedrocky-cli/src/commands/dag.rsrocky-cli/src/commands/lineage.rsrocky-cli/src/commands/ai.rs::compile_projectrocky-cli/src/commands/ci_diff.rsrocky-cli/src/commands/run.rs::execute_modelsrocky-server/src/state.rs::ServerState::recompilerocky-server/src/lsp.rs::RockyLsp::recompilerocky-server/src/lsp.rsdid_change debounceDeliberate non-wires (2)
rocky-cli/src/commands/ai.rs:112ValidationContext.source_schemasis a distinct surface fromCompilerConfig.source_schemas; promotion needs a dedicatedrocky-ai::generate::ValidationContextaudit. Design §4.4.rocky-cli/src/commands/bench.rs:268.rocky-state.redb; wiring would either no-op or read a surrounding CWD's cache and make benchmarks non-reproducible across machines.Both sites carry inline
// TODO(arc7-wave2): ...comments explainingthe decision. Each is a one-line follow-up if future work calls for it.
Precedence rule in
rocky compile--with-seedwins -> wave-1 seed loader (explicit user intent).[cache.schemas]fromrocky.toml(wave-2).Shared helpers
rocky-cli::source_schemas::load_cached_source_schemas—opens
StateStoreviaopen_read_onlyso a concurrentrocky runnever fails with
LockHeldByOther. Gates on[cache.schemas] enabled,filters TTL at read time, emits a once-per-CLI-process info log when
the scan returns at least one entry. Does not create
state.redbas aside effect when the file is missing (fresh clone).
rocky-server::schema_cache_throttle::SchemaCacheThrottle—Mutex<HashSet<String>>-backed per-session throttle for the LSP infolog so it doesn't spam once per keystroke. Keyed on
models_dirforPR 1b. PR 2's write tap will extend the key with a cache-version
suffix (a simple string concat; the
version_bump_pattern_re_firestest in the throttle module locks in that shape ahead of time).
LSP throttle explanation
CLI processes are short-lived, so a
OnceLock<()>gives the "once perinvocation" behaviour the spec asks for. The LSP lives for the whole
IDE session and recompiles on a 300 ms debounced
did_change— aper-invocation log would fire per keystroke. The throttle module
maintains a
HashSet<String>of keys the server has already loggedfor. Today's key is the project's
models_dir; PR 2 appends acache-version counter so the log re-fires when the cache actually
changes.
Known follow-up (to fix in PR 2)
CLI default
state_path=.rocky-state.redbin CWD (frommain.rs:71); LSP/server convention =models_dir.join(".rocky-state.redb")(matches existing
api.rs/dashboard.rs). Today no writes land, sothe divergence is invisible at the user level. PR 2's write tap should
unify them so the "inlay-hint improvement" is observable end-to-end.
Test plan
cargo test --workspace— full suite green; 14 new tests pass.cargo clippy --workspace --all-targets -- -D warningsclean.cargo fmt --all --checkclean.just codegen— no schema/binding diff (no*Outputstructchanges, as scoped).
just regen-fixtures— byte-stable (no run/compile outputchanges).
uv run pytestinintegrations/dagster/— 312 green.npm run compileineditors/vscode/green.New tests:
rocky-cli::source_schemas— disabled config short-circuits, missingstate file returns empty without side-effect file creation, cached
entries round-trip, TTL expiry.
rocky-cli::commands::compile— seededSCHEMA_CACHEentry flowsthrough
rocky compilewithout--with-seed,default_type_mapperround-trip locks
StoredColumn -> TypedColumnparity, cold cache +rocky.tomlpresent doesn't createstate.redb.rocky-server::schema_cache_throttle— first call returnstrue,repeat key returns
false, distinct keys each log once, version-bumpkey shape re-fires (shape-lock for PR 2).
rocky-server::lsp—[cache.schemas] enabled = falsefully disablesthe LSP path, zero-config project falls back to defaults, cold cache
doesn't create state file.
Follow-up PRs
rocky runwrite tap onbatch_describe_schema(fillsthe cache on every successful DESCRIBE; this PR's read path is a
no-op until PR 2 lands).
rocky discover --with-schemasflag (CI warm-up path).rocky state clear-schema-cachesubcommand + CLI TTLoverride +
[cache.schemas] enabled = falsesurfacing inrocky doctor.