Skip to content

feat(engine): rocky run schema-cache write tap (Arc 7 wave 2 wave-2 PR 2)#230

Merged
hugocorreia90 merged 1 commit intomainfrom
feat/arc7-wave2-wave2-pr2-write-tap
Apr 22, 2026
Merged

feat(engine): rocky run schema-cache write tap (Arc 7 wave 2 wave-2 PR 2)#230
hugocorreia90 merged 1 commit intomainfrom
feat/arc7-wave2-wave2-pr2-write-tap

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

  • Write tap: every BatchCheckAdapter::batch_describe_schema result in rocky run is persisted to the SCHEMA_CACHE redb table shipped in feat(engine/rocky-core): schema cache infra (Arc 7 wave 2 wave-2 PR 1a) #223 and read by the compiler wiring from feat(engine): wire Arc 7 wave 2 wave-2 schema cache into typecheck callsites (PR 1b) #228. Downstream compiles (and LSP per-keystroke typecheck) resolve leaf FROM <schema>.<table> refs against real warehouse types with no round-trip.
  • Config gate: [cache.schemas] enabled (default true). The helper returns () so cache-write failures log warn! and never fail the run — the best-effort contract is enforced at the type level.
  • Dedup: SchemaCacheWriteTap::seen (HashSet over schema_cache_key) suppresses repeat writes for the same (catalog, schema, table) triple within one run. Databricks already dedupes at the (catalog, schema) pair level; the tighter local guarantee keeps the invariant available to PR 3's rocky discover --with-schemas.

Writes the full schema returned by batch_describe_schema — not just tables in the current --filter — because the DESCRIBE cost is already paid. Writes both source and target schemas (distinct keys, no collision).

State_path decision: deferred

CLI default state path is .rocky-state.redb in CWD (engine/rocky/src/main.rs:72). LSP default is models_dir.join(".rocky-state.redb") (engine/crates/rocky-server/src/lsp.rs:445, state.rs:116). Without unification, a rocky run from the project root and a rocky lsp for the same project write to different state files, and inlay-hint cache hits never materialise end-to-end.

Not fixed in this PR. Any existing user has a CWD-relative .rocky-state.redb with watermarks, run history, branch records, and partition state. Silently moving writes to models/.rocky-state.redb orphans that file. A clean fix needs a migration story (detect + move, or read-the-old-path-fallback), which is out of scope for the cache-write tap. PR 1b's commit message already flagged this as a PR-2 follow-up — raising a follow-up issue to land it on its own merits.

Out of scope (follow-up PRs)

  • rocky discover --with-schemas (PR 3 — separate agent in parallel).
  • rocky state clear-schema-cache + --cache-ttl CLI override (PR 4).
  • Per-table warehouse.describe_table(...) fallback inside process_table stays untapped. That path fires only when (a) the warehouse has no BatchCheckAdapter (DuckDB — not a wave-2 cache target; no warehouse schemas to cache) or (b) the batch call failed (rare; adding a lock-held write inside a concurrent task-spawn contends with the rest of the run). Open as a follow-up if demand appears.

Test plan

  • cargo test -p rocky-cli — 220 prior tests + 7 new tests green (disabled config, missing state, cached entries, TTL expiry already covered by PR 1b; this PR adds writes-when-enabled, writes-nothing-when-disabled, dedup-within-run, writes-all-tables, distinct-catalogs, round-trip-through-reader, signature-does-not-propagate-errors)
  • cargo test --workspace — full suite green
  • cargo clippy -p rocky-cli --all-targets -- -D warnings clean
  • cargo fmt --all --check clean
  • Playground smoke (rocky run against examples/playground/pocs/00-foundations/00-playground-default/) — stays green; DuckDB has no BatchCheckAdapter so the tap branch is never entered
  • No CLI output-struct changes — just codegen is a no-op and just regen-fixtures is byte-stable

Design doc

~/Developer/rocky-plans/plans/rocky-arc7-wave2-wave2-design.md (§4.2 write path, §4.5 JSON serialization, §4.6 ColumnInfo conversion).

Infra dependencies: #223 (PR 1a — schema cache types + state CRUD), #228 (PR 1b — read-path wiring). This PR closes the write end of the Arc 7 wave 2 wave-2 cache loop.

…ave 2 wave-2 PR 2)

Tap every successful `BatchCheckAdapter::batch_describe_schema` result in
`rocky run` into the persisted schema cache shipped in #223 and wired for
reads in #228. Downstream compiles (and the LSP's per-keystroke typecheck)
can now resolve leaf `FROM <schema>.<table>` references against real
warehouse types without a round-trip on every call.

Scope:
- New `rocky-cli::schema_cache_writer` module with
  `persist_batch_describe(store, config, tap, catalog, schema, cols_by_table)`.
  One entry per returned table — the DESCRIBE cost is already paid, so
  sibling tables in the same source schema join the cache too.
- Gate on `[cache.schemas] enabled` (default true, per design doc §4.3).
  Cache-write failures log `warn!` and never fail the run; the helper
  returns `()` so the best-effort contract is enforced at the type level.
- Dedup within one run via `SchemaCacheWriteTap::seen` (a `HashSet` over
  `schema_cache_key`). Databricks already deduplicates at the
  `(catalog, schema)` pair level, but the tighter guarantee keeps the
  invariant local for PR 3's `rocky discover --with-schemas`.
- Writes the map returned by `batch_describe_schema` for both source and
  target schema directions — distinct keys, free signal for models that
  read from a sibling's target.

Deliberate non-scope:
- Per-table `warehouse.describe_table(...)` fallback inside `process_table`
  stays untapped for now. That path only fires when (a) the warehouse has
  no `BatchCheckAdapter` (DuckDB — not a wave-2 cache target, no warehouse
  schemas to cache) or (b) the batch call failed (rare; adding a lock-held
  write inside a concurrent task-spawn contends with the rest of the run
  for dubious cache benefit). Can be a follow-up if demand appears.
- `rocky discover --with-schemas` (PR 3, parallel fan-out).
- `rocky state clear-schema-cache` and `--cache-ttl` override (PR 4).
- The `state_path` CLI/LSP divergence. CLI default is
  `.rocky-state.redb` (CWD); LSP default is `models_dir.join(...)`. Fixing
  this requires a migration story for existing users' CWD state files
  with watermarks and run history; scoped out of PR 2 and tracked as a
  follow-up. PR 1b's commit message already flagged it.

Tests (7 new):
- `writes_entry_when_enabled` — happy-path write + readback.
- `writes_nothing_when_disabled` — config gate short-circuits before redb.
- `dedups_repeated_key_within_one_run` — second call with same key is
  suppressed (evidence: differing column list has no effect).
- `writes_all_tables_in_batch_not_just_selected` — full-schema write.
- `distinct_catalogs_do_not_collide` — key composition includes catalog.
- `round_trip_through_reader` — writer + PR 1b reader contract stays
  consistent; key shape and column conversion survive the round-trip.
- `signature_does_not_propagate_errors` — compile-time pin on the `()`
  return type; the best-effort contract can't be accidentally changed
  without a test failure.

Verification:
- `cargo test -p rocky-cli` — 220 prior + 7 new = 227 tests green.
- `cargo test --workspace` — full suite green.
- `cargo clippy -p rocky-cli --all-targets -- -D warnings` clean.
- `cargo fmt --all --check` clean.
- Playground smoke: `rocky run` against the default POC stays green
  (DuckDB has no `BatchCheckAdapter`, so the tap branch is never entered).

Design doc: `~/Developer/rocky-plans/plans/rocky-arc7-wave2-wave2-design.md`
(§4.2 write path, §4.5 JSON serialization, §4.6 ColumnInfo conversion).
Infra dependencies: #223 (PR 1a — schema cache types + state CRUD) and
#228 (PR 1b — read-path wiring). PR 2 closes the write end.
@hugocorreia90 hugocorreia90 merged commit 9545e4f into main Apr 22, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/arc7-wave2-wave2-pr2-write-tap branch April 22, 2026 17:51
hugocorreia90 added a commit that referenced this pull request Apr 22, 2026
…(Arc 7 PR 4) (#232)

* feat(engine/rocky-cli): add rocky state clear-schema-cache + --cache-ttl override (PR 4)

Arc 7 wave 2 wave-2 PR 4 — user-facing control surface for the schema cache:

- `rocky state clear-schema-cache [--dry-run]` — explicit flush of the
  SCHEMA_CACHE redb table. Missing state store treated as no-op (CI-friendly:
  safe to run on an ephemeral runner before a build).
- `--cache-ttl <seconds>` global CLI flag — overrides `[cache.schemas]
  ttl_seconds` for this invocation. Precedence: `--cache-ttl` > `rocky.toml`
  > built-in default (86400s / 24h). Applies to CLI read paths; the
  `rocky lsp` / `rocky serve` daemons keep the config-derived TTL.
- `rocky state` becomes a subcommand group; bare `rocky state` preserved
  via `Option<StateAction>` defaulting to `Show`.

Completes the Arc 7 wave 2 wave-2 sequence (PR 1a #223 infra, PR 1b #228
reads, PR 2 #230 write tap, PR 3 #231 discover warm-up, PR 4 user controls).

* docs(engine/rocky-cli): strip task references from ClearSchemaCacheOutput doc

The doc comment on the output struct flows into schemas/*.schema.json,
dagster Pydantic docstrings, and vscode TypeScript jsdoc. Keep the
behavioral description; drop the 'Arc 7 wave 2 wave-2 PR 4 / PR 2 / PR 1b'
references per monorepo CLAUDE.md (task refs in code rot over time).

* docs(engine): add CHANGELOG entries for rocky state clear-schema-cache + --cache-ttl

* fix(integrations/dagster): sort ClearSchemaCacheOutput in types.py import block

Ruff I001 was tripping on the import block order in types.py; the
original PR 4 agent inserted ClearSchemaCacheOutput between SourceOutput
and StateOutput instead of between CiOutput and ColumnLineageOutput.
hugocorreia90 added a commit that referenced this pull request Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4

Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0
/ dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG.

Engine headlines (12 PRs):
- Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end
  (#223 infra, #228 reads, #230 write tap, #231 discover warm-up,
  #232 state controls + --cache-ttl override)
- Arc 2 wave 3 complete — bytes_scanned / bytes_written on
  MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake
  deferred doc, #222 docstring cascade). Real $ on rocky cost for
  BQ + Databricks
- FR-005 Unity Catalog workspace-binding reconcile (#226)
- FR-002 Fivetran connector metadata via SourceOutput.metadata (#225)
- Housekeeping: compute_backoff dedup into rocky_core::retry (#217)

Dagster headlines (4 PRs):
- FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor
  on RockyResource startup (#224)
- FR-003 RockyResource.state_health() (#227) + FR follow-up threading
  doctor(check=state_rw) for sub-second probes (#229)
- RockyResource.cost() wiring + fixture (#218)

VS Code: regenerated TS bindings for engine 1.14.0 type additions.
No extension feature changes.

* chore(integrations/dagster): regenerate test fixtures for engine 1.14.0

36 fixtures picked up the new engine version string in their top-level
"version" field. No schema changes — just the version bump.
hugocorreia90 added a commit that referenced this pull request Apr 23, 2026
* feat(engine): unify CLI + LSP state_path resolution

The CLI default state_path (`.rocky-state.redb` in CWD) and the LSP /
server default (`<models>/.rocky-state.redb`) diverged after Arc 7 wave
2 wave-2. The schema-cache write tap (PR #230) persisted entries to
CWD, while the LSP read-path (PR #228) looked next to the models
directory — so inlay-hint cache-hits never fired end-to-end for any
project where the two locations weren't already co-located.

Add `rocky_core::state::resolve_state_path(explicit, models_dir)` as
the single resolution point for both halves of the binary:

1. Explicit `--state-path` wins verbatim.
2. `<models>/.rocky-state.redb` exists — use it (canonical default).
3. CWD `.rocky-state.redb` exists — use it with a one-time
   deprecation warning on stderr (legacy fallback; protects existing
   watermarks, branches, partitions, run history).
4. Both exist — CWD wins (preserves legacy state) with a louder
   warning asking the user to reconcile. Merge is lossy.
5. Neither exists (fresh project) — default to
   `<models>/.rocky-state.redb`, no warning.

Wire the helper through main.rs (single top-level resolution),
commands/watch.rs, and all four rocky-server callsites (state.rs,
lsp.rs, api.rs, dashboard.rs). Passing `--state-path` explicitly
remains a hard override, so the Dagster integration — which always
passes an explicit path — is unchanged.

Five resolver unit tests cover every branch (explicit / models-dir /
CWD-fallback / both-exist / fresh). Smoke-tested end-to-end against
the release binary: the warning lands on stderr; the models-dir case
is silent; the both-exist case emits the louder warning.

* fix(engine/rocky-core): fall back to CWD when models dir is missing

Case 5 of `resolve_state_path` returned `<models>/.rocky-state.redb`
unconditionally on fresh projects. For replication-only and
quality-only pipelines (and several POCs, e.g.
`02-performance/06-schema-drift-recover`) there is no `models/`
directory at all, so the next `rocky run` failed trying to open
the state lock file at a path whose parent doesn't exist:

    Error: failed to open state store at models/.rocky-state.redb
      i/o error opening state lock file at models/.rocky-state.redb.lock:
      No such file or directory (os error 2)

That crash surfaced as a `just regen-fixtures` normalizer failure
in the codegen-drift CI workflow for PR #238 — the `drift/run_clean`
capture emitted empty stdout and the JSON normalizer then errored on
`Expecting value: line 1 column 1 (char 0)`.

Refine the resolver to check `models_dir.is_dir()`:

- Case 5 (fresh project): default to `<models>/.rocky-state.redb`
  only when `models_dir` exists; otherwise fall back to CWD.
- Case 3 (legacy CWD state): emit the migration-nudge warning only
  when `models_dir` is a real directory. Without one there's
  nowhere to move the file to, so the warning would be noise.

The LSP only attaches to projects with `.rocky` files (i.e.
projects that have a models dir by definition), so the no-models
fallback path has no CLI-vs-LSP divergence to unify.

Two new unit tests pin the behaviour — fresh-project-without-models
and CWD-state-without-models. Verified locally: the drift POC run
now emits valid JSON, and `just regen-fixtures` completes with
zero drift against the committed `fixtures_generated/`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant