feat(engine/rocky-cli): add rocky discover --with-schemas (PR 3)#231
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 22, 2026
Merged
Conversation
… 2 wave-2 PR 3) Explicit warm-up path for the Arc 7 wave 2 wave-2 schema cache (design doc §4.2 route B). When `--with-schemas` is set, `rocky discover` walks each unique `(catalog, schema)` pair reachable via the source's `BatchCheckAdapter`, issues one `batch_describe_schema` round-trip, and persists every returned table as a `SchemaCacheEntry` via `StateStore::write_schema_cache_entry` (the infra PR 1a shipped in #223). Downstream `rocky compile` / `rocky lsp` invocations pick up those entries via the read path wired in #228 (PR 1b), so leaf models that reference the cached source stop typechecking as `Unknown`. What the flag does NOT do: - Does not touch the `rocky run` write tap (PR 2, parallel agent). - Does not add `clear-schema-cache` or a CLI TTL override (PR 4). - Does not alter the read path, the cache-entry format, or the `state_path` resolution. Error handling (design doc §4.2 + trust positioning): - `--with-schemas` + `[cache.schemas] enabled = false` in rocky.toml → hard error with an actionable message. The two signals are contradictory; silently skipping would leave the user guessing why `schemas_cached=0`. Erroring keeps the user's mental model aligned with what the cache actually does. - Missing `source.catalog` → warn once, skip writes (cannot key entries without a catalog). - `BatchCheckAdapter` not registered for the source adapter (DuckDB today) → warn once, skip writes. - Per-schema `batch_describe_schema` failure → warn and continue. - Per-entry `write_schema_cache_entry` failure → warn and continue. DiscoverOutput schema change: - New `schemas_cached: usize` field (skipped from the wire format when zero — fixtures captured without the flag stay byte-stable). Full codegen cascade run: `schemas/discover.schema.json`, `integrations/dagster/.../types_generated/discover_schema.py`, and `editors/vscode/src/types/generated/discover.ts` all regenerated. Tests: - 5 new unit tests (`discover::tests`) covering the dedup helper and the inner warm-up loop against a stub `BatchCheckAdapter`: writes one entry per table, continues past describe failures, handles the empty schema list, and lowercases key components. DuckDB adapter has no `BatchCheckAdapter` so playground integration tests hit the warn-and-skip path; the stub gives meaningful assertions for the happy path that would otherwise require a live warehouse. Test plan: - `cargo test -p rocky-cli -p rocky-core` — 1236 tests green. - `cargo clippy --all-targets -- -D warnings` — clean on full workspace. - `cargo fmt --all --check` — clean. - `just codegen` — schemas regenerated, only the expected `schemas_cached` node added to `discover`. - `uv run pytest` in `integrations/dagster/` — 370 tests green after Pydantic regeneration. - `npm run compile` in `editors/vscode/` — clean. - `scripts/regen_fixtures.sh` — fixtures byte-stable (field skipped when zero). - Smoke-tested against the 00-playground-default POC: discover without the flag returns the same JSON as before (no `schemas_cached` field); discover with `--with-schemas` warns about the missing DuckDB `BatchCheckAdapter` and returns the same JSON (no entries written, `schemas_cached=0` elided); discover with `enabled = false` + `--with-schemas` errors cleanly.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
…(Arc 7 PR 4) (#232) * feat(engine/rocky-cli): add rocky state clear-schema-cache + --cache-ttl override (PR 4) Arc 7 wave 2 wave-2 PR 4 — user-facing control surface for the schema cache: - `rocky state clear-schema-cache [--dry-run]` — explicit flush of the SCHEMA_CACHE redb table. Missing state store treated as no-op (CI-friendly: safe to run on an ephemeral runner before a build). - `--cache-ttl <seconds>` global CLI flag — overrides `[cache.schemas] ttl_seconds` for this invocation. Precedence: `--cache-ttl` > `rocky.toml` > built-in default (86400s / 24h). Applies to CLI read paths; the `rocky lsp` / `rocky serve` daemons keep the config-derived TTL. - `rocky state` becomes a subcommand group; bare `rocky state` preserved via `Option<StateAction>` defaulting to `Show`. Completes the Arc 7 wave 2 wave-2 sequence (PR 1a #223 infra, PR 1b #228 reads, PR 2 #230 write tap, PR 3 #231 discover warm-up, PR 4 user controls). * docs(engine/rocky-cli): strip task references from ClearSchemaCacheOutput doc The doc comment on the output struct flows into schemas/*.schema.json, dagster Pydantic docstrings, and vscode TypeScript jsdoc. Keep the behavioral description; drop the 'Arc 7 wave 2 wave-2 PR 4 / PR 2 / PR 1b' references per monorepo CLAUDE.md (task refs in code rot over time). * docs(engine): add CHANGELOG entries for rocky state clear-schema-cache + --cache-ttl * fix(integrations/dagster): sort ClearSchemaCacheOutput in types.py import block Ruff I001 was tripping on the import block order in types.py; the original PR 4 agent inserted ClearSchemaCacheOutput between SourceOutput and StateOutput instead of between CiOutput and ColumnLineageOutput.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--with-schemastorocky discover. When set, for every unique(catalog, schema)pair reachable via the source'sBatchCheckAdapter, the command issues onebatch_describe_schemaround-trip and persists every returned table as aSchemaCacheEntryinstate.redb::schema_cache. Downstreamrocky compile/rocky lspthen typecheck leaf models against real warehouse types (via the read path wired in feat(engine): wire Arc 7 wave 2 wave-2 schema cache into typecheck callsites (PR 1b) #228 / PR 1b) instead of falling back toUnknown.rocky runwrite tap (PR 2) androcky state clear-schema-cache(PR 4) ship separately. Plan at~/Developer/rocky-plans/plans/rocky-arc7-wave2-wave2-design.md§4.2 (route B) and §6 (PR breakdown).DiscoverOutputgainsschemas_cached: usize(skipped from the wire format when zero, so existing fixtures stay byte-stable). Full codegen cascade ran — Pydantic + TypeScript bindings regenerated alongside the schema.Config-disabled semantics (decision point from the task brief)
Hard error, not warn+skip. When
--with-schemasis passed alongside[cache.schemas] enabled = falseinrocky.toml, Rocky exits with:The two signals are contradictory — user explicitly asked for cache warm-up and explicitly disabled the cache. Silently no-op'ing would leave them guessing at why
schemas_cached=0. Erroring keeps the user's mental model aligned with what the cache does and is the trust-aligned call for Rocky's positioning.Other error paths (all warn + continue)
source.catalog→ warn once, skip (cannot key entries without a catalog).BatchCheckAdapter(DuckDB today) → warn once, skip.batch_describe_schemafailure → warn and continue (one bad source doesn't abort warm-up).write_schema_cache_entryfailure → warn and continue.Out of scope (do not touch)
rocky runwrite tap (PR 2).rocky state clear-schema-cacheand CLI TTL overrides (PR 4).Test plan
cargo test -p rocky-cli -p rocky-core— 1236 tests green (5 newdiscover::testscovering dedup + happy / failure / empty / lowercase paths via a stubBatchCheckAdapter).cargo clippy --all-targets -- -D warnings— clean on full workspace.cargo fmt --all --check— clean.just codegen— only the expectedschemas_cachednode added toschemas/discover.schema.json, Pydantic + TypeScript bindings regenerated.uv run pytestinintegrations/dagster/— 370 tests green after Pydantic regeneration.npm run compileineditors/vscode/— clean.scripts/regen_fixtures.sh— fixtures byte-stable (field skipped when zero).rocky -c rocky.toml discover→ same JSON as before, noschemas_cachedfield.rocky -c rocky.toml discover --with-schemas→ warns about missing DuckDBBatchCheckAdapter, returns same JSON (no entries written,schemas_cached=0elided).rocky -c <disabled>.toml discover --with-schemas→ errors cleanly with the config-conflict message.