Skip to content

feat(engine/rocky-cli): add rocky discover --with-schemas (PR 3)#231

Merged
hugocorreia90 merged 1 commit intomainfrom
feat/arc7-wave2-wave2-pr3-discover-with-schemas
Apr 22, 2026
Merged

feat(engine/rocky-cli): add rocky discover --with-schemas (PR 3)#231
hugocorreia90 merged 1 commit intomainfrom
feat/arc7-wave2-wave2-pr3-discover-with-schemas

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Config-disabled semantics (decision point from the task brief)

Hard error, not warn+skip. When --with-schemas is passed alongside [cache.schemas] enabled = false in rocky.toml, Rocky exits with:

`--with-schemas` conflicts with `[cache.schemas] enabled = false` in ; remove the flag or set `enabled = true` to warm the cache

The two signals are contradictory — user explicitly asked for cache warm-up and explicitly disabled the cache. Silently no-op'ing would leave them guessing at why schemas_cached=0. Erroring keeps the user's mental model aligned with what the cache does and is the trust-aligned call for Rocky's positioning.

Other error paths (all warn + continue)

  • Missing source.catalog → warn once, skip (cannot key entries without a catalog).
  • Source adapter has no BatchCheckAdapter (DuckDB today) → warn once, skip.
  • Per-schema batch_describe_schema failure → warn and continue (one bad source doesn't abort warm-up).
  • Per-entry write_schema_cache_entry failure → warn and continue.

Out of scope (do not touch)

  • rocky run write tap (PR 2).
  • rocky state clear-schema-cache and CLI TTL overrides (PR 4).
  • Any changes to the read path or cache-entry format.

Test plan

  • cargo test -p rocky-cli -p rocky-core — 1236 tests green (5 new discover::tests covering dedup + happy / failure / empty / lowercase paths via a stub BatchCheckAdapter).
  • cargo clippy --all-targets -- -D warnings — clean on full workspace.
  • cargo fmt --all --check — clean.
  • just codegen — only the expected schemas_cached node added to schemas/discover.schema.json, Pydantic + TypeScript bindings regenerated.
  • uv run pytest in integrations/dagster/ — 370 tests green after Pydantic regeneration.
  • npm run compile in editors/vscode/ — clean.
  • scripts/regen_fixtures.sh — fixtures byte-stable (field skipped when zero).
  • Smoke-tested against the 00-playground-default POC:
    • rocky -c rocky.toml discover → same JSON as before, no schemas_cached field.
    • rocky -c rocky.toml discover --with-schemas → warns about missing DuckDB BatchCheckAdapter, returns same JSON (no entries written, schemas_cached=0 elided).
    • rocky -c <disabled>.toml discover --with-schemas → errors cleanly with the config-conflict message.

… 2 wave-2 PR 3)

Explicit warm-up path for the Arc 7 wave 2 wave-2 schema cache (design
doc §4.2 route B). When `--with-schemas` is set, `rocky discover` walks
each unique `(catalog, schema)` pair reachable via the source's
`BatchCheckAdapter`, issues one `batch_describe_schema` round-trip, and
persists every returned table as a `SchemaCacheEntry` via
`StateStore::write_schema_cache_entry` (the infra PR 1a shipped in
#223). Downstream `rocky compile` / `rocky lsp` invocations pick up
those entries via the read path wired in #228 (PR 1b), so leaf models
that reference the cached source stop typechecking as `Unknown`.

What the flag does NOT do:

- Does not touch the `rocky run` write tap (PR 2, parallel agent).
- Does not add `clear-schema-cache` or a CLI TTL override (PR 4).
- Does not alter the read path, the cache-entry format, or the
  `state_path` resolution.

Error handling (design doc §4.2 + trust positioning):

- `--with-schemas` + `[cache.schemas] enabled = false` in rocky.toml
  → hard error with an actionable message. The two signals are
  contradictory; silently skipping would leave the user guessing why
  `schemas_cached=0`. Erroring keeps the user's mental model aligned
  with what the cache actually does.
- Missing `source.catalog` → warn once, skip writes (cannot key
  entries without a catalog).
- `BatchCheckAdapter` not registered for the source adapter (DuckDB
  today) → warn once, skip writes.
- Per-schema `batch_describe_schema` failure → warn and continue.
- Per-entry `write_schema_cache_entry` failure → warn and continue.

DiscoverOutput schema change:

- New `schemas_cached: usize` field (skipped from the wire format
  when zero — fixtures captured without the flag stay byte-stable).
  Full codegen cascade run: `schemas/discover.schema.json`,
  `integrations/dagster/.../types_generated/discover_schema.py`, and
  `editors/vscode/src/types/generated/discover.ts` all regenerated.

Tests:

- 5 new unit tests (`discover::tests`) covering the dedup helper and
  the inner warm-up loop against a stub `BatchCheckAdapter`: writes
  one entry per table, continues past describe failures, handles the
  empty schema list, and lowercases key components. DuckDB adapter
  has no `BatchCheckAdapter` so playground integration tests hit the
  warn-and-skip path; the stub gives meaningful assertions for the
  happy path that would otherwise require a live warehouse.

Test plan:

- `cargo test -p rocky-cli -p rocky-core` — 1236 tests green.
- `cargo clippy --all-targets -- -D warnings` — clean on full
  workspace.
- `cargo fmt --all --check` — clean.
- `just codegen` — schemas regenerated, only the expected
  `schemas_cached` node added to `discover`.
- `uv run pytest` in `integrations/dagster/` — 370 tests green after
  Pydantic regeneration.
- `npm run compile` in `editors/vscode/` — clean.
- `scripts/regen_fixtures.sh` — fixtures byte-stable (field skipped
  when zero).
- Smoke-tested against the 00-playground-default POC: discover
  without the flag returns the same JSON as before (no
  `schemas_cached` field); discover with `--with-schemas` warns
  about the missing DuckDB `BatchCheckAdapter` and returns the same
  JSON (no entries written, `schemas_cached=0` elided); discover
  with `enabled = false` + `--with-schemas` errors cleanly.
@hugocorreia90 hugocorreia90 merged commit 97df649 into main Apr 22, 2026
15 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/arc7-wave2-wave2-pr3-discover-with-schemas branch April 22, 2026 17:51
hugocorreia90 added a commit that referenced this pull request Apr 22, 2026
…(Arc 7 PR 4) (#232)

* feat(engine/rocky-cli): add rocky state clear-schema-cache + --cache-ttl override (PR 4)

Arc 7 wave 2 wave-2 PR 4 — user-facing control surface for the schema cache:

- `rocky state clear-schema-cache [--dry-run]` — explicit flush of the
  SCHEMA_CACHE redb table. Missing state store treated as no-op (CI-friendly:
  safe to run on an ephemeral runner before a build).
- `--cache-ttl <seconds>` global CLI flag — overrides `[cache.schemas]
  ttl_seconds` for this invocation. Precedence: `--cache-ttl` > `rocky.toml`
  > built-in default (86400s / 24h). Applies to CLI read paths; the
  `rocky lsp` / `rocky serve` daemons keep the config-derived TTL.
- `rocky state` becomes a subcommand group; bare `rocky state` preserved
  via `Option<StateAction>` defaulting to `Show`.

Completes the Arc 7 wave 2 wave-2 sequence (PR 1a #223 infra, PR 1b #228
reads, PR 2 #230 write tap, PR 3 #231 discover warm-up, PR 4 user controls).

* docs(engine/rocky-cli): strip task references from ClearSchemaCacheOutput doc

The doc comment on the output struct flows into schemas/*.schema.json,
dagster Pydantic docstrings, and vscode TypeScript jsdoc. Keep the
behavioral description; drop the 'Arc 7 wave 2 wave-2 PR 4 / PR 2 / PR 1b'
references per monorepo CLAUDE.md (task refs in code rot over time).

* docs(engine): add CHANGELOG entries for rocky state clear-schema-cache + --cache-ttl

* fix(integrations/dagster): sort ClearSchemaCacheOutput in types.py import block

Ruff I001 was tripping on the import block order in types.py; the
original PR 4 agent inserted ClearSchemaCacheOutput between SourceOutput
and StateOutput instead of between CiOutput and ColumnLineageOutput.
hugocorreia90 added a commit that referenced this pull request Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4

Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0
/ dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG.

Engine headlines (12 PRs):
- Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end
  (#223 infra, #228 reads, #230 write tap, #231 discover warm-up,
  #232 state controls + --cache-ttl override)
- Arc 2 wave 3 complete — bytes_scanned / bytes_written on
  MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake
  deferred doc, #222 docstring cascade). Real $ on rocky cost for
  BQ + Databricks
- FR-005 Unity Catalog workspace-binding reconcile (#226)
- FR-002 Fivetran connector metadata via SourceOutput.metadata (#225)
- Housekeeping: compute_backoff dedup into rocky_core::retry (#217)

Dagster headlines (4 PRs):
- FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor
  on RockyResource startup (#224)
- FR-003 RockyResource.state_health() (#227) + FR follow-up threading
  doctor(check=state_rw) for sub-second probes (#229)
- RockyResource.cost() wiring + fixture (#218)

VS Code: regenerated TS bindings for engine 1.14.0 type additions.
No extension feature changes.

* chore(integrations/dagster): regenerate test fixtures for engine 1.14.0

36 fixtures picked up the new engine version string in their top-level
"version" field. No schema changes — just the version bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant