feat(integrations/dagster): add RockyResource.state_health() accessor#227
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 22, 2026
Merged
feat(integrations/dagster): add RockyResource.state_health() accessor#227hugocorreia90 merged 1 commit intomainfrom
hugocorreia90 merged 1 commit intomainfrom
Conversation
Aggregates the already-shipped state-backend observability signals — the configured [state] backend plus the most recent run recorded in the state store — into a single typed StateHealthResult. When probe_write=True, additionally runs rocky doctor and extracts the state_rw check for a live put/get/delete round-trip against the backend. Delegates to a new health.state_health() module function to keep the resource thin and match the existing rocky_healthcheck facade. No engine changes, no codegen cascade — the accessor is a pure Python facade over rocky doctor + rocky history. Shape matches the FR's Python sketch: backend: Literal["local", "tiered", "valkey", "s3", "gcs"] last_run_status: Literal["success", "partial_failure", "failure"] | None last_run_at: datetime | None probe_outcome: Literal["ok", "timeout", "error"] | None probe_duration_ms: int | None probe_error: str | None The FR's stretch-goal "recent outcomes" rollup (persisted ring buffer of state-upload / state-download outcomes) stays deferred — those signals are tracing-only today and would need new engine-side persistence. Sensor-tick safety: both the history call and the probe swallow dagster.Failure rather than propagating, so a missing binary or unreadable state store degrades to None/probe_error="..." rather than crashing the tick.
bd474b2 to
455f617
Compare
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a focused programmatic accessor for Rocky state-backend health to the Dagster integration. Dagster users observing Rocky state-backend health can now read
RockyResource.state_health()to get a live snapshot of the configured backend + the most recent run's status; withprobe_write=Trueit additionally exercises the engine'sstate_rwput/get/delete probe and surfaces the outcome as typed fields. Closes the reprioritization-doc wave 2 follow-up (FR-003) now that wave 1 (#224) has merged.Design
rocky history --output jsonforlast_run_status/last_run_at(normalised fromRunStatus's CamelCase wire values into snake_case literals so callers don't need to know the serialisation).rocky doctorfor the optionalstate_rwprobe (translated into a tri-stateok/timeout/erroroutcome).Plus a cheap
tomllib.loadofconfig_pathfor thebackendfield.rocky state-healthCLI verb stays out of scope — all the substrate (probe_state_backendhelper,record_runwiring) already ships, so a new verb would just cascade through codegen without adding signal the existing surfaces don't already carry.history()raisingdg.Failure(missing binary, unreadable state store) degrades tolast_run_status=Nonerather than crashing the sensor tick. A failing probe populatesprobe_outcome="error"+probe_errorrather than raising. Mirrors the existingrocky_healthchecktolerance pattern.RockyResourcedelegates tohealth.state_health()— same split as the existingrocky_healthcheckfunction so the resource stays easy to diff against wave 1's additions.Shape
Matches the FR's explicit Python sketch:
The FR's stretch-goal "recent outcomes" rollup (persisted ring buffer of state-upload / state-download outcomes) stays deferred — those signals are
tracing::info!events only today and would require new engine-side persistence. Flagging explicitly so the decision is visible in history rather than buried.Known follow-up optimisation
RockyResource.doctor()today doesn't accept a--checkfilter — the probe path therefore invokes the full doctor (~1-2s cold). The engine CLI itself supports--check state_rw; wiring acheck: str | None = Nonekwarg onRockyResource.doctor()would cut the probe cost to the engine's sub-secondprobe_state_backendhelper and is a natural follow-up. Not landed here to keep the scope tight.Test plan
uv run pytest— 366 tests pass (24 new intest_state_health.py, covering the full matrix: all 5 backends, fresh state store, eachRunStatusvariant, unknown status → failure fallback, binary failure tolerance, probe-skipped-by-default, healthy probe → ok, critical + timeout message → timeout, critical generic → error, probe check missing, probe binary failure, probe warning severity, resource-method facade).uv run ruff check src/ tests/— clean.uv run ruff format --check src/ tests/— clean.setup_for_execution,run_pipes,RockyPipesMessageReader,strict_doctor*) untouched — the new method sits next todoctor()and does not intersect.Wave 2 follow-up
This closes the wave 2 item in the 2026-04-22 reprio — the Python-only state-health accessor. Wave 3 (workspace-binding reconciliation, engine-side) and wave 4 (adapter-namespaced source metadata, codegen cascade) remain.