feat(dagster): auto-surface compliance + retention-status on RockyComponent#249
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 24, 2026
Merged
Conversation
…ponent Adds two opt-in YAML attributes (`surface_compliance`, `surface_retention_status`, both default `False`) that auto-emit governance-waveplan observability on the Dagster asset graph — closing the glue-code gap every `RockyComponent` adopter hits after dagster-v1.12.0 shipped the types but not the wiring. - `RockyResource.compliance()` / `retention_status()` — first-class methods mirroring `state_health()`. Both accept an optional `env` kwarg. - `compliance_check_results()` + `retention_observations()` helpers in `observability.py` — pure-function builders matching the `drift_observations()` / `anomaly_check_results()` shape. Compliance aggregates per asset (Dagster rejects duplicate `(key, check_name)` pairs per materialization); retention yields one observation per model row. - `RockyComponent` pre-declares `compliance_exception` check specs when the opt-in is on, invokes the new resource methods once per materialization batch, and folds results through the helpers. Binary failures are logged and swallowed (same tolerance as drift/anomaly). Sentinel-keyed compliance results (for models outside the component's selection) are filtered with a warning to preserve Dagster's declared-spec invariant. - New scenarios + tests cover parse-guard, per-asset aggregation, sentinel fallback, undeclared-model filtering, failure tolerance, opt-in gating. Full test suite: 393 passed (was 369, +24 new).
a88be13 to
a5e7c1b
Compare
8 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 24, 2026
Governance-waveplan polish wave on top of v1.16.0/v1.12.0/v1.8.0. Engine 1.17.0: - FR-009 BREAKING: reject empty workspace_ids without opt-in (#250) - --env flag on rocky run / rocky plan + plan preview of classification / mask / retention actions (#251) - Wiremock coverage for apply_column_tags + apply_masking_policy (#252) - W004 warning for unresolved classification tags (#253) - SCIM client + per-catalog GRANT for reconcile_role_graph (#254) - rocky retention-status --drift warehouse probe (#255) Dagster 1.13.0 (tracks engine 1.17.0): - Pluggable per-call kwarg resolvers on RockyResource (#248) - Auto-surface compliance + retention-status on RockyComponent (#249) - Pre-flight governance_override validator (#250) - Regenerated PlanResult with env + action preview fields (#251) VS Code 1.9.0 (tracks engine 1.17.0): - Regenerated plan.ts with env + 3 governance-action interfaces (#251)
hugocorreia90
added a commit
that referenced
this pull request
Apr 24, 2026
Governance-waveplan polish wave on top of v1.16.0/v1.12.0/v1.8.0. Engine 1.17.0: - FR-009 BREAKING: reject empty workspace_ids without opt-in (#250) - --env flag on rocky run / rocky plan + plan preview of classification / mask / retention actions (#251) - Wiremock coverage for apply_column_tags + apply_masking_policy (#252) - W004 warning for unresolved classification tags (#253) - SCIM client + per-catalog GRANT for reconcile_role_graph (#254) - rocky retention-status --drift warehouse probe (#255) Dagster 1.13.0 (tracks engine 1.17.0): - Pluggable per-call kwarg resolvers on RockyResource (#248) - Auto-surface compliance + retention-status on RockyComponent (#249) - Pre-flight governance_override validator (#250) - Regenerated PlanResult with env + action preview fields (#251) VS Code 1.9.0 (tracks engine 1.17.0): - Regenerated plan.ts with env + 3 governance-action interfaces (#251) Also refreshes transitive dependencies across all three artifacts: - Cargo.lock: 14 transitive bumps (rustls v0.23.39, tokio v1.52.1, uuid v1.23.1, webpki-roots v1.0.7, compression-codecs v0.4.38, and 9 others) - uv.lock: 10 transitive bumps (pydantic v2.13.3, dagster-pipes/shared v1.13.2, datamodel-code-generator v0.56.1, ruff v0.15.11, and 5 others) - package-lock.json: transitive-only via npm update; direct deps unchanged so the engines.vscode / @types/vscode / test-electron triangle stays in lockstep
hugocorreia90
added a commit
that referenced
this pull request
Apr 24, 2026
Governance-waveplan polish wave on top of v1.16.0/v1.12.0/v1.8.0. Engine 1.17.0: - FR-009 BREAKING: reject empty workspace_ids without opt-in (#250) - --env flag on rocky run / rocky plan + plan preview of classification / mask / retention actions (#251) - Wiremock coverage for apply_column_tags + apply_masking_policy (#252) - W004 warning for unresolved classification tags (#253) - SCIM client + per-catalog GRANT for reconcile_role_graph (#254) - rocky retention-status --drift warehouse probe (#255) Dagster 1.13.0 (tracks engine 1.17.0): - Pluggable per-call kwarg resolvers on RockyResource (#248) - Auto-surface compliance + retention-status on RockyComponent (#249) - Pre-flight governance_override validator (#250) - Regenerated PlanResult with env + action preview fields (#251) VS Code 1.9.0 (tracks engine 1.17.0): - Regenerated plan.ts with env + 3 governance-action interfaces (#251) Also refreshes transitive dependencies across all three artifacts: - Cargo.lock: 14 transitive bumps (rustls v0.23.39, tokio v1.52.1, uuid v1.23.1, webpki-roots v1.0.7, compression-codecs v0.4.38, and 9 others) - uv.lock: 10 transitive bumps (pydantic v2.13.3, dagster-pipes/shared v1.13.2, datamodel-code-generator v0.56.1, ruff v0.15.11, and 5 others) - package-lock.json: transitive-only via npm update; direct deps unchanged so the engines.vscode / @types/vscode / test-electron triangle stays in lockstep
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the last governance-waveplan observability gap on
RockyComponent. After engine-v1.16.0 shipped the Wave B (rocky compliance) and Wave C-2 (rocky retention-status) surfaces and dagster-v1.12.0 shipped the matching Pydantic types, everyRockyComponentadopter who wanted compliance/retention events on the asset graph had to hand-roll a wrapper asset that shelled out, parsed JSON, and yieldedAssetCheckResult/AssetObservation— duplicating work the component already does for drift + anomalies.This PR adds two opt-in YAML attributes (
surface_compliance,surface_retention_status, both defaultFalse— zero behaviour change for deployments that don't flip them). When on, the component auto-wires both surfaces through the existing bridge pattern.Adopters flip both on with two lines in
defs.yaml:What got added
RockyResourcemethods (mirrorstate_health()shape):compliance(env=None) -> ComplianceOutputretention_status(env=None) -> RetentionStatusOutputobservability.pyhelpers + constants (mirrordrift_observations()/anomaly_check_results()):compliance_check_results(output, *, key_resolver) -> Iterator[AssetCheckResult]retention_observations(output, *, key_resolver) -> Iterator[AssetObservation]COMPLIANCE_CHECK_NAME = "compliance_exception"COMPLIANCE_FALLBACK_ASSET_KEY = AssetKey(["_compliance"])RETENTION_OBSERVATION_NAME = "retention_drift"RockyComponentwiring:surface_compliance: bool = False,surface_retention_status: bool = False.compliance_exceptioncheck spec per asset when opt-in is on (matches the anomaly pattern)._emit_governance_eventshelper invoked in both execution modes (streaming + pipes).Public API re-exports via
dagster_rocky/__init__.pyforcompliance_check_results,retention_observations, the three new constants,ComplianceOutput,RetentionStatusOutput,ComplianceException,ComplianceSummary,ModelRetentionStatus.Tests (24 new, 417 total after rebase on #248, all green):
warehouse_days.--env.Test plan
cd integrations/dagster && uv run pytest— 417 passeduv run ruff check src/ tests/— cleanuv run ruff format --check src/ tests/— cleansurface_compliance: trueon theirRockyComponentYAML surfaces compliance exceptions on the asset graph without any wrapper codeDeviations from the original design sketch
Several field names and call shapes in the original design sketch diverged from what the shipped schemas actually expose. Deliberate deviations:
ComplianceExceptionhas noseverityfield — real fields arecolumn / env / model / reason. The sketch's "passed=True when severity=info/acknowledged, passed=False when severity=warn/error" is unimplementable. Every exception yieldspassed=False,severity=WARN.ModelRetentionStatusfields differ from the sketch — real fields areconfigured_days / in_sync / model / warehouse_days. Metadata keys are named accordingly:rocky/retention_configured_days,rocky/retention_warehouse_days,rocky/retention_in_sync.warehouse_daysis alwaysNonein v1 (schema doc) so the metadata field is omitted conditionally.key_resolver: Callable[[str], AssetKey | None], nottranslator: RockyDagsterTranslator— matches the existingdrift_observations()/anomaly_check_results()contract. Simpler signature, andtranslator.get_asset_key(source, table)doesn't fit a compliance exception (which hasmodel+column, no source/TableInfo).(asset_key, check_name)pairs per materialization. Multiple exceptions for the same model (typically one per env) fold into one aggregated WARN result; metadata lists every(model, column, env)triple. This is the only deviation from a "one event per input row" helper shape; retention still yields one observation per row.conftest.pycomment spells out that the legacytests/fixtures/*.jsonwas removed; scenarios now live as Python dicts inscenarios.pyand flow throughconftest.pyviajson.dumps. The playground POC also doesn't have governance config, soregen-fixturescouldn't capture a live capture without also changing the POC. AddedCOMPLIANCE+RETENTION_STATUSdicts toscenarios.pyandcompliance_json/retention_status_jsonfixtures toconftest.py— the parse-guard still fires via the new tests.