Skip to content

feat(engine): fail-fast on empty governance_override.workspace_ids#250

Merged
hugocorreia90 merged 1 commit intomainfrom
feat/governance-override-allow-empty-workspace-ids
Apr 24, 2026
Merged

feat(engine): fail-fast on empty governance_override.workspace_ids#250
hugocorreia90 merged 1 commit intomainfrom
feat/governance-override-allow-empty-workspace-ids

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

@hugocorreia90 hugocorreia90 commented Apr 24, 2026

Summary

Guards every workspace-binding reconciler user against a silent full-revoke footgun. The reconciler shipped in engine-v1.14.0 treats workspace_ids as declarative state, which means an empty list = revoke every binding on the target catalog. A misconfigured permission store, an off-by-one serializer, or a dict-get returning an empty default then silently strips every workspace binding from the catalog.

Two-layer defence: the engine rejects the footgun at the CLI boundary, and dagster-rocky mirrors the check in-process so it surfaces as a dg.Failure before the subprocess is spawned. The Python check runs after _apply_resolvers (from #248), so it also catches payloads produced by governance_override_fn.

Payload -> behaviour

Payload Before After
key omitted Skip reconciliation Skip reconciliation (unchanged)
{"workspace_ids": ["ws-a", "ws-b"]} Reconcile to that set Reconcile to that set (unchanged)
{"workspace_ids": []} Silently revoke all Error — refuses the silent full revoke
{"workspace_ids": [], "allow_empty_workspace_ids": true} n/a Explicit full revoke

Migration (breaking)

Any caller that was relying on workspace_ids = [] meaning "revoke everything" must set "allow_empty_workspace_ids": true on the override payload. Callers that pass None / omit the key / send a non-empty list are unaffected. The reconciler shipped two days ago in engine-v1.14.0 so the blast radius is near-zero; tightening the contract now before it has users.

Shape change (rocky-core)

GovernanceOverride::workspace_ids is now Option<Vec<WorkspaceBindingConfig>> (was Vec<WorkspaceBindingConfig> with #[serde(default)]). Serde cannot distinguish {} from {"workspace_ids": []} with the old shape — Option<Vec<_>> is what lets the engine tell "no override" (None) apart from "reconcile to empty" (Some(vec![])). A new allow_empty_workspace_ids: bool field (default false) carries the opt-in so the intent is auditable per-run via the RunRecord audit trail introduced in engine 1.16.0.

No schema-regen cascade

GovernanceOverride is a per-run CLI JSON payload parsed directly by main.rs, not part of RockyConfig — it doesn't derive JsonSchema and isn't emitted in schemas/rocky_project.schema.json or the generated Pydantic / VS Code bindings. Confirmed clean via just codegen (no diff after the Rust change).

Tests added

  • engine/crates/rocky-core/src/config.rs — 5 new tests covering the payload -> validator outcome table. The existing test_governance_override_json_structured was updated for the Option<Vec> shape.
  • integrations/dagster/tests/test_resource.py — 8 validator unit tests under a TestValidateGovernanceOverride class + 1 end-to-end test asserting the subprocess is never spawned on the footgun path.

Follow-ups

  • Empty grants / schema_grants has the same footgun shape but is intentionally out of scope — a follow-up can extend the guard if the pattern is endorsed.
  • Surfacing an intentional full revoke (allow_empty_workspace_ids=true) as a dg.AssetCheckResult warning so it's visible in the run viewer.

Test plan

  • cd engine && cargo test -p rocky-core (109 passed including 5 new)
  • cd engine && cargo test -p rocky-cli (257 passed)
  • cd engine && cargo clippy -p rocky-core -p rocky-cli -- -D warnings (clean)
  • cd engine && cargo fmt --check (clean)
  • cd integrations/dagster && uv run pytest (407 passed after rebase on feat(dagster): pluggable per-call kwarg resolvers on RockyResource #248, includes new validator + end-to-end tests)
  • cd integrations/dagster && uv run ruff check src/ tests/ (clean)
  • just codegen leaves a clean git status — codegen-drift CI should pass

@hugocorreia90 hugocorreia90 force-pushed the feat/governance-override-allow-empty-workspace-ids branch 2 times, most recently from ea0813c to 085d102 Compare April 24, 2026 11:41
Guards every workspace-binding reconciler user against a silent
full-revoke footgun.

The reconciler shipped in engine-v1.14.0 treats `workspace_ids` as
declarative state: given N ids, Rocky reconciles bindings on the target
catalog to exactly that set. Empty list = revoke every existing binding.
That's the right contract when the list is intentional; it also means a
misconfigured permission store, an off-by-one serializer, or a dict-get
returning an empty default silently strips every workspace binding from
the catalog.

Two-layer defence:

* Engine (`rocky run`). `GovernanceOverride::workspace_ids` becomes
  `Option<Vec<_>>` so the engine can distinguish "key absent" (skip
  reconciliation) from "reconcile to empty" (the footgun). A new
  `allow_empty_workspace_ids: bool` field (default false) carries the
  opt-in. A new `validate_workspace_ids` method rejects the
  `Some(empty)` + no-flag shape with a clear error naming the target
  catalog and pointing at the escape hatch; called before any warehouse
  mutation on the catalog so an accidental payload can't leave partial
  state behind.
* Python (`dagster-rocky`). `_validate_governance_override` mirrors the
  check in-process, so `RockyResource.run()` / `run_streaming()` /
  `run_pipes()` / `resume_run()` fail as `dg.Failure` before the
  subprocess runs — faster feedback + cleaner stack trace. Runs after
  `_apply_resolvers` so it also catches payloads produced by
  `governance_override_fn`.

Payload semantics after this change:

| Payload | Behaviour |
|---|---|
| key omitted | Skip binding reconciliation (unchanged) |
| `{workspace_ids: [ids...]}` | Reconcile to that set (unchanged) |
| `{workspace_ids: []}` | Error — refuses the silent full revoke |
| `{workspace_ids: [], allow_empty_workspace_ids: true}` | Explicit full revoke |

BREAKING for callers that relied on `workspace_ids = []` meaning
"revoke everything". Migration: set `allow_empty_workspace_ids: true`
in the override payload. The reconciler shipped two days ago in
engine-v1.14.0 so the blast radius is near-zero; tightening the
contract now before it has users.

No schema-regen cascade: `GovernanceOverride` is a per-run CLI JSON
payload, not part of the project schema (it's parsed directly by the
CLI in `main.rs`, never reaches `RockyConfig`). Confirmed clean via
`just codegen`.

Tests:
* rocky-core: 5 new tests covering the payload -> validator outcome
  table (key absent, non-empty, empty without flag, empty with flag,
  empty with explicit false flag).
* dagster-rocky: 8 validator unit tests + 1 end-to-end test asserting
  the subprocess is never spawned on the footgun path.

Keeps `grants` / `schema_grants` as `Vec` unchanged — scoped to
`workspace_ids`. A follow-up can extend the guard to those fields
if the pattern is endorsed.
@hugocorreia90 hugocorreia90 force-pushed the feat/governance-override-allow-empty-workspace-ids branch from 085d102 to 0496249 Compare April 24, 2026 11:44
@hugocorreia90 hugocorreia90 merged commit 8c3b126 into main Apr 24, 2026
14 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/governance-override-allow-empty-workspace-ids branch April 24, 2026 11:53
hugocorreia90 added a commit that referenced this pull request Apr 24, 2026
Governance-waveplan polish wave on top of v1.16.0/v1.12.0/v1.8.0.

Engine 1.17.0:
- FR-009 BREAKING: reject empty workspace_ids without opt-in (#250)
- --env flag on rocky run / rocky plan + plan preview of classification /
  mask / retention actions (#251)
- Wiremock coverage for apply_column_tags + apply_masking_policy (#252)
- W004 warning for unresolved classification tags (#253)
- SCIM client + per-catalog GRANT for reconcile_role_graph (#254)
- rocky retention-status --drift warehouse probe (#255)

Dagster 1.13.0 (tracks engine 1.17.0):
- Pluggable per-call kwarg resolvers on RockyResource (#248)
- Auto-surface compliance + retention-status on RockyComponent (#249)
- Pre-flight governance_override validator (#250)
- Regenerated PlanResult with env + action preview fields (#251)

VS Code 1.9.0 (tracks engine 1.17.0):
- Regenerated plan.ts with env + 3 governance-action interfaces (#251)
hugocorreia90 added a commit that referenced this pull request Apr 24, 2026
Governance-waveplan polish wave on top of v1.16.0/v1.12.0/v1.8.0.

Engine 1.17.0:
- FR-009 BREAKING: reject empty workspace_ids without opt-in (#250)
- --env flag on rocky run / rocky plan + plan preview of classification /
  mask / retention actions (#251)
- Wiremock coverage for apply_column_tags + apply_masking_policy (#252)
- W004 warning for unresolved classification tags (#253)
- SCIM client + per-catalog GRANT for reconcile_role_graph (#254)
- rocky retention-status --drift warehouse probe (#255)

Dagster 1.13.0 (tracks engine 1.17.0):
- Pluggable per-call kwarg resolvers on RockyResource (#248)
- Auto-surface compliance + retention-status on RockyComponent (#249)
- Pre-flight governance_override validator (#250)
- Regenerated PlanResult with env + action preview fields (#251)

VS Code 1.9.0 (tracks engine 1.17.0):
- Regenerated plan.ts with env + 3 governance-action interfaces (#251)

Also refreshes transitive dependencies across all three artifacts:
- Cargo.lock: 14 transitive bumps (rustls v0.23.39, tokio v1.52.1, uuid v1.23.1,
  webpki-roots v1.0.7, compression-codecs v0.4.38, and 9 others)
- uv.lock: 10 transitive bumps (pydantic v2.13.3, dagster-pipes/shared v1.13.2,
  datamodel-code-generator v0.56.1, ruff v0.15.11, and 5 others)
- package-lock.json: transitive-only via npm update; direct deps unchanged so
  the engines.vscode / @types/vscode / test-electron triangle stays in lockstep
hugocorreia90 added a commit that referenced this pull request Apr 24, 2026
Governance-waveplan polish wave on top of v1.16.0/v1.12.0/v1.8.0.

Engine 1.17.0:
- FR-009 BREAKING: reject empty workspace_ids without opt-in (#250)
- --env flag on rocky run / rocky plan + plan preview of classification /
  mask / retention actions (#251)
- Wiremock coverage for apply_column_tags + apply_masking_policy (#252)
- W004 warning for unresolved classification tags (#253)
- SCIM client + per-catalog GRANT for reconcile_role_graph (#254)
- rocky retention-status --drift warehouse probe (#255)

Dagster 1.13.0 (tracks engine 1.17.0):
- Pluggable per-call kwarg resolvers on RockyResource (#248)
- Auto-surface compliance + retention-status on RockyComponent (#249)
- Pre-flight governance_override validator (#250)
- Regenerated PlanResult with env + action preview fields (#251)

VS Code 1.9.0 (tracks engine 1.17.0):
- Regenerated plan.ts with env + 3 governance-action interfaces (#251)

Also refreshes transitive dependencies across all three artifacts:
- Cargo.lock: 14 transitive bumps (rustls v0.23.39, tokio v1.52.1, uuid v1.23.1,
  webpki-roots v1.0.7, compression-codecs v0.4.38, and 9 others)
- uv.lock: 10 transitive bumps (pydantic v2.13.3, dagster-pipes/shared v1.13.2,
  datamodel-code-generator v0.56.1, ruff v0.15.11, and 5 others)
- package-lock.json: transitive-only via npm update; direct deps unchanged so
  the engines.vscode / @types/vscode / test-electron triangle stays in lockstep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant