feat(engine/rocky-databricks): reconcile Unity Catalog workspace bindings alongside grants#226
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 22, 2026
Conversation
…ings alongside grants Unity Catalog users isolating catalogs per workspace can now declaratively reconcile workspace bindings alongside grants in one governance pass. Previously, `[governance.isolation.workspace_ids]` was interpreted imperatively (bindings only ever added; drift never removed); this change makes the desired set declarative, so undeclared bindings are removed on next run and access-level changes (READ_WRITE -> READ_ONLY) land as a single remove + add. Trait surface: `GovernanceAdapter` gains `list_workspace_bindings` and `remove_workspace_binding`, defaulting to "not supported" error so adapters that don't support the concept must explicitly override. Databricks implements both via the existing Unity Catalog workspace-bindings REST API; Snowflake, BigQuery, and `NoopGovernanceAdapter` override to return `Ok(vec![])` / `Ok(())`, matching the existing "not applicable" semantics of `bind_workspace` / `set_isolation`. Databricks-side reconcile: `permissions.rs` gains a combined `reconcile_access` entry point that diffs grants + bindings in one pass and produces an `AccessDiff` grouping both deltas. The grants-only `reconcile()` is preserved unchanged so existing callers are unaffected. `rocky run` drives the reconcile through the trait primitives (list -> diff -> apply) so non-Databricks adapters gracefully see an empty diff. No config surface change: existing `[governance.isolation.workspace_ids]` TOML is reused as the desired-state source, so no migration is needed. Behavior note for upgraders: any workspace binding hand-added outside `rocky.toml` will be removed on next run — declare it in config first. Tests: unit tests cover the binding-diff matrix (add, remove, unchanged, access-level change, mixed); wiremock integration tests cover list / add / remove via the UC REST API and the combined `reconcile_access` pass against a wiremock'd grants + bindings stack.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.
hugocorreia90
added a commit
that referenced
this pull request
Apr 23, 2026
* feat(engine/rocky-core): classification + masking trait + config plumbing
Wave A Agent 1 foundation for column classification + masking policies.
GovernanceAdapter trait gains two methods:
- apply_column_tags(table, column_tags) — per-column tagging; default
errors so adapters declare support explicitly (Databricks YES, others
surface the gap). NoopGovernanceAdapter overrides to Ok(()) so
pipelines that declare classifications against no-governance
warehouses degrade gracefully.
- apply_masking_policy(table, policy, env) — env-aware masking policy
application. Same default-errors-must-override contract.
Types added:
- MaskStrategy (Hash | Redact | Partial | None) — wire shape matching
the rocky.toml TOML (rename_all = "lowercase"). Derives JsonSchema.
- MaskingPolicy { column_strategies } — per-column resolved strategy
map. The config→adapter bridge resolves classification tags against
[mask] / [mask.<env>] and emits this.
Config surface (rocky.toml):
- [mask] holds workspace-default strategies keyed by classification
tag (pii = "hash"). [mask.<env>] overrides per environment. Parsed
via an untagged MaskEntry enum so serde tries scalar first, falls
through to nested-table shape. Unknown strategies hard-fail at load.
- [classifications].allow_unmasked — advisory list for suppressing
the upcoming W004 warning when a classification has no matching
strategy (e.g., internal-only discovery tags).
- RockyConfig::resolve_mask_for_env(env) — single entry point the
run/plan layers will call to produce the flat tag→strategy map.
Model sidecar ([classification] block):
- ModelConfig / RawModelConfig gain classification: BTreeMap<String,
String>. Keys are column names, values are free-form classification
tags so teams can coin new ones without touching the engine.
SQL generation scaffolding (Databricks-flavored, rocky-core):
- catalog::generate_set_column_tags_sql — ALTER TABLE ... ALTER COLUMN
... SET TAGS for per-column Unity Catalog tagging.
- new masking module — generate_create_mask_sql (CREATE OR REPLACE
FUNCTION with sha2/redact/partial bodies), generate_set_mask_sql
(ALTER TABLE ... SET MASK), generate_drop_mask_sql. Function names
namespaced by env: rocky_mask_<strategy>_<env>.
Deferrals noted for follow-up commits:
- The SDK-trait (rocky-adapter-sdk) copy of GovernanceAdapter has
long lagged rocky-core's (it's missing the 4 workspace methods from
#226). Not backported here — that drift predates this PR and is out
of scope.
- CLI --env flag threading into run.rs: the resolver already takes
Option<&str>, but no callsite surfaces env yet. Lands in a follow-up
once the full run/plan pass is wired.
Tests: trait defaults + Noop overrides (rocky-core/src/traits.rs), SQL
generation (catalog.rs + masking.rs), config parsing + env-override
resolution (config.rs), sidecar classification parsing (models.rs).
* feat(engine/rocky-databricks): implement apply_column_tags + apply_masking_policy
Completes the Databricks half of the Wave A Agent 1 foundation. Unity
Catalog column tags are applied one statement per column (UC rejects
multi-column ALTER COLUMN in one DDL). Masking policies are applied in
two passes: CREATE OR REPLACE the backing functions per distinct
strategy/env, then ALTER TABLE ... ALTER COLUMN SET MASK (or DROP MASK
when the resolved strategy is None).
rocky-core::traits: MaskStrategy gains PartialOrd + Ord so BTreeSet can
dedupe strategy applications in apply_masking_policy.
rocky-databricks::catalog: new CatalogManager::set_column_tags helper
skipping empty tag maps (UC rejects SET TAGS ()).
rocky-databricks::governance: GovernanceAdapter impl for
DatabricksGovernanceAdapter gains both new methods. Pass 1 uses the
generate_create_mask_sql helper from rocky-core::masking with env-
namespaced function names (rocky_mask_<strategy>_<env>) for
idempotency. Pass 2 threads column→strategy through
generate_set_mask_sql / generate_drop_mask_sql. DROP is only emitted
when an explicit None overrides a prior masked tag; this keeps us
clear of Databricks' missing DROP MASK IF EXISTS form.
* feat(engine/rocky-cli): wire column classification + masking reconcile in rocky run
Hooks the two new GovernanceAdapter methods from the classification +
masking foundation into the happy path of `rocky run`. After the model
DAG executes successfully, the main pipeline path now:
1. Reloads the project's `rocky_compiler::Project` (cheap re-walk of
`models_dir/`) to access each model's `[classification]` sidecar.
2. For every model with a non-empty classification map, builds a
column → {"classification": tag} map and calls
`GovernanceAdapter::apply_column_tags`.
3. Resolves the project-level `[mask]` / `[mask.<env>]` config via
`RockyConfig::resolve_mask_for_env(None)` into a tag → strategy
map, filters the model's classifications that resolve, and calls
`apply_masking_policy` with a populated `MaskingPolicy`.
Failures on either call emit a `warn!` and continue — mirroring the
`apply_grants` best-effort semantics earlier in the same function.
Models without a `[classification]` block short-circuit at the first
check with no adapter work.
Deliberate v1 scope:
- `env = None` is passed to the resolver; the `--env` CLI flag is a
follow-up. The resolver already accepts `Option<&str>`, so wiring
a choice is non-breaking once the flag lands.
- The `rocky plan` preview of these actions (the PlanOutput
tag/mask rows from waveplan §2 item 6) is deferred. `plan` would
need to walk the same resolver without a connected adapter — a
small shape-only follow-up.
- The `rocky-compiler` W004 warning for unresolved classification
tags (waveplan §2 item 5) is deferred — the `RockyConfig`
already retains `[classifications.allow_unmasked]` to suppress
the warning once it lands.
Codegen cascade: `MaskStrategy` / `MaskingPolicy` / the new `[mask]`
+ `[classification]` config shapes deriving `JsonSchema` surface
through the project-level `rocky-project.schema.json`. Regenerated:
- schemas/rocky_project.schema.json
- integrations/dagster/.../rocky_project_schema.py
- editors/vscode/schemas/rocky-project.schema.json
- editors/vscode/src/types/generated/rocky_project.ts
* chore(engine): cargo fmt + clippy fixes for CI
- Run `cargo fmt` to absorb the formatting drift flagged by the CI
rustfmt --check step across config.rs, masking.rs, models.rs,
traits.rs, and rocky-databricks/governance.rs.
- Replace `.get("confidential").is_none()` with
`!contains_key("confidential")` in the mask-resolver test per the
clippy `unnecessary_get_then_check` lint.
No behavior change; same test assertions, same SQL output.
* chore(engine): cargo fmt with rustfmt 1.95.0 (CI fix)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Unity Catalog users who isolate catalogs per workspace can now declaratively reconcile workspace bindings alongside grants in one governance pass. Previously,
[governance.isolation.workspace_ids]was interpreted imperatively (bindings only ever added; drift never removed); this change makes the desired set declarative, so undeclared bindings are removed on next run and access-level changes (READ_WRITE->READ_ONLY) land as a single remove + add.Design decision
Per the reprio doc's Q4, the reconcile extends the existing
rocky-databricks/src/permissions.rs::reconcile()entry point rather than spawning a parallelgovernance.rspass:PermissionManager::reconcile_accessis the in-adapter combined pass — takes desired grants + desired workspace bindings, produces anAccessDiffgrouping both deltas, applies in one flow. The grants-onlyreconcile()stays intact so existing callers are unaffected.GovernanceAdaptertrait gainslist_workspace_bindings+remove_workspace_bindingprimitives (defaulting to "not supported" error so new adapters must declare their semantics).rocky rundrives the combined reconcile through these trait primitives — list current, diff against desired, apply viabind_workspace/remove_workspace_binding. Keepingrocky runat the trait layer means non-Databricks adapters (Snowflake, BigQuery, DuckDB, Noop) see an empty-diff no-op; only the Databricks implementation actually hits the UC REST API.Config surface
No new fields — the existing
[governance.isolation.workspace_ids]TOML block is reused as the desired-state source. This is an interpretation flip (imperative-add -> declarative-reconcile), not a schema change. Nojust codegenoutput drift; no migration required for downstream config.Behavior change for upgraders
Pre-PR: bindings in
workspace_idswere added, never removed. Post-PR: bindings not declared inworkspace_idsare removed on next run. Anyone who hand-added a workspace binding outsiderocky.tomlshould declare it in config before upgrading.Test plan
cargo test -p rocky-core -p rocky-databricks -p rocky-cli— 1016 + 100 + 207 pass, plus 20 wiremock integration tests (13 new across list / add-remove PATCH shape / combined reconcile_access pass).cargo clippy --workspace --all-targets -- -D warnings— clean.cargo fmt --all --check— clean.just codegen— no drift; no schema-affecting changes.just regen-fixtures— byte-stable.uv run pytestinintegrations/dagster/— 312 pass.npm run compileineditors/vscode/— clean.Non-Databricks adapters:
Snowflake,BigQuery, andNoopGovernanceAdapteroverride the new trait methods to returnOk(vec![])/Ok(()), matching the existing "not applicable" semantics ofbind_workspace/set_isolation. Future adapters that grow an analogous concept can override; the error-returning trait defaults force an explicit choice.