Skip to content

feat(engine): role-graph reconciliation (Wave C-1)#243

Merged
hugocorreia90 merged 1 commit intomainfrom
feat/governance-role-graph
Apr 23, 2026
Merged

feat(engine): role-graph reconciliation (Wave C-1)#243
hugocorreia90 merged 1 commit intomainfrom
feat/governance-role-graph

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Wave C-1 of the governance waveplan — hierarchical roles declared in rocky.toml, flattened via a DAG walk, and reconciled against the warehouse via a new GovernanceAdapter::reconcile_role_graph trait method.

Config surface

[role.reader]
permissions = ["SELECT", "USE CATALOG", "USE SCHEMA"]

[role.analytics_engineer]
inherits = ["reader"]
permissions = ["MODIFY"]        # added to reader's, flattened at reconcile

[role.admin]
inherits = ["analytics_engineer"]
permissions = ["MANAGE"]

admin resolves to SELECT + USE CATALOG + USE SCHEMA + MODIFY + MANAGE via the inheritance chain.

Core types

  • RoleConfig { inherits, permissions } on RockyConfig.roles — parsed from [role.<name>] blocks.
  • ResolvedRole in rocky-core::ir — flattened per-role result (full permission set + retained inherits_from for audit/debug).
  • RoleGraphError in rocky-core::role_graphCycle { role, path }, UnknownParent { role, parent }, UnknownPermission { role, permission }.

Algorithm

  • DFS traversal with in-stack tracking for cycle detection (produces the exact cycle path).
  • BTreeSet<Permission> dedup; BTreeMap output for deterministic iteration.
  • 12 unit tests: linear inheritance, diamond dedup, self-/2-/3-node cycles, unknown parent, unknown permission, empty graph, grouping-only roles, deterministic sort.

Trait extension

async fn reconcile_role_graph(
    &self,
    roles: &BTreeMap<String, ResolvedRole>,
) -> AdapterResult<()>;

Default-unsupported on all adapters; NoopGovernanceAdapter overrides to Ok(()).

Databricks impl — v1 log-only

Validates each rocky_role_<name> against Databricks principal syntax (fails fast on bad names) and emits a structured debug! record per role. Does not call SCIM or emit GRANT DDL in v1 — see follow-ups below.

Runtime wiring

rocky run now calls rocky_cfg.role_graph() (flatten) + governance_adapter.reconcile_role_graph(&resolved) alongside the Wave A classification / masking applies. Best-effort semantics: failures warn!, pipeline continues.

Explicit v1 deferrals (tracked follow-ups, not in this PR)

  1. SCIM client in rocky-databricks — create / reconcile Unity Catalog groups programmatically. The rocky_role_<name> naming is pinned; SCIM work is isolated to group lifecycle.
  2. Per-catalog GRANT emission. The trait sig has no catalog parameter, so the Databricks impl can't emit GRANT ... ON CATALOG today. Two paths forward — (a) extend the sig with &[catalog], or (b) have run.rs iterate managed catalogs and call per catalog. Either is a small, isolated change.
  3. --env threading into reconcile_role_graph for parity with masking's env override.

Files touched (16 files, +1040 / -2 LOC)

Area Change
rocky-core/src/role_graph.rs (new) Flattening + error types + 12 unit tests
rocky-core/src/ir.rs ResolvedRole, Permission Ord/PartialOrd/FromStr, UnknownPermission
rocky-core/src/config.rs RoleConfig, RockyConfig.roles, role_graph() helper
rocky-core/src/traits.rs New trait method + default + Noop override + 2 tests
rocky-core/src/{bridge,cross_engine,docs,unified_dag}.rs Add roles: Default::default() to test literals
rocky-databricks/src/governance.rs v1 log-only impl + role_group_name() + 4 unit tests
rocky-databricks/tests/wiremock_tests.rs 2 trait-dispatch tests pinning v1 as no-network
rocky-cli/src/commands/run.rs Best-effort wiring after Wave A masking reconcile
schemas/rocky_project.schema.json + dagster Pydantic + vscode TS Codegen cascade for RoleConfig

Test plan

  • cargo test --workspace --features rocky-databricks/test-support — 1097+22 pass
  • cargo clippy --workspace --all-targets -- -D warnings — green
  • cargo fmt --check — green (rustfmt 1.95.0)
  • just codegen idempotent (re-run produces no drift)
  • Declare a 3-level inheritance chain in a playground POC rocky.toml, run rocky run against a Databricks workspace with pre-created rocky_role_* groups, confirm debug logs list the right flattened permissions (real GRANT emission is Wave C-1-followup)
  • Cycle + unknown-parent error surfacing in rocky compile / rocky run

Adds hierarchical `[role.*]` declarations to `rocky.toml` and reconciles
them against the warehouse's native role/group system via a new
`GovernanceAdapter::reconcile_role_graph` trait method.

## Config (rocky-core)

```toml
[role.reader]
permissions = ["SELECT", "USE CATALOG", "USE SCHEMA"]

[role.analytics_engineer]
inherits = ["reader"]
permissions = ["MODIFY"]

[role.admin]
inherits = ["analytics_engineer"]
permissions = ["MANAGE"]
```

`admin` resolves to the union of SELECT + USE CATALOG + USE SCHEMA +
MODIFY + MANAGE. Flattening lives in `rocky-core/src/role_graph.rs`:
DFS cycle detection + unknown-parent validation + deterministic
dedup via `BTreeSet<Permission>` (Permission now derives Ord).

## Trait surface

`GovernanceAdapter::reconcile_role_graph(roles: &BTreeMap<String,
ResolvedRole>) -> AdapterResult<()>` with the same default/noop shape
Wave A used for `apply_column_tags` / `apply_masking_policy`. Trait
default errors "not supported"; NoopGovernanceAdapter returns Ok so
pipelines that declare roles against non-governance warehouses
degrade gracefully.

## Databricks impl (v1: log-only)

UC has groups rather than roles; Rocky maps each role to a UC group
named `rocky_role_<name>`. A complete impl would (1) create the group
via the SCIM API, then (2) emit `GRANT <permission> ON CATALOG ... TO
<group>` for every catalog Rocky manages. Neither piece exists in
`rocky-databricks` today:

- **No SCIM client.** Wiring one up is out of scope for this PR — the
  task spec explicitly calls this out as a follow-up to avoid a "huge
  new auth surface." Tracked for a subsequent wave.
- **No catalog context in the trait sig.** `reconcile_role_graph(&self,
  roles: &...)` has no catalog parameter, so per-catalog GRANT
  emission can't happen here — it needs either a sig tweak or a
  coordinator in `run.rs` that iterates managed catalogs.

In v1 the impl validates each group name against Databricks'
principal syntax (so the deferred GRANT path can't generate invalid
SQL later) and emits a structured `debug` log of the reconciled role
graph.

## Runtime wiring (rocky-cli)

`rocky run` calls `rocky_cfg.role_graph()` after its successful DAG
and dispatches to `governance_adapter.reconcile_role_graph(...)`.
Best-effort: flatten errors and adapter errors both `warn!` without
aborting, mirroring the Wave A classification+masking wiring.

## Tests

- 12 unit tests for `flatten_role_graph` covering linear inheritance,
  diamond dedup, self/2-node/3-node cycles, unknown parent, unknown
  permission, empty graph, grouping-only (no-permission) roles, and
  deterministic sort order.
- 2 trait-default + noop tests for `reconcile_role_graph`.
- 4 Databricks unit tests (group-name convention, empty map,
  multi-role, invalid group-name rejection).
- 2 wiremock tests pin the trait-dispatch contract and verify no HTTP
  call is made in v1.

## Codegen

`roles: BTreeMap<String, RoleConfig>` added to `RockyConfig` triggers
the schemas cascade: `schemas/rocky_project.schema.json` +
`integrations/dagster/.../rocky_project_schema.py` +
`editors/vscode/src/types/generated/rocky_project.ts` all regenerated
via `just codegen`.

## Follow-ups

1. SCIM client in `rocky-databricks` for UC group creation.
2. Extend `reconcile_role_graph` (or add a coordinator in `run.rs`) so
   per-catalog GRANT emission lands.
3. Thread `--env` into the `reconcile_role_graph` call site for parity
   with the masking resolver, if per-env role sets become a demand.
@hugocorreia90 hugocorreia90 merged commit 75e7ce8 into main Apr 23, 2026
15 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/governance-role-graph branch April 23, 2026 23:02
hugocorreia90 added a commit that referenced this pull request Apr 23, 2026
* chore: release engine-v1.16.0 + dagster-v1.12.0 + vscode-v1.8.0

Bundles the governance waveplan — five merged PRs (#240 audit trail,
#241 classification + masking, #242 rocky compliance, #243 role-graph,
#244 retention) on top of three FR-004 / state-path follow-ups
(#237 error-path idempotency, #238 state-path unification,
#239 success-path idempotency finalize).

Version bumps: engine 1.15.0 → 1.16.0, dagster-rocky 1.11.0 → 1.12.0,
vscode extension 1.7.0 → 1.8.0.

CHANGELOGs updated for all three artifacts.

* chore(dagster): regen test fixtures for 1.16.0

Fixture drift flagged by CI (`codegen-drift.yml`). Fixtures are captured
from the live engine binary — the version-string bump to 1.16.0 ripples
through every `version` field, and the Wave A audit-trail work (#240)
adds the 8 `RunRecord` fields to `rocky history` output, which the
playground POC now emits.

Regenerated via `just regen-fixtures` against
`examples/playground/pocs/00-foundations/00-playground-default`.

* chore(scripts): sentinel top-level version field in fixture normaliser

Every CLI output's top-level `version` is `env!("CARGO_PKG_VERSION")`
at emit time, so every engine version bump rippled through all 38
captured fixtures — every release PR fought `codegen-drift.yml` until
`just regen-fixtures` was re-run.

Extend the existing `AUDIT_FIELD_SENTINELS` set (Wave A already
sentineled the audit-trail `rocky_version` field + hostname / git
commit / etc.) with the top-level `version` key → `"0.0.0-SENTINEL"`.
After this, version bumps only touch Cargo.toml / pyproject.toml /
package.json / CHANGELOGs — never fixtures.

Regen captured all 38 fixtures; top-level `version` now uniformly
renders as `"0.0.0-SENTINEL"`.
hugocorreia90 added a commit that referenced this pull request Apr 24, 2026
…ion actions (#251)

Closes the `--env <name>` plumbing gap left over from the 1.16.0 governance
waveplan: `RockyConfig::resolve_mask_for_env(Option<&str>)` already accepted
an env, but `rocky run` / `rocky plan` hard-coded `None`. This wires the flag
through on both commands so `[mask.<env>]` overrides resolve over the
workspace `[mask]` defaults, matching the `--env` shape `rocky compliance`
already uses.

`PlanOutput` gains three additive action-row collections — a dry-run view of
the control-plane governance work the post-DAG reconcile pass in `rocky run`
would do:

- `classification_actions`: `(model, column, tag)` triples from
  `[classification]` sidecars.
- `mask_actions`: `(model, column, tag, resolved_strategy)` where the tag
  resolves under the active env; unresolved tags are a `rocky compliance`
  diagnostic, not a preview row.
- `retention_actions`: models with `retention = "<N>[dy]"` sidecar, carrying
  the parsed `duration_days` + a warehouse-native `warehouse_preview`
  (Databricks renders the Delta TBLPROPERTIES pair; Snowflake renders
  `DATA_RETENTION_TIME_IN_DAYS`; other adapters emit `null`).

All three fields use `skip_serializing_if = "Vec::is_empty"` so existing JSON
consumers on projects without governance config are byte-stable. `PlanOutput.env`
carries the active `--env` under the same treatment.

Role-graph reconcile stays env-invariant. `rocky.toml` has no `[role.<env>]`
override shape (contrast `[mask.<env>]`); roles represent deployment-wide
permission groups while masks vary per env. `--env` therefore does NOT flow
into `reconcile_role_graph`. Classification tagging and retention policies
are also env-invariant by the same reasoning.

Regenerated bindings via `just codegen`:
- `schemas/plan.schema.json`
- `integrations/dagster/src/dagster_rocky/types_generated/plan_schema.py`
- `editors/vscode/src/types/generated/plan.ts`

Dagster `PlanResult` hand-written model picks up the four new fields
(`env`, `classification_actions`, `mask_actions`, `retention_actions`) and
re-exports `ClassificationAction` / `MaskAction` / `RetentionAction` from
the package barrel. New `PLAN_WITH_GOVERNANCE` scenario + `plan_with_governance_json`
fixture + `test_parse_plan_with_governance` parse-guard.

Follow-up of the governance waveplan shipped in engine-v1.16.0 (#241, #243, #244).
hugocorreia90 added a commit that referenced this pull request Apr 24, 2026
…le-graph reconciliation (#254)

Completes the Wave C-1 role-graph reconciler that shipped log-only in
#243. The v1 impl validated role names and emitted `debug!` but never
created UC groups or emitted grants — this wires both pieces up so
`reconcile_role_graph` becomes real.

New `rocky_databricks::scim` module wraps the workspace-level
`/api/2.0/preview/scim/v2/Groups` surface:

- `create_group(display_name)` — POSTs with SCIM 2.0 `schemas` envelope;
  idempotent via POST-first / 409-fallback-to-GET-by-displayName.
- `get_group_by_name(display_name)` — filter lookup.
- `delete_group` deferred (ADD-ONLY v1 scope).

Reuses the existing `Auth` stack (PAT + OAuth M2M) — no new auth path.

`DatabricksGovernanceAdapter::reconcile_role_graph` now:

1. Creates a `rocky_role_<name>` group per role via SCIM (catalog-
   independent, once per role).
2. Emits `GRANT <permission> ON CATALOG <catalog> TO
   `rocky_role_<name>`` per `(role, catalog, permission)` triple
   against every catalog the current `rocky run` touched.

Trait signature change: `GovernanceAdapter::reconcile_role_graph` gains
a `catalogs: &[&str]` parameter (Option A). Keeps SCIM creates outside
the catalog loop — avoids N× redundant round-trips that Option B
(per-catalog calls from the caller) would force. In-process trait only;
no JSON / FFI / bindings surface to regenerate. `run.rs` collects
managed catalogs unconditionally — not gated on `auto_create_catalogs`,
so pre-provisioned catalogs are covered.

ADD-ONLY semantics: groups are never deleted and grants are never
revoked by this path. A role removed from rocky.toml leaves its group
and grants in place until a future reconcile mode adds delete. Adapters
constructed via `without_workspace()` (no SCIM client) fall back to
the v1 log-only behaviour.

Tests: SCIM unit tests (URL / body serialization / response parsing)
plus wiremock end-to-end covering happy-path create (201), idempotent
409 → GET fallback, 2 roles × 2 catalogs with varied permissions (2
SCIM creates + 6 GRANTs), empty-catalogs (SCIM fires, zero GRANTs),
empty-roles no-op, and the log-only fallback when SCIM isn't
configured.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant