fix(engine/rocky-cli): finalize idempotency on success for non-replication dispatches (FR-004 F2) by hugocorreia90 · Pull Request #239 · rocky-data/rocky

hugocorreia90 · 2026-04-23T18:29:20Z

Summary

Closes the success-path leak that the #237 (FR-004 F1) agent flagged as a follow-up. rocky run --idempotency-key K against a Transformation / Quality / Snapshot / Load pipeline successfully returned Ok(()) but left its InFlight claim in place — the four non-replication dispatch arms delegated straight to run_local::run_* / load::run_load and returned without calling finalize_idempotency, and the error-path wrapper shipped in #237 only fires on is_err(). A retry with the same key inside in_flight_ttl_hours (default 24h) then returned skipped_in_flight for up to 24h instead of skipped_idempotent.

Each non-replication arm now stamps Succeeded on its happy-path exit via a new finalize_idempotency_on_success helper. Replication is unchanged — it already finalized correctly at its main exit in #235.

Before / After per arm

All four arms previously had the shape return super::run_local::run_X(...).await; — the delegated helper returned Ok(()), the arm returned directly from run(), and the IdempotencyCtx stayed Some(...) (i.e. InFlight) until the outer wrapper's is_err() check skipped the error-path finalize.

Arm	Before (happy path)	After (happy path)
`PipelineConfig::Transformation`	`return super::run_local::run_transformation(...).await;` — `InFlight` lingers	`.await?` then `finalize_idempotency_on_success(...)` then `return Ok(())` — stamped `Succeeded`
`PipelineConfig::Quality`	`return super::run_local::run_quality(...).await;` — `InFlight` lingers	same pattern — stamped `Succeeded`
`PipelineConfig::Snapshot`	`return super::run_local::run_snapshot(...).await;` — `InFlight` lingers	same pattern — stamped `Succeeded`
`PipelineConfig::Load`	`return super::load::run_load(...).await;` — `InFlight` lingers	same pattern — stamped `Succeeded`

On the error path, each arm's .await? still propagates out of the async block and the run_result.is_err() wrapper in #237 releases the claim. The one-shot .take() on IdempotencyCtx makes the success-path finalize and the error-path wrapper mutually exclusive — whichever fires first drains the ctx.

New helper

finalize_idempotency_on_success mirrors finalize_idempotency_on_error:

Takes &mut Option<IdempotencyCtx> and .take()s the ctx (one-shot).
Opens its own StateStore from the canonical state_path owned by run() — the four delegated helpers in run_local.rs use config_dir.join(".rocky_state") which may differ from the claim's location, so finalizing at state_path matches the claim exactly.
Always stamps FinalOutcome::Succeeded.
Best-effort throughout — any error swallowed with warn! so bookkeeping flake never flips a successful run to non-zero.

Why a dedicated helper instead of calling finalize_idempotency with a synthesized RunOutput? Semantically explicit. Constructing a default RunOutput just to feed derive_run_status() depends on an implicit "default = Success" assumption — if anyone changes that default, the success-path finalize silently drops to the skip-variant branch and the leak returns.

New test

transformation_success_stamps_idempotency_succeeded_not_inflight drives run() end-to-end against a minimal transformation pipeline (DuckDB :memory: adapter, non-existent models dir so run_transformation warns-and-returns-Ok without needing live DDL) with --idempotency-key set. After the run returns Ok(()), it re-opens the redb state store and asserts the entry is IdempotencyState::Succeeded, not InFlight. Transformation is the cleanest arm to exercise — no warehouse credentials, no live DDL, pure happy-path dispatch.

Validated by commenting out the new finalize_idempotency_on_success call on the Transformation arm — the test fails with InFlight != Succeeded. Restored, it passes.

Why #237's tests didn't catch this

#237's tests (finalize_on_error_releases_inflight_claim_allowing_immediate_retry, finalize_on_error_is_one_shot_and_idempotent_when_ctx_already_taken, finalize_idempotency_is_one_shot_take) exercise the helpers directly with hand-built IdempotencyCtx values. None of them drive run() end-to-end through a non-replication dispatch arm with --idempotency-key set. The happy-path gap was invisible to the test suite — the wrapper's is_err() check made the error path airtight, but nothing exercised the Ok case through a non-replication arm.

The new F2 test closes that gap by driving run() directly, which is why it catches the issue.

Test plan

cargo build --release
cargo test -p rocky-cli — 236 passed (including 4 idempotency tests: F1's 3 + F2's new 1)
cargo test -p rocky-cli idempot — 3 passed (the original F1 test name contains "releases_inflight" not "idempot" so shows up under a broader filter)
cargo test -p rocky-cli run::tests — 22 passed
cargo test -p rocky-core --lib — 1053 passed
cargo clippy --all-targets --all-features -- -D warnings — clean
cargo fmt --all -- --check — clean
just codegen — zero schema drift
just regen-fixtures — zero fixture drift

Follow-ups not in scope

No codegen cascade (no *Output struct or RunStatus variant changes).
No Dagster / VS Code changes — the fix is engine-internal.
No CHANGELOG.md entry for 1.15.0 (the known-follow-up list at line 65 already called F1 out; F2 was implicit). Added an [Unreleased] / Fixed entry for the next engine release.

…ation dispatches (FR-004 F2) Before this change, a `rocky run --idempotency-key K` against a Transformation / Quality / Snapshot / Load pipeline that returned `Ok(())` left its `InFlight` claim in place. The four non-replication dispatch arms delegated straight to `run_local::run_*` / `load::run_load` and returned without calling `finalize_idempotency`; the error-path wrapper shipped in #237 only fires on `is_err()`. A retry with the same key inside `in_flight_ttl_hours` (default 24h) then returned `skipped_in_flight` for up to 24h instead of `skipped_idempotent`. Each arm now stamps `Succeeded` on its happy-path exit via a new `finalize_idempotency_on_success` helper that mirrors `finalize_idempotency_on_error`: one-shot `.take()` on the shared `IdempotencyCtx`, opens its own `StateStore` from the canonical `state_path` owned by `run()`, best-effort on error. The one-shot semantics keep the outer error-path wrapper a no-op once the success path has drained the ctx. Replication was never affected — it already finalized correctly at its main exit in #235. Tested by driving `run()` end-to-end against a transformation pipeline with `--idempotency-key` set and asserting the redb entry is `Succeeded`, not `InFlight`. Pre-fix, the assertion fails with `InFlight != Succeeded`. Existing F1 coverage (`finalize_on_error_releases_inflight_claim_allowing_immediate_retry`, `finalize_on_error_is_one_shot_and_idempotent_when_ctx_already_taken`, `finalize_idempotency_is_one_shot_take`) continues to pass — the F1 wrapper remains safely idempotent against the new success-path finalize.

* chore: release engine-v1.16.0 + dagster-v1.12.0 + vscode-v1.8.0 Bundles the governance waveplan — five merged PRs (#240 audit trail, #241 classification + masking, #242 rocky compliance, #243 role-graph, #244 retention) on top of three FR-004 / state-path follow-ups (#237 error-path idempotency, #238 state-path unification, #239 success-path idempotency finalize). Version bumps: engine 1.15.0 → 1.16.0, dagster-rocky 1.11.0 → 1.12.0, vscode extension 1.7.0 → 1.8.0. CHANGELOGs updated for all three artifacts. * chore(dagster): regen test fixtures for 1.16.0 Fixture drift flagged by CI (`codegen-drift.yml`). Fixtures are captured from the live engine binary — the version-string bump to 1.16.0 ripples through every `version` field, and the Wave A audit-trail work (#240) adds the 8 `RunRecord` fields to `rocky history` output, which the playground POC now emits. Regenerated via `just regen-fixtures` against `examples/playground/pocs/00-foundations/00-playground-default`. * chore(scripts): sentinel top-level version field in fixture normaliser Every CLI output's top-level `version` is `env!("CARGO_PKG_VERSION")` at emit time, so every engine version bump rippled through all 38 captured fixtures — every release PR fought `codegen-drift.yml` until `just regen-fixtures` was re-run. Extend the existing `AUDIT_FIELD_SENTINELS` set (Wave A already sentineled the audit-trail `rocky_version` field + hostname / git commit / etc.) with the top-level `version` key → `"0.0.0-SENTINEL"`. After this, version bumps only touch Cargo.toml / pyproject.toml / package.json / CHANGELOGs — never fixtures. Regen captured all 38 fixtures; top-level `version` now uniformly renders as `"0.0.0-SENTINEL"`.

hugocorreia90 merged commit 74bcfed into main Apr 23, 2026
12 checks passed

hugocorreia90 deleted the fix/fr-004-f2-success-path-finalize branch April 23, 2026 18:39

hugocorreia90 mentioned this pull request Apr 23, 2026

chore: release engine-v1.16.0 + dagster-v1.12.0 + vscode-v1.8.0 #245

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(engine/rocky-cli): finalize idempotency on success for non-replication dispatches (FR-004 F2)#239

fix(engine/rocky-cli): finalize idempotency on success for non-replication dispatches (FR-004 F2)#239
hugocorreia90 merged 1 commit intomainfrom
fix/fr-004-f2-success-path-finalize

hugocorreia90 commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hugocorreia90 commented Apr 23, 2026

Summary

Before / After per arm

New helper

New test

Why #237's tests didn't catch this

Test plan

Follow-ups not in scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant