fix(engine/rocky-cli): finalize idempotency on success for non-replication dispatches (FR-004 F2)#239
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 23, 2026
Conversation
…ation dispatches (FR-004 F2) Before this change, a `rocky run --idempotency-key K` against a Transformation / Quality / Snapshot / Load pipeline that returned `Ok(())` left its `InFlight` claim in place. The four non-replication dispatch arms delegated straight to `run_local::run_*` / `load::run_load` and returned without calling `finalize_idempotency`; the error-path wrapper shipped in #237 only fires on `is_err()`. A retry with the same key inside `in_flight_ttl_hours` (default 24h) then returned `skipped_in_flight` for up to 24h instead of `skipped_idempotent`. Each arm now stamps `Succeeded` on its happy-path exit via a new `finalize_idempotency_on_success` helper that mirrors `finalize_idempotency_on_error`: one-shot `.take()` on the shared `IdempotencyCtx`, opens its own `StateStore` from the canonical `state_path` owned by `run()`, best-effort on error. The one-shot semantics keep the outer error-path wrapper a no-op once the success path has drained the ctx. Replication was never affected — it already finalized correctly at its main exit in #235. Tested by driving `run()` end-to-end against a transformation pipeline with `--idempotency-key` set and asserting the redb entry is `Succeeded`, not `InFlight`. Pre-fix, the assertion fails with `InFlight != Succeeded`. Existing F1 coverage (`finalize_on_error_releases_inflight_claim_allowing_immediate_retry`, `finalize_on_error_is_one_shot_and_idempotent_when_ctx_already_taken`, `finalize_idempotency_is_one_shot_take`) continues to pass — the F1 wrapper remains safely idempotent against the new success-path finalize.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 23, 2026
* chore: release engine-v1.16.0 + dagster-v1.12.0 + vscode-v1.8.0 Bundles the governance waveplan — five merged PRs (#240 audit trail, #241 classification + masking, #242 rocky compliance, #243 role-graph, #244 retention) on top of three FR-004 / state-path follow-ups (#237 error-path idempotency, #238 state-path unification, #239 success-path idempotency finalize). Version bumps: engine 1.15.0 → 1.16.0, dagster-rocky 1.11.0 → 1.12.0, vscode extension 1.7.0 → 1.8.0. CHANGELOGs updated for all three artifacts. * chore(dagster): regen test fixtures for 1.16.0 Fixture drift flagged by CI (`codegen-drift.yml`). Fixtures are captured from the live engine binary — the version-string bump to 1.16.0 ripples through every `version` field, and the Wave A audit-trail work (#240) adds the 8 `RunRecord` fields to `rocky history` output, which the playground POC now emits. Regenerated via `just regen-fixtures` against `examples/playground/pocs/00-foundations/00-playground-default`. * chore(scripts): sentinel top-level version field in fixture normaliser Every CLI output's top-level `version` is `env!("CARGO_PKG_VERSION")` at emit time, so every engine version bump rippled through all 38 captured fixtures — every release PR fought `codegen-drift.yml` until `just regen-fixtures` was re-run. Extend the existing `AUDIT_FIELD_SENTINELS` set (Wave A already sentineled the audit-trail `rocky_version` field + hostname / git commit / etc.) with the top-level `version` key → `"0.0.0-SENTINEL"`. After this, version bumps only touch Cargo.toml / pyproject.toml / package.json / CHANGELOGs — never fixtures. Regen captured all 38 fixtures; top-level `version` now uniformly renders as `"0.0.0-SENTINEL"`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the success-path leak that the #237 (FR-004 F1) agent flagged as a follow-up.
rocky run --idempotency-key Kagainst a Transformation / Quality / Snapshot / Load pipeline successfully returnedOk(())but left itsInFlightclaim in place — the four non-replication dispatch arms delegated straight torun_local::run_*/load::run_loadand returned without callingfinalize_idempotency, and the error-path wrapper shipped in #237 only fires onis_err(). A retry with the same key insidein_flight_ttl_hours(default 24h) then returnedskipped_in_flightfor up to 24h instead ofskipped_idempotent.Each non-replication arm now stamps
Succeededon its happy-path exit via a newfinalize_idempotency_on_successhelper. Replication is unchanged — it already finalized correctly at its main exit in #235.Before / After per arm
All four arms previously had the shape
return super::run_local::run_X(...).await;— the delegated helper returnedOk(()), the arm returned directly fromrun(), and theIdempotencyCtxstayedSome(...)(i.e.InFlight) until the outer wrapper'sis_err()check skipped the error-path finalize.PipelineConfig::Transformationreturn super::run_local::run_transformation(...).await;—InFlightlingers.await?thenfinalize_idempotency_on_success(...)thenreturn Ok(())— stampedSucceededPipelineConfig::Qualityreturn super::run_local::run_quality(...).await;—InFlightlingersSucceededPipelineConfig::Snapshotreturn super::run_local::run_snapshot(...).await;—InFlightlingersSucceededPipelineConfig::Loadreturn super::load::run_load(...).await;—InFlightlingersSucceededOn the error path, each arm's
.await?still propagates out of the async block and therun_result.is_err()wrapper in #237 releases the claim. The one-shot.take()onIdempotencyCtxmakes the success-path finalize and the error-path wrapper mutually exclusive — whichever fires first drains the ctx.New helper
finalize_idempotency_on_successmirrorsfinalize_idempotency_on_error:&mut Option<IdempotencyCtx>and.take()s the ctx (one-shot).StateStorefrom the canonicalstate_pathowned byrun()— the four delegated helpers inrun_local.rsuseconfig_dir.join(".rocky_state")which may differ from the claim's location, so finalizing atstate_pathmatches the claim exactly.FinalOutcome::Succeeded.warn!so bookkeeping flake never flips a successful run to non-zero.Why a dedicated helper instead of calling
finalize_idempotencywith a synthesizedRunOutput? Semantically explicit. Constructing a defaultRunOutputjust to feedderive_run_status()depends on an implicit "default = Success" assumption — if anyone changes that default, the success-path finalize silently drops to the skip-variant branch and the leak returns.New test
transformation_success_stamps_idempotency_succeeded_not_inflightdrivesrun()end-to-end against a minimal transformation pipeline (DuckDB:memory:adapter, non-existent models dir sorun_transformationwarns-and-returns-Ok without needing live DDL) with--idempotency-keyset. After the run returnsOk(()), it re-opens the redb state store and asserts the entry isIdempotencyState::Succeeded, notInFlight. Transformation is the cleanest arm to exercise — no warehouse credentials, no live DDL, pure happy-path dispatch.Validated by commenting out the new
finalize_idempotency_on_successcall on the Transformation arm — the test fails withInFlight != Succeeded. Restored, it passes.Why #237's tests didn't catch this
#237's tests (
finalize_on_error_releases_inflight_claim_allowing_immediate_retry,finalize_on_error_is_one_shot_and_idempotent_when_ctx_already_taken,finalize_idempotency_is_one_shot_take) exercise the helpers directly with hand-builtIdempotencyCtxvalues. None of them driverun()end-to-end through a non-replication dispatch arm with--idempotency-keyset. The happy-path gap was invisible to the test suite — the wrapper'sis_err()check made the error path airtight, but nothing exercised theOkcase through a non-replication arm.The new F2 test closes that gap by driving
run()directly, which is why it catches the issue.Test plan
cargo build --releasecargo test -p rocky-cli— 236 passed (including 4 idempotency tests: F1's 3 + F2's new 1)cargo test -p rocky-cli idempot— 3 passed (the original F1 test name contains "releases_inflight" not "idempot" so shows up under a broader filter)cargo test -p rocky-cli run::tests— 22 passedcargo test -p rocky-core --lib— 1053 passedcargo clippy --all-targets --all-features -- -D warnings— cleancargo fmt --all -- --check— cleanjust codegen— zero schema driftjust regen-fixtures— zero fixture driftFollow-ups not in scope
*Outputstruct orRunStatusvariant changes).CHANGELOG.mdentry for 1.15.0 (the known-follow-up list at line 65 already called F1 out; F2 was implicit). Added an[Unreleased] / Fixedentry for the next engine release.