Skip to content

fix(engine/rocky-cli): finalize idempotency on success for non-replication dispatches (FR-004 F2)#239

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/fr-004-f2-success-path-finalize
Apr 23, 2026
Merged

fix(engine/rocky-cli): finalize idempotency on success for non-replication dispatches (FR-004 F2)#239
hugocorreia90 merged 1 commit intomainfrom
fix/fr-004-f2-success-path-finalize

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Closes the success-path leak that the #237 (FR-004 F1) agent flagged as a follow-up. rocky run --idempotency-key K against a Transformation / Quality / Snapshot / Load pipeline successfully returned Ok(()) but left its InFlight claim in place — the four non-replication dispatch arms delegated straight to run_local::run_* / load::run_load and returned without calling finalize_idempotency, and the error-path wrapper shipped in #237 only fires on is_err(). A retry with the same key inside in_flight_ttl_hours (default 24h) then returned skipped_in_flight for up to 24h instead of skipped_idempotent.

Each non-replication arm now stamps Succeeded on its happy-path exit via a new finalize_idempotency_on_success helper. Replication is unchanged — it already finalized correctly at its main exit in #235.

Before / After per arm

All four arms previously had the shape return super::run_local::run_X(...).await; — the delegated helper returned Ok(()), the arm returned directly from run(), and the IdempotencyCtx stayed Some(...) (i.e. InFlight) until the outer wrapper's is_err() check skipped the error-path finalize.

Arm Before (happy path) After (happy path)
PipelineConfig::Transformation return super::run_local::run_transformation(...).await;InFlight lingers .await? then finalize_idempotency_on_success(...) then return Ok(()) — stamped Succeeded
PipelineConfig::Quality return super::run_local::run_quality(...).await;InFlight lingers same pattern — stamped Succeeded
PipelineConfig::Snapshot return super::run_local::run_snapshot(...).await;InFlight lingers same pattern — stamped Succeeded
PipelineConfig::Load return super::load::run_load(...).await;InFlight lingers same pattern — stamped Succeeded

On the error path, each arm's .await? still propagates out of the async block and the run_result.is_err() wrapper in #237 releases the claim. The one-shot .take() on IdempotencyCtx makes the success-path finalize and the error-path wrapper mutually exclusive — whichever fires first drains the ctx.

New helper

finalize_idempotency_on_success mirrors finalize_idempotency_on_error:

  • Takes &mut Option<IdempotencyCtx> and .take()s the ctx (one-shot).
  • Opens its own StateStore from the canonical state_path owned by run() — the four delegated helpers in run_local.rs use config_dir.join(".rocky_state") which may differ from the claim's location, so finalizing at state_path matches the claim exactly.
  • Always stamps FinalOutcome::Succeeded.
  • Best-effort throughout — any error swallowed with warn! so bookkeeping flake never flips a successful run to non-zero.

Why a dedicated helper instead of calling finalize_idempotency with a synthesized RunOutput? Semantically explicit. Constructing a default RunOutput just to feed derive_run_status() depends on an implicit "default = Success" assumption — if anyone changes that default, the success-path finalize silently drops to the skip-variant branch and the leak returns.

New test

transformation_success_stamps_idempotency_succeeded_not_inflight drives run() end-to-end against a minimal transformation pipeline (DuckDB :memory: adapter, non-existent models dir so run_transformation warns-and-returns-Ok without needing live DDL) with --idempotency-key set. After the run returns Ok(()), it re-opens the redb state store and asserts the entry is IdempotencyState::Succeeded, not InFlight. Transformation is the cleanest arm to exercise — no warehouse credentials, no live DDL, pure happy-path dispatch.

Validated by commenting out the new finalize_idempotency_on_success call on the Transformation arm — the test fails with InFlight != Succeeded. Restored, it passes.

Why #237's tests didn't catch this

#237's tests (finalize_on_error_releases_inflight_claim_allowing_immediate_retry, finalize_on_error_is_one_shot_and_idempotent_when_ctx_already_taken, finalize_idempotency_is_one_shot_take) exercise the helpers directly with hand-built IdempotencyCtx values. None of them drive run() end-to-end through a non-replication dispatch arm with --idempotency-key set. The happy-path gap was invisible to the test suite — the wrapper's is_err() check made the error path airtight, but nothing exercised the Ok case through a non-replication arm.

The new F2 test closes that gap by driving run() directly, which is why it catches the issue.

Test plan

  • cargo build --release
  • cargo test -p rocky-cli — 236 passed (including 4 idempotency tests: F1's 3 + F2's new 1)
  • cargo test -p rocky-cli idempot — 3 passed (the original F1 test name contains "releases_inflight" not "idempot" so shows up under a broader filter)
  • cargo test -p rocky-cli run::tests — 22 passed
  • cargo test -p rocky-core --lib — 1053 passed
  • cargo clippy --all-targets --all-features -- -D warnings — clean
  • cargo fmt --all -- --check — clean
  • just codegen — zero schema drift
  • just regen-fixtures — zero fixture drift

Follow-ups not in scope

  • No codegen cascade (no *Output struct or RunStatus variant changes).
  • No Dagster / VS Code changes — the fix is engine-internal.
  • No CHANGELOG.md entry for 1.15.0 (the known-follow-up list at line 65 already called F1 out; F2 was implicit). Added an [Unreleased] / Fixed entry for the next engine release.

…ation dispatches (FR-004 F2)

Before this change, a `rocky run --idempotency-key K` against a
Transformation / Quality / Snapshot / Load pipeline that returned
`Ok(())` left its `InFlight` claim in place. The four non-replication
dispatch arms delegated straight to `run_local::run_*` / `load::run_load`
and returned without calling `finalize_idempotency`; the error-path
wrapper shipped in #237 only fires on `is_err()`. A retry with the same
key inside `in_flight_ttl_hours` (default 24h) then returned
`skipped_in_flight` for up to 24h instead of `skipped_idempotent`.

Each arm now stamps `Succeeded` on its happy-path exit via a new
`finalize_idempotency_on_success` helper that mirrors
`finalize_idempotency_on_error`: one-shot `.take()` on the shared
`IdempotencyCtx`, opens its own `StateStore` from the canonical
`state_path` owned by `run()`, best-effort on error. The one-shot
semantics keep the outer error-path wrapper a no-op once the success
path has drained the ctx.

Replication was never affected — it already finalized correctly at its
main exit in #235.

Tested by driving `run()` end-to-end against a transformation pipeline
with `--idempotency-key` set and asserting the redb entry is
`Succeeded`, not `InFlight`. Pre-fix, the assertion fails with
`InFlight != Succeeded`. Existing F1 coverage
(`finalize_on_error_releases_inflight_claim_allowing_immediate_retry`,
`finalize_on_error_is_one_shot_and_idempotent_when_ctx_already_taken`,
`finalize_idempotency_is_one_shot_take`) continues to pass — the F1
wrapper remains safely idempotent against the new success-path
finalize.
@hugocorreia90 hugocorreia90 merged commit 74bcfed into main Apr 23, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/fr-004-f2-success-path-finalize branch April 23, 2026 18:39
hugocorreia90 added a commit that referenced this pull request Apr 23, 2026
* chore: release engine-v1.16.0 + dagster-v1.12.0 + vscode-v1.8.0

Bundles the governance waveplan — five merged PRs (#240 audit trail,
#241 classification + masking, #242 rocky compliance, #243 role-graph,
#244 retention) on top of three FR-004 / state-path follow-ups
(#237 error-path idempotency, #238 state-path unification,
#239 success-path idempotency finalize).

Version bumps: engine 1.15.0 → 1.16.0, dagster-rocky 1.11.0 → 1.12.0,
vscode extension 1.7.0 → 1.8.0.

CHANGELOGs updated for all three artifacts.

* chore(dagster): regen test fixtures for 1.16.0

Fixture drift flagged by CI (`codegen-drift.yml`). Fixtures are captured
from the live engine binary — the version-string bump to 1.16.0 ripples
through every `version` field, and the Wave A audit-trail work (#240)
adds the 8 `RunRecord` fields to `rocky history` output, which the
playground POC now emits.

Regenerated via `just regen-fixtures` against
`examples/playground/pocs/00-foundations/00-playground-default`.

* chore(scripts): sentinel top-level version field in fixture normaliser

Every CLI output's top-level `version` is `env!("CARGO_PKG_VERSION")`
at emit time, so every engine version bump rippled through all 38
captured fixtures — every release PR fought `codegen-drift.yml` until
`just regen-fixtures` was re-run.

Extend the existing `AUDIT_FIELD_SENTINELS` set (Wave A already
sentineled the audit-trail `rocky_version` field + hostname / git
commit / etc.) with the top-level `version` key → `"0.0.0-SENTINEL"`.
After this, version bumps only touch Cargo.toml / pyproject.toml /
package.json / CHANGELOGs — never fixtures.

Regen captured all 38 fixtures; top-level `version` now uniformly
renders as `"0.0.0-SENTINEL"`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant