fix(engine): handle SIGPIPE gracefully on CLI stdout#199
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 21, 2026
Merged
Conversation
Rust installs SIG_IGN for SIGPIPE at startup, so `rocky ... | head -N` (and any similar pipe that closes early) would EPIPE on the next write, which `println!` turns into a panic and `panic = "abort"` surfaces as SIGABRT (exit 134). Restore SIG_DFL as the first action in main() — run before `Cli::parse()` so clap's own `--help` / `--version` output is covered too. Windows has no SIGPIPE; the call is a no-op stub there. Ref: rocky backlog TODO.md §3 (2026-04-20).
2 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 21, 2026
Adds reference + features-page coverage for the commands that landed in PRs #199-#203, and populates the CHANGELOG's [Unreleased] section for the upcoming coordinated release cut. - `docs/reference/commands/administration.md` — new `rocky cost <run_id|latest>` section with JSON + table examples and adapter coverage notes (Databricks/Snowflake duration-based; BigQuery bytes; DuckDB zero; discovery adapters skipped). Added to the related- commands list on `rocky trace` so the three sibling readers (replay/trace/cost) cross-link. - `docs/reference/commands/core-pipeline.md` — added `rocky branch compare <name>` to the branch subcommand header and a short usage example explaining the `ShadowConfig.schema_override` mechanism it shares with `rocky run --branch`. - `docs/features/all-features.md` — added `cost` to the administration command roster and a bullet under Observability for the historical- rollup surface. - `engine/CHANGELOG.md` — populated [Unreleased] with the five PRs shipped 2026-04-21 (SIGPIPE, branch compare, POC portability cleanup, rocky cost, record_run wiring) plus the Pydantic soft-swap + fixture normalizer internals. Ready for the next coordinated release cut. Docs build: `cd docs && npm run build` — 69 pages built, no warnings.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 21, 2026
* chore(engine): release 1.12.0 Arc 1 wave 2 + cleanup cascade. Eight PRs since v1.11.0: - #199 SIGPIPE handler - #200 rocky branch compare - #201 POC target_dialect cleanup - #202 rocky cost <run_id|latest> (Arc 2 wave 2 first PR) - #203 rocky run persists RunRecord (Arc 1 wave 2 load-bearing fix) - #204 docs + CHANGELOG [Unreleased] cascade - #205 demo-branches-replay.gif refresh - #206 real per-model started_at on MaterializationOutput rocky history / replay / trace / cost now return real data end-to-end for the first time. Full notes in CHANGELOG. * feat(state): configurable transfer timeout + tracing span + Valkey wrap - `StateConfig.transfer_timeout_seconds` (default 300s) replaces the hard- coded `STATE_TRANSFER_TIMEOUT`. Operators can now tune the wall-clock budget in `rocky.toml` for very large state or slow networks without recompiling. `StateConfig` gains a manual `Default` impl so `StateConfig::default()` yields 300s (not u64's zero). - `state.upload` / `state.download` tracing spans wrap every transfer carrying `backend`, `bucket`, and `size_bytes`. The in-elapse warn event inherits those fields automatically, so hung transfers are diagnosable from stderr logs alone (which dagster-rocky streams into the Dagster run viewer). - Structured `warn!` on timeout elapse ("state transfer exceeded timeout budget") with a `duration_ms` field — replaces silent `Timeout(_)`. - Valkey read/write paths audited and closed: `redis::Client::get_connection` + `redis::cmd(...).query()` are sync and blocked the tokio runtime thread; no outer `tokio::time::timeout` could rescue them. Both `upload_to_valkey` and `download_from_valkey` now run under `tokio::task::spawn_blocking` inside `with_transfer_timeout`, closing the same class of hang the object-store paths were already protected against. - `default_client_options()` in `object_store.rs` honours the standard `object_store`-crate env vars `AWS_ALLOW_HTTP` / `AZURE_ALLOW_HTTP` / `GOOGLE_STORAGE_ALLOW_HTTP`. Always off in production; the new integration test uses it to front-end the S3 SDK with a plain-HTTP wiremock server without bypassing the credential chain. - New `tests/state_sync_timeout_test.rs` integration test: a wiremock S3 endpoint that holds PutObject for 1h proves `upload_state` returns `StateSyncError::Timeout` within the configured 2s budget (+grace). A prompt-endpoint negative control guards against regressions. - CHANGELOG entries added under [1.12.0]. Example config in `engine/examples/dagster-integration/rocky.toml` surfaces the new key. cargo fmt clean; `clippy --workspace --all-targets -- -D warnings` clean; all 977 rocky-core unit tests + 30 e2e + 20 integration + the 2 new timeout tests pass. * chore(codegen): regenerate schemas + pydantic types for StateConfig.transfer_timeout_seconds * chore(fixtures): regenerate dagster test fixtures for 1.12.0 `just regen-fixtures` — version string bump only (1.11.0 → 1.12.0) across 35 captured fixtures under integrations/dagster/tests/fixtures_generated/.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rocky compile --output json | head -N(and every similar pipe that closes stdout early —jq | head,| less,| head -c 1, etc.) used to abort with SIGABRT (exit 134) on the child. Root cause: Rust's stdlib installsSIG_IGNfor SIGPIPE at startup, so after the downstream pipe closes, the nextwrite(2)returnsEPIPE,println!panics, and underpanic = "abort"that panic becomes SIGABRT.Fix: restore the kernel-default SIGPIPE disposition (
SIG_DFL) as the very first statement ofmain()— beforeCli::parse(), before the tokio runtime is built, before any threads exist. The process now terminates cleanly with SIGPIPE (exit 141) instead, which is the POSIX convention for broken-pipe exits. Windows gets a no-op stub via#[cfg(unix)]since it has no SIGPIPE.libcis added as a target-scoped dep under[target.'cfg(unix)'.dependencies]so the Windows build remains untouched.Ref: rocky backlog TODO.md §3 (2026-04-20).
Manual smoke test
All exit codes are
0(or, had the pipe closed mid-write rather than between writes, would be141). Never134(SIGABRT).Test plan
cargo build -p rocky --release— cleancargo fmt --all --check— cleancargo clippy --workspace --all-targets -- -D warnings— cleancargo test -p rocky— passes (0 tests; rocky is a thin binary crate)libc::signalunsafe block follows therust-unsafeskill convention (async-signal-safety, single-threaded pre-runtime invariant, fallback worst case)schemas/,types_generated/, or any*Outputstruct