fix(dagster): FR-014 failed_sources plumbing + argv/stderr redaction#284
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 29, 2026
Merged
fix(dagster): FR-014 failed_sources plumbing + argv/stderr redaction#284hugocorreia90 merged 1 commit intomainfrom
hugocorreia90 merged 1 commit intomainfrom
Conversation
Two HIGH-priority post-launch maintenance fixes for the Dagster
integration:
* FR-014 plumbing: ``rocky_source_sensor`` was guarded by
``getattr(result, "failed_sources", None) or []`` but the hand-written
``DiscoverResult`` never declared the field, so the engine fix shipped
in engine-v1.17.4 was silently dropped on the Python side. Add
``failed_sources`` to ``DiscoverResult`` (forward-referencing the
codegen-generated ``FailedSourceOutput`` via the existing soft-swap
bridge), re-export ``FailedSourceOutput`` from the package and
``types_generated`` for downstream consumers, and pin the sensor's
warn-and-keep-cursor behaviour with a focused test.
* Credential leak in logs/UI: ``_run_rocky_streaming`` logged the full
argv (including ``--governance-override <json>`` and
``--idempotency-key <opaque>``) and ``_run_rocky``'s failure path
embedded untruncated stderr into ``dg.Failure.metadata``, both of
which surface to operator terminals and the Dagster run viewer. Add
a ``_redact_argv`` helper that masks values of credential-bearing
flags (``--governance-override``, ``--idempotency-key``) before they
reach the logger, and a ``_truncate_stderr_for_metadata`` helper that
caps stderr at 8KB with an explicit ``... [truncated, original {N}
bytes]`` marker. Apply both at the leaking sites and to the streaming
failure paths defensively. The subprocess still receives the real
argv — only what gets logged / surfaced is sanitised.
hugocorreia90
added a commit
that referenced
this pull request
Apr 29, 2026
…s_generated/__init__.py just codegen overwrote the curated __init__.py that PR #284 had added the FailedSourceOutput re-export to. Restored main's version which already integrates FR-014 alongside the rest of the codegen output.
hugocorreia90
added a commit
that referenced
this pull request
Apr 29, 2026
* feat(engine): max_bytes_scanned threshold in [budget] block Adds an optional `max_bytes_scanned: u64` field on `BudgetConfig`, alongside the existing `max_usd` and `max_duration_ms`. The new threshold gates a run on the aggregate `bytes_scanned` summed across every materialization. Composes with the other limits as all-OR — any single dimension breach trips the `budget_breach` event and (with `on_breach = "error"`) fails the run. Carries through: - `BudgetLimitType::MaxBytesScanned` variant + `"max_bytes_scanned"` tag on `BudgetBreach` / `BudgetBreachOutput`. - `RunCostSummary.total_bytes_scanned: Option<u64>` so consumers can read the aggregate without re-walking `materializations`. - Skipped (rather than treated as zero) when no adapter reports a byte count, matching the `max_usd` behaviour. Driven by post-launch user feedback that "fail CI when this run scanned more than N TB" wasn't expressible — `max_usd` correlates but a regression that stops pruning partitions on a flat-rate warehouse can blow scan volume up without changing the dollar figure. * fix: cargo fmt + restore FailedSourceOutput re-export in dagster types_generated/__init__.py just codegen overwrote the curated __init__.py that PR #284 had added the FailedSourceOutput re-export to. Restored main's version which already integrates FR-014 alongside the rest of the codegen output.
5 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280, #281, #282), the [budget].max_bytes_scanned threshold (#288), the audit-sweep closeout (#283, #285–#287, #290–#293), and the rocky-server auth + CORS gate (#291). Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky preview surface and ships the P1 cluster (#289) + FR-014 follow-on (#284). VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and RunCostSummary.total_bytes_scanned. See per-artifact CHANGELOG entries for the full breakdown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Part of the post-launch maintenance cluster — two HIGH-priority fixes for the Dagster integration.
Bug #8 (HIGH/BUG) — FR-014 Python plumbing.
rocky_source_sensorguards withgetattr(result, "failed_sources", None) or [](sensor.py:104), but the hand-writtenDiscoverResultnever declared the field, so the engine fix shipped in engine-v1.17.4 was silently dropped on the Python side. Soft-swap pattern: addfailed_sources: list[FailedSourceOutput] = []toDiscoverResult, forward-referencing the codegen-generatedFailedSourceOutputvia the existing types-generated bridge. Re-exportFailedSourceOutputfrom the package and fromtypes_generated/__init__.pyfor downstream consumers and tests. Pin the sensor's warn-and-keep-cursor behaviour with a focused test.Bug #9 (HIGH/SEC) — credential leak via argv + stderr.
_run_rocky_streaming(resource.py:1019) logged the full constructed argv via_log.info(... " ".join(cmd)), including--governance-override <json>and--idempotency-key <opaque>— both surface to operator terminals and Dagster compute logs._run_rocky's failure path (resource.py:902) embedded the entire stderr intodg.Failure.metadata["stderr"], which the Dagster UI renders inline. A multi-MB stderr blob is both a UX problem and a footgun if rocky's tracing layer ever logs sensitive context.Fix introduces two small helpers:
_redact_argv(argv)— replaces the value of any token in_REDACTED_ARGV_FLAGS = {"--governance-override", "--idempotency-key"}with"***". Returns a copy so the subprocess still receives the unredacted argv._truncate_stderr_for_metadata(stderr)— caps at_STDERR_METADATA_CAP_BYTES = 8192and appends\n... [truncated, original {N} bytes]so operators know how much was clipped.Applied at the two leaking sites and defensively at the streaming-failure path's
stderr_tailmetadata. New tests cover both the helpers and an end-to-enddg.Failuretruncation case.Test plan
uv run pytestinintegrations/dagster/— 470 passed (10 new sensor + resource tests)uv run ruff check src/ tests/— cleanuv run ruff format --check src/ tests/— clean