Skip to content

fix(dagster): FR-014 failed_sources plumbing + argv/stderr redaction#284

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/dagster-fr014-log-redaction
Apr 29, 2026
Merged

fix(dagster): FR-014 failed_sources plumbing + argv/stderr redaction#284
hugocorreia90 merged 1 commit intomainfrom
fix/dagster-fr014-log-redaction

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Part of the post-launch maintenance cluster — two HIGH-priority fixes for the Dagster integration.

Bug #8 (HIGH/BUG) — FR-014 Python plumbing. rocky_source_sensor guards with getattr(result, "failed_sources", None) or [] (sensor.py:104), but the hand-written DiscoverResult never declared the field, so the engine fix shipped in engine-v1.17.4 was silently dropped on the Python side. Soft-swap pattern: add failed_sources: list[FailedSourceOutput] = [] to DiscoverResult, forward-referencing the codegen-generated FailedSourceOutput via the existing types-generated bridge. Re-export FailedSourceOutput from the package and from types_generated/__init__.py for downstream consumers and tests. Pin the sensor's warn-and-keep-cursor behaviour with a focused test.

Bug #9 (HIGH/SEC) — credential leak via argv + stderr.

  • _run_rocky_streaming (resource.py:1019) logged the full constructed argv via _log.info(... " ".join(cmd)), including --governance-override <json> and --idempotency-key <opaque> — both surface to operator terminals and Dagster compute logs.
  • _run_rocky's failure path (resource.py:902) embedded the entire stderr into dg.Failure.metadata["stderr"], which the Dagster UI renders inline. A multi-MB stderr blob is both a UX problem and a footgun if rocky's tracing layer ever logs sensitive context.

Fix introduces two small helpers:

  • _redact_argv(argv) — replaces the value of any token in _REDACTED_ARGV_FLAGS = {"--governance-override", "--idempotency-key"} with "***". Returns a copy so the subprocess still receives the unredacted argv.
  • _truncate_stderr_for_metadata(stderr) — caps at _STDERR_METADATA_CAP_BYTES = 8192 and appends \n... [truncated, original {N} bytes] so operators know how much was clipped.

Applied at the two leaking sites and defensively at the streaming-failure path's stderr_tail metadata. New tests cover both the helpers and an end-to-end dg.Failure truncation case.

Test plan

  • uv run pytest in integrations/dagster/ — 470 passed (10 new sensor + resource tests)
  • uv run ruff check src/ tests/ — clean
  • uv run ruff format --check src/ tests/ — clean
  • CI green on the dagster + codegen-drift workflows

Two HIGH-priority post-launch maintenance fixes for the Dagster
integration:

* FR-014 plumbing: ``rocky_source_sensor`` was guarded by
  ``getattr(result, "failed_sources", None) or []`` but the hand-written
  ``DiscoverResult`` never declared the field, so the engine fix shipped
  in engine-v1.17.4 was silently dropped on the Python side. Add
  ``failed_sources`` to ``DiscoverResult`` (forward-referencing the
  codegen-generated ``FailedSourceOutput`` via the existing soft-swap
  bridge), re-export ``FailedSourceOutput`` from the package and
  ``types_generated`` for downstream consumers, and pin the sensor's
  warn-and-keep-cursor behaviour with a focused test.

* Credential leak in logs/UI: ``_run_rocky_streaming`` logged the full
  argv (including ``--governance-override <json>`` and
  ``--idempotency-key <opaque>``) and ``_run_rocky``'s failure path
  embedded untruncated stderr into ``dg.Failure.metadata``, both of
  which surface to operator terminals and the Dagster run viewer. Add
  a ``_redact_argv`` helper that masks values of credential-bearing
  flags (``--governance-override``, ``--idempotency-key``) before they
  reach the logger, and a ``_truncate_stderr_for_metadata`` helper that
  caps stderr at 8KB with an explicit ``... [truncated, original {N}
  bytes]`` marker. Apply both at the leaking sites and to the streaming
  failure paths defensively. The subprocess still receives the real
  argv — only what gets logged / surfaced is sanitised.
@hugocorreia90 hugocorreia90 merged commit 8b0b50a into main Apr 29, 2026
9 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/dagster-fr014-log-redaction branch April 29, 2026 16:44
hugocorreia90 added a commit that referenced this pull request Apr 29, 2026
…s_generated/__init__.py

just codegen overwrote the curated __init__.py that PR #284 had added
the FailedSourceOutput re-export to. Restored main's version which
already integrates FR-014 alongside the rest of the codegen output.
hugocorreia90 added a commit that referenced this pull request Apr 29, 2026
* feat(engine): max_bytes_scanned threshold in [budget] block

Adds an optional `max_bytes_scanned: u64` field on `BudgetConfig`,
alongside the existing `max_usd` and `max_duration_ms`. The new
threshold gates a run on the aggregate `bytes_scanned` summed
across every materialization. Composes with the other limits as
all-OR — any single dimension breach trips the `budget_breach`
event and (with `on_breach = "error"`) fails the run.

Carries through:
- `BudgetLimitType::MaxBytesScanned` variant + `"max_bytes_scanned"`
  tag on `BudgetBreach` / `BudgetBreachOutput`.
- `RunCostSummary.total_bytes_scanned: Option<u64>` so consumers
  can read the aggregate without re-walking `materializations`.
- Skipped (rather than treated as zero) when no adapter reports a
  byte count, matching the `max_usd` behaviour.

Driven by post-launch user feedback that "fail CI when this run
scanned more than N TB" wasn't expressible — `max_usd` correlates
but a regression that stops pruning partitions on a flat-rate
warehouse can blow scan volume up without changing the dollar
figure.

* fix: cargo fmt + restore FailedSourceOutput re-export in dagster types_generated/__init__.py

just codegen overwrote the curated __init__.py that PR #284 had added
the FailedSourceOutput re-export to. Restored main's version which
already integrates FR-014 alongside the rest of the codegen output.
hugocorreia90 added a commit that referenced this pull request Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280,
#281, #282), the [budget].max_bytes_scanned threshold (#288), the
audit-sweep closeout (#283, #285#287, #290#293), and the rocky-server
auth + CORS gate (#291).

Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky
preview surface and ships the P1 cluster (#289) + FR-014 follow-on
(#284).

VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and
RunCostSummary.total_bytes_scanned.

See per-artifact CHANGELOG entries for the full breakdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant