docs(engine): clarify bytes_scanned holds billing-relevant bytes, not scan volume#222
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 22, 2026
Merged
Conversation
… scan volume bytes_scanned carries the adapter's *billed* bytes figure, not literal scan volume — for BigQuery that's `totalBytesBilled` (with the 10 MB per-query minimum floor), which matches the BigQuery console's "Bytes billed" field, not "Bytes processed". The previous field-level docs either omitted that distinction or buried it; anyone comparing Rocky's output to the warehouse console would have reached for the wrong column. Added adapter-state-neutral docstrings covering all four adapters (BigQuery, Databricks, Snowflake, DuckDB) to every `bytes_scanned` and `bytes_written` field declaration — 6 sites across `ExecutionStats`, `ModelExecution`, `MaterializationOutput`, `ReplayModelOutput`, `TraceModelEntry`, and `PerModelCostHistorical`. The rustdoc cascades via `just codegen` into `description` fields on `run`/`replay`/`trace`/`cost` JSON schemas, Pydantic v2 `Field(description=...)`, and TypeScript JSDoc — so the documented semantic is now visible in VS Code hover, Dagster's Pydantic IDE integrations, and any downstream schema consumer. Zero behavior change (just regen-fixtures confirmed byte-stable).
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up clarification to #219.
bytes_scannedcarries the adapter's billed bytes figure, not literal scan volume. For BigQuery that'stotalBytesBilled(with the 10 MB per-query minimum floor), which matches the BigQuery console's "Bytes billed" field — not "Bytes processed". The previous field-level docs either omitted that distinction or buried it; anyone comparing Rocky's output to the warehouse console would have reached for the wrong column.This PR adds adapter-state-neutral docstrings covering all four adapters (BigQuery, Databricks, Snowflake, DuckDB) to every
bytes_scannedandbytes_writtenfield declaration — 6 sites acrossExecutionStats,ModelExecution,MaterializationOutput,ReplayModelOutput,TraceModelEntry, andPerModelCostHistorical.The cascade is the whole point
schemarsemits///rustdoc as thedescriptionfield in generated JSON schemas, so the documented semantic cascades throughjust codegeninto:schemas/{run,replay,trace,cost}.schema.json—descriptionfieldintegrations/dagster/src/dagster_rocky/types_generated/{run,replay,trace,cost}_schema.py— Pydantic v2Field(description=...)editors/vscode/src/types/generated/{run,replay,trace,cost}.ts— TypeScript JSDocNet effect: the BQ-console-comparison nugget is now visible in VS Code hover (rustdoc), Dagster's Pydantic IDE integrations (Python docstrings), and any downstream JSON schema consumer.
Files touched
Rust source (6 field sites, 3 files):
engine/crates/rocky-core/src/traits.rs—ExecutionStats.{bytes_scanned, bytes_written}(internal adapter-trait type; rustdoc only)engine/crates/rocky-core/src/state.rs—ModelExecution.{bytes_scanned, bytes_written}(persisted mirror; rustdoc only)engine/crates/rocky-cli/src/output.rs:MaterializationOutput.{bytes_scanned, bytes_written}— cascades torun.*ReplayModelOutput.{bytes_scanned, bytes_written}— cascades toreplay.*TraceModelEntry.{bytes_scanned, bytes_written}— cascades totrace.*PerModelCostHistorical.{bytes_scanned, bytes_written}— cascades tocost.*Generated (4 commands × 3 surfaces = 12 files):
schemas/{run,replay,trace,cost}.schema.json,integrations/dagster/src/dagster_rocky/types_generated/{run,replay,trace,cost}_schema.py,editors/vscode/src/types/generated/{run,replay,trace,cost}.ts.Test plan
cargo test --workspace— green (doc-only change)cargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --all --check— cleanuv run pytestinintegrations/dagster/— 312 passednpm run compileineditors/vscode/— cleanjust regen-fixtures— no fixture diff (byte-stable; descriptions live in the schema, not the emitted payload)Notes
Nonetoday until the manifest plumbing lands"), so it stays correct whether the in-flight Databricksbytes_scannedoverride lands before or after this PR.