feat(engine/rocky-databricks): override execute_statement_with_stats with total_byte_count#221
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 22, 2026
Merged
Conversation
…with total_byte_count Override the default WarehouseAdapter::execute_statement_with_stats (added in #219) on DatabricksWarehouseAdapter so Databricks materializations surface real byte accounting in MaterializationOutput.bytes_scanned instead of inheriting the all-None stub. total_byte_count is the byte count Databricks natively reports on the Statement Execution response manifest; mapping it into ExecutionStats. bytes_scanned matches the #219 convention (billing-relevant bytes slot, even though Databricks is DBU-priced rather than bytes-priced). execute_statement's signature is unchanged; the default trait impl continues to delegate to it for callers that don't need stats. Snowflake slice to follow in a sibling PR.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WarehouseAdapter::execute_statement_with_statsonDatabricksWarehouseAdapterso Databricks materializations now surface real byte accounting inMaterializationOutput.bytes_scannedinstead of inheriting the all-Nonestub.Manifestto deserializetotal_byte_countfrom the Databricks SQL Statement Execution response and addsDatabricksConnector::execute_statement_with_stats(stats-aware counterpart toexecute_statement) plus a freestats_from_responsehelper.execute_statement's signature and behaviour are unchanged — the default trait impl keeps delegating to it for callers that don't need stats. Databricks slice of Trust-system Arc 2 wave 3.Why
bytes_scannedholdstotal_byte_countMatches the #219 naming convention:
ExecutionStats.bytes_scannedis the billing-relevant bytes figure for the adapter. Databricks is DBU-priced (not bytes-priced), sototal_byte_countisn't a cost driver the way BigQuery'stotalBytesBilledis — it's the byte count Databricks natively reports for a statement, surfaced in thebytes_scannedslot so the cost pipeline stays free of adapter-specific branching. Documented inline onstats_from_response.Scope notes
WarehouseAdaptertrait changes — only the override.bytes_writtenstaysNone; Databricks doesn't expose a bytes-written figure on the Statement Execution response.rows_affectedstaysNone—total_row_countis the result-row count, not the DML-affected-row count.Test plan
cargo test -p rocky-databricks -p rocky-core -p rocky-cli— all green. 4 new unit tests onstats_from_response+Manifestdeserialization (happy path withtotal_byte_count, missing manifest, manifest-without-total_byte_count).cargo clippy --workspace --all-targets -- -D warnings— clean.cargo fmt --all --check— clean.just codegen— no output-struct change, byte-stable (verified no-op).just regen-fixtures— DuckDB playground doesn't exercise Databricks, byte-stable (verified no-op).uv run pytestinintegrations/dagster/— 312 passed, unaffected.Files touched
engine/crates/rocky-databricks/src/connector.rs—Manifestparsestotal_byte_count; newstats_from_responsehelper +DatabricksConnector::execute_statement_with_statsmethod; 4 new unit tests.engine/crates/rocky-databricks/src/adapter.rs— overrideWarehouseAdapter::execute_statement_with_statsonDatabricksWarehouseAdapter.