feat(engine/rocky-databricks): override execute_statement_with_stats with total_byte_count by hugocorreia90 · Pull Request #221 · rocky-data/rocky

hugocorreia90 · 2026-04-22T14:37:50Z

Summary

Overrides the default WarehouseAdapter::execute_statement_with_stats on DatabricksWarehouseAdapter so Databricks materializations now surface real byte accounting in MaterializationOutput.bytes_scanned instead of inheriting the all-None stub.
Extends Manifest to deserialize total_byte_count from the Databricks SQL Statement Execution response and adds DatabricksConnector::execute_statement_with_stats (stats-aware counterpart to execute_statement) plus a free stats_from_response helper.
execute_statement's signature and behaviour are unchanged — the default trait impl keeps delegating to it for callers that don't need stats. Databricks slice of Trust-system Arc 2 wave 3.

Why `bytes_scanned` holds `total_byte_count`

Matches the #219 naming convention: ExecutionStats.bytes_scanned is the billing-relevant bytes figure for the adapter. Databricks is DBU-priced (not bytes-priced), so total_byte_count isn't a cost driver the way BigQuery's totalBytesBilled is — it's the byte count Databricks natively reports for a statement, surfaced in the bytes_scanned slot so the cost pipeline stays free of adapter-specific branching. Documented inline on stats_from_response.

Scope notes

Databricks only. Snowflake slice to follow in a sibling PR. DuckDB / rocky-cli / run.rs unchanged — those were already wired in feat(engine/rocky-bigquery): plumb totalBytesBilled into bytes_scanned #219.
No WarehouseAdapter trait changes — only the override.
bytes_written stays None; Databricks doesn't expose a bytes-written figure on the Statement Execution response. rows_affected stays None — total_row_count is the result-row count, not the DML-affected-row count.

Test plan

cargo test -p rocky-databricks -p rocky-core -p rocky-cli — all green. 4 new unit tests on stats_from_response + Manifest deserialization (happy path with total_byte_count, missing manifest, manifest-without-total_byte_count).
cargo clippy --workspace --all-targets -- -D warnings — clean.
cargo fmt --all --check — clean.
just codegen — no output-struct change, byte-stable (verified no-op).
just regen-fixtures — DuckDB playground doesn't exercise Databricks, byte-stable (verified no-op).
uv run pytest in integrations/dagster/ — 312 passed, unaffected.

Files touched

engine/crates/rocky-databricks/src/connector.rs — Manifest parses total_byte_count; new stats_from_response helper + DatabricksConnector::execute_statement_with_stats method; 4 new unit tests.
engine/crates/rocky-databricks/src/adapter.rs — override WarehouseAdapter::execute_statement_with_stats on DatabricksWarehouseAdapter.

…with total_byte_count Override the default WarehouseAdapter::execute_statement_with_stats (added in #219) on DatabricksWarehouseAdapter so Databricks materializations surface real byte accounting in MaterializationOutput.bytes_scanned instead of inheriting the all-None stub. total_byte_count is the byte count Databricks natively reports on the Statement Execution response manifest; mapping it into ExecutionStats. bytes_scanned matches the #219 convention (billing-relevant bytes slot, even though Databricks is DBU-priced rather than bytes-priced). execute_statement's signature is unchanged; the default trait impl continues to delegate to it for callers that don't need stats. Snowflake slice to follow in a sibling PR.

* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.

hugocorreia90 merged commit 8837a27 into main Apr 22, 2026
12 checks passed

hugocorreia90 deleted the feat/bytes-scanned-databricks branch April 22, 2026 14:51

hugocorreia90 mentioned this pull request Apr 22, 2026

chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 #233

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(engine/rocky-databricks): override execute_statement_with_stats with total_byte_count#221

feat(engine/rocky-databricks): override execute_statement_with_stats with total_byte_count#221
hugocorreia90 merged 1 commit intomainfrom
feat/bytes-scanned-databricks

hugocorreia90 commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hugocorreia90 commented Apr 22, 2026

Summary

Why bytes_scanned holds total_byte_count

Scope notes

Test plan

Files touched

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why `bytes_scanned` holds `total_byte_count`