docs(engine/rocky-snowflake): document why bytes_scanned is deferred#220
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 22, 2026
Merged
Conversation
Explains the intentional gap for future readers: Snowflake's SQL API doesn't surface per-statement bytes_scanned in the immediate response (it lives in QUERY_HISTORY, keyed by query_id), so overriding execute_statement_with_stats would add a second round-trip per statement — violating the implicit "cheap, piggybacked" contract that BigQuery and Databricks satisfy. Snowflake cost is also duration-based, not bytes-based: rocky_core::cost::compute_observed_cost_usd routes Snowflake through duration_hours × dbu_per_hour × cost_per_dbu and never reads bytes_scanned. A QUERY_HISTORY lookup would only affect display in rocky trace / rocky history, not rocky cost correctness. Sibling to #219 (BigQuery bytes slice) and the pending Databricks slice. No behavior change — comment only.
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SnowflakeWarehouseAdapter'simpl WarehouseAdapterblock explaining whyexecute_statement_with_statsis intentionally not overridden.Why bytes_scanned is deferred for Snowflake
Two independent reasons:
No free piggyback path. Snowflake does not return
bytes_scannedin the immediate SQL API statement response. The per-statement figure only surfaces inQUERY_HISTORY, keyed byquery_id. PopulatingExecutionStats::bytes_scannedfrom Snowflake would require a second round-trip per materialize statement — violating the implicit "cheap, piggybacked on the existing response" contract that BigQuery (statistics.query.totalBytesBilled) and Databricks (result.manifest.total_byte_count) satisfy.Cost calculator doesn't consume it.
rocky_core::cost::compute_observed_cost_usdroutes the Snowflake branch throughduration_hours × dbu_per_hour × cost_per_dbuand never readsbytes_scanned— the testobserved_cost_databricks_ignores_bytes(cost.rs:962) documents the same invariant for the Databricks/Snowflake arm. A QUERY_HISTORY round-trip would only affect display inrocky trace/rocky historyoutput, notrocky costcorrectness.If Snowflake's pricing ever shifts to bytes-based (serverless compute, query acceleration), revisit. If
bytes_scannedis wanted for display, do it as a batched QUERY_HISTORY lookup at run-finalise time (one query for all statement IDs), not a per-statement round-trip.Scope
engine/crates/rocky-snowflake/src/adapter.rsonly — 21 lines of comment added to theWarehouseAdapterimpl, right afterexecute_statement, so future readers asking "where'sexecute_statement_with_stats?" land on the rationale.rocky-core::traits(trait definition) androcky-databricks/rocky-bigqueryuntouched.Test plan
cargo check -p rocky-snowflake— cleancargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --all --check— cleanNotes