Skip to content

docs(engine/rocky-snowflake): document why bytes_scanned is deferred#220

Merged
hugocorreia90 merged 1 commit intomainfrom
docs/bytes-scanned-snowflake-deferred
Apr 22, 2026
Merged

docs(engine/rocky-snowflake): document why bytes_scanned is deferred#220
hugocorreia90 merged 1 commit intomainfrom
docs/bytes-scanned-snowflake-deferred

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

  • Adds a doc comment on SnowflakeWarehouseAdapter's impl WarehouseAdapter block explaining why execute_statement_with_stats is intentionally not overridden.
  • Sibling to feat(engine/rocky-bigquery): plumb totalBytesBilled into bytes_scanned #219 (BigQuery bytes slice) and the pending Databricks slice — makes the gap self-documenting instead of implicit.
  • Comment only. No behavior change. No new functionality.

Why bytes_scanned is deferred for Snowflake

Two independent reasons:

  1. No free piggyback path. Snowflake does not return bytes_scanned in the immediate SQL API statement response. The per-statement figure only surfaces in QUERY_HISTORY, keyed by query_id. Populating ExecutionStats::bytes_scanned from Snowflake would require a second round-trip per materialize statement — violating the implicit "cheap, piggybacked on the existing response" contract that BigQuery (statistics.query.totalBytesBilled) and Databricks (result.manifest.total_byte_count) satisfy.

  2. Cost calculator doesn't consume it. rocky_core::cost::compute_observed_cost_usd routes the Snowflake branch through duration_hours × dbu_per_hour × cost_per_dbu and never reads bytes_scanned — the test observed_cost_databricks_ignores_bytes (cost.rs:962) documents the same invariant for the Databricks/Snowflake arm. A QUERY_HISTORY round-trip would only affect display in rocky trace / rocky history output, not rocky cost correctness.

If Snowflake's pricing ever shifts to bytes-based (serverless compute, query acceleration), revisit. If bytes_scanned is wanted for display, do it as a batched QUERY_HISTORY lookup at run-finalise time (one query for all statement IDs), not a per-statement round-trip.

Scope

  • engine/crates/rocky-snowflake/src/adapter.rs only — 21 lines of comment added to the WarehouseAdapter impl, right after execute_statement, so future readers asking "where's execute_statement_with_stats?" land on the rationale.
  • No other crate touched. rocky-core::traits (trait definition) and rocky-databricks / rocky-bigquery untouched.

Test plan

  • cargo check -p rocky-snowflake — clean
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --all --check — clean

Notes

Explains the intentional gap for future readers: Snowflake's SQL API
doesn't surface per-statement bytes_scanned in the immediate response
(it lives in QUERY_HISTORY, keyed by query_id), so overriding
execute_statement_with_stats would add a second round-trip per
statement — violating the implicit "cheap, piggybacked" contract that
BigQuery and Databricks satisfy.

Snowflake cost is also duration-based, not bytes-based:
rocky_core::cost::compute_observed_cost_usd routes Snowflake through
duration_hours × dbu_per_hour × cost_per_dbu and never reads
bytes_scanned. A QUERY_HISTORY lookup would only affect display in
rocky trace / rocky history, not rocky cost correctness.

Sibling to #219 (BigQuery bytes slice) and the pending Databricks
slice. No behavior change — comment only.
@hugocorreia90 hugocorreia90 merged commit 9389dbf into main Apr 22, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the docs/bytes-scanned-snowflake-deferred branch April 22, 2026 14:27
hugocorreia90 added a commit that referenced this pull request Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4

Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0
/ dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG.

Engine headlines (12 PRs):
- Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end
  (#223 infra, #228 reads, #230 write tap, #231 discover warm-up,
  #232 state controls + --cache-ttl override)
- Arc 2 wave 3 complete — bytes_scanned / bytes_written on
  MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake
  deferred doc, #222 docstring cascade). Real $ on rocky cost for
  BQ + Databricks
- FR-005 Unity Catalog workspace-binding reconcile (#226)
- FR-002 Fivetran connector metadata via SourceOutput.metadata (#225)
- Housekeeping: compute_backoff dedup into rocky_core::retry (#217)

Dagster headlines (4 PRs):
- FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor
  on RockyResource startup (#224)
- FR-003 RockyResource.state_health() (#227) + FR follow-up threading
  doctor(check=state_rw) for sub-second probes (#229)
- RockyResource.cost() wiring + fixture (#218)

VS Code: regenerated TS bindings for engine 1.14.0 type additions.
No extension feature changes.

* chore(integrations/dagster): regenerate test fixtures for engine 1.14.0

36 fixtures picked up the new engine version string in their top-level
"version" field. No schema changes — just the version bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant