Skip to content

fix(engine): paper-cut cluster — BigQuery JWT cache, sql transpile, wasm UTF-8, explain TOML, partial-success exit code#292

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/engine-paper-cuts
Apr 29, 2026
Merged

fix(engine): paper-cut cluster — BigQuery JWT cache, sql transpile, wasm UTF-8, explain TOML, partial-success exit code#292
hugocorreia90 merged 1 commit intomainfrom
fix/engine-paper-cuts

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Five small, isolated bug fixes from the maintenance sweep — each fixes a subtle bug in already-shipped code. Bundled because they all share that shape and don't interact.

  • P1.1 BigQuery JWT not cached (rocky-bigquery/src/auth.rs)
    Service Account auth was minting + exchanging a fresh JWT on every API call (~100ms/call). Added an Arc<RwLock<Option<CachedToken>>> keyed by the BigQueryAuth instance, mirroring the Databricks pattern (60s REFRESH_SLACK). The cache uses Google's expires_in from the token-exchange response. New invalidate_cache() for post-401 retries. The ServiceAccount enum variant is now struct-form { key, cached_token }.

  • P1.2 rocky-sql transpile corruption (rocky-sql/src/transpile.rs)
    String::replace walked the whole SQL body, corrupting string literals, line/block comments, and identifier substrings (e.g. mapping INT64 → BIGINT incorrectly hit inside INT-prefixed words). Added a single-pass walker that tracks LineComment / BlockComment / '…' / "…" / `…` state (with doubled-quote escape handling), and requires ASCII word boundaries on type tokens. Function tokens (e.g. NVL() are still literal but skip strings/comments.

  • P1.3 WASM byte_offset_to_line_col panic (rocky-wasm/src/lib.rs)
    An offset past source.len() or mid-char into a multi-byte UTF-8 character would panic the slice. Clamp to text.len() and walk back to the nearest char boundary via is_char_boundary.

  • P1.4 explain.save_intent_to_config clobber (rocky-ai/src/explain.rs)
    When the existing TOML sidecar failed to parse, the prior unwrap_or(empty) silently overwrote the file with a fresh intent = "..." document, deleting the user's in-flight edits. Now returns Err(io::Error::other("refusing to overwrite unparseable TOML at <path>: …")).

  • P1.5 Partial-success exit code 2 (rocky-cli/src/commands/run.rs, rocky/src/main.rs)
    Documented contract said partial failures exit 2, but the post-JSON bail! always returned exit 1, so dagster allow_partial=True couldn't distinguish partial from total failure. Defined a PartialFailure sentinel next to Interrupted, branched the bail! site on derive_run_status(), and added the downcast in main.rs (PartialFailure → exit 2, mirroring Interrupted → exit 130). Total failure (no tables copied) still exits 1.

Test plan

  • cd engine && cargo test -p rocky-bigquery -p rocky-sql -p rocky-wasm -p rocky-ai -p rocky-cli — all green (incl. new tests for each fix).
  • cd engine && cargo clippy --all-targets -- -D warnings clean.
  • cd engine && cargo fmt --check clean.
  • Full cargo test workspace-wide — no regressions.

New regression tests:

  • BigQuery: cache-hit returns cached token, expired token rejected by read_fresh_token, invalidate_cache clears the entry, invalidate_cache is a no-op on Bearer.
  • rocky-sql: word boundaries (INT vs BIGINT), function/type replacement skips string literals + line + block comments, doubled-quote escape inside literals round-trips.
  • rocky-wasm: offset past end, multi-byte char at EOF (é), mid-char offset (日本語).
  • rocky-ai: refuses to clobber unparseable TOML and asserts the file is byte-identical after the failed call; merges into valid TOML; creates fresh sidecar when missing.

No *Output structs touched, so no just codegen cascade. No DSL change.

…asm UTF-8, explain TOML, partial-success exit code

Five small, isolated bug fixes bundled into one PR.

- rocky-bigquery (auth): cache the exchanged Google access token on
  ServiceAccount auth (mirrors the Databricks RwLock + 60s slack
  pattern), so each API call no longer mints + exchanges a fresh JWT.
  Adds invalidate_cache() for the post-401 case.
- rocky-sql (transpile): replace the naive String::replace pipeline
  with a string/comment-aware walk plus ASCII word boundaries on type
  tokens. Fixes corruption inside string literals, line/block
  comments, and identifier substrings (e.g. INT no longer matches
  inside BIGINT).
- rocky-wasm (byte_offset_to_line_col): clamp the offset to text.len()
  and walk back to the nearest char boundary so a multi-byte UTF-8
  character at end of buffer (or a deliberately mid-char offset from
  adversarial LSP/WASM input) cannot panic.
- rocky-ai (explain.save_intent_to_config): on TOML parse failure,
  return Err rather than silently overwriting the file with a fresh
  intent-only document. Preserves whatever the user was editing.
- rocky-cli (run) + rocky (main): wire a PartialFailure sentinel so
  partial-success runs (status = partial_failure) exit with code 2
  while total failures keep exit code 1. Mirrors the existing
  Interrupted -> 130 downcast. Restores the documented dagster
  allow_partial=True contract.

Tests:
- rocky-bigquery: cache-hit, expired-token, invalidate clears cache.
- rocky-sql: word-boundary INT vs BIGINT, NVL/LEN/INT64 inside
  string literals, line comments, block comments, doubled-quote
  escapes.
- rocky-wasm: offset past end, multi-byte char at EOF, mid-char
  offset.
- rocky-ai: refuses to clobber unparseable TOML; merges into valid
  TOML; creates fresh sidecar when missing.

cargo test -p rocky-bigquery -p rocky-sql -p rocky-wasm -p rocky-ai
-p rocky-cli passes; cargo clippy --all-targets -- -D warnings clean;
cargo fmt --check clean.
@hugocorreia90 hugocorreia90 merged commit f780272 into main Apr 29, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/engine-paper-cuts branch April 29, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant