Skip to content

fix(engine): validate identifiers + reject injection-bearing SQL string-literal payloads#293

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/sql-injection-identifier-validation
Apr 29, 2026
Merged

fix(engine): validate identifiers + reject injection-bearing SQL string-literal payloads#293
hugocorreia90 merged 1 commit intomainfrom
fix/sql-injection-identifier-validation

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Closes the SQL-injection audit sweep across the warehouse adapters and execution engine. Every untrusted-input format!-with-SQL site now refuses unsafe input rather than escaping it.

  • rocky-bigquery/connector.rsvalidate_identifier on catalog/schema/table in describe_table (the WHERE table_name = '...' literal is safe because the validator rejects single quotes), and on catalog/schema in list_tables (same class, found while there).
  • rocky-databricks/loader.rsformat_target validates each segment and returns AdapterResult; the cloud URI passed to COPY INTO is checked against validate_cloud_uri (rejects ', \, \n, \r); the csv_delimiter: char is validated to keep the '<delim>' literal safe.
  • rocky-snowflake/stage.rs + loader.rscreate_external_stage_sql, put_file_sql, file_format_clause refuse string-literal payloads with embedded quotes or backslashes; format_target validates target identifiers.
  • rocky-engine/executor.rsvalidate_identifier on model_name before the CREATE OR REPLACE TABLE format!; failed validation surfaces in the result's failed list.
  • rocky-engine/profile.rsgenerate_profile_sql returns Result and validates the table name and every column name.
  • rocky-lang/lower.rs — unchanged. Token::Ident's lexer regex [a-zA-Z_][a-zA-Z0-9_]* is strictly tighter than validate_identifier, so every format! site there is safe by construction. Documented in the module header so a future reader doesn't have to re-derive it.

Threads validation at the SQL boundary the same way crates/rocky-databricks/src/{batch,dialect,permissions}.rs already do — no TableRef shape change, no newtype.

Test plan

  • cargo test --workspace — all 30 test groups green
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --check — clean
  • Per-site rejection tests added (injection-bearing identifier, quote-in-URI, quote-in-CSV-delimiter, etc.)

…ng-literal payloads

Closes the SQL-injection audit sweep. Every untrusted-input format!-with-SQL
site in the warehouse adapters and execution engine now refuses unsafe input
rather than trying to escape it.

Sites:
- rocky-bigquery/connector.rs: validate_identifier on catalog/schema/table in
  describe_table (the WHERE table_name = '...' literal is safe because the
  validator rejects single quotes), and on catalog/schema in list_tables.
- rocky-databricks/loader.rs: format_target validates each segment and
  returns AdapterResult; the cloud URI passed to COPY INTO is checked
  (rejects `'`, `\\`, `\n`, `\r`); the CSV delimiter is validated to keep
  the `'<delim>'` literal safe.
- rocky-snowflake/stage.rs + loader.rs: create_external_stage_sql,
  put_file_sql, file_format_clause refuse string-literal payloads with
  embedded quotes or backslashes; format_target validates target identifiers.
- rocky-engine/executor.rs: validate_identifier on model_name before the
  CREATE OR REPLACE TABLE format!; failed validation surfaces in the result's
  failed list.
- rocky-engine/profile.rs: generate_profile_sql returns Result and validates
  the table name and every column name.
- rocky-lang/lower.rs: unchanged. Token::Ident's lexer regex
  [a-zA-Z_][a-zA-Z0-9_]* is strictly tighter than validate_identifier, so
  every format! site there is safe by construction. Documented in the module
  header.

Tests added per site for the rejected-injection paths.
@hugocorreia90 hugocorreia90 merged commit c42d974 into main Apr 29, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/sql-injection-identifier-validation branch April 29, 2026 18:29
hugocorreia90 added a commit that referenced this pull request Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280,
#281, #282), the [budget].max_bytes_scanned threshold (#288), the
audit-sweep closeout (#283, #285#287, #290#293), and the rocky-server
auth + CORS gate (#291).

Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky
preview surface and ships the P1 cluster (#289) + FR-014 follow-on
(#284).

VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and
RunCostSummary.total_bytes_scanned.

See per-artifact CHANGELOG entries for the full breakdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant