fix(engine): validate identifiers + reject injection-bearing SQL string-literal payloads#293
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 29, 2026
Merged
Conversation
…ng-literal payloads Closes the SQL-injection audit sweep. Every untrusted-input format!-with-SQL site in the warehouse adapters and execution engine now refuses unsafe input rather than trying to escape it. Sites: - rocky-bigquery/connector.rs: validate_identifier on catalog/schema/table in describe_table (the WHERE table_name = '...' literal is safe because the validator rejects single quotes), and on catalog/schema in list_tables. - rocky-databricks/loader.rs: format_target validates each segment and returns AdapterResult; the cloud URI passed to COPY INTO is checked (rejects `'`, `\\`, `\n`, `\r`); the CSV delimiter is validated to keep the `'<delim>'` literal safe. - rocky-snowflake/stage.rs + loader.rs: create_external_stage_sql, put_file_sql, file_format_clause refuse string-literal payloads with embedded quotes or backslashes; format_target validates target identifiers. - rocky-engine/executor.rs: validate_identifier on model_name before the CREATE OR REPLACE TABLE format!; failed validation surfaces in the result's failed list. - rocky-engine/profile.rs: generate_profile_sql returns Result and validates the table name and every column name. - rocky-lang/lower.rs: unchanged. Token::Ident's lexer regex [a-zA-Z_][a-zA-Z0-9_]* is strictly tighter than validate_identifier, so every format! site there is safe by construction. Documented in the module header. Tests added per site for the rejected-injection paths.
5 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280, #281, #282), the [budget].max_bytes_scanned threshold (#288), the audit-sweep closeout (#283, #285–#287, #290–#293), and the rocky-server auth + CORS gate (#291). Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky preview surface and ships the P1 cluster (#289) + FR-014 follow-on (#284). VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and RunCostSummary.total_bytes_scanned. See per-artifact CHANGELOG entries for the full breakdown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the SQL-injection audit sweep across the warehouse adapters and execution engine. Every untrusted-input
format!-with-SQL site now refuses unsafe input rather than escaping it.validate_identifieron catalog/schema/table indescribe_table(theWHERE table_name = '...'literal is safe because the validator rejects single quotes), and on catalog/schema inlist_tables(same class, found while there).format_targetvalidates each segment and returnsAdapterResult; the cloud URI passed toCOPY INTOis checked againstvalidate_cloud_uri(rejects',\,\n,\r); thecsv_delimiter: charis validated to keep the'<delim>'literal safe.create_external_stage_sql,put_file_sql,file_format_clauserefuse string-literal payloads with embedded quotes or backslashes;format_targetvalidates target identifiers.validate_identifieronmodel_namebefore theCREATE OR REPLACE TABLEformat!; failed validation surfaces in the result'sfailedlist.generate_profile_sqlreturnsResultand validates the table name and every column name.Token::Ident's lexer regex[a-zA-Z_][a-zA-Z0-9_]*is strictly tighter thanvalidate_identifier, so everyformat!site there is safe by construction. Documented in the module header so a future reader doesn't have to re-derive it.Threads validation at the SQL boundary the same way
crates/rocky-databricks/src/{batch,dialect,permissions}.rsalready do — noTableRefshape change, no newtype.Test plan
cargo test --workspace— all 30 test groups greencargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --check— clean