Implement a dialect-specific rule for unparsing an identifier with or without quotes#10573
Conversation
datafusion/sql/Cargo.toml
Outdated
| datafusion-common = { workspace = true, default-features = true } | ||
| datafusion-expr = { workspace = true } | ||
| log = { workspace = true } | ||
| regex = { version = "1.8" } |
There was a problem hiding this comment.
I think we need to move regex to top level, as it is used in much of packages. It can be done as followup
| .collect::<Result<Vec<_>>>() | ||
| } | ||
|
|
||
| pub(super) fn new_ident_quoted_if_needs(&self, ident: String) -> ast::Ident { |
There was a problem hiding this comment.
Please add a method comments for a pub method
comphead
left a comment
There was a problem hiding this comment.
Thanks @goldmedal
I'm thinking how this will work with whitespaces columns like
select 1 as "a a";
4acde31 to
44e9baa
Compare
Thanks @comphead :) |
alamb
left a comment
There was a problem hiding this comment.
Thank you @goldmedal -- I think this looks really nice
Thank you for the reviews @comphead
I left some suggestions for improvement but I think they could be done as follow on PRs as well.
cc @phillipleblanc and @devinjdangelo and @backkem
| let ast = expr_to_sql(&expr)?; | ||
| let sql = format!("{}", ast); | ||
| assert_eq!(sql, r#"(("a" < 5) OR ("a" = 8))"#); | ||
| assert_eq!(sql, r#"((a < 5) OR (a = 8))"#); |
There was a problem hiding this comment.
Given this change, perhaps we can remove the next example in the file simple_expr_to_sql_demo_no_escape as I don't think it serves any purpose
| rand = { workspace = true, features = ["small_rng"] } | ||
| rand_distr = "0.4.3" | ||
| regex = "1.5.4" | ||
| regex = { workspace = true } |
There was a problem hiding this comment.
that is certainly nice to use the same version of regex everywhere 👍
| use regex::Regex; | ||
| use sqlparser::keywords::ALL_KEYWORDS; | ||
|
|
||
| /// Dialect is used to capture dialect specific syntax. |
There was a problem hiding this comment.
| /// Dialect is used to capture dialect specific syntax. | |
| /// `Dialect` to usse for Unparsing | |
| /// | |
| /// The default dialect tries to avoid quoting identifiers unless necessary (e.g. `a` instead of `"a"`) | |
| /// but this behavior can be overridden as needed |
There was a problem hiding this comment.
Thanks. Look nice.
Thanks @alamb ! |
phillipleblanc
left a comment
There was a problem hiding this comment.
Awesome! Thanks @goldmedal 🥇
| use regex::Regex; | ||
| use sqlparser::keywords::ALL_KEYWORDS; | ||
|
|
||
| /// `Dialect` to usse for Unparsing |
There was a problem hiding this comment.
| /// `Dialect` to usse for Unparsing | |
| /// `Dialect` to use for Unparsing |
| /// See <https://github.com/sqlparser-rs/sqlparser-rs/pull/1170> | ||
| pub trait Dialect { | ||
| fn identifier_quote_style(&self) -> Option<char>; | ||
| fn identifier_needs_quote(&self, _: &str) -> bool { |
There was a problem hiding this comment.
Above note said this trait will eventually be replaced by the Dialect in the SQLparser package. Seems this pr make this harder. Should we extend sqlparser Dialect using something like DialectExt trait?
There was a problem hiding this comment.
I wanted to note that this functionality could also be covered within the existing SQLparser::Dialect::identifier_quote_style. It's signature looks as follows:
identifier_quote_style(&self, _identifier: &str) -> Option<char>
It is passed the identifier and can optionally return a quote character if needed. This way the trait doesn't need extending at all. See also apache/datafusion-sqlparser-rs#1170.
There was a problem hiding this comment.
@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per @backkem 's suggestion in a follow on PR, or would you like to update this PR?
There was a problem hiding this comment.
@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per
@backkem 's suggestion in a follow on PR, or would you like to update this PR?
Thanks @lewiszlw @backkem @alamb
I think I have time to fix it now. I can fix it in this PR.
backkem
left a comment
There was a problem hiding this comment.
LGTM with one small nit.
| impl Dialect for DefaultDialect { | ||
| fn identifier_quote_style(&self) -> Option<char> { | ||
| Some('"') | ||
| fn identifier_quote_style(&self, _identifier: &str) -> Option<char> { |
There was a problem hiding this comment.
| fn identifier_quote_style(&self, _identifier: &str) -> Option<char> { | |
| fn identifier_quote_style(&self, identifier: &str) -> Option<char> { |
There was a problem hiding this comment.
@backkem I want to check if I should also change the signature (L29) in the Dialect trait. I'm not familiar with naming conventions in Rust. I guess _identifier means this parameter is an identifier, but we ignore it in this method, right?
There was a problem hiding this comment.
Indeed, prefixing the identifier with an _ is a convention for silencing a linter warning that the variable is unused. Since it is being used now, the _ prefix is no longer needed.
654c836 to
8ed1525
Compare
There was a problem hiding this comment.
Thank you again @goldmedal and @backkem and @phillipleblanc and @lewiszlw and @comphead -- I think this PR looks really nice now and this makes unparsing much nicer looking for humans 🏆
|
Thanks again @alamb @backkem @phillipleblanc @lewiszlw @comphead :) |
|
This shouldn't have passed checks. functions/Cargo.toml |
Yeah, I don't know why that is a warning and not an error -- here is a PR to fix it: #10662 |
… without quotes (apache#10573) * add ident needs quote check * implement the check for default dialect and fix tests * add test for need-quoted cases * update cargo lock * fomrat cargo toml * fix the example test * move regex to top level * add comments for new_ident_quoted_if_needs func * fix typo and add test for space * fix example test * fix example test * fix the test fail * remove unused example and modified comments * fix typo * follow the latest Dialect trait in sqlparser * fix the parameter name
Which issue does this PR close?
Closes #10557
Rationale for this change
What changes are included in this PR?
Only implement the default dialect in this PR. We need other follow-up PR for other dialects.
Are these changes tested?
Yes
Are there any user-facing changes?
No