Implement a dialect-specific rule for unparsing an identifier with or without quotes by goldmedal · Pull Request #10573 · apache/datafusion

goldmedal · 2024-05-19T02:10:49Z

Which issue does this PR close?

Closes #10557

Rationale for this change

What changes are included in this PR?

Only implement the default dialect in this PR. We need other follow-up PR for other dialects.

Are these changes tested?

Yes

Are there any user-facing changes?

No

comphead · 2024-05-20T17:28:46Z

datafusion/sql/Cargo.toml

 datafusion-common = { workspace = true, default-features = true }
 datafusion-expr = { workspace = true }
 log = { workspace = true }
+regex = { version = "1.8" }


I think we need to move regex to top level, as it is used in much of packages. It can be done as followup

I moved it in 32aa0e9.

comphead · 2024-05-20T17:40:23Z

datafusion/sql/src/unparser/expr.rs

            .collect::<Result<Vec<_>>>()
    }

+    pub(super) fn new_ident_quoted_if_needs(&self, ident: String) -> ast::Ident {


Please add a method comments for a pub method

I added some comments in 7a534fb.

comphead

Thanks @goldmedal
I'm thinking how this will work with whitespaces columns like

select 1 as "a a";

goldmedal · 2024-05-21T11:54:48Z

Thanks @goldmedal I'm thinking how this will work with whitespaces columns like
select 1 as "a a";

Thanks @comphead :)
I'm not sure what you mean but I think it also works like other illegal char for SQL identifiers. I add a test case for it in 44e9baa.

alamb

Thank you @goldmedal -- I think this looks really nice

Thank you for the reviews @comphead

I left some suggestions for improvement but I think they could be done as follow on PRs as well.

cc @phillipleblanc and @devinjdangelo and @backkem

alamb · 2024-05-21T20:41:50Z

datafusion-examples/examples/plan_to_sql.rs

    let ast = expr_to_sql(&expr)?;
    let sql = format!("{}", ast);
-    assert_eq!(sql, r#"(("a" < 5) OR ("a" = 8))"#);
+    assert_eq!(sql, r#"((a < 5) OR (a = 8))"#);


Given this change, perhaps we can remove the next example in the file simple_expr_to_sql_demo_no_escape as I don't think it serves any purpose

alamb · 2024-05-21T20:42:24Z

datafusion/core/Cargo.toml

 rand = { workspace = true, features = ["small_rng"] }
 rand_distr = "0.4.3"
-regex = "1.5.4"
+regex = { workspace = true }


that is certainly nice to use the same version of regex everywhere 👍

alamb · 2024-05-21T20:45:36Z

datafusion/sql/src/unparser/dialect.rs

+use regex::Regex;
+use sqlparser::keywords::ALL_KEYWORDS;
+
 /// Dialect is used to capture dialect specific syntax.


Suggested change

/// Dialect is used to capture dialect specific syntax.

/// `Dialect` to usse for Unparsing

///

/// The default dialect tries to avoid quoting identifiers unless necessary (e.g. `a` instead of `"a"`)

/// but this behavior can be overridden as needed

Thanks. Look nice.

goldmedal · 2024-05-22T05:16:22Z

Thank you @goldmedal -- I think this looks really nice

Thank you for the reviews @comphead

I left some suggestions for improvement but I think they could be done as follow on PRs as well.

cc @phillipleblanc and @devinjdangelo and @backkem

Thanks @alamb !
I think the suggestions is very simple and reasonable. So, I just fixed them in this PR quickly.

phillipleblanc

Awesome! Thanks @goldmedal 🥇

phillipleblanc · 2024-05-22T05:24:10Z

datafusion/sql/src/unparser/dialect.rs

+use regex::Regex;
+use sqlparser::keywords::ALL_KEYWORDS;
+
+/// `Dialect` to usse for Unparsing


Suggested change

/// `Dialect` to usse for Unparsing

/// `Dialect` to use for Unparsing

Thanks @phillipleblanc

lewiszlw · 2024-05-22T08:41:55Z

datafusion/sql/src/unparser/dialect.rs

 /// See <https://github.com/sqlparser-rs/sqlparser-rs/pull/1170>
 pub trait Dialect {
    fn identifier_quote_style(&self) -> Option<char>;
+    fn identifier_needs_quote(&self, _: &str) -> bool {


Above note said this trait will eventually be replaced by the Dialect in the SQLparser package. Seems this pr make this harder. Should we extend sqlparser Dialect using something like DialectExt trait?

I wanted to note that this functionality could also be covered within the existing SQLparser::Dialect::identifier_quote_style. It's signature looks as follows:

identifier_quote_style(&self, _identifier: &str) -> Option<char>

It is passed the identifier and can optionally return a quote character if needed. This way the trait doesn't need extending at all. See also apache/datafusion-sqlparser-rs#1170.

@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per @backkem 's suggestion in a follow on PR, or would you like to update this PR?

@goldmedal let me know what you want to do here -- I can merge this PR and we can update this per
@backkem 's suggestion in a follow on PR, or would you like to update this PR?

Thanks @lewiszlw @backkem @alamb
I think I have time to fix it now. I can fix it in this PR.

backkem

LGTM with one small nit.

backkem · 2024-05-22T12:15:35Z

datafusion/sql/src/unparser/dialect.rs

 impl Dialect for DefaultDialect {
-    fn identifier_quote_style(&self) -> Option<char> {
-        Some('"')
+    fn identifier_quote_style(&self, _identifier: &str) -> Option<char> {


Suggested change

fn identifier_quote_style(&self, _identifier: &str) -> Option<char> {

fn identifier_quote_style(&self, identifier: &str) -> Option<char> {

@backkem I want to check if I should also change the signature (L29) in the Dialect trait. I'm not familiar with naming conventions in Rust. I guess _identifier means this parameter is an identifier, but we ignore it in this method, right?

Indeed, prefixing the identifier with an _ is a convention for silencing a linter warning that the variable is unused. Since it is being used now, the _ prefix is no longer needed.

alamb

Thank you again @goldmedal and @backkem and @phillipleblanc and @lewiszlw and @comphead -- I think this PR looks really nice now and this makes unparsing much nicer looking for humans 🏆

goldmedal · 2024-05-23T00:13:53Z

Thanks again @alamb @backkem @phillipleblanc @lewiszlw @comphead :)

Omega359 · 2024-05-24T13:27:01Z

This shouldn't have passed checks.

+ cargo fmt --all -- --check
`cargo metadata` exited with an error: error: failed to load manifest for workspace member `/opt/dev/datafusion/datafusion/core`
referenced by workspace at `/opt/dev/datafusion/Cargo.toml`

Caused by:
  failed to load manifest for dependency `datafusion-functions`

Caused by:
  failed to parse manifest at `/opt/dev/datafusion/datafusion/functions/Cargo.toml`

Caused by:
  dependency (regex) specified without providing a local path, Git repository, version, or workspace dependency to use

functions/Cargo.toml

regex = { worksapce = true, optional = true }

alamb · 2024-05-25T12:13:48Z

This shouldn't have passed checks.

+ cargo fmt --all -- --check
`cargo metadata` exited with an error: error: failed to load manifest for workspace member `/opt/dev/datafusion/datafusion/core`
referenced by workspace at `/opt/dev/datafusion/Cargo.toml`

Caused by:
  failed to load manifest for dependency `datafusion-functions`

Caused by:
  failed to parse manifest at `/opt/dev/datafusion/datafusion/functions/Cargo.toml`

Caused by:
  dependency (regex) specified without providing a local path, Git repository, version, or workspace dependency to use

functions/Cargo.toml

regex = { worksapce = true, optional = true }

Yeah, I don't know why that is a warning and not an error -- here is a PR to fix it: #10662

… without quotes (apache#10573) * add ident needs quote check * implement the check for default dialect and fix tests * add test for need-quoted cases * update cargo lock * fomrat cargo toml * fix the example test * move regex to top level * add comments for new_ident_quoted_if_needs func * fix typo and add test for space * fix example test * fix example test * fix the test fail * remove unused example and modified comments * fix typo * follow the latest Dialect trait in sqlparser * fix the parameter name

github-actions bot added the sql SQL Planner label May 19, 2024

goldmedal mentioned this pull request May 19, 2024

Make SQL strings generated from Exprs "prettier" #10557

Closed

comphead reviewed May 20, 2024

View reviewed changes

goldmedal force-pushed the feature/10557-dialect-need-qutoed branch from 4acde31 to 44e9baa Compare May 21, 2024 11:49

github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate labels May 21, 2024

alamb approved these changes May 21, 2024

View reviewed changes

phillipleblanc approved these changes May 22, 2024

View reviewed changes

lewiszlw reviewed May 22, 2024

View reviewed changes

backkem approved these changes May 22, 2024

View reviewed changes

goldmedal added 16 commits May 22, 2024 23:31

add ident needs quote check

e13d7dc

implement the check for default dialect and fix tests

bce7e41

add test for need-quoted cases

a616719

update cargo lock

b8e7dbe

fomrat cargo toml

0293ca7

fix the example test

4063d5d

move regex to top level

c0e03d1

add comments for new_ident_quoted_if_needs func

7430634

fix typo and add test for space

a881a65

fix example test

c9eb4a4

fix example test

2eba717

fix the test fail

dc75c2d

remove unused example and modified comments

603d0b4

fix typo

3a9125c

follow the latest Dialect trait in sqlparser

9a1d05c

fix the parameter name

8ed1525

goldmedal force-pushed the feature/10557-dialect-need-qutoed branch from 654c836 to 8ed1525 Compare May 22, 2024 15:38

alamb approved these changes May 22, 2024

View reviewed changes

alamb merged commit 7bd4b53 into apache:main May 22, 2024

goldmedal deleted the feature/10557-dialect-need-qutoed branch May 23, 2024 00:13

alamb mentioned this pull request May 23, 2024

Make SQL strings generated from Exprs even "prettier" #10633

Closed

alamb mentioned this pull request May 25, 2024

Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) #10662

Merged

alamb mentioned this pull request May 28, 2024

DataFusion weekly project plan (Andrew Lamb) - May 27, 2024 #10699

Closed

9 tasks

goldmedal mentioned this pull request Jun 24, 2024

Introduce the calculation for TO_MANY relationship Canner/wren-engine#626

Merged

	/// `Dialect` to usse for Unparsing
	/// `Dialect` to use for Unparsing

	fn identifier_quote_style(&self, _identifier: &str) -> Option<char> {
	fn identifier_quote_style(&self, identifier: &str) -> Option<char> {

Comments

Conversation

goldmedal commented May 19, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

goldmedal commented May 21, 2024

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goldmedal commented May 22, 2024

Uh oh!

phillipleblanc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

backkem May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goldmedal May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

backkem left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goldmedal commented May 23, 2024

Uh oh!

Omega359 commented May 24, 2024

Uh oh!

alamb commented May 25, 2024

Uh oh!

Reviewers

Assignees

Labels

backkem May 22, 2024 •

edited

Loading

goldmedal May 22, 2024 •

edited

Loading

alamb left a comment •

edited

Loading