Skip to content

Enhancement: Improved Identifiers - casefolding, quoted values, and basic escaping#5726

Merged
alanmcruickshank merged 23 commits intosqlfluff:mainfrom
keraion:AL05_quotes_multiples
Apr 13, 2024
Merged

Enhancement: Improved Identifiers - casefolding, quoted values, and basic escaping#5726
alanmcruickshank merged 23 commits intosqlfluff:mainfrom
keraion:AL05_quotes_multiples

Conversation

@keraion
Copy link
Copy Markdown
Contributor

@keraion keraion commented Mar 28, 2024

Brief summary of the change made

First and foremost, apologies for the larger PR here.

This adds more functionality to identifiers.

  • Per-dialect casefolding comparisons
  • Normalizing reference values; the value inside of quotes and handling basic escapes sequences.
    Initial AL05 fixes with enhanced identifiers:
  • Handling case sensitive or quoted values cases, plus a configuration to override default casefolding behavior (e.g. MySQL's casefolding depends on the host system)
  • Add BigQuery to unnesting alias logic
  • Identify tables that are used multiple times and require aliasing even when not referenced
  • Require an alias for tables that are immediately followed by a QUALIFY in Redshift dialect

Fixes #3966
Fixes #4150
Makes progress on #4623
Fixes #4801
Fixes #4802
Makes progress on #5356
Makes progress on #5471
Fixes #5625

Are there any other side effects of this change that we should be aware of?

Some locations where prior a str was return, it now returns a BaseSegment to retain casefolding information. Locations using the string are evaluating the raw version as they were before.

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 28, 2024

Coverage Results ✅

Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL   17539      0   100%

225 files skipped due to complete coverage.

@coveralls
Copy link
Copy Markdown

coveralls commented Mar 28, 2024

Coverage Status

coverage: 99.985% (+0.001%) from 99.984%
when pulling bedcb4a on keraion:AL05_quotes_multiples
into 422a92b on sqlfluff:main.

Copy link
Copy Markdown
Member

@alanmcruickshank alanmcruickshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First: thanks for the contribution! It's great to see a contribution where someone is really getting stuck in like this 🚀

I haven't reviewed everything yet, but I've reviewed all the dialect parts, which look sensible (except where I've added some comments).

The parsing parts also look fairly sensible, but I'd love a different solution in the MatchResult change - it makes MatchResult dependant on the structure of dialects which I think makes the code fairly fragile - I'd love something more generic if possible.

I think there's also a possible unnecessary raw_value property on the segment class.

What I haven't done yet is pick through all the places that we're swapping from segments to strings or referring to the different raw properties yet. I'll do that in a second pass.

Comment thread src/sqlfluff/utils/analysis/query.py Outdated
Comment thread test/fixtures/dialects/sqlite/quoted_identifiers.sql
Comment thread src/sqlfluff/core/parser/match_result.py Outdated
Comment on lines +119 to +122
@property
def raw_value(self) -> str:
"""Returns the raw segment's value."""
return self._raw_value
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find an instance of this actually being used. Did you mean to use it in one of the rules, or is this a remnant of previous testing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a remnant from before. I decided that the difference between what the raw_normalized and raw_value would return could be handled with the casefold boolean.

Copy link
Copy Markdown
Member

@alanmcruickshank alanmcruickshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Couple of tweeks from my side, but otherwise good to go. @WittierDinosaur do you want to take another look to see if you can see anything?

Otherwise, thanks so much for this one @keraion. Monster PR, but all checks out 👍

Comment thread src/sqlfluff/core/parser/segments/base.py Outdated
Comment on lines -290 to +291
new_seg: "BaseSegment"
if self.matched_class.class_is_type("raw"):
_raw_type = cast(Type["RawSegment"], self.matched_class)
assert len(result_segments) == 1
# TODO: Should this be a generic method on BaseSegment and RawSegment?
# It feels a little strange to be this specific here.
new_seg = _raw_type(
raw=result_segments[0].raw,
pos_marker=result_segments[0].pos_marker,
**self.segment_kwargs,
)
else:
new_seg = self.matched_class(
segments=result_segments, **self.segment_kwargs
)
new_seg: "BaseSegment" = self.matched_class.from_result_segments(
result_segments, self.segment_kwargs
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE. Yeah this is more what I had in mind 👍

@alanmcruickshank alanmcruickshank added this pull request to the merge queue Apr 13, 2024
Merged via the queue into sqlfluff:main with commit 4722f99 Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants