Enhancement: Improved Identifiers - casefolding, quoted values, and basic escaping#5726
Conversation
Coverage Results ✅ |
alanmcruickshank
left a comment
There was a problem hiding this comment.
First: thanks for the contribution! It's great to see a contribution where someone is really getting stuck in like this 🚀
I haven't reviewed everything yet, but I've reviewed all the dialect parts, which look sensible (except where I've added some comments).
The parsing parts also look fairly sensible, but I'd love a different solution in the MatchResult change - it makes MatchResult dependant on the structure of dialects which I think makes the code fairly fragile - I'd love something more generic if possible.
I think there's also a possible unnecessary raw_value property on the segment class.
What I haven't done yet is pick through all the places that we're swapping from segments to strings or referring to the different raw properties yet. I'll do that in a second pass.
| @property | ||
| def raw_value(self) -> str: | ||
| """Returns the raw segment's value.""" | ||
| return self._raw_value |
There was a problem hiding this comment.
I can't find an instance of this actually being used. Did you mean to use it in one of the rules, or is this a remnant of previous testing?
There was a problem hiding this comment.
This was a remnant from before. I decided that the difference between what the raw_normalized and raw_value would return could be handled with the casefold boolean.
alanmcruickshank
left a comment
There was a problem hiding this comment.
Great work. Couple of tweeks from my side, but otherwise good to go. @WittierDinosaur do you want to take another look to see if you can see anything?
Otherwise, thanks so much for this one @keraion. Monster PR, but all checks out 👍
| new_seg: "BaseSegment" | ||
| if self.matched_class.class_is_type("raw"): | ||
| _raw_type = cast(Type["RawSegment"], self.matched_class) | ||
| assert len(result_segments) == 1 | ||
| # TODO: Should this be a generic method on BaseSegment and RawSegment? | ||
| # It feels a little strange to be this specific here. | ||
| new_seg = _raw_type( | ||
| raw=result_segments[0].raw, | ||
| pos_marker=result_segments[0].pos_marker, | ||
| **self.segment_kwargs, | ||
| ) | ||
| else: | ||
| new_seg = self.matched_class( | ||
| segments=result_segments, **self.segment_kwargs | ||
| ) | ||
| new_seg: "BaseSegment" = self.matched_class.from_result_segments( | ||
| result_segments, self.segment_kwargs | ||
| ) |
There was a problem hiding this comment.
NICE. Yeah this is more what I had in mind 👍
Brief summary of the change made
First and foremost, apologies for the larger PR here.
This adds more functionality to identifiers.
Initial AL05 fixes with enhanced identifiers:
QUALIFYin Redshift dialectFixes #3966
Fixes #4150
Makes progress on #4623
Fixes #4801
Fixes #4802
Makes progress on #5356
Makes progress on #5471
Fixes #5625
Are there any other side effects of this change that we should be aware of?
Some locations where prior a
strwas return, it now returns aBaseSegmentto retain casefolding information. Locations using the string are evaluating the raw version as they were before.Pull Request checklist
Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
.ymlrule test cases intest/fixtures/rules/std_rule_cases..sql/.ymlparser test cases intest/fixtures/dialects(note YML files can be auto generated withtox -e generate-fixture-yml).test/fixtures/linter/autofix.Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.