[ruff] Implement unnecessary-regular-expression (RUF055)#14659
[ruff] Implement unnecessary-regular-expression (RUF055)#14659AlexWaygood merged 35 commits intoastral-sh:mainfrom
ruff] Implement unnecessary-regular-expression (RUF055)#14659Conversation
|
@dosisod This seems like it would be a good 'refurb' rule for your linter |
|
| code | total | + violation | - violation | + fix | - fix |
|---|---|---|---|---|---|
| RUF055 | 50 | 50 | 0 | 0 | 0 |
MichaReiser
left a comment
There was a problem hiding this comment.
This overall looks great. You made this look simple.
The only thing that I notice we miss is raw-string support (or, at least, tests for it). Raw strings are the recommended way to write regex patterns in python because it avoids the need for double escaping.
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
| // For now, reject any regex metacharacters. Compare to the complete list | ||
| // from https://docs.python.org/3/howto/regex.html#matching-characters | ||
| let has_metacharacters = string_lit.value.chars().any(|c| { | ||
| matches!( | ||
| c, | ||
| '.' | '^' | '$' | '*' | '+' | '?' | '{' | '}' | '[' | ']' | '\\' | '|' | '(' | ')' | ||
| ) | ||
| }); |
There was a problem hiding this comment.
I like how you intentionally excluded meta-characters. So consider this an extension, and I think it's totally fine to do this as a follow-up pr (or not at all).
It would be nice if the rule only skips replacement for characters that are different between regex expressions and regular strings. For example, \n matches \n in a regex and a string.
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Thanks for the excellent PR writeup -- it made reviewing this really easy! This looks great overall.
The limitations around Match seem necessary, but some of the other restrictions can probably be loosened. For example, the sub replacement doesn't have to be a string literal, but it does need to be a string or at the very least not a function. Similarly, the patterns themselves could be plain str variables, but we need to inspect them for regex metacharacters. I didn't find a way to do that for non-literal strings, but if I missed it, that would be an easy improvement.
We don't have an out-of-the-box way of doing this for strings right now, so I wouldn't try to tackle it in this PR. But if you're interested, a followup might be to add an is_str() function to ruff_python_semantic::analyze::typing that looks similar to this is_list function:
ruff/crates/ruff_python_semantic/src/analyze/typing.rs
Lines 739 to 745 in d9cbf2f
And then you could use that in this rule for stronger type inference
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: Micha Reiser <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
|
Thank you both for the great reviews! I think I've incorporated all of the suggestions, with the exception of handling simple escapes like Similarly, I'm quite interested in the |
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/unnecessary_regular_expression.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Simon Brugman <[email protected]>
Co-authored-by: Simon Brugman <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
| 32 32 | | ||
| 33 33 | # this should be replaced with "abc" == s | ||
| 34 |-if re.fullmatch("abc", s): | ||
| 34 |+if "abc" == s: |
There was a problem hiding this comment.
As a minor note, I think this should be s == "abc". It's a minor stylistic difference (in which I think x == VALUE is more idiomatic), but it can cause actual differences due to type checking implementations, e.g. microsoft/pyright#9093
Summary
This is a limited implementation of the rule idea from #12283 to replace some uses of the
remodule withstrmethod calls. A few of the examples given there:For this initial implementation, I've restricted the rule to string literals in the
patternargument to each of therefunctions and further restricted these string literals to exclude anyremetacharacters. Each of therefunctions takes additional kwargs that change their behavior, so the rule doesn't apply when these are present either. re.sub can also take a function as the replacement argument (unlikestr.replace, which expects anotherstr), so the rule is also restricted to cases where that argument is also a string literal. Finally,match,search, andfullmatchreturnMatchobjects unlike the proposed fixes, so the rule only applies when these are used in a boolean test for their truth values. For example,would trigger the rule, but the plain
re.match("abc", s)call above would not because the returnedMatchcould be used. I think this is probably a fairly common use case, so the rule can still be useful even with these restrictions.The limitations around
Matchseem necessary, but some of the other restrictions can probably be loosened. For example, thesubreplacement doesn't have to be a string literal, but it does need to be a string or at the very least not a function. Similarly, the patterns themselves could be plainstrvariables, but we need to inspect them for regex metacharacters. I didn't find a way to do that for non-literal strings, but if I missed it, that would be an easy improvement.I think these checks can also be directly extended to the
regexpackage. I sawunraw-re-pattern(RUF039), for example, handles bothreandregex, but I only handledrefor now.Test Plan
cargo testwith newRUF055.pysnapshot test.Possible related rule
Right before submitting this, I tried running
RUF055.pywith python, and it crashed with aValueError: cannot use LOCALE flag with a str pattern. That would be an easy thing to check with very similar code to what I have here.