Preserve triple quotes and prefixes for strings#15818
Conversation
|
| self.p(flags.prefix().as_str()); | ||
| self.p(flags.quote_str()); | ||
| self.p(body); | ||
| self.p(flags.quote_str()); |
There was a problem hiding this comment.
| self.p(flags.prefix().as_str()); | |
| self.p(flags.quote_str()); | |
| self.p(body); | |
| self.p(flags.quote_str()); | |
| self.p(&flags.format_string_contents(body)); |
There was a problem hiding this comment.
What did you think about the str::from_utf8 call above this? As far as I know this should be a safe unwrap or unreachable!, but I went with this approach just in case. I figured it was better to fall back on the normal escaping code than to panic if I were wrong, but it does complicate the flow a bit.
There was a problem hiding this comment.
Right now it doesn't seem like it's possible to construct an AST node for a literal-bytes expression that includes a non-ASCII character in it: https://play.ruff.rs/0b3b0efb-74cd-4e26-a885-9180e23e4506
However, our parser supports error recovery and the specifics of exactly what nodes representing invalid syntax will look like are an implementation detail. Similarly, we don't currently run any of our AST-based rules on files that contain invalid syntax (let alone try to autofix them), but that might well change in the future.
All told, I think I prefer your current approach here, though I'd also be interested in @dhruvmanila's thoughts. It might be good to add a comment, whatever we end up doing here!
There was a problem hiding this comment.
That makes sense, thanks! I expanded my comment in p_bytes_repr a bit more, hopefully capturing some of this information. Otherwise I'm interested in Dhruv's thoughts too!
This seems a bit separate, but I guess we could even make a stronger assertion that s.iter().all(u8::is_ascii) if we wanted. Although we'd still need to call from_utf8 to get something to write in our buffer anyway.
There was a problem hiding this comment.
Sorry for the late reply. The main goal of error recovery would be to preserve all the information of the original source i.e., the parser shouldn't drop any of the tokens. This would happen on a case by case basis but, for example, in this scenario the parser could just store the raw bytes and set a flag that it contains non-ascii bytes. All downstream tools would then need to account for this additional information.
Another thing to point out is that the generator is mainly used to provide auto-fix and we might decide that even though we would provide diagnostics for source code with syntax errors, we won't be providing any auto-fix for it as it might prove even more difficult to consider all the cases.
I think the current logic is fine for now and we can think about this more when we start working on this.
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
AlexWaygood
left a comment
There was a problem hiding this comment.
Great work! The only other thing I'd do is get rid of the Default implementation for AnyStringFlags, for a consistent interface to the other *Flags structs
|
Thank you! Sorry I neglected the |
That works for me! Or we could just inline it into |
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
MichaReiser
left a comment
There was a problem hiding this comment.
I've mainly an understand question:
Who's responsibility is it to correctly handle nested quotes? Is it the generator or the code building the AST? If it is the generator, what happens in cases where it's impossible to pick the right quotes?
crates/ruff_linter/src/rules/flake8_simplify/rules/split_static_string.rs
Show resolved
Hide resolved
| if flags.prefix().is_raw() && self.p_raw_bytes(s, flags).is_ok() { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Does this lead to a syntax error if flags uses a quote that is contained in the string? For example, what if I have b" and 'tert" and flags specify single quotes?
There was a problem hiding this comment.
I think that is true, but the only way to get into that state is if the user of the Generator hard-coded a request for single quotes. The intention was to pass along flags from input string literals (so the input would have to be br' and 'tert', which is already a syntax error, or to use Checker::default_bytes_flags if there's not an existing literal to preserve, which I think should also handle this case.
Basically you'd have to write code like
BytesLiteral {
value: Box::from(*b" and 'tert"),
range: TextRange::default(),
flags: BytesLiteralFlags::empty()
.with_quote_style(Quote::Single)
.with_prefix(ByteStringPrefix::Raw { uppercase_r: false }),
}to cause the issue, at least as far as I can think of.
There was a problem hiding this comment.
The exception to this would be if a refactor e.g moves a byte string into an f-string expression and the target python version is 3.12. But arguably, there isn't much the generator can do about this.
There was a problem hiding this comment.
Oh good point, I forgot about rules moving these into f-strings. I think UnicodeEscape will still get a chance to run in that case, as long as the outer f-string isn't raw too, but this is good to keep in mind.
It's still the generator's responsibility to handle nested quotes. All of the code paths except raw strings end up going through either I tested your |
crates/ruff_linter/resources/test/fixtures/flake8_simplify/SIM905.py
Outdated
Show resolved
Hide resolved
| fn display_contents(self, contents: &str) -> DisplayFlags { | ||
| DisplayFlags { | ||
| prefix: self.prefix(), | ||
| quote_str: self.quote_str(), | ||
| contents, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| pub struct DisplayFlags<'a> { | ||
| prefix: AnyStringPrefix, | ||
| quote_str: &'a str, | ||
| contents: &'a str, | ||
| } | ||
|
|
||
| impl std::fmt::Display for DisplayFlags<'_> { | ||
| fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
| write!( | ||
| f, | ||
| "{prefix}{quote}{contents}{quote}", | ||
| prefix = self.prefix, | ||
| quote = self.quote_str, | ||
| contents = self.contents | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
An alternative here might be this (the diff is relative to your branch), which would mean the DisplayFlags struct would take up slightly less memory. Curious what @MichaReiser thinks of the pros and cons. I think there's almost certainly not much practical difference to what you have now:
diff --git a/crates/ruff_python_ast/src/nodes.rs b/crates/ruff_python_ast/src/nodes.rs
index 3471776f2..70b21e7db 100644
--- a/crates/ruff_python_ast/src/nodes.rs
+++ b/crates/ruff_python_ast/src/nodes.rs
@@ -977,7 +977,7 @@ impl Ranged for FStringPart {
}
}
-pub trait StringFlags: Copy {
+pub trait StringFlags: Copy + Into<AnyStringFlags> {
/// Does the string use single or double quotes in its opener and closer?
fn quote_style(self) -> Quote;
@@ -1029,16 +1029,14 @@ pub trait StringFlags: Copy {
fn display_contents(self, contents: &str) -> DisplayFlags {
DisplayFlags {
- prefix: self.prefix(),
- quote_str: self.quote_str(),
+ flags: self.into(),
contents,
}
}
}
pub struct DisplayFlags<'a> {
- prefix: AnyStringPrefix,
- quote_str: &'a str,
+ flags: AnyStringFlags,
contents: &'a str,
}
@@ -1047,8 +1045,8 @@ impl std::fmt::Display for DisplayFlags<'_> {
write!(
f,
"{prefix}{quote}{contents}{quote}",
- prefix = self.prefix,
- quote = self.quote_str,
+ prefix = self.flags.prefix(),
+ quote = self.flags.quote_str(),
contents = self.contents
)
}
diff --git a/crates/ruff_python_parser/src/token.rs b/crates/ruff_python_parser/src/token.rs
index ecda19730..d741b15a4 100644
--- a/crates/ruff_python_parser/src/token.rs
+++ b/crates/ruff_python_parser/src/token.rs
@@ -777,6 +777,12 @@ impl TokenFlags {
}
}
+impl From<TokenFlags> for AnyStringFlags {
+ fn from(flags: TokenFlags) -> Self {
+ flags.as_any_string_flags()
+ }
+}
+There was a problem hiding this comment.
I'm not too fond of trait chains. I'd rather have a into_any_string_flags method but I don't think it matters much. Display is rarely used.
| } | ||
| } | ||
|
|
||
| pub struct DisplayFlags<'a> { |
There was a problem hiding this comment.
I don't love the name of this struct because it doesn't just display the flags -- it displays the string contents embedded in the prefix and quotes, which are supplied by the flags. DisplayStringLiteralWithQuotesAndPrefix is kinda verbose, though, so I'm not sure what a better name might be. (AKA: I'm fine with it being merged as-is!)
There was a problem hiding this comment.
Oh good point. I also wanted to make the struct private, but it's in a public trait. One thing we could do is just return impl Display from display_contents? Then we could use as verbose an internal name as we want and arguably better capture the intention here without having to name a type.
There was a problem hiding this comment.
The downside of that is that the trait is no longer object safe, but it might not matter here.
There was a problem hiding this comment.
Oh good point. I'm somewhat leaning toward leaving the code as it is here, for both the struct name and its contents (re the conversation above) if that's okay with you both.
I don't really love the new trait bound or From impl either, but it does make the struct smaller, so I'm happy either way, though.
There was a problem hiding this comment.
I'm fine to leave everything as is!
* main: (66 commits) [red-knot] Use ternary decision diagrams (TDDs) for visibility constraints (#15861) [`pyupgrade`] Rename private type parameters in PEP 695 generics (`UP049`) (#15862) Simplify the `StringFlags` trait (#15944) [`flake8-pyi`] Make `PYI019` autofixable for `.py` files in preview mode as well as stubs (#15889) Docs (`linter.md`): clarify that Python files are always searched for in subdirectories (#15882) [`flake8-pyi`] Make PEP-695 functions with multiple type parameters fixable by PYI019 again (#15938) [red-knot] Use unambiguous invalid-syntax-construct for suppression comment test (#15933) Make `Binding::range()` point to the range of a type parameter's name, not the full type parameter (#15935) Update black deviations (#15928) [red-knot] MDTest: Fix line numbers in error messages (#15932) Preserve triple quotes and prefixes for strings (#15818) [red-knot] Hand-written MDTest parser (#15926) [`pylint`] Fix missing parens in unsafe fix for `unnecessary-dunder-call` (`PLC2801`) (#15762) nit: docs for ignore & select (#15883) [airflow] `BashOperator` has been moved to `airflow.providers.standard.operators.bash.BashOperator` (AIR302) (#15922) [`flake8-logging`] `.exception()` and `exc_info=` outside exception handlers (`LOG004`, `LOG014`) (#15799) [red-knot] Enforce specifying paths for mdtest code blocks in a separate preceding line (#15890) [red-knot] Internal refactoring of visibility constraints API (#15913) [red-knot] Implicit instance attributes (#15811) [`flake8-comprehensions`] Handle extraneous parentheses around list comprehension (`C403`) (#15877) ...
Summary
This is a follow-up to #15726, #15778, and #15794 to preserve the triple quote and prefix flags in plain strings, bytestrings, and f-strings.
I also added a
StringLiteralFlags::without_triple_quotesmethod to avoid passing along triple quotes in rules like SIM905 where it might not make sense, as discussed here.Test Plan
Existing tests, plus many new cases in the
generator::tests::quotetest that should cover all combinations of quotes and prefixes, at least for simple string bodies.Closes #7799 when combined with #15694, #15726, #15778, and #15794.