Preserve triple quotes and prefixes for strings by ntBre · Pull Request #15818 · astral-sh/ruff

ntBre · 2025-01-29T21:57:29Z

Summary

This is a follow-up to #15726, #15778, and #15794 to preserve the triple quote and prefix flags in plain strings, bytestrings, and f-strings.

I also added a StringLiteralFlags::without_triple_quotes method to avoid passing along triple quotes in rules like SIM905 where it might not make sense, as discussed here.

Test Plan

Existing tests, plus many new cases in the generator::tests::quote test that should cover all combinations of quotes and prefixes, at least for simple string bodies.

Closes #7799 when combined with #15694, #15726, #15778, and #15794.

crates/ruff_python_codegen/src/generator.rs

crates/ruff_python_literal/src/escape.rs

github-actions · 2025-01-29T22:07:00Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

AlexWaygood

Nice!

crates/ruff_python_ast/src/nodes.rs

AlexWaygood · 2025-01-29T22:07:52Z

crates/ruff_python_codegen/src/generator.rs

+        self.p(flags.prefix().as_str());
+        self.p(flags.quote_str());
+        self.p(body);
+        self.p(flags.quote_str());


Suggested change

self.p(flags.prefix().as_str());

self.p(flags.quote_str());

self.p(body);

self.p(flags.quote_str());

self.p(&flags.format_string_contents(body));

What did you think about the str::from_utf8 call above this? As far as I know this should be a safe unwrap or unreachable!, but I went with this approach just in case. I figured it was better to fall back on the normal escaping code than to panic if I were wrong, but it does complicate the flow a bit.

Right now it doesn't seem like it's possible to construct an AST node for a literal-bytes expression that includes a non-ASCII character in it: https://play.ruff.rs/0b3b0efb-74cd-4e26-a885-9180e23e4506

However, our parser supports error recovery and the specifics of exactly what nodes representing invalid syntax will look like are an implementation detail. Similarly, we don't currently run any of our AST-based rules on files that contain invalid syntax (let alone try to autofix them), but that might well change in the future.

All told, I think I prefer your current approach here, though I'd also be interested in @dhruvmanila's thoughts. It might be good to add a comment, whatever we end up doing here!

That makes sense, thanks! I expanded my comment in p_bytes_repr a bit more, hopefully capturing some of this information. Otherwise I'm interested in Dhruv's thoughts too!

This seems a bit separate, but I guess we could even make a stronger assertion that s.iter().all(u8::is_ascii) if we wanted. Although we'd still need to call from_utf8 to get something to write in our buffer anyway.

Sorry for the late reply. The main goal of error recovery would be to preserve all the information of the original source i.e., the parser shouldn't drop any of the tokens. This would happen on a case by case basis but, for example, in this scenario the parser could just store the raw bytes and set a flag that it contains non-ascii bytes. All downstream tools would then need to account for this additional information.

Another thing to point out is that the generator is mainly used to provide auto-fix and we might decide that even though we would provide diagnostics for source code with syntax errors, we won't be providing any auto-fix for it as it might prove even more difficult to consider all the cases.

I think the current logic is fine for now and we can think about this more when we start working on this.

crates/ruff_python_codegen/src/generator.rs

crates/red_knot_python_semantic/src/types/display.rs

Co-authored-by: Alex Waygood <[email protected]>

crates/ruff_python_codegen/src/generator.rs

crates/ruff_python_literal/src/escape.rs

Co-authored-by: Alex Waygood <[email protected]>

crates/ruff_python_literal/src/escape.rs

crates/ruff_python_codegen/src/generator.rs

crates/ruff_python_literal/src/escape.rs

Co-authored-by: Alex Waygood <[email protected]>

AlexWaygood

Great work! The only other thing I'd do is get rid of the Default implementation for AnyStringFlags, for a consistent interface to the other *Flags structs

crates/ruff_python_ast/src/nodes.rs

ntBre · 2025-01-30T19:36:32Z

Thank you! Sorry I neglected the Default change from our conversation earlier. It looks like AnyStringFlags::default is only used in AnyStringFlags::new. I added a private empty method instead for now, but I could also make empty public, and add all of the docs from the other Flags. What do you think?

AlexWaygood · 2025-01-30T19:40:51Z

I added a private empty method instead for now

That works for me! Or we could just inline it into AnyStringFlags::new()

crates/ruff_python_ast/src/nodes.rs

Co-authored-by: Alex Waygood <[email protected]>

crates/ruff_python_ast/src/nodes.rs

MichaReiser

I've mainly an understand question:

Who's responsibility is it to correctly handle nested quotes? Is it the generator or the code building the AST? If it is the generator, what happens in cases where it's impossible to pick the right quotes?

crates/ruff_python_ast/src/nodes.rs

crates/ruff_linter/src/rules/flake8_simplify/rules/split_static_string.rs

crates/ruff_python_codegen/src/generator.rs

MichaReiser · 2025-02-03T09:20:45Z

crates/ruff_python_codegen/src/generator.rs

+        if flags.prefix().is_raw() && self.p_raw_bytes(s, flags).is_ok() {
+            return;
+        }


Does this lead to a syntax error if flags uses a quote that is contained in the string? For example, what if I have b" and 'tert" and flags specify single quotes?

I think that is true, but the only way to get into that state is if the user of the Generator hard-coded a request for single quotes. The intention was to pass along flags from input string literals (so the input would have to be br' and 'tert', which is already a syntax error, or to use Checker::default_bytes_flags if there's not an existing literal to preserve, which I think should also handle this case.

Basically you'd have to write code like

BytesLiteral { value: Box::from(*b" and 'tert"), range: TextRange::default(), flags: BytesLiteralFlags::empty() .with_quote_style(Quote::Single) .with_prefix(ByteStringPrefix::Raw { uppercase_r: false }), }

to cause the issue, at least as far as I can think of.

The exception to this would be if a refactor e.g moves a byte string into an f-string expression and the target python version is 3.12. But arguably, there isn't much the generator can do about this.

Oh good point, I forgot about rules moving these into f-strings. I think UnicodeEscape will still get a chance to run in that case, as long as the outer f-string isn't raw too, but this is good to keep in mind.

ntBre · 2025-02-03T14:29:31Z

Who's responsibility is it to correctly handle nested quotes? Is it the generator or the code building the AST? If it is the generator, what happens in cases where it's impossible to pick the right quotes?

It's still the generator's responsibility to handle nested quotes. All of the code paths except raw strings end up going through either UnicodeEscape::with_preferred_quote (in the case of normal strings and f-strings) or AsciiEscape::with_preferred_quote (for bytestrings). These try to use the quote you pass in but will change the outer quotes if needed or even start adding escaped quotes in the worst case.

I tested your split case above with SIM905 (after temporarily modifying the rule to preserve triple quotes) and the output was ['''"itemA"''', """'itemB'""", """'''itemC'''"""]. itemA is probably the most interesting because the outer triple double quotes from the input got converted to single quotes to preserve the inner double quotes.

crates/ruff_linter/resources/test/fixtures/flake8_simplify/SIM905.py

AlexWaygood

LGTM!

AlexWaygood · 2025-02-03T17:52:38Z

crates/ruff_python_ast/src/nodes.rs

+    fn display_contents(self, contents: &str) -> DisplayFlags {
+        DisplayFlags {
+            prefix: self.prefix(),
+            quote_str: self.quote_str(),
+            contents,
+        }
+    }
+}
+
+pub struct DisplayFlags<'a> {
+    prefix: AnyStringPrefix,
+    quote_str: &'a str,
+    contents: &'a str,
+}
+
+impl std::fmt::Display for DisplayFlags<'_> {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        write!(
+            f,
+            "{prefix}{quote}{contents}{quote}",
+            prefix = self.prefix,
+            quote = self.quote_str,
+            contents = self.contents
+        )
    }
 }


An alternative here might be this (the diff is relative to your branch), which would mean the DisplayFlags struct would take up slightly less memory. Curious what @MichaReiser thinks of the pros and cons. I think there's almost certainly not much practical difference to what you have now:

diff --git a/crates/ruff_python_ast/src/nodes.rs b/crates/ruff_python_ast/src/nodes.rs index 3471776f2..70b21e7db 100644 --- a/crates/ruff_python_ast/src/nodes.rs +++ b/crates/ruff_python_ast/src/nodes.rs @@ -977,7 +977,7 @@ impl Ranged for FStringPart { } } -pub trait StringFlags: Copy { +pub trait StringFlags: Copy + Into<AnyStringFlags> { /// Does the string use single or double quotes in its opener and closer? fn quote_style(self) -> Quote; @@ -1029,16 +1029,14 @@ pub trait StringFlags: Copy { fn display_contents(self, contents: &str) -> DisplayFlags { DisplayFlags { - prefix: self.prefix(), - quote_str: self.quote_str(), + flags: self.into(), contents, } } } pub struct DisplayFlags<'a> { - prefix: AnyStringPrefix, - quote_str: &'a str, + flags: AnyStringFlags, contents: &'a str, } @@ -1047,8 +1045,8 @@ impl std::fmt::Display for DisplayFlags<'_> { write!( f, "{prefix}{quote}{contents}{quote}", - prefix = self.prefix, - quote = self.quote_str, + prefix = self.flags.prefix(), + quote = self.flags.quote_str(), contents = self.contents ) } diff --git a/crates/ruff_python_parser/src/token.rs b/crates/ruff_python_parser/src/token.rs index ecda19730..d741b15a4 100644 --- a/crates/ruff_python_parser/src/token.rs +++ b/crates/ruff_python_parser/src/token.rs @@ -777,6 +777,12 @@ impl TokenFlags { } } +impl From<TokenFlags> for AnyStringFlags { + fn from(flags: TokenFlags) -> Self { + flags.as_any_string_flags() + } +} +

I'm not too fond of trait chains. I'd rather have a into_any_string_flags method but I don't think it matters much. Display is rarely used.

AlexWaygood · 2025-02-03T17:59:22Z

crates/ruff_python_ast/src/nodes.rs

+    }
+}
+
+pub struct DisplayFlags<'a> {


I don't love the name of this struct because it doesn't just display the flags -- it displays the string contents embedded in the prefix and quotes, which are supplied by the flags. DisplayStringLiteralWithQuotesAndPrefix is kinda verbose, though, so I'm not sure what a better name might be. (AKA: I'm fine with it being merged as-is!)

Oh good point. I also wanted to make the struct private, but it's in a public trait. One thing we could do is just return impl Display from display_contents? Then we could use as verbose an internal name as we want and arguably better capture the intention here without having to name a type.

Oh, I like that idea!

The downside of that is that the trait is no longer object safe, but it might not matter here.

Oh good point. I'm somewhat leaning toward leaving the code as it is here, for both the struct name and its contents (re the conversation above) if that's okay with you both.

I don't really love the new trait bound or From impl either, but it does make the struct smaller, so I'm happy either way, though.

I'm fine to leave everything as is!

* main: (66 commits) [red-knot] Use ternary decision diagrams (TDDs) for visibility constraints (#15861) [`pyupgrade`] Rename private type parameters in PEP 695 generics (`UP049`) (#15862) Simplify the `StringFlags` trait (#15944) [`flake8-pyi`] Make `PYI019` autofixable for `.py` files in preview mode as well as stubs (#15889) Docs (`linter.md`): clarify that Python files are always searched for in subdirectories (#15882) [`flake8-pyi`] Make PEP-695 functions with multiple type parameters fixable by PYI019 again (#15938) [red-knot] Use unambiguous invalid-syntax-construct for suppression comment test (#15933) Make `Binding::range()` point to the range of a type parameter's name, not the full type parameter (#15935) Update black deviations (#15928) [red-knot] MDTest: Fix line numbers in error messages (#15932) Preserve triple quotes and prefixes for strings (#15818) [red-knot] Hand-written MDTest parser (#15926) [`pylint`] Fix missing parens in unsafe fix for `unnecessary-dunder-call` (`PLC2801`) (#15762) nit: docs for ignore & select (#15883) [airflow] `BashOperator` has been moved to `airflow.providers.standard.operators.bash.BashOperator` (AIR302) (#15922) [`flake8-logging`] `.exception()` and `exc_info=` outside exception handlers (`LOG004`, `LOG014`) (#15799) [red-knot] Enforce specifying paths for mdtest code blocks in a separate preceding line (#15890) [red-knot] Internal refactoring of visibility constraints API (#15913) [red-knot] Implicit instance attributes (#15811) [`flake8-comprehensions`] Handle extraneous parentheses around list comprehension (`C403`) (#15877) ...

ntBre added 6 commits January 29, 2025 13:33

preserve triple quoting

4559f26

unset triple quotes for SIM905 fix, add without_triple_quotes method

de72a61

pass flags instead of separate quote and triple_quote values

59067bd

move prefix logic into p_str_repr

5354c0c

handle prefixes for bytestrings

7d93868

test all prefix and quote combinations

4c85aad

ntBre requested review from AlexWaygood, MichaReiser, carljm and sharkdp as code owners January 29, 2025 21:57

ntBre commented Jan 29, 2025

View reviewed changes

crates/ruff_python_codegen/src/generator.rs Outdated Show resolved Hide resolved

ntBre commented Jan 29, 2025

View reviewed changes

crates/ruff_python_literal/src/escape.rs Outdated Show resolved Hide resolved

AlexWaygood reviewed Jan 29, 2025

View reviewed changes

AlexWaygood added bug Something isn't working fixes Related to suggested fixes for violations labels Jan 29, 2025

ntBre and others added 5 commits January 29, 2025 17:33

add *Flags::without_triple_quotes

13e6e7a

use format_string_contents

6b0efc8

unwrap -> expect

0483893

Co-authored-by: Alex Waygood <[email protected]>

use a matrix for prefix-quote tests

7dcc98e

use TripleQuoted enum instead of bool

759cb64

Co-authored-by: Alex Waygood <[email protected]>

AlexWaygood reviewed Jan 30, 2025

View reviewed changes

crates/ruff_python_codegen/src/generator.rs Outdated Show resolved Hide resolved

crates/ruff_python_codegen/src/generator.rs Show resolved Hide resolved

crates/ruff_python_literal/src/escape.rs Outdated Show resolved Hide resolved

ntBre and others added 4 commits January 30, 2025 08:48

remove redundant quote tests

5ae157c

add nested quote test for bytestring

55379b7

Co-authored-by: Alex Waygood <[email protected]>

mark is_yes const

0550ac1

Co-authored-by: Alex Waygood <[email protected]>

prototype making the TripleQuoted enum more central to the API

6183c74

dhruvmanila reviewed Jan 30, 2025

View reviewed changes

crates/ruff_python_literal/src/escape.rs Outdated Show resolved Hide resolved

ntBre added 2 commits January 30, 2025 10:19

Merge branch 'alex/prototype' of github.com:AlexWaygood/ruff

82f341c

update doc links

718fb82

AlexWaygood reviewed Jan 30, 2025

View reviewed changes

ntBre and others added 3 commits January 30, 2025 11:46

rename field to triple_quotes

073f1e9

Co-authored-by: Alex Waygood <[email protected]>

add StringFlags::write_string_contents

f95a28d

Co-authored-by: Alex Waygood <[email protected]>

expand ascii bytes comment

0c55361

AlexWaygood approved these changes Jan 30, 2025

View reviewed changes

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved

ntBre added 2 commits January 30, 2025 14:47

remove AnyStringFlags::default

dab84a3

use write_string_contents in format_string_contents

7f62864

AlexWaygood approved these changes Jan 30, 2025

View reviewed changes

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved

ntBre and others added 2 commits January 30, 2025 18:11

tidy imports

7eadc7c

Co-authored-by: Alex Waygood <[email protected]>

move buf_size optimization into write_string_contents

845fe6a

Co-authored-by: Alex Waygood <[email protected]>

AlexWaygood approved these changes Jan 30, 2025

View reviewed changes

dhruvmanila reviewed Jan 31, 2025

View reviewed changes

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved

rename with_triple_quotes_set_to

b799946

MichaReiser reviewed Feb 3, 2025

View reviewed changes

MichaReiser approved these changes Feb 3, 2025

View reviewed changes

ntBre added 3 commits February 3, 2025 10:00

inline p_raw_bytes

eef013d

switch to display impl for StringFlags

970565f

add nested quote test case

0f83712

AlexWaygood reviewed Feb 3, 2025

View reviewed changes

crates/ruff_linter/resources/test/fixtures/flake8_simplify/SIM905.py Outdated Show resolved Hide resolved

move test case to the bottom

41f5f5e

AlexWaygood approved these changes Feb 3, 2025

View reviewed changes

AlexWaygood reviewed Feb 3, 2025

View reviewed changes

ntBre merged commit b5e5271 into main Feb 4, 2025
21 checks passed

ntBre deleted the brent/preserve-triple branch February 4, 2025 13:41

AlexWaygood mentioned this pull request Feb 4, 2025

Simplify the StringFlags trait #15944

Merged

BrewTestBot mentioned this pull request Feb 6, 2025

ruff 0.9.5 Homebrew/homebrew-core#206789

Merged

Comments

Conversation

ntBre commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Feb 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ntBre commented Jan 30, 2025

Uh oh!

AlexWaygood commented Jan 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ntBre commented Feb 3, 2025

Uh oh!

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ntBre commented Jan 29, 2025 •

edited

Loading

github-actions bot commented Jan 29, 2025 •

edited

Loading

`ruff-ecosystem` results

dhruvmanila Feb 1, 2025 •

edited

Loading