add DebugText for self-documenting f-strings#6167
add DebugText for self-documenting f-strings#6167MichaReiser merged 5 commits intoastral-sh:mainfrom
DebugText for self-documenting f-strings#6167Conversation
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinuxWindows |
| fn self_documenting_f_string() { | ||
| assert_round_trip!(r#"f"{ chr(65) = }""#); | ||
| assert_round_trip!(r#"f"{ chr(65) = !s}""#); | ||
| assert_round_trip!(r#"f"{ chr(65) = !r}""#); |
There was a problem hiding this comment.
Would you mind adding a test case with a format specification?
There was a problem hiding this comment.
If possible, an additional test case to mix both conversion and format specification would be nice as well: f"{a=!r:0.05f}"
.../src/snapshots/ruff_python_parser__string__tests__parse_fstring_self_doc_trailing_space.snap
Show resolved
Hide resolved
| start_location, | ||
| ) | ||
| })?; | ||
| let leading = |
There was a problem hiding this comment.
this feels a little hacky but was my best idea. suggestions welcome
There was a problem hiding this comment.
Which part do you find hacky?
There was a problem hiding this comment.
pulling slices out of expression
There was a problem hiding this comment.
I think it's fine. I ultimately want to re-write this entire file and use Cursor and use string slices from the source text directly everywhere. But @dhruvmanila may make this whole code redundant before when implementing the 3.12 F-string changes.
135c98b to
757dc4f
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
This is great!
I would like to hear your opinion on keeping Conversion::Repr for debug strings to ease downstream use and we may be able to speed this up a bit (the whole module requires a rework but we can start somewhere).
| #[derive(Debug, PartialEq, Eq, Hash)] | ||
| pub struct ExprFormattedValue<'a> { | ||
| value: Box<ComparableExpr<'a>>, | ||
| debug_text: Option<DebugText<'a>>, |
There was a problem hiding this comment.
Would it be possible to store a reference to a debug text instead of defining an extra DebugText node?
| debug_text: Option<DebugText<'a>>, | |
| debug_text: Option<&'a DebugText>, |
| #[derive(Debug, PartialEq, Eq, Hash)] | ||
| pub struct DebugText<'a> { | ||
| leading: &'a str, | ||
| trailing: &'a str, | ||
| } | ||
|
|
||
| impl<'a> From<&'a ast::DebugText> for DebugText<'a> { | ||
| fn from(debug_text: &'a ast::DebugText) -> Self { | ||
| Self { | ||
| leading: debug_text.leading.as_str(), | ||
| trailing: debug_text.trailing.as_str(), | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
See comment above
| #[derive(Debug, PartialEq, Eq, Hash)] | |
| pub struct DebugText<'a> { | |
| leading: &'a str, | |
| trailing: &'a str, | |
| } | |
| impl<'a> From<&'a ast::DebugText> for DebugText<'a> { | |
| fn from(debug_text: &'a ast::DebugText) -> Self { | |
| Self { | |
| leading: debug_text.leading.as_str(), | |
| trailing: debug_text.trailing.as_str(), | |
| } | |
| } | |
| } |
crates/ruff_python_ast/src/nodes.rs
Outdated
| } | ||
| } | ||
|
|
||
| #[derive(Clone, Debug, PartialEq)] |
There was a problem hiding this comment.
| #[derive(Clone, Debug, PartialEq)] | |
| #[derive(Clone, Debug, PartialEq, Eq, Hash)] |
| fn unparse_formatted( | ||
| &mut self, | ||
| val: &Expr, | ||
| debug_text: &Option<DebugText>, |
There was a problem hiding this comment.
Nit: I find Option<&ref> more natural than passing a reference to an option (except if the option is mutable).
| debug_text: &Option<DebugText>, | |
| debug_text: Option<&DebugText>, |
| start_location, | ||
| ) | ||
| })?; | ||
| let leading = |
There was a problem hiding this comment.
Which part do you find hacky?
| // match a python 3.8 self documenting expression | ||
| // format '{' PYTHON_EXPRESSION '=' FORMAT_SPECIFIER? '}' | ||
| '=' if self.peek() != Some('=') && delimiters.is_empty() => { | ||
| trailing_seq.push('='); |
There was a problem hiding this comment.
I like it that we now use the text ranges to get leading and before_eq. Would it be possible also to use text ranges to extract trailing? It would reduce 2 string allocations for each self-documenting string (one for trailing_seq, and one for formatting trailing in debug text.
There was a problem hiding this comment.
i think so. we'd need to store the = and any trailing spaces inside expression which is a bit odd but doable if we just track where the actual expression ends. will push something with what i mean (and maybe you have ideas for how to improve that)
dhruvmanila
left a comment
There was a problem hiding this comment.
This is good! Thanks for working on this :)
| fn self_documenting_f_string() { | ||
| assert_round_trip!(r#"f"{ chr(65) = }""#); | ||
| assert_round_trip!(r#"f"{ chr(65) = !s}""#); | ||
| assert_round_trip!(r#"f"{ chr(65) = !r}""#); |
There was a problem hiding this comment.
If possible, an additional test case to mix both conversion and format specification would be nice as well: f"{a=!r:0.05f}"
instead of modelling self-documenting f-strings (`f"{ foo= }"`) as a
(simplified)
`Constant("foo=")` followed by a `FormattedValue(Expr("x"))`, instead
model this case with a `DebugText(leading, trailing)` attribute on the
`FormattedValue` so that we don't have to synthesize nodes (which
results in siblings with overlapping ranges). We need to be able to
preserve the whitespace for self-documenting f-strings, as well as
reproduce the source (eg unparse, format).
| vec![Expr::from(ast::ExprFormattedValue { | ||
| value: Box::new(value), | ||
| debug_text: Some(ast::DebugText { | ||
| leading: leading.to_string(), |
There was a problem hiding this comment.
Could we store references into the source here rather than allocating new strings? i guess many more things would need to change
There was a problem hiding this comment.
I played around with an ast that uses Cow<'source, str> for all string fields. I spent about 2h fixing lifetime issues and run out of patience. The problem is that every node must be parametrized by the source lifetime.
We may still want to do this because using a bump allocator for the AST nodes would also require a new lifetime. Something to explore in the future.
757dc4f to
ff200c6
Compare
instead of modelling self-documenting f-strings (
f"{ foo= }") as a (simplified)Constant("foo=")followed by aFormattedValue(Expr("x")), instead model this case with aDebugText(leading, trailing)attribute on theFormattedValueso that we don't have to synthesize nodes (which results in siblings with overlapping ranges). We need to be able to preserve the whitespace for self-documenting f-strings, as well as reproduce the source (eg unparse, format).Fixes #5970
Test Plan
Existing snapshots, a few more tests. More needed?