Skip to content

Comments

add DebugText for self-documenting f-strings#6167

Merged
MichaReiser merged 5 commits intoastral-sh:mainfrom
davidszotten:self-documenting-f-strings
Aug 1, 2023
Merged

add DebugText for self-documenting f-strings#6167
MichaReiser merged 5 commits intoastral-sh:mainfrom
davidszotten:self-documenting-f-strings

Conversation

@davidszotten
Copy link
Contributor

instead of modelling self-documenting f-strings (f"{ foo= }") as a (simplified)
Constant("foo=") followed by a FormattedValue(Expr("x")), instead model this case with a DebugText(leading, trailing) attribute on the FormattedValue so that we don't have to synthesize nodes (which results in siblings with overlapping ranges). We need to be able to preserve the whitespace for self-documenting f-strings, as well as reproduce the source (eg unparse, format).

Fixes #5970

Test Plan

Existing snapshots, a few more tests. More needed?

@github-actions
Copy link
Contributor

github-actions bot commented Jul 29, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     10.9±0.29ms     3.7 MB/sec    1.12     12.3±0.44ms     3.3 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.1±0.06ms     7.9 MB/sec    1.08      2.3±0.09ms     7.3 MB/sec
formatter/numpy/globals.py                 1.00    232.3±7.64µs    12.7 MB/sec    1.05   244.1±15.98µs    12.1 MB/sec
formatter/pydantic/types.py                1.00      4.6±0.16ms     5.5 MB/sec    1.09      5.0±0.16ms     5.1 MB/sec
linter/all-rules/large/dataset.py          1.05     15.0±0.49ms     2.7 MB/sec    1.00     14.4±1.06ms     2.8 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.03      3.7±0.11ms     4.4 MB/sec    1.00      3.6±0.08ms     4.6 MB/sec
linter/all-rules/numpy/globals.py          1.00   497.5±21.62µs     5.9 MB/sec    1.02   506.9±12.18µs     5.8 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.6±0.28ms     3.9 MB/sec    1.00      6.6±0.27ms     3.9 MB/sec
linter/default-rules/large/dataset.py      1.17      8.8±0.22ms     4.6 MB/sec    1.00      7.5±0.21ms     5.4 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.17  1797.2±47.89µs     9.3 MB/sec    1.00  1538.0±44.55µs    10.8 MB/sec
linter/default-rules/numpy/globals.py      1.05    198.3±4.90µs    14.9 MB/sec    1.00    188.8±6.29µs    15.6 MB/sec
linter/default-rules/pydantic/types.py     1.14      3.9±0.07ms     6.6 MB/sec    1.00      3.4±0.08ms     7.5 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.07     13.5±0.58ms     3.0 MB/sec    1.00     12.6±0.72ms     3.2 MB/sec
formatter/numpy/ctypeslib.py               1.10      2.6±0.11ms     6.3 MB/sec    1.00      2.4±0.11ms     7.0 MB/sec
formatter/numpy/globals.py                 1.10   303.7±35.09µs     9.7 MB/sec    1.00   276.5±29.94µs    10.7 MB/sec
formatter/pydantic/types.py                1.16      6.1±0.39ms     4.2 MB/sec    1.00      5.2±0.22ms     4.9 MB/sec
linter/all-rules/large/dataset.py          1.14     20.5±1.08ms  2028.8 KB/sec    1.00     18.0±0.83ms     2.3 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.13      5.3±0.31ms     3.2 MB/sec    1.00      4.7±0.23ms     3.6 MB/sec
linter/all-rules/numpy/globals.py          1.07   621.3±29.97µs     4.7 MB/sec    1.00   578.5±30.42µs     5.1 MB/sec
linter/all-rules/pydantic/types.py         1.13      9.0±0.39ms     2.8 MB/sec    1.00      8.0±0.38ms     3.2 MB/sec
linter/default-rules/large/dataset.py      1.13     10.9±0.67ms     3.7 MB/sec    1.00      9.7±0.49ms     4.2 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.07      2.1±0.10ms     7.8 MB/sec    1.00  1986.2±168.05µs     8.4 MB/sec
linter/default-rules/numpy/globals.py      1.07   248.3±13.84µs    11.9 MB/sec    1.00   232.9±17.70µs    12.7 MB/sec
linter/default-rules/pydantic/types.py     1.11      4.8±0.38ms     5.3 MB/sec    1.00      4.3±0.27ms     5.9 MB/sec

@MichaReiser MichaReiser requested a review from dhruvmanila July 29, 2023 08:53
@MichaReiser MichaReiser added the parser Related to the parser label Jul 29, 2023
fn self_documenting_f_string() {
assert_round_trip!(r#"f"{ chr(65) = }""#);
assert_round_trip!(r#"f"{ chr(65) = !s}""#);
assert_round_trip!(r#"f"{ chr(65) = !r}""#);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a test case with a format specification?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, an additional test case to mix both conversion and format specification would be nice as well: f"{a=!r:0.05f}"

start_location,
)
})?;
let leading =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels a little hacky but was my best idea. suggestions welcome

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part do you find hacky?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling slices out of expression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. I ultimately want to re-write this entire file and use Cursor and use string slices from the source text directly everywhere. But @dhruvmanila may make this whole code redundant before when implementing the 3.12 F-string changes.

@davidszotten davidszotten force-pushed the self-documenting-f-strings branch from 135c98b to 757dc4f Compare July 29, 2023 11:23
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I would like to hear your opinion on keeping Conversion::Repr for debug strings to ease downstream use and we may be able to speed this up a bit (the whole module requires a rework but we can start somewhere).

#[derive(Debug, PartialEq, Eq, Hash)]
pub struct ExprFormattedValue<'a> {
value: Box<ComparableExpr<'a>>,
debug_text: Option<DebugText<'a>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to store a reference to a debug text instead of defining an extra DebugText node?

Suggested change
debug_text: Option<DebugText<'a>>,
debug_text: Option<&'a DebugText>,

Comment on lines 600 to 614
#[derive(Debug, PartialEq, Eq, Hash)]
pub struct DebugText<'a> {
leading: &'a str,
trailing: &'a str,
}

impl<'a> From<&'a ast::DebugText> for DebugText<'a> {
fn from(debug_text: &'a ast::DebugText) -> Self {
Self {
leading: debug_text.leading.as_str(),
trailing: debug_text.trailing.as_str(),
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

Suggested change
#[derive(Debug, PartialEq, Eq, Hash)]
pub struct DebugText<'a> {
leading: &'a str,
trailing: &'a str,
}
impl<'a> From<&'a ast::DebugText> for DebugText<'a> {
fn from(debug_text: &'a ast::DebugText) -> Self {
Self {
leading: debug_text.leading.as_str(),
trailing: debug_text.trailing.as_str(),
}
}
}

}
}

#[derive(Clone, Debug, PartialEq)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Clone, Debug, PartialEq)]
#[derive(Clone, Debug, PartialEq, Eq, Hash)]

fn unparse_formatted(
&mut self,
val: &Expr,
debug_text: &Option<DebugText>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I find Option<&ref> more natural than passing a reference to an option (except if the option is mutable).

Suggested change
debug_text: &Option<DebugText>,
debug_text: Option<&DebugText>,

start_location,
)
})?;
let leading =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part do you find hacky?

// match a python 3.8 self documenting expression
// format '{' PYTHON_EXPRESSION '=' FORMAT_SPECIFIER? '}'
'=' if self.peek() != Some('=') && delimiters.is_empty() => {
trailing_seq.push('=');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it that we now use the text ranges to get leading and before_eq. Would it be possible also to use text ranges to extract trailing? It would reduce 2 string allocations for each self-documenting string (one for trailing_seq, and one for formatting trailing in debug text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think so. we'd need to store the = and any trailing spaces inside expression which is a bit odd but doable if we just track where the actual expression ends. will push something with what i mean (and maybe you have ideas for how to improve that)

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good! Thanks for working on this :)

fn self_documenting_f_string() {
assert_round_trip!(r#"f"{ chr(65) = }""#);
assert_round_trip!(r#"f"{ chr(65) = !s}""#);
assert_round_trip!(r#"f"{ chr(65) = !r}""#);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, an additional test case to mix both conversion and format specification would be nice as well: f"{a=!r:0.05f}"

instead of modelling self-documenting f-strings (`f"{ foo= }"`) as a
(simplified)
`Constant("foo=")` followed by a `FormattedValue(Expr("x"))`, instead
model this case with a `DebugText(leading, trailing)` attribute on the
`FormattedValue` so that we don't have to synthesize nodes (which
results in siblings with overlapping ranges). We need to be able to
preserve the whitespace for self-documenting f-strings, as well as
reproduce the source (eg unparse, format).
vec![Expr::from(ast::ExprFormattedValue {
value: Box::new(value),
debug_text: Some(ast::DebugText {
leading: leading.to_string(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we store references into the source here rather than allocating new strings? i guess many more things would need to change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with an ast that uses Cow<'source, str> for all string fields. I spent about 2h fixing lifetime issues and run out of patience. The problem is that every node must be parametrized by the source lifetime.

We may still want to do this because using a bump allocator for the AST nodes would also require a new lifetime. Something to explore in the future.

@davidszotten davidszotten force-pushed the self-documenting-f-strings branch from 757dc4f to ff200c6 Compare July 31, 2023 20:51
@MichaReiser MichaReiser merged commit ba990b6 into astral-sh:main Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parser needs special handling of self-documenting f-strings

3 participants