Skip to content

Comments

Maybe parenthesize long constants and names#6816

Merged
MichaReiser merged 1 commit intomainfrom
maybe-parenthesize-constants-and-names
Aug 24, 2023
Merged

Maybe parenthesize long constants and names#6816
MichaReiser merged 1 commit intomainfrom
maybe-parenthesize-constants-and-names

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Aug 23, 2023

Summary

This PR implements #6271 for string, fstring, and number literals as well as for names. The stacked PR implements the behavior for non-fluent call expressions (Doing the changes separately helps to understand the compatibility and performance implications).

This PR is about parenthesizing unsplittable expressions if it makes them fit on a line, but the formatter should omit the parentheses if they still don't.

x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
x_aaaa = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

# Exceeds the line width even when parenthesizing
x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
# Fits after parenthesizing
x_aaaa = (
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
)

Test Plan

I added a few new test cases.

The similarity index improves for all projects except typeshed.

Main

project similarity index
cpython 0.76038
django 0.99833
transformers 0.99804
twine 0.99876
typeshed 0.99953
warehouse 0.99612
zulip 0.99732

This PR

project similarity index
cpython 0.76058
django 0.99894
transformers 0.99842
twine 0.99929
typeshed 0.99953
warehouse 0.99632
zulip 0.99909

Performance

The performance implication of this should be limited because constants and single names are rarely used in clause headers (if "string":). However, constant are used frequently in assignments where this assignment applies as well and using BestFitting has a considerable cost:

  • BestFitting allocates a Vec for each variant + one vec for itself
  • It requires using memoized, which allocates too
  • propagate_expands requires special handling for interned (used by memoized) to avoid visiting the document multiple times.
  • Handling Interned (memoized) FormatElements means the printer can no longer iterate over a flat Vec.

@MichaReiser
Copy link
Member Author

MichaReiser commented Aug 23, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Aug 23, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01      3.6±0.04ms    11.4 MB/sec    1.00      3.5±0.02ms    11.5 MB/sec
formatter/numpy/ctypeslib.py               1.00    748.1±9.00µs    22.3 MB/sec    1.00    747.4±8.37µs    22.3 MB/sec
formatter/numpy/globals.py                 1.00     78.4±0.30µs    37.7 MB/sec    1.00     78.6±1.49µs    37.6 MB/sec
formatter/pydantic/types.py                1.00  1543.3±10.99µs    16.5 MB/sec    1.00  1543.5±12.35µs    16.5 MB/sec
linter/all-rules/large/dataset.py          1.02     10.6±0.07ms     3.8 MB/sec    1.00     10.4±0.08ms     3.9 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      2.9±0.01ms     5.8 MB/sec    1.00      2.8±0.01ms     5.8 MB/sec
linter/all-rules/numpy/globals.py          1.00    322.2±0.80µs     9.2 MB/sec    1.00    321.2±1.79µs     9.2 MB/sec
linter/all-rules/pydantic/types.py         1.01      5.4±0.02ms     4.7 MB/sec    1.00      5.4±0.04ms     4.7 MB/sec
linter/default-rules/large/dataset.py      1.04      5.7±0.01ms     7.2 MB/sec    1.00      5.5±0.02ms     7.4 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.02   1195.8±7.83µs    13.9 MB/sec    1.00   1170.4±4.35µs    14.2 MB/sec
linter/default-rules/numpy/globals.py      1.02    125.4±0.33µs    23.5 MB/sec    1.00    122.8±0.30µs    24.0 MB/sec
linter/default-rules/pydantic/types.py     1.03      2.6±0.02ms    10.0 MB/sec    1.00      2.5±0.01ms    10.3 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      4.3±0.06ms     9.5 MB/sec    1.01      4.4±0.06ms     9.3 MB/sec
formatter/numpy/ctypeslib.py               1.00   886.2±14.52µs    18.8 MB/sec    1.01   896.5±12.95µs    18.6 MB/sec
formatter/numpy/globals.py                 1.00     90.3±1.98µs    32.7 MB/sec    1.03     93.1±4.25µs    31.7 MB/sec
formatter/pydantic/types.py                1.00  1842.8±23.84µs    13.8 MB/sec    1.01  1861.2±21.54µs    13.7 MB/sec
linter/all-rules/large/dataset.py          1.00     12.7±0.18ms     3.2 MB/sec    1.03     13.0±0.20ms     3.1 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.5±0.05ms     4.8 MB/sec    1.01      3.5±0.04ms     4.8 MB/sec
linter/all-rules/numpy/globals.py          1.00    434.6±7.22µs     6.8 MB/sec    1.01    439.9±6.52µs     6.7 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.5±0.10ms     3.9 MB/sec    1.03      6.8±0.11ms     3.8 MB/sec
linter/default-rules/large/dataset.py      1.00      7.0±0.06ms     5.8 MB/sec    1.02      7.2±0.09ms     5.7 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1510.7±36.21µs    11.0 MB/sec    1.01  1523.8±18.49µs    10.9 MB/sec
linter/default-rules/numpy/globals.py      1.00    174.5±2.64µs    16.9 MB/sec    1.02    178.8±5.87µs    16.5 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.2±0.03ms     8.1 MB/sec    1.02      3.2±0.04ms     7.9 MB/sec

@MichaReiser MichaReiser force-pushed the fix-printer-group-modes branch from 5051db4 to c80afd8 Compare August 23, 2023 13:33
@MichaReiser MichaReiser force-pushed the maybe-parenthesize-constants-and-names branch from aa7e59b to 9ab2cb5 Compare August 23, 2023 13:33
let mut formatted_variants = Vec::with_capacity(variants.len());

for variant in variants {
let mut buffer = VecBuffer::with_capacity(8, f.state_mut());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's always initialize the vector with a non-zero capacity because it always writes at least 3 entries, and lets double it to avoid resizing.

let mut enclosing = Vec::with_capacity(if self.is_empty() {
0
} else {
self.len().ilog2() as usize
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helped to improve performance quiet a bit. I haven't done any science but our groups very likely form a tree, the depth of a balanced binary tree is probably not the worst approximation.

@MichaReiser MichaReiser force-pushed the maybe-parenthesize-constants-and-names branch from 9ab2cb5 to 1c1420c Compare August 23, 2023 15:17
)

@@ -221,8 +221,8 @@
@@ -221,8 +217,8 @@
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert above will be tricky. Because both the left and the right shouldn't have parentheses by default if it doesn't make them fit. Which is given, because the string is simply way too long, making it never fit. But we should then still break the expression.. hum. A Problem for another day

@MichaReiser MichaReiser force-pushed the fix-printer-group-modes branch from c80afd8 to 3fa14d5 Compare August 23, 2023 16:06
@MichaReiser MichaReiser force-pushed the maybe-parenthesize-constants-and-names branch from 1c1420c to 701c645 Compare August 23, 2023 16:06
@MichaReiser MichaReiser added the formatter Related to the formatter label Aug 23, 2023
@MichaReiser MichaReiser force-pushed the maybe-parenthesize-constants-and-names branch from 701c645 to d50042b Compare August 23, 2023 16:21
@MichaReiser MichaReiser marked this pull request as ready for review August 23, 2023 16:23
@MichaReiser MichaReiser force-pushed the maybe-parenthesize-constants-and-names branch 3 times, most recently from 3f95414 to 56f175a Compare August 23, 2023 17:07
@MichaReiser MichaReiser force-pushed the maybe-parenthesize-constants-and-names branch from 56f175a to 697d820 Compare August 23, 2023 17:27
) -> OptionalParentheses {
OptionalParentheses::Never
let text_len = context.source()[self.range].len();
if text_len > 4 && text_len < context.options().line_width().value() as usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 4 the length of if (?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather arbitrary number and a heuristic to avoid the cost of using best fitting for most literals and common names like self. It can result in the formatted text exceeding the line width even if parenthesizing would make it fit but IMO, I probably rather have a 4-character name exceed the line width by 2 than add parentheses and expand it vertically.

How did I come up with the number. I exclude the constants True, False and None and... I assumed that they all have a length of 4. So you see, I'm not good at math nor counting 🤣

I'll factor this method out and add a comment of why 4... uhm 5.


if memchr2(b'\n', b'\r', text.as_bytes()).is_none()
&& text.len() > 4
&& text.len() < context.options().line_width().value() as usize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always going to indented if it's parenthesized, right? So could we subtract the indent size from context.options().line_width().value()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but we don't have an indent size option yet (only available in the printer). Let me add it in a follow up PR.

.memoized();

// Don't use best fitting if it is known that the expression can never fit
if format_expression.inspect(f)?.will_break() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, so this looks for newlines in the IR, interesting. How does will_break deal with things that might break conditionally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks for things that make group expand for certain: Like hard line breaks or expand_parent. The logic ignores any conditions (it returns true if content in if_group_breaks or if_group_fits contains a hard line break). It showed that conditions can mostly be ignored, especially because it is uncommon or even unneeded to have hard line breaks or expand parent inside of conditional content.

Base automatically changed from fix-printer-group-modes to main August 24, 2023 06:43
@MichaReiser MichaReiser force-pushed the maybe-parenthesize-constants-and-names branch from 697d820 to 33a6c16 Compare August 24, 2023 09:38
@MichaReiser MichaReiser enabled auto-merge (squash) August 24, 2023 09:40
@MichaReiser MichaReiser merged commit 04a9a8d into main Aug 24, 2023
@MichaReiser MichaReiser deleted the maybe-parenthesize-constants-and-names branch August 24, 2023 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

formatter Related to the formatter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants