Avoid omitting parentheses for trailing attributes on call expressions by charliermarsh · Pull Request #6322 · astral-sh/ruff

charliermarsh · 2023-08-03T23:48:26Z

Summary

This PR modifies our can_omit_optional_parentheses rules to ensure that if we see a call followed by an attribute, we treat that as an attribute access rather than a splittable call expression.

This in turn ensures that we wrap like:

ct_match = aaaaaaaaaaact_id == self.get_content_type(
    obj=rel_obj, using=instance._state.db
)

For calls, but:

ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

For calls with trailing attribute accesses.

Closes #6065.

Test Plan

Similarity index before:

zulip: 0.99436
django: 0.99779
warehouse: 0.99504
transformers: 0.99403
cpython: 0.75912
typeshed: 0.72293

And after:

zulip: 0.99436
django: 0.99780
warehouse: 0.99504
transformers: 0.99404
cpython: 0.75913
typeshed: 0.72293

charliermarsh · 2023-08-03T23:49:00Z

crates/ruff_python_formatter/src/expression/mod.rs

+                    return false;
+                }
+            }
+        }


I'm guessing this is not the right solution, but it passes the new fixtures. Should this instead be in expr_compare.rs?

The method here mirrors https://github.com/psf/black/blob/2fd9d8b339e1e2e1b93956c6d68b2b358b3fc29d/src/black/lines.py#L884-L966 I had a quick glance and don't seem to find this logic. But your change seems correct because the method returns false for the provided example. I recommend adding print statements to black's code base and then running the formatting on the code snipped to better understand what the underlying logic is.

@konstin any chance this already gets fixed by your Call chain PR?

i don't think so, they should be independent

github-actions · 2023-08-04T00:17:54Z

PR Check Results

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01     10.4±0.42ms     3.9 MB/sec    1.00     10.3±0.49ms     4.0 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.0±0.08ms     8.2 MB/sec    1.01      2.1±0.11ms     8.1 MB/sec
formatter/numpy/globals.py                 1.01   235.8±13.71µs    12.5 MB/sec    1.00   234.5±11.59µs    12.6 MB/sec
formatter/pydantic/types.py                1.03      4.4±0.22ms     5.8 MB/sec    1.00      4.3±0.20ms     5.9 MB/sec
linter/all-rules/large/dataset.py          1.00     14.2±0.33ms     2.9 MB/sec    1.05     14.9±0.43ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.6±0.13ms     4.6 MB/sec    1.02      3.7±0.13ms     4.5 MB/sec
linter/all-rules/numpy/globals.py          1.00   506.6±21.17µs     5.8 MB/sec    1.04   526.4±25.39µs     5.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.4±0.66ms     4.0 MB/sec    1.05      6.7±0.16ms     3.8 MB/sec
linter/default-rules/large/dataset.py      1.00      7.0±0.17ms     5.8 MB/sec    1.03      7.2±0.22ms     5.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.01  1500.9±40.38µs    11.1 MB/sec    1.00  1488.2±50.69µs    11.2 MB/sec
linter/default-rules/numpy/globals.py      1.00   183.3±10.29µs    16.1 MB/sec    1.00    182.6±8.51µs    16.2 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.1±0.10ms     8.1 MB/sec    1.02      3.2±0.20ms     8.0 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     10.2±0.17ms     4.0 MB/sec    1.00     10.2±0.83ms     4.0 MB/sec
formatter/numpy/ctypeslib.py               1.00  1928.0±33.78µs     8.6 MB/sec    1.00  1930.2±39.16µs     8.6 MB/sec
formatter/numpy/globals.py                 1.00    218.2±4.82µs    13.5 MB/sec    1.01    221.4±9.21µs    13.3 MB/sec
formatter/pydantic/types.py                1.02      4.3±0.11ms     6.0 MB/sec    1.00      4.2±0.07ms     6.1 MB/sec
linter/all-rules/large/dataset.py          1.00     12.9±0.21ms     3.2 MB/sec    1.05     13.6±0.26ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.04      3.6±0.08ms     4.7 MB/sec    1.00      3.4±0.05ms     4.9 MB/sec
linter/all-rules/numpy/globals.py          1.05    437.0±7.94µs     6.8 MB/sec    1.00    417.7±7.62µs     7.1 MB/sec
linter/all-rules/pydantic/types.py         1.00      5.9±0.15ms     4.3 MB/sec    1.02      6.1±0.11ms     4.2 MB/sec
linter/default-rules/large/dataset.py      1.01      6.9±0.11ms     5.9 MB/sec    1.00      6.8±0.09ms     6.0 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.02  1411.1±28.46µs    11.8 MB/sec    1.00  1378.0±19.18µs    12.1 MB/sec
linter/default-rules/numpy/globals.py      1.03    159.8±2.34µs    18.5 MB/sec    1.00    154.9±3.11µs    19.0 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.0±0.05ms     8.4 MB/sec    1.00      3.0±0.04ms     8.4 MB/sec

MichaReiser

This logic doesn't seem to be specific to compare operators. It also applies to boolean operators and, I assume for binary expressions too.

def f():
    return (
        unicodedata.normalize("NFKC", s1).casefold()
        == unicodedata.normalize("NFKC", s2).casefold()
    )
    return (
        unicodedata.normalize("NFKC", s1).casefold()
        and unicodedata.normalize("NFKC", s2).casefold()
    )


ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

ct_match = (
    aaaaaaaaaaact_id and self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

ct_match = aaaaaaaaaaact_id == xyzaaaaaa

ct_match = aaaaaaaaaaact_id and xyzaaaaaa


ct_match = {aaaaaaaaaaaaaaaa} == self.get_content_type(
    obj=rel_obj, using=instance._state.db
).id

ct_match = {aaaaaaaaaaaaaaaa} and self.get_content_type(
    obj=rel_obj, using=instance._state.db
).id

ct_match = (aaaaaaaaaaaaaaaa) == self.get_content_type(
    obj=rel_obj, using=instance._state.db
).id

ct_match = (aaaaaaaaaaaaaaaa) and self.get_content_type(
    obj=rel_obj, using=instance._state.db
).id

We need to spend more time understanding what black's doing to ensure we move closer. One way to assert this is to run the ecosystem checks on our selected projects and verify that either the similarity index increases or that diffing the Black/Ruff differences between the old and new Ruff versions introduces no regressions.

I recommend you to debug Black's can_omit_invisible_parens, delimiter_split, and rhs to better understand what happening. It can also be useful to debug the selected selected transformers.

rhs -> in_parentheses_only_* helpers
delimiter` -> Normal groups

We can also hop on a call and do this together. It ca

MichaReiser · 2023-08-04T06:18:39Z

crates/ruff_python_formatter/src/expression/mod.rs

+                    return false;
+                }
+            }
+        }


The method here mirrors https://github.com/psf/black/blob/2fd9d8b339e1e2e1b93956c6d68b2b358b3fc29d/src/black/lines.py#L884-L966 I had a quick glance and don't seem to find this logic. But your change seems correct because the method returns false for the provided example. I recommend adding print statements to black's code base and then running the formatting on the code snipped to better understand what the underlying logic is.

MichaReiser · 2023-08-04T06:19:27Z

crates/ruff_python_formatter/src/expression/mod.rs

+                    return false;
+                }
+            }
+        }


@konstin any chance this already gets fixed by your Call chain PR?

charliermarsh · 2023-08-04T13:46:22Z

Thank you! Will take a look at this again today and review the Black source.

charliermarsh · 2023-08-05T00:38:47Z

I have deduced at least that our max_priority_count and max_priority are correct and consistent with Black. The issue seems to be this branch, which is returning true but doesn't do so in Black:

// Only use the layout if the first or last expression has parentheses of some sort.
let first_parenthesized = visitor
    .first
    .is_some_and(|first| has_parentheses(first, visitor.source));
let last_parenthesized = visitor
    .last
    .is_some_and(|last| has_parentheses(last, visitor.source));
first_parenthesized || last_parenthesized

(Although in Black, this logic is split between a few branches, I think.)

Continuing to dig in...

MichaReiser · 2023-08-05T07:51:28Z

I have deduced at least that our max_priority_count and max_priority are correct and consistent with Black. The issue seems to be this branch, which is returning true but doesn't do so in Black:
// Only use the layout if the first or last expression has parentheses of some sort.
let first_parenthesized = visitor
    .first
    .is_some_and(|first| has_parentheses(first, visitor.source));
let last_parenthesized = visitor
    .last
    .is_some_and(|last| has_parentheses(last, visitor.source));
first_parenthesized || last_parenthesized
(Although in Black, this logic is split between a few branches, I think.)

Continuing to dig in...

I think last incorrectly points to the call expression for

ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

rather than the attribute. This makes the above-mentioned path to return true because a call is parenthesized.

Doing something like this seems to help (do we need to do the same for Subscript, similar to call?

Expr::Attribute(ast::ExprAttribute {
                range: _,
                value,
                attr: _,
                ctx: _,
            }) => {
                self.visit_expr(&value);
                if has_parentheses(value, self.source) {
                    self.update_max_priority(OperatorPriority::Attribute);
                }
                self.last = Some(expr);
                return;
            }

charliermarsh · 2023-08-05T15:52:38Z

Ohhh smart. I didn't realize that Black formatted these differently:

ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

ct_match = aaaaaaaaaaact_id == self.get_content_type(
    obj=rel_obj, using=instance._state.db
)

charliermarsh · 2023-08-05T16:01:48Z

It looks like on main, we already break this differently than Black (Ruff, Black):

ct_match = (
    {aaaaaaaaaaaaaaaa} == self.get_content_type[obj, rel_obj, using, instance._state.db].id
)

Is that covered in the same place, or elsewhere? I feel like it's elsewhere, since it's not about removing the optional parentheses, but about which expression we break (left-vs-right).

charliermarsh · 2023-08-05T16:15:26Z

It seems like Black prefers splitting the expression on the RHS of a binary operator, but will split the left if it can't (playground), but hesitant to keep investigating this here because it seems separate and I know you've spent more time on binary operators.

MichaReiser · 2023-08-05T16:42:45Z

Is it specific to subscript operations? It would be good to understand in depth which expressions we break differently.

We implement a similar "Prefer splitting parenthesized expressions before binary expression" logic internal discord. Or it is because our groups have the wrong hierarchy:

group sequence: group1, group2: The right group expands before the left
Group hierarchy: group1(group2): The outer group expands before the inner.

charliermarsh · 2023-08-05T16:44:16Z

Not specific to subscripts, I'll look into the group hierarchy, thanks!

charliermarsh · 2023-08-05T16:46:57Z

(We do seem to have a "group sequence" as described above. Gonna keep looking into it, I need to understand this better -- but I do think it can be done separately from this PR.)

MichaReiser · 2023-08-05T17:54:54Z

Separate PR makes sense. Can you include the similarity index changes in your test plan

charliermarsh · 2023-08-05T19:13:04Z

Yes sir!

charliermarsh · 2023-08-05T19:23:24Z

Done, very small improvements in three projects.

MichaReiser · 2023-08-07T09:13:02Z

crates/ruff_python_formatter/src/expression/mod.rs

                attr: _,
                ctx: _,
            }) => {
+                self.visit_expr(value);


Can you test if we need the same for Subscript? Because it has the same pattern or is it not required because we never want to break subscripts?

I did look into this but perhaps I didn't understand what you're referencing. Are you referring to a subscript with a trailing attribute, like foo[bar].baz? Or a subscript following a function call, like foo(bar)[baz]? I believe I added test coverage for both, and both are consistent with Black.

This PR fixes

ct_match = aaaaaaaaaaact_id == self.get_content_type( obj=rel_obj, using=instance._state.db )

Where the last item is a call chain. The fix was to correctly set last to the call because the callee is to the left of the call. This fixed has_parentheses and has_own_parentheses to return true.

Now, we have a similar situation with subscript where we visit the value to the left of the subscript. Meaning a[b] ends with the subscript and not the node a (the subscript should be last).

Now, whether it should apply that layout is a different question, but it currently seems inconsistent because has_parentheses returns true, but we don't set the subscript to the last node.

Black output:

ct_match = ( self.get_content_type["obj=rel_obj, using=instance._state.db"] == aaaaaaaaaaact_id ) ct_match = ( aaaaaaaaaaact_id == self.get_content_type["obj=rel_obj, using=instance._state.db"] )

So black prefers not to break subscripts. Can we remove the Subscript case from has_own_parentheses?

Will investigate, thank you for clarifying.

We can merge this in the meantime. Maybe update the title?

charliermarsh · 2023-08-10T15:13:22Z

I'm realizing that this didn't actually fix the whole issue. Given

def f():
    return (
        unicodedata.normalize("NFKC", s1).casefold()
        == unicodedata.normalize("NFKC", s2).casefold()
    )

That's stable formatting for Black, but we do:

def f():
    return unicodedata.normalize("NFKC", s1).casefold() == unicodedata.normalize(
        "NFKC", s2
    ).casefold()

charliermarsh · 2023-08-10T15:20:51Z

It looks like Black treats these differently depending on whether the function call contains any arguments.

https://black.vercel.app/?version=main&state=_Td6WFoAAATm1rRGAgAhARYAAAB0L-Wj4AItAJ1dAD2IimZxl1N_WmXWfJWiTYLz4-Xr1ocavDZNy3yj_TrA725dHDb4iaUJISuadfhoT0sKHZLLk1QxHmGNY5xrrWDt1EKt5lq1gq5h7gVq9AKcRwEZ1BmvCk8Yz_Mr7D5lfjC0kkK4Ijl0mefy8pKhR554SYnrYdRZHw5TzgSkSgI8eJOYrTthc6A1exOcCkfWBuyxxInO8Zc12LdP7gAAAAAAIqEgVX6QfxMAAbkBrgQAADp5TWmxxGf7AgAAAAAEWVo=

astral-sh#6322) ## Summary This PR modifies our `can_omit_optional_parentheses` rules to ensure that if we see a call followed by an attribute, we treat that as an attribute access rather than a splittable call expression. This in turn ensures that we wrap like: ```python ct_match = aaaaaaaaaaact_id == self.get_content_type( obj=rel_obj, using=instance._state.db ) ``` For calls, but: ```python ct_match = ( aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id ) ``` For calls with trailing attribute accesses. Closes astral-sh#6065. ## Test Plan Similarity index before: - `zulip`: 0.99436 - `django`: 0.99779 - `warehouse`: 0.99504 - `transformers`: 0.99403 - `cpython`: 0.75912 - `typeshed`: 0.72293 And after: - `zulip`: 0.99436 - `django`: 0.99780 - `warehouse`: 0.99504 - `transformers`: 0.99404 - `cpython`: 0.75913 - `typeshed`: 0.72293

charliermarsh commented Aug 3, 2023

View reviewed changes

charliermarsh requested a review from MichaReiser August 3, 2023 23:49

MichaReiser requested changes Aug 4, 2023

View reviewed changes

charliermarsh force-pushed the charlie/compare-call branch from 0bd76d0 to 4705850 Compare August 4, 2023 13:49

charliermarsh marked this pull request as draft August 4, 2023 13:51

konstin added the formatter Related to the formatter label Aug 4, 2023

charliermarsh force-pushed the charlie/compare-call branch from 4705850 to c6866c2 Compare August 5, 2023 16:05

charliermarsh marked this pull request as ready for review August 5, 2023 16:06

charliermarsh requested a review from MichaReiser August 5, 2023 16:07

Avoid omitting parentheses for compare expressions with calls

7d8262d

charliermarsh force-pushed the charlie/compare-call branch from c6866c2 to 7d8262d Compare August 5, 2023 16:09

MichaReiser approved these changes Aug 7, 2023

View reviewed changes

charliermarsh changed the title ~~Avoid omitting parentheses for compare expressions with calls~~ Avoid omitting parentheses for trailing attributes on call expressions Aug 7, 2023

charliermarsh merged commit 63ffadf into main Aug 7, 2023

charliermarsh deleted the charlie/compare-call branch August 7, 2023 17:18

Comments

Conversation

charliermarsh commented Aug 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Check Results

Benchmark

Linux

Windows

Uh oh!

MichaReiser left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Aug 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charliermarsh commented Aug 4, 2023

Uh oh!

charliermarsh commented Aug 5, 2023

Uh oh!

MichaReiser commented Aug 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charliermarsh commented Aug 5, 2023

Uh oh!

charliermarsh commented Aug 5, 2023

Uh oh!

charliermarsh commented Aug 5, 2023

Uh oh!

MichaReiser commented Aug 5, 2023

Uh oh!

charliermarsh commented Aug 5, 2023

Uh oh!

charliermarsh commented Aug 5, 2023

Uh oh!

MichaReiser commented Aug 5, 2023

Uh oh!

charliermarsh commented Aug 5, 2023

Uh oh!

charliermarsh commented Aug 5, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charliermarsh commented Aug 10, 2023

Uh oh!

charliermarsh commented Aug 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

charliermarsh commented Aug 3, 2023 •

edited

Loading

MichaReiser Aug 4, 2023 •

edited

Loading

github-actions bot commented Aug 4, 2023 •

edited

Loading

MichaReiser left a comment •

edited

Loading

MichaReiser Aug 4, 2023 •

edited

Loading

MichaReiser commented Aug 5, 2023 •

edited

Loading