Skip to content

Comments

Avoid omitting parentheses for trailing attributes on call expressions#6322

Merged
charliermarsh merged 1 commit intomainfrom
charlie/compare-call
Aug 7, 2023
Merged

Avoid omitting parentheses for trailing attributes on call expressions#6322
charliermarsh merged 1 commit intomainfrom
charlie/compare-call

Conversation

@charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Aug 3, 2023

Summary

This PR modifies our can_omit_optional_parentheses rules to ensure that if we see a call followed by an attribute, we treat that as an attribute access rather than a splittable call expression.

This in turn ensures that we wrap like:

ct_match = aaaaaaaaaaact_id == self.get_content_type(
    obj=rel_obj, using=instance._state.db
)

For calls, but:

ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

For calls with trailing attribute accesses.

Closes #6065.

Test Plan

Similarity index before:

  • zulip: 0.99436
  • django: 0.99779
  • warehouse: 0.99504
  • transformers: 0.99403
  • cpython: 0.75912
  • typeshed: 0.72293

And after:

  • zulip: 0.99436
  • django: 0.99780
  • warehouse: 0.99504
  • transformers: 0.99404
  • cpython: 0.75913
  • typeshed: 0.72293

return false;
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is not the right solution, but it passes the new fixtures. Should this instead be in expr_compare.rs?

Copy link
Member

@MichaReiser MichaReiser Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method here mirrors https://github.com/psf/black/blob/2fd9d8b339e1e2e1b93956c6d68b2b358b3fc29d/src/black/lines.py#L884-L966 I had a quick glance and don't seem to find this logic. But your change seems correct because the method returns false for the provided example. I recommend adding print statements to black's code base and then running the formatting on the code snipped to better understand what the underlying logic is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konstin any chance this already gets fixed by your Call chain PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think so, they should be independent

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

PR Check Results

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01     10.4±0.42ms     3.9 MB/sec    1.00     10.3±0.49ms     4.0 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.0±0.08ms     8.2 MB/sec    1.01      2.1±0.11ms     8.1 MB/sec
formatter/numpy/globals.py                 1.01   235.8±13.71µs    12.5 MB/sec    1.00   234.5±11.59µs    12.6 MB/sec
formatter/pydantic/types.py                1.03      4.4±0.22ms     5.8 MB/sec    1.00      4.3±0.20ms     5.9 MB/sec
linter/all-rules/large/dataset.py          1.00     14.2±0.33ms     2.9 MB/sec    1.05     14.9±0.43ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.6±0.13ms     4.6 MB/sec    1.02      3.7±0.13ms     4.5 MB/sec
linter/all-rules/numpy/globals.py          1.00   506.6±21.17µs     5.8 MB/sec    1.04   526.4±25.39µs     5.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.4±0.66ms     4.0 MB/sec    1.05      6.7±0.16ms     3.8 MB/sec
linter/default-rules/large/dataset.py      1.00      7.0±0.17ms     5.8 MB/sec    1.03      7.2±0.22ms     5.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.01  1500.9±40.38µs    11.1 MB/sec    1.00  1488.2±50.69µs    11.2 MB/sec
linter/default-rules/numpy/globals.py      1.00   183.3±10.29µs    16.1 MB/sec    1.00    182.6±8.51µs    16.2 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.1±0.10ms     8.1 MB/sec    1.02      3.2±0.20ms     8.0 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     10.2±0.17ms     4.0 MB/sec    1.00     10.2±0.83ms     4.0 MB/sec
formatter/numpy/ctypeslib.py               1.00  1928.0±33.78µs     8.6 MB/sec    1.00  1930.2±39.16µs     8.6 MB/sec
formatter/numpy/globals.py                 1.00    218.2±4.82µs    13.5 MB/sec    1.01    221.4±9.21µs    13.3 MB/sec
formatter/pydantic/types.py                1.02      4.3±0.11ms     6.0 MB/sec    1.00      4.2±0.07ms     6.1 MB/sec
linter/all-rules/large/dataset.py          1.00     12.9±0.21ms     3.2 MB/sec    1.05     13.6±0.26ms     3.0 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.04      3.6±0.08ms     4.7 MB/sec    1.00      3.4±0.05ms     4.9 MB/sec
linter/all-rules/numpy/globals.py          1.05    437.0±7.94µs     6.8 MB/sec    1.00    417.7±7.62µs     7.1 MB/sec
linter/all-rules/pydantic/types.py         1.00      5.9±0.15ms     4.3 MB/sec    1.02      6.1±0.11ms     4.2 MB/sec
linter/default-rules/large/dataset.py      1.01      6.9±0.11ms     5.9 MB/sec    1.00      6.8±0.09ms     6.0 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.02  1411.1±28.46µs    11.8 MB/sec    1.00  1378.0±19.18µs    12.1 MB/sec
linter/default-rules/numpy/globals.py      1.03    159.8±2.34µs    18.5 MB/sec    1.00    154.9±3.11µs    19.0 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.0±0.05ms     8.4 MB/sec    1.00      3.0±0.04ms     8.4 MB/sec

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic doesn't seem to be specific to compare operators. It also applies to boolean operators and, I assume for binary expressions too.

def f():
    return (
        unicodedata.normalize("NFKC", s1).casefold()
        == unicodedata.normalize("NFKC", s2).casefold()
    )
    return (
        unicodedata.normalize("NFKC", s1).casefold()
        and unicodedata.normalize("NFKC", s2).casefold()
    )


ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

ct_match = (
    aaaaaaaaaaact_id and self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

ct_match = aaaaaaaaaaact_id == xyzaaaaaa

ct_match = aaaaaaaaaaact_id and xyzaaaaaa


ct_match = {aaaaaaaaaaaaaaaa} == self.get_content_type(
    obj=rel_obj, using=instance._state.db
).id

ct_match = {aaaaaaaaaaaaaaaa} and self.get_content_type(
    obj=rel_obj, using=instance._state.db
).id

ct_match = (aaaaaaaaaaaaaaaa) == self.get_content_type(
    obj=rel_obj, using=instance._state.db
).id

ct_match = (aaaaaaaaaaaaaaaa) and self.get_content_type(
    obj=rel_obj, using=instance._state.db
).id

We need to spend more time understanding what black's doing to ensure we move closer. One way to assert this is to run the ecosystem checks on our selected projects and verify that either the similarity index increases or that diffing the Black/Ruff differences between the old and new Ruff versions introduces no regressions.

I recommend you to debug Black's can_omit_invisible_parens, delimiter_split, and rhs to better understand what happening. It can also be useful to debug the selected selected transformers.

  • rhs -> in_parentheses_only_* helpers
  • delimiter` -> Normal groups

We can also hop on a call and do this together. It ca

return false;
}
}
}
Copy link
Member

@MichaReiser MichaReiser Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method here mirrors https://github.com/psf/black/blob/2fd9d8b339e1e2e1b93956c6d68b2b358b3fc29d/src/black/lines.py#L884-L966 I had a quick glance and don't seem to find this logic. But your change seems correct because the method returns false for the provided example. I recommend adding print statements to black's code base and then running the formatting on the code snipped to better understand what the underlying logic is.

return false;
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konstin any chance this already gets fixed by your Call chain PR?

@charliermarsh
Copy link
Member Author

Thank you! Will take a look at this again today and review the Black source.

@charliermarsh charliermarsh marked this pull request as draft August 4, 2023 13:51
@konstin konstin added the formatter Related to the formatter label Aug 4, 2023
@charliermarsh
Copy link
Member Author

I have deduced at least that our max_priority_count and max_priority are correct and consistent with Black. The issue seems to be this branch, which is returning true but doesn't do so in Black:

// Only use the layout if the first or last expression has parentheses of some sort.
let first_parenthesized = visitor
    .first
    .is_some_and(|first| has_parentheses(first, visitor.source));
let last_parenthesized = visitor
    .last
    .is_some_and(|last| has_parentheses(last, visitor.source));
first_parenthesized || last_parenthesized

(Although in Black, this logic is split between a few branches, I think.)

Continuing to dig in...

@MichaReiser
Copy link
Member

MichaReiser commented Aug 5, 2023

I have deduced at least that our max_priority_count and max_priority are correct and consistent with Black. The issue seems to be this branch, which is returning true but doesn't do so in Black:

// Only use the layout if the first or last expression has parentheses of some sort.
let first_parenthesized = visitor
    .first
    .is_some_and(|first| has_parentheses(first, visitor.source));
let last_parenthesized = visitor
    .last
    .is_some_and(|last| has_parentheses(last, visitor.source));
first_parenthesized || last_parenthesized

(Although in Black, this logic is split between a few branches, I think.)

Continuing to dig in...

I think last incorrectly points to the call expression for

ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

rather than the attribute. This makes the above-mentioned path to return true because a call is parenthesized.

Doing something like this seems to help (do we need to do the same for Subscript, similar to call?

Expr::Attribute(ast::ExprAttribute {
                range: _,
                value,
                attr: _,
                ctx: _,
            }) => {
                self.visit_expr(&value);
                if has_parentheses(value, self.source) {
                    self.update_max_priority(OperatorPriority::Attribute);
                }
                self.last = Some(expr);
                return;
            }

@charliermarsh
Copy link
Member Author

Ohhh smart. I didn't realize that Black formatted these differently:

ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)

ct_match = aaaaaaaaaaact_id == self.get_content_type(
    obj=rel_obj, using=instance._state.db
)

@charliermarsh
Copy link
Member Author

It looks like on main, we already break this differently than Black (Ruff, Black):

ct_match = (
    {aaaaaaaaaaaaaaaa} == self.get_content_type[obj, rel_obj, using, instance._state.db].id
)

Is that covered in the same place, or elsewhere? I feel like it's elsewhere, since it's not about removing the optional parentheses, but about which expression we break (left-vs-right).

@charliermarsh charliermarsh marked this pull request as ready for review August 5, 2023 16:06
@charliermarsh
Copy link
Member Author

It seems like Black prefers splitting the expression on the RHS of a binary operator, but will split the left if it can't (playground), but hesitant to keep investigating this here because it seems separate and I know you've spent more time on binary operators.

@MichaReiser
Copy link
Member

Is it specific to subscript operations? It would be good to understand in depth which expressions we break differently.

We implement a similar "Prefer splitting parenthesized expressions before binary expression" logic internal discord. Or it is because our groups have the wrong hierarchy:

  • group sequence: group1, group2: The right group expands before the left
  • Group hierarchy: group1(group2): The outer group expands before the inner.

@charliermarsh
Copy link
Member Author

Not specific to subscripts, I'll look into the group hierarchy, thanks!

@charliermarsh
Copy link
Member Author

(We do seem to have a "group sequence" as described above. Gonna keep looking into it, I need to understand this better -- but I do think it can be done separately from this PR.)

@MichaReiser
Copy link
Member

Separate PR makes sense. Can you include the similarity index changes in your test plan

@charliermarsh
Copy link
Member Author

Yes sir!

@charliermarsh
Copy link
Member Author

Done, very small improvements in three projects.

attr: _,
ctx: _,
}) => {
self.visit_expr(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test if we need the same for Subscript? Because it has the same pattern or is it not required because we never want to break subscripts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did look into this but perhaps I didn't understand what you're referencing. Are you referring to a subscript with a trailing attribute, like foo[bar].baz? Or a subscript following a function call, like foo(bar)[baz]? I believe I added test coverage for both, and both are consistent with Black.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixes

ct_match = aaaaaaaaaaact_id == self.get_content_type(
    obj=rel_obj, using=instance._state.db
)

Where the last item is a call chain. The fix was to correctly set last to the call because the callee is to the left of the call. This fixed has_parentheses and has_own_parentheses to return true.

Now, we have a similar situation with subscript where we visit the value to the left of the subscript. Meaning a[b] ends with the subscript and not the node a (the subscript should be last).

Now, whether it should apply that layout is a different question, but it currently seems inconsistent because has_parentheses returns true, but we don't set the subscript to the last node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black output:

ct_match = (
    self.get_content_type["obj=rel_obj, using=instance._state.db"] == aaaaaaaaaaact_id
)

ct_match = (
    aaaaaaaaaaact_id == self.get_content_type["obj=rel_obj, using=instance._state.db"]
)

So black prefers not to break subscripts. Can we remove the Subscript case from has_own_parentheses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will investigate, thank you for clarifying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this in the meantime. Maybe update the title?

@charliermarsh charliermarsh changed the title Avoid omitting parentheses for compare expressions with calls Avoid omitting parentheses for trailing attributes on call expressions Aug 7, 2023
@charliermarsh charliermarsh merged commit 63ffadf into main Aug 7, 2023
@charliermarsh charliermarsh deleted the charlie/compare-call branch August 7, 2023 17:18
@charliermarsh
Copy link
Member Author

I'm realizing that this didn't actually fix the whole issue. Given

def f():
    return (
        unicodedata.normalize("NFKC", s1).casefold()
        == unicodedata.normalize("NFKC", s2).casefold()
    )

That's stable formatting for Black, but we do:

def f():
    return unicodedata.normalize("NFKC", s1).casefold() == unicodedata.normalize(
        "NFKC", s2
    ).casefold()

durumu pushed a commit to durumu/ruff that referenced this pull request Aug 12, 2023
astral-sh#6322)

## Summary

This PR modifies our `can_omit_optional_parentheses` rules to ensure
that if we see a call followed by an attribute, we treat that as an
attribute access rather than a splittable call expression.

This in turn ensures that we wrap like:

```python
ct_match = aaaaaaaaaaact_id == self.get_content_type(
    obj=rel_obj, using=instance._state.db
)
```

For calls, but:

```python
ct_match = (
    aaaaaaaaaaact_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)
```

For calls with trailing attribute accesses.

Closes astral-sh#6065.

## Test Plan

Similarity index before:

- `zulip`: 0.99436
- `django`: 0.99779
- `warehouse`: 0.99504
- `transformers`: 0.99403
- `cpython`: 0.75912
- `typeshed`: 0.72293

And after:

- `zulip`: 0.99436
- `django`: 0.99780
- `warehouse`: 0.99504
- `transformers`: 0.99404
- `cpython`: 0.75913
- `typeshed`: 0.72293
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

formatter Related to the formatter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Formatter: Break quality check before breaking parentheses

3 participants