Skip to content

Comments

Format docstrings#6452

Merged
konstin merged 16 commits intomainfrom
docstring_formatting
Aug 14, 2023
Merged

Format docstrings#6452
konstin merged 16 commits intomainfrom
docstring_formatting

Conversation

@konstin
Copy link
Member

@konstin konstin commented Aug 9, 2023

Summary Implement docstring formatting

Test Plan Matches black's docstring.py fixture exactly, added some new cases for what is hard to debug with black and with what black doesn't cover.

similarity index:

main:
zulip: 0.99702
django: 0.99784
warehouse: 0.99585
build: 0.75623
transformers: 0.99469
cpython: 0.75989
typeshed: 0.74853

this branch:

zulip: 0.99702
django: 0.99784
warehouse: 0.99585
build: 0.75623
transformers: 0.99464
cpython: 0.75517
typeshed: 0.74853

The regression in transformers is actually an improvement in a file they don't format with black (they run black examples tests src utils setup.py conftest.py, the difference is in hubconf.py). cpython doesn't use black.

Closes #6196

@konstin konstin changed the base branch from main to impl_deref_mut_for_WithNodeLevel August 9, 2023 17:31
@konstin
Copy link
Member Author

konstin commented Aug 9, 2023

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

PR Check Results

Benchmark

Linux

group                                      main                                    pr
-----                                      ----                                    --
formatter/large/dataset.py                 1.07      4.5±0.24ms     9.1 MB/sec     1.00      4.2±0.28ms     9.7 MB/sec
formatter/numpy/ctypeslib.py               1.00  917.4±111.86µs    18.2 MB/sec     1.02   934.8±50.51µs    17.8 MB/sec
formatter/numpy/globals.py                 1.07     94.3±8.19µs    31.3 MB/sec     1.00     87.8±6.88µs    33.6 MB/sec
formatter/pydantic/types.py                1.00  1840.5±101.76µs    13.9 MB/sec    1.01  1858.7±122.12µs    13.7 MB/sec
linter/all-rules/large/dataset.py          1.15     14.4±0.36ms     2.8 MB/sec     1.00     12.6±0.59ms     3.2 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.04      3.7±0.09ms     4.6 MB/sec     1.00      3.5±0.21ms     4.7 MB/sec
linter/all-rules/numpy/globals.py          1.00   503.5±25.79µs     5.9 MB/sec     1.02   513.2±25.14µs     5.7 MB/sec
linter/all-rules/pydantic/types.py         1.10      7.3±0.26ms     3.5 MB/sec     1.00      6.6±0.46ms     3.9 MB/sec
linter/default-rules/large/dataset.py      1.22      7.9±0.64ms     5.2 MB/sec     1.00      6.5±0.37ms     6.3 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.11  1560.3±91.43µs    10.7 MB/sec     1.00  1402.0±51.38µs    11.9 MB/sec
linter/default-rules/numpy/globals.py      1.06   192.9±22.40µs    15.3 MB/sec     1.00   182.7±11.65µs    16.1 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.3±0.15ms     7.7 MB/sec     1.01      3.3±0.34ms     7.7 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      4.3±0.01ms     9.5 MB/sec    1.01      4.3±0.03ms     9.4 MB/sec
formatter/numpy/ctypeslib.py               1.00    822.8±8.78µs    20.2 MB/sec    1.01   831.1±10.91µs    20.0 MB/sec
formatter/numpy/globals.py                 1.00     85.4±0.59µs    34.6 MB/sec    1.04     88.9±8.72µs    33.2 MB/sec
formatter/pydantic/types.py                1.00  1767.3±22.52µs    14.4 MB/sec    1.01  1787.5±22.35µs    14.3 MB/sec
linter/all-rules/large/dataset.py          1.00     12.5±0.09ms     3.2 MB/sec    1.01     12.7±0.07ms     3.2 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.5±0.01ms     4.7 MB/sec    1.01      3.6±0.03ms     4.7 MB/sec
linter/all-rules/numpy/globals.py          1.00    369.7±4.87µs     8.0 MB/sec    1.01    373.5±5.28µs     7.9 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.5±0.03ms     3.9 MB/sec    1.01      6.6±0.05ms     3.9 MB/sec
linter/default-rules/large/dataset.py      1.00      6.9±0.02ms     5.9 MB/sec    1.01      6.9±0.05ms     5.9 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1433.9±5.69µs    11.6 MB/sec    1.00   1438.8±7.70µs    11.6 MB/sec
linter/default-rules/numpy/globals.py      1.00    148.4±0.83µs    19.9 MB/sec    1.01    149.2±0.93µs    19.8 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.1±0.03ms     8.3 MB/sec    1.00      3.1±0.02ms     8.3 MB/sec

@konstin konstin marked this pull request as ready for review August 10, 2023 09:44
@konstin konstin added the formatter Related to the formatter label Aug 10, 2023
@MichaReiser
Copy link
Member

It would be helpful for me if you could summarise the key points what "format docstring" includes. Having a basic understanding of the rules and not having to infer them from the code, would help me review the changes.

@konstin
Copy link
Member Author

konstin commented Aug 10, 2023

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive! I love it how you managed to implement this without allocating in the common case where docstrings are already formatted.

The main feedback theme is around using is_python_whitespace, trim_whitespace_start, trim_whitespace, and trim_whitespace_end instead of the rust provided trim methods to only trim valid Python whitespace.

The other part is that I don't know how to integrate this into my suppression PR xD But you can let me figure this out

.fmt(f)
}
}
StringLayout::DocString => format_docstring(self.string.range(), f),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The format_docstring function is the odd one in this file. Most other code uses Format implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about this too but i didn't find a design that makes this better; i'd rather turn FormatStringContinuation into a function call.

In general, what would you think about going more into the direction of trying to make function calls (or some indirection of them) format-able?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can return a format_with or format_once.

@MichaReiser
Copy link
Member

Can you extend your test plan with the similarity index? The perfect implementation wouldn't change the similarity index, any bugs would show up in a regression.

@konstin konstin force-pushed the docstring_formatting branch from f8a11fc to 972139e Compare August 10, 2023 16:24
@konstin konstin requested a review from dhruvmanila as a code owner August 10, 2023 16:24
@konstin konstin closed this Aug 10, 2023
@konstin konstin reopened this Aug 10, 2023
@konstin konstin force-pushed the impl_deref_mut_for_WithNodeLevel branch from 9da5459 to 23af0a6 Compare August 10, 2023 16:32
@konstin
Copy link
Member Author

konstin commented Aug 10, 2023

Can you extend your test plan with the similarity index? The perfect implementation wouldn't change the similarity index, any bugs would show up in a regression.

good point:

main:
zulip: 0.99702
django: 0.99784
warehouse: 0.99585
build: 0.75623
transformers: 0.99469
cpython: 0.75989
typeshed: 0.74853

this branch:
zulip: 0.99702
django: 0.99783
warehouse: 0.99582
build: 0.75623
transformers: 0.99463
cpython: 0.75516
typeshed: 0.74853

this is a regression, i'll investigate

@charliermarsh
Copy link
Member

It could be related to the fact that we format single-quote docstrings, while Black does not. (You're welcome to change that.)

@konstin
Copy link
Member Author

konstin commented Aug 10, 2023

black does format single quote docstrings for me.

The problem was a chaperone space where there shouldn't be one, it's fixed now and the similarity scores match again; i updated the PR description.

@charliermarsh
Copy link
Member

Can you document whatever you see around single-quote docstrings in the Black Deviations document, where we claim Black doesn't format them?

@konstin
Copy link
Member Author

konstin commented Aug 10, 2023

oh that's a slightly different area, it's about formatting the surroundings of the docstring that happens in FormatSuite rather than the docstring formatting itself:

class OracleSpatialRefSys(models.Model, SpatialRefSysMixin):
    "Maps to the Oracle MDSYS.CS_SRS table."
    cs_name = models.CharField(max_length=68)

unlike black, we format this to

class OracleSpatialRefSys(models.Model, SpatialRefSysMixin):
    "Maps to the Oracle MDSYS.CS_SRS table."

    cs_name = models.CharField(max_length=68)

It's code that i only moved around in this PR. I've added a comment but i think it's reasonable for black compatibility to fix that like a compatibility bug later. (From a style perspective, i'd turn single quoted docstrings into triple quoted docstrings)

@konstin konstin force-pushed the impl_deref_mut_for_WithNodeLevel branch from 23af0a6 to 39611cc Compare August 11, 2023 10:32
@konstin konstin force-pushed the docstring_formatting branch from ce6a1bf to 8b7ed41 Compare August 11, 2023 10:32
Base automatically changed from impl_deref_mut_for_WithNodeLevel to main August 11, 2023 10:41
Replace tabs with spaces

Refactor

Don't format escaped newline docstrings

Save
@konstin konstin force-pushed the docstring_formatting branch from 8b7ed41 to a7dd46a Compare August 14, 2023 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

formatter Related to the formatter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Formatter: Docstrings

3 participants