Skip to content

Comments

[ty] Provide docstrings for stdlib APIs when hovering over them in an IDE#19311

Merged
AlexWaygood merged 4 commits intomainfrom
alex/typeshed-docstrings
Jul 14, 2025
Merged

[ty] Provide docstrings for stdlib APIs when hovering over them in an IDE#19311
AlexWaygood merged 4 commits intomainfrom
alex/typeshed-docstrings

Conversation

@AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Jul 13, 2025

Summary

I made a codemod that will auto-add docstrings to stub files by dynamically inspecting the value of the docstrings at runtime. This PR adds a step to our typeshed-sync workflow that applies the codemod, so that we always have docstrings for the stdlib checked into our vendored stubs for the standard library. This will allow us to display the docstrings when users hover over stdlib symbols in their IDE.

The source code for the codemod is here. The changes the codemod makes can be viewed here.

The only issue I know of is that if you have version-dependent method definitions, e.g.

class Foo:
    if sys.version_info >= (3, 13):
        def method(self): ...
    else:
        def method(self, arg): ...

then the codemod will only add a docstring to the definition in the first sys.version_info branch, not the second. This issue only exists for nested scopes, however; version-dependent class or function definitions in the global scope should have docstrings added to all definitions without issue.

^EDIT: I fixed this issue.

Codemodding docstrings into the stubs increases the size of the vendored-typeshed zipfile that we include as part of the ty binary. Locally, a release build of the ty binary increases in size from 38.7MB to 39.9MB. I think that's probably worth it, given that there's no other way to provide docstrings for C-extension modules in the stdlib. Even modules that are nominally written in Python, such as the typing module, often have several classes in them that are actually written in C (typing.TypeVar, for example); it would be impossible for ty to obtain docstrings for these classes by inspecting the runtime source code of the stdlib, so codemodding the docstrings into the stub seems to be a more resilient strategy here.

Codemodding docstrings into the stubs at typeshed-sync time is preferable to attempting to maintain these docstrings upstream in typeshed, because docstrings are constantly changing upstream in CPython, and it would be extremely difficult to keep the copies of these docstrings in typeshed up to date. An automated codemod solves this issue.

Test Plan

@AlexWaygood AlexWaygood added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Jul 13, 2025
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is great

Is there a reason that this is in a separate repository? As it's a script, can it be added / moved to the Ruff repository mainly so that it's easier to maintain?

@dhruvmanila
Copy link
Member

Do we know if the script (or something else) that Pylance is using to fetch these docstrings is open-sourced and available? If so, can it be used by us?

@MichaReiser
Copy link
Member

Locally, a release build of the ty binary increases in size from 38.7MB to 39.9MB.

The published ty artifact (built with uv build in the ty repository) increases from 3187088 (3.18MB) to 3218280 (3.21MB). I think that's neglectable.

uvx --python=3.12 --from=git+https://github.com/AlexWaygood/docstring-adder.git add-docstrings --stdlib-path ./typeshed/stdlib
uvx --python=3.11 --from=git+https://github.com/AlexWaygood/docstring-adder.git add-docstrings --stdlib-path ./typeshed/stdlib
uvx --python=3.10 --from=git+https://github.com/AlexWaygood/docstring-adder.git add-docstrings --stdlib-path ./typeshed/stdlib
uvx --python=3.9 --from=git+https://github.com/AlexWaygood/docstring-adder.git add-docstrings --stdlib-path ./typeshed/stdlib
Copy link
Member

@MichaReiser MichaReiser Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does typeshed not support Python 3.8. If that's the case, it probably doesn't make sense for ty to still support Python 3.8 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, it only supports 3.9+ these days

@MichaReiser
Copy link
Member

MichaReiser commented Jul 14, 2025

It's good to know that our parser is fast enough that the size increase in code to parse doesn't impact runtime performance in a meaningful way

Do you know why this isn't something that has been done before?

@AlexWaygood
Copy link
Member Author

Do we know if the script (or something else) that Pylance is using to fetch these docstrings is open-sourced and available? If so, can it be used by us?

I believe it's closed-source. An early open-source prototype is available at https://github.com/gramster/stubsplit, but note that the README for that project says:

It would be better to rewrite this at some point using libcst

which is basically what I've done ;)

@AlexWaygood
Copy link
Member Author

Is there a reason that this is in a separate repository? As it's a script, can it be added / moved to the Ruff repository mainly so that it's easier to maintain?

One reason why I'd be interested in keeping it separate is that I'd like to explore running this script at build time in https://github.com/typeshed-internal/stub_uploader when uploading typeshed's third-party stubs packages to PyPI. This would also be really beneficial to ty users, as well as users of other type checkers.

I'd be interested in moving the repo to the astral organisation, though, and giving other people commit access?

@MichaReiser
Copy link
Member

I'd be interested in moving the repo to the astral organisation, though, and giving other people commit access?

I think that would be great. You should have the necessary permissions to create a new repository

@AlexWaygood
Copy link
Member Author

It's good to know that our parser is fast enough that the size increase in code to parse doesn't impact runtime performance in a meaningful way

I'm not sure we actually know that from this PR, because this PR itself doesn't actually add docstrings (it just adds the workflow that means they'll be auto-added in the next typeshed sync). I was going to trigger the workflow manually immediately after landing this, but I can open a draft PR now with all the docstrings added to check performance doesn't degrade on Codspeed.

@MichaReiser
Copy link
Member

I'm not sure we actually know that from this PR, because this PR itself doesn't actually add docstrings (it just adds the workflow that means they'll be auto-added in the next typeshed sync). I was going to trigger the workflow manually immediately after landing this, but I can open a draft PR now with all the docstrings added to check performance doesn't degrade on Codspeed.

Whooops. I didn't realize this. That also means that my binary size measurement is off because what I did is checkout this PR. Can you measure the binary size increase of the released ty artifact (use uv build in the ty repository).

@AlexWaygood
Copy link
Member Author

Screenshot of signature help in the playground for a builtin function when the stubs have docstrings:

image

@AlexWaygood
Copy link
Member Author

Can you measure the binary size increase of the released ty artifact (use uv build in the ty repository).

For the latest release of ty, I have these numbers:

  • Wheel: 6,700,111 bytes (6.7 MB on disk)
  • Sdist: 3,186,425 bytes (3.2 MB on disk)

Updating the submodule to #19327, I have:

  • Wheel: 7,453,238 bytes (7.5 MB on disk)
  • Sdist: 3,848,622 bytes (3.9 MB on disk)

@AlexWaygood
Copy link
Member Author

The Codspeed report on #19327 reports some regressions, but these appear most pronounced on the microbenchmarks (which makes sense, as parsing the vendored typeshed stubs takes up a much higher percentage of the total execution time for smaller projects). There are regressions of up to 4% on the microbenchmarks, a 2% regression on the cold tomllib benchmark, and regressions of 1% or lower for all other benchmarks.

@MichaReiser
Copy link
Member

MichaReiser commented Jul 14, 2025

Thanks @AlexWaygood for getting all those numbers. The binary size increase makes way more sense than the numbers I shared.

I think those regressions are fine, considering the value they provide in an IDE context and in-stubs documentation has much better ergonomics when reading a builtin-stub file in the IDE over an external JSON file. I also don't think that it justifies shipping typeshed twice.

The long-term solution here is to pre-process typeshed so that we don't need to parse the files in the first place.

@AlexWaygood
Copy link
Member Author

I'd be interested in moving the repo to the astral organisation, though, and giving other people commit access?

I think that would be great. You should have the necessary permissions to create a new repository

Okay, the repo is now at https://github.com/astral-sh/docstring-adder !

@AlexWaygood AlexWaygood merged commit fddf2f3 into main Jul 14, 2025
34 checks passed
@AlexWaygood AlexWaygood deleted the alex/typeshed-docstrings branch July 14, 2025 16:00
@AlexWaygood
Copy link
Member Author

I manually triggered the workflow, and it created #19334 -- everything seems to be working as expected 🥳

dcreager added a commit that referenced this pull request Jul 14, 2025
* dcreager/merge-arguments: (223 commits)
  fix docs
  Combine CallArguments and CallArgumentTypes
  [ty] Sync vendored typeshed stubs (#19334)
  [`refurb`] Make example error out-of-the-box (`FURB122`) (#19297)
  [refurb] Make example error out-of-the-box (FURB177) (#19309)
  [ty] ignore errors when reformatting codemodded typeshed (#19332)
  [ty] Provide docstrings for stdlib APIs when hovering over them in an IDE (#19311)
  [ty] Add virtual files to the only project database (#19322)
  Add t-string fixtures for rules that do not need to be modified (#19146)
  [ty] Remove `FileLookupError` (#19323)
  [ty] Fix handling of metaclasses in `object.<CURSOR>` completions
  [ty] Use an interval map for scopes by expression (#19025)
  [ty] List all `enum` members (#19283)
  [ty] Handle configuration errors in LSP more gracefully (#19262)
  [ty] Use python version and path from Python extension (#19012)
  [`pep8_naming`] Avoid false positives on standard library functions with uppercase names (`N802`) (#18907)
  Update Rust crate toml to 0.9.0 (#19320)
  [ty] Fix server version (#19284)
  Update NPM Development dependencies (#19319)
  Update taiki-e/install-action action to v2.56.13 (#19317)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants