Skip to content

Comments

Handle multiple unused submodule imports#666

Closed
squiddy wants to merge 1 commit intoastral-sh:mainfrom
squiddy:handle-multiple-unused-submodule-imports
Closed

Handle multiple unused submodule imports#666
squiddy wants to merge 1 commit intoastral-sh:mainfrom
squiddy:handle-multiple-unused-submodule-imports

Conversation

@squiddy
Copy link
Contributor

@squiddy squiddy commented Nov 10, 2022

I've got the tests and the right behaviour in place, but the code is not pretty right now.

  • Cleanup code

This resolves the issue where ruff doesn't see any unused imports in code like this:

import multiprocessing.pool
import multiprocessing.process
z = multiprocessing.pool.ThreadPool()

The underlying issue was twofold:

  1. When tracking submodule imports, we used the first part of the module (multiprocessing in this example) as the key in the bindings map. Therefore both submodule imports would end up as one entry, with no way to differentiate between them.

  2. When tracking references to imports, we only looked at Name expressions. A compound (attribute) expression such as multiprocessing.process resulted in two attempts to mark names/imports used:

    One for multiprocessing and one for process. Marking multiprocessing used together with the issue in 1) meant that all imports of multiprocessing were marked as used.

The solution:

  1. For submodule imports, use the full name as key in the bindings map.

  2. Detect Attribute access (such as foo.bar.baz, but not foo.bar["baz"]) and use that to check if it matches an import. If the visitor is not in such an Attribute in the tree right now, keep looking at Name expressions.

    With an attribute access like multiprocessing.pool.ThreadPool, iterate through all parts of it and try to look them up in bindings. This is necessary because certain modules can have different access patterns, e.g.

    import os
    os.path.exists("foo")
    

instead of

   import os.path
   os.path.exists("foo")

Closes #60

@charliermarsh
Copy link
Member

Awesome! Will review this soon!

@charliermarsh
Copy link
Member

@squiddy - Re-reading the summary -- do you want a code review on this, or did you wanna do any cleanup first?

@squiddy
Copy link
Contributor Author

squiddy commented Nov 12, 2022

I should have specified that more. I'd really appreciate a code review on the approach itself, the cleanup part is primary about the changes to handle_node_load.

@charliermarsh
Copy link
Member

You got it :)

@charliermarsh charliermarsh marked this pull request as ready for review November 12, 2022 17:15
@charliermarsh
Copy link
Member

(Overall, what you've written in the summary makes sense. Haven't read the code yet.)

@charliermarsh
Copy link
Member

@squiddy - Would you like me to clean up based on the comments and get this merged? Totally up to you.

@squiddy
Copy link
Contributor Author

squiddy commented Nov 21, 2022

@squiddy - Would you like me to clean up based on the comments and get this merged? Totally up to you.

I've give it a try, thanks for the review.

@squiddy squiddy force-pushed the handle-multiple-unused-submodule-imports branch 3 times, most recently from 21f5f1e to d61fcbf Compare November 26, 2022 08:44
@charliermarsh
Copy link
Member

Cool, gonna profile this tonight to make sure it's not a significant regression, then we can merge.

@charliermarsh
Copy link
Member

Introduces a small slowdown so wanna see if I can optimize this a bit, then will merge.

@squiddy
Copy link
Contributor Author

squiddy commented Dec 9, 2022

@charliermarsh I know you're very busy - good evidence to the success of ruff - so I'm not being pushy here.

Can I help you here move this forward? If it's the slowdown that is blocking this, how would I go about that measuring that? Is it just hyperfine + running ruff on the cmdline, or did you do some in-depth profiling?

I'm happy for anything you can share that helps me help you. :)

@charliermarsh
Copy link
Member

Thanks for this really kind and understanding message (and not pushy at all). I've been a bad maintainer on this one, asking you to make changes and then failing to follow-up, so I apologize.

I did try to get this across the finish line once or twice but was struggling to eke out any more performance gains and felt mixed about introducing a performance penalty. But the blocker now is that I need to do a fairly substantial rebase due to #1147 and #1137. If you're up for trying to handle that, it'd be appreciated. But otherwise, I'll continue to try to get to this when I can.

@squiddy
Copy link
Contributor Author

squiddy commented Dec 10, 2022

I'm doing the rebase now and see what else I can do.

@squiddy squiddy force-pushed the handle-multiple-unused-submodule-imports branch from 598b3b4 to fb93856 Compare December 10, 2022 04:24
This resolves the issue where ruff doesn't see any unused imports in
code like this:

    import multiprocessing.pool
    import multiprocessing.process
    z = multiprocessing.pool.ThreadPool()

The underlying issue was twofold:

1. When tracking submodule imports, we used the first part of the module
   (multiprocessing in this example) as the key in the bindings map.
   Therefore both submodule imports would end up as one entry, with no
   way to differentiate between them.

2. When tracking references to imports, we only looked at Name
   expressions. A compound (attribute) expression such as
   multiprocessing.process resulted in two attempts to mark
   names/imports used:

   One for multiprocessing and one for process. Marking multiprocessing
   used together with the issue in 1) meant that all imports of
   multiprocessing were marked as used.

The solution:

1. For submodule imports, use the full name as key in the bindings map.

2. Detect Attribute access (such as foo.bar.baz, but not foo.bar["baz"])
   and use that to check if it matches an import. If the visitor is not
   in such an Attribute in the tree right now, keep looking at Name
   expressions.

   With an attribute access like `multiprocessing.pool.ThreadPool`,
   iterate through all parts of it and try to look them up in bindings.
   This is necessary because certain modules can have different access
   patterns, e.g.

       import os
       os.path.exists("foo")

  instead of

       import os.path
       os.path.exists("foo")
@squiddy squiddy force-pushed the handle-multiple-unused-submodule-imports branch from fb93856 to 4fdf7bd Compare December 10, 2022 04:46
@squiddy
Copy link
Contributor Author

squiddy commented Dec 10, 2022

Currently experimenting with putting this attribute specific logic after the "simple" name-based lookup, under the assumption that Name are way more common generally than just inside Attribute, e.g. in function calls or variable lookups.

Benchmarks against CPython look promising, but I need to fix one more test. 🤞

@squiddy
Copy link
Contributor Author

squiddy commented Dec 26, 2022

Closing for now. I haven't made any good progress, might look into that again at some later point.

@squiddy squiddy closed this Dec 26, 2022
dylwil3 added a commit that referenced this pull request Sep 26, 2025
…ed-import` (`F401`) (#20200)

# Summary

The PR under review attempts to make progress towards the age-old
problem of submodule imports, specifically with regards to their
treatment by the rule [`unused-import`
(`F401`)](https://docs.astral.sh/ruff/rules/unused-import/).

Some related issues:
- #60
- #4656

Prior art:
- #13965
- #5010
- #5011
- #666

See the PR summary for a detailed description.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle multiple unused submodule imports

2 participants