Handle multiple unused submodule imports#666
Conversation
|
Awesome! Will review this soon! |
|
@squiddy - Re-reading the summary -- do you want a code review on this, or did you wanna do any cleanup first? |
|
I should have specified that more. I'd really appreciate a code review on the approach itself, the cleanup part is primary about the changes to |
|
You got it :) |
|
(Overall, what you've written in the summary makes sense. Haven't read the code yet.) |
|
@squiddy - Would you like me to clean up based on the comments and get this merged? Totally up to you. |
I've give it a try, thanks for the review. |
21f5f1e to
d61fcbf
Compare
|
Cool, gonna profile this tonight to make sure it's not a significant regression, then we can merge. |
|
Introduces a small slowdown so wanna see if I can optimize this a bit, then will merge. |
|
@charliermarsh I know you're very busy - good evidence to the success of ruff - so I'm not being pushy here. Can I help you here move this forward? If it's the slowdown that is blocking this, how would I go about that measuring that? Is it just hyperfine + running ruff on the cmdline, or did you do some in-depth profiling? I'm happy for anything you can share that helps me help you. :) |
|
Thanks for this really kind and understanding message (and not pushy at all). I've been a bad maintainer on this one, asking you to make changes and then failing to follow-up, so I apologize. I did try to get this across the finish line once or twice but was struggling to eke out any more performance gains and felt mixed about introducing a performance penalty. But the blocker now is that I need to do a fairly substantial rebase due to #1147 and #1137. If you're up for trying to handle that, it'd be appreciated. But otherwise, I'll continue to try to get to this when I can. |
|
I'm doing the rebase now and see what else I can do. |
598b3b4 to
fb93856
Compare
This resolves the issue where ruff doesn't see any unused imports in
code like this:
import multiprocessing.pool
import multiprocessing.process
z = multiprocessing.pool.ThreadPool()
The underlying issue was twofold:
1. When tracking submodule imports, we used the first part of the module
(multiprocessing in this example) as the key in the bindings map.
Therefore both submodule imports would end up as one entry, with no
way to differentiate between them.
2. When tracking references to imports, we only looked at Name
expressions. A compound (attribute) expression such as
multiprocessing.process resulted in two attempts to mark
names/imports used:
One for multiprocessing and one for process. Marking multiprocessing
used together with the issue in 1) meant that all imports of
multiprocessing were marked as used.
The solution:
1. For submodule imports, use the full name as key in the bindings map.
2. Detect Attribute access (such as foo.bar.baz, but not foo.bar["baz"])
and use that to check if it matches an import. If the visitor is not
in such an Attribute in the tree right now, keep looking at Name
expressions.
With an attribute access like `multiprocessing.pool.ThreadPool`,
iterate through all parts of it and try to look them up in bindings.
This is necessary because certain modules can have different access
patterns, e.g.
import os
os.path.exists("foo")
instead of
import os.path
os.path.exists("foo")
fb93856 to
4fdf7bd
Compare
|
Currently experimenting with putting this attribute specific logic after the "simple" name-based lookup, under the assumption that Benchmarks against CPython look promising, but I need to fix one more test. 🤞 |
|
Closing for now. I haven't made any good progress, might look into that again at some later point. |
…ed-import` (`F401`) (#20200) # Summary The PR under review attempts to make progress towards the age-old problem of submodule imports, specifically with regards to their treatment by the rule [`unused-import` (`F401`)](https://docs.astral.sh/ruff/rules/unused-import/). Some related issues: - #60 - #4656 Prior art: - #13965 - #5010 - #5011 - #666 See the PR summary for a detailed description.
I've got the tests and the right behaviour in place, but the code is not pretty right now.
This resolves the issue where ruff doesn't see any unused imports in code like this:
The underlying issue was twofold:
When tracking submodule imports, we used the first part of the module (multiprocessing in this example) as the key in the bindings map. Therefore both submodule imports would end up as one entry, with no way to differentiate between them.
When tracking references to imports, we only looked at Name expressions. A compound (attribute) expression such as multiprocessing.process resulted in two attempts to mark names/imports used:
One for multiprocessing and one for process. Marking multiprocessing used together with the issue in 1) meant that all imports of multiprocessing were marked as used.
The solution:
For submodule imports, use the full name as key in the bindings map.
Detect Attribute access (such as foo.bar.baz, but not foo.bar["baz"]) and use that to check if it matches an import. If the visitor is not in such an Attribute in the tree right now, keep looking at Name expressions.
With an attribute access like
multiprocessing.pool.ThreadPool, iterate through all parts of it and try to look them up in bindings. This is necessary because certain modules can have different access patterns, e.g.instead of
Closes #60