[ty] Remove redundant re-exports that share the same top-most module#22581
[ty] Remove redundant re-exports that share the same top-most module#22581BurntSushi merged 5 commits intomainfrom
Conversation
|
Demo: 2026-01-14T14.17.47-05.00.mp4Here's what the status quo looks like: 2026-01-14T14.19.40-05.00.mp4 |
Typing conformance resultsNo changes detected ✅ |
|
d259075 to
987d25d
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
This looks good. There's certainly a fair amount of complexity in merge (functional but also to avoid allocations) that requires a fair amount of concentration to understand what's going on.
I have one design question on whether filtering out re-export must be sensitive to the current module.
| /// assert_eq!(ModuleName::new_static("foo.bar.baz").unwrap().top(), "foo"); | ||
| /// ``` | ||
| #[must_use] | ||
| pub fn top(&self) -> &str { |
There was a problem hiding this comment.
Nit: I'm leaning towards using root here, which IMO goes better with ancestors and parent than top, but I can't specifically say why :)
There was a problem hiding this comment.
I was unsure whether "root" was really appropriate since I'm not sure, e.g., a.b.c has a "root" at a per se.
I could just make this more verbose and call it first_component. Which I think I kind of like better since it's less ambiguous and it isn't frequently used either, so the verbosity doesn't bother me.
crates/ty_ide/src/all_symbols.rs
Outdated
| /// * Another thing we might consider doing differently here is to not | ||
| /// completely remove symbols deeper in the package hierarchy, and | ||
| /// instead just expose the fact that they are redundant somehow on | ||
| /// `AllSymbolInfo`. Then callers (like for completions) might just | ||
| /// choose to rank the redundant symbols much lower. But for now, we | ||
| /// just remove them entirely. |
There was a problem hiding this comment.
One concern that I have with removing it without considering the outer context is if you're working on said library. Let's say I work on pandas and are writing a module in pandas.io.parsers.readers or a sibling package of it. It's probably desired to then import the symbol from the closest module rather than always preferring the outer most module?
There was a problem hiding this comment.
Maaaaybe. I think that's very plausible, but I'm not sure.
The other similar thing in this space is that we might only want to do this merging in non-first party code, under the reasoning that in first party code, you probably want to see all your re-exports. That would also I think address this concern without coming up with a notion of what "closest module" means.
(We talked about this in our 1-on-1. The reason I didn't go down this route is that it's pretty annoying to set up the test infrastructure for it.)
If you're okay with it, I'd like to move forward with this as-is and prioritize future work here based on user feedback. I filed astral-sh/ty#2519 to track this.
crates/ty_ide/src/all_symbols.rs
Outdated
| merged.extend(group.merge(db)); | ||
| group.clear(); |
There was a problem hiding this comment.
Is this semantically the same as "finishing" a group? If so, should we add a finish_into(&mut self, target: &mut Vec) method (or finish(&mut self) -> impl Iterator<...> ?
There was a problem hiding this comment.
Yeah that kind of API is what I started with. But the iterator returned borrows from the Group and that makes it difficult to clear the other state.
This could be fixed with a dedicated iterator type and a Drop impl that does the clearing, but it just seemed like too much ceremony for a one-time-use API. If this were a "real" API with broader scope, yeah absolutely, it'd be worth some strong encapsulation and coupling reduction.
| // Merge the kind from the original symbol into the | ||
| // reexported symbol. We do this unconditionally since | ||
| // it is applicable in all cases, even when the symbols | ||
| // are in different package hierarchies. |
There was a problem hiding this comment.
What are cases where the kind between two symbols with the same fully qualified name is different?
Would two different kinds suggest that these are different symbols?
There was a problem hiding this comment.
Ah good question! I added this comment to the source clarifying why:
// The reason why we do this unconditionally is that
// reexport symbols usually do not have a very specific
// kind, and are typically just assigned a kind of
// "variable." This is because the symbol extraction
// reports these symbols as-is by inspecting import
// statements and doesn't follow them to their original
// definition to discover their "true" kind. But that's
// exactly what we're doing here! We have the original
// definition. So assign the kind from the original,
// even when it crosses package hierarchies.Does that help?
Previously, we were constructing this at a higher level layer. But this commit pushes it down a layer of abstraction. This shouldn't result in constructing the fully qualified name any more frequently than we previously did. Namely, we're still only doing it for symbols that match the caller provided search query. The motivation here is that we want to do some de-duplication based on module name, and having the fully qualified name of a symbol seems like the most straight-forward way to go about this. (We'll need one more piece of information that we add in a subsequent commit.)
This information should let us filter out (or rather, merge) re-exported symbols across a package hierarchy for the purposes of auto-completions.
This roughly mimics how pandas defines and re-exports `read_csv` at the top-level module.
987d25d to
bfc1573
Compare
The invariants established by the constructors for `ModuleName` guarantee that this is always available. It's useful for determining the "top level" module for where a symbol lives.
The implementation here is (to me) surprisingly complicated. The main complications are: 1. Trying to limit the redundant detection to as few of the symbols we extract as possible. In particular, while I haven't done benchmarking on this, I perceive the redundancy detection to be somewhat expensive and auto-import can return lots of symbols. So we're careful to only do this extra checking on (typically) small groups of symbols that could possibly be merged. 2. Even by restricting our work, this merging process could still be called quite a bit. (Thousands of times in my "standard data scienc-y test environment.") So I went out of my way to amortize allocs. 3. Re-exports can form a chain and we want to find all of them. 4. We (probably) don't want to remove redundant re-exports unless they share the same top-level module. Otherwise, e.g., a library that re-exports another library's symbols could have all of its re-exports dropped. 5. We want to only keep the top-most re-exports, and there may be multiple such re-exports. So we keep all of them. 6. We can't assume anything about the relationship of re-exports and the original definition. A re-export could be deeper in the module hierarchy than the original definition or above it.
bfc1573 to
e092bd5
Compare
The idea here is to avoid flooding completions with a bunch of
redundant symbols when those symbols are just re-exports of one
another. And in particular, to avoid suggesting symbols "deeper"
in the module graph.
The implementation here is (to me) surprisingly complicated. The main
constraints leading to some of that complexity are:
Trying to limit the redundant detection to as few of the symbols
we extract as possible. In particular, while I haven't done benchmarking
on this, I perceive the redundancy detection to be somewhat expensive
and auto-import can return lots of symbols. So we're careful to only do
this extra checking on (typically) small groups of symbols that could
possibly be merged.
Even by restricting our work, this merging process could still be
called quite a bit. (Thousands of times in my "standard data scienc-y
test environment.") So I went out of my way to amortize allocs.
Re-exports can form a chain and we want to find all of them.
We (probably) don't want to remove redundant re-exports unless
they share the same top-level module. Otherwise, e.g., a library
that re-exports another library's symbols could have all of its
re-exports dropped.
We want to only keep the top-most re-exports, and there may be
multiple such re-exports. So we keep all of them.
We can't assume anything about the relationship of re-exports
and the original definition. A re-export could be deeper in the
module hierarchy than the original definition or above it.
Reviewers are encouraged to read this PR commit by commit.
Fixes astral-sh/ty#1857