Skip to content

Comments

Refactor symbol lookup APIs to hide re-export implementation details#16133

Merged
dhruvmanila merged 2 commits intomainfrom
dhruv/re-export-3
Feb 14, 2025
Merged

Refactor symbol lookup APIs to hide re-export implementation details#16133
dhruvmanila merged 2 commits intomainfrom
dhruv/re-export-3

Conversation

@dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Feb 13, 2025

Summary

This PR refactors the symbol lookup APIs to better facilitate the re-export implementation. Specifically,

  • Add module_type_symbol which returns the Symbol that's a member of types.ModuleType
  • Rename symbol -> symbol_impl; add symbol which delegates to symbol_impl with RequireExplicitReExport::No
  • Update global_symbol to do symbol_impl -> fall back to module_type_symbol and default to RequireExplicitReExport::No
  • Add imported_symbol to do symbol_impl with RequireExplicitReExport as Yes if the module is in a stub file else No
  • Update known_module_symbol to use imported_symbol with a fallback to module_type_symbol
  • Update ModuleLiteralType::member to use imported_symbol with a custom fallback

We could potentially also update symbol_from_declarations and symbol_from_bindings to avoid passing in the RequireExplicitReExport as it would be always No if called directly. We could add symbol_from_declarations_impl and symbol_from_bindings_impl.

Looking at the _impl functions, I think we should move all of these symbol related logic into symbol.rs where Symbol is defined and the _impl could be private while we expose the public APIs at the crate level. This would also make the RequireExplicitReExport an implementation detail and the caller doesn't need to worry about it.

Happy to hear others thoughts on this.

@dhruvmanila dhruvmanila added the ty Multi-file analysis & type inference label Feb 13, 2025
@dhruvmanila dhruvmanila changed the title WIP: Follow-up from re-export implementation Refactor symbol lookup APIs to hide re-export implementation details Feb 13, 2025
@dhruvmanila dhruvmanila marked this pull request as ready for review February 13, 2025 13:08
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

ty.inner_type()
});
let declared_ty =
symbol_from_declarations(self.db(), declarations, RequiresExplicitReExport::No)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should rename the current symbol_from_declarations and symbol_from_bindings to private functions symbol_from_declarations_impl and symbol_from_bindings_impl and then have public versions that assume RequiresExplicitReExport::No. IMO it would be ideal if RequiresExplicitReExport enum could be a private implementation detail of the lookup functions, and not part of the public lookup API at all. (Also maybe at some point we should move the lookup functions to a submodule so they actually can have implementation details that are private from inference.)

The reasoning is that type inference should only ever infer direct from some set of bindings or declarations when it's doing within-file inference. All cross-file stuff should go through APIs like imported_symbol or builtin_symbol; it's never correct for a different file to peek into another file's individual bindings and declarations. So it doesn't make sense to expose from-bindings and from-declarations APIs that let you specify RequiresExplicitReExport (and also it's annoying to have to specify it everywhere.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is basically my plan as a follow-up (mentioned in the PR description) by moving it all in symbol.rs. I don't think it'll be "private" by staying in types.rs as it already contains a public usage of it. Apologies if it wasn't obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry totally missed that in the PR description! Great that we are independently thinking along the same lines :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base automatically changed from dhruv/re-export-2 to main February 14, 2025 09:47
@dhruvmanila dhruvmanila merged commit 63dd68e into main Feb 14, 2025
21 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/re-export-3 branch February 14, 2025 09:55
Comment on lines +271 to +281
/// Return the symbol for a member of `types.ModuleType`.
pub(crate) fn module_type_symbol<'db>(db: &'db dyn Db, name: &str) -> Symbol<'db> {
if module_type_symbols(db)
.iter()
.any(|module_type_member| &**module_type_member == name)
{
KnownClass::ModuleType.to_instance(db).member(db, name)
} else {
Symbol::Unbound
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the post-merge review. I think it might be good to add some more doc-comments to this function, because it's a bit weird as a standalone routine. In general we wouldn't check to see whether a symbol exists on a class before doing the .member() call on the instance type -- we'd just do the .member() call on the instance type, since it has the same end result. The reason for doing the funny dance here to only call KnownClass::ModuleType.to_instance(db).member(db, name) when absolutely necessary is that it was a fairly significant performance regression to fallback to doing that for every name lookup that wasn't found in the module's globals. So we use less idiomatic (and much more verbose) code here as a micro-optimisation because it's used in a very hot path.

Copy link
Member Author

@dhruvmanila dhruvmanila Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 89cefbe (#16152)

Comment on lines +331 to +349
/// Lookup the type of `symbol` in the builtins namespace.
///
/// Returns `Symbol::Unbound` if the `builtins` module isn't available for some reason.
///
/// Note that this function is only intended for use in the context of the builtins *namespace*
/// and should not be used when a symbol is being explicitly imported from the `builtins` module
/// (e.g. `from builtins import int`).
pub(crate) fn builtins_symbol<'db>(db: &'db dyn Db, symbol: &str) -> Symbol<'db> {
resolve_module(db, &KnownModule::Builtins.name())
.map(|module| {
external_symbol_impl(db, module.file(), symbol).or_fall_back_to(db, || {
// We're looking up in the builtins namespace and not the module, so we should
// do the normal lookup in `types.ModuleType` and not the special one as in
// `imported_symbol`.
module_type_symbol(db, symbol)
})
})
.unwrap_or(Symbol::Unbound)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, sorry for the delayed review... I'm a bit confused by the changes here. Why is the builtins namespace being handled so differently to all other module namespaces? Yes, it's true that __name__, __doc__ and other attributes found on types.ModuleType are present in the builtins namespace:

>>> import builtins
>>> builtins_dict = builtins.__dict__
>>> builtins_dict['__name__']
'builtins'
>>> builtins_dict['__doc__']
"Built-in functions, types, exceptions, and other objects.\n\nThis module provides direct access to all 'built-in'\nidentifiers of Python; for example, builtins.len is\nthe full name for the built-in function len().\n\nThis module is not normally accessed explicitly by most\napplications, but can be useful in modules that provide\nobjects with the same name as a built-in value, but in\nwhich the built-in of that name is also needed."

but that's also true for any other module:

>>> import typing
>>> typing_dict = typing.__dict__
>>> typing_dict['__name__']
'typing'
>>> typing_dict['__doc__']
'\nThe typing module: Support for gradual typing as defined by PEP 484 and subsequent PEPs.\n\nAmong other things, the module includes the following:\n* Generic, Protocol, and internal machinery to support generic aliases.\n  All subscripted types like X[int], Union[int, str] are generic aliases.\n* Various "special forms" that have unique meanings in type annotations:\n  NoReturn, Never, ClassVar, Self, Concatenate, Unpack, and others.\n* Classes whose instances can be type arguments to generic classes and functions:\n  TypeVar, ParamSpec, TypeVarTuple.\n* Public helper functions: get_type_hints, overload, cast, final, and others.\n* Several protocols to support duck-typing:\n  SupportsFloat, SupportsIndex, SupportsAbs, and others.\n* Special types: NewType, NamedTuple, TypedDict.\n* Deprecated aliases for builtin types and collections.abc ABCs.\n\nAny name not present in __all__ is an implementation detail\nthat may be changed without notice. Use at your own risk!\n'

Why are we adding special handling to builtins specifically so that builtins_symbol(db, "__doc__") returns a bound symbol, but not to all of the functions in stdlib.rs? __doc__ is also present in the global namespace of typing, typing_extensions, or any other core module, and it's a different object to builtins.__doc__

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.

So, while looking into the way builtins should be handled, I saw the other functions in stdlib.rs as well. The main reason for special handling builtins is related to #15476 where we need to treat the builtins namespace as different to explicitly importing symbols from the builtins module.

  • When considering a symbol from a builtins namespace (via the fallback logic of name lookup), the lookup should go through the external symbol query which will make sure that explicit re-exports are required.
  • While, when considering a symbol from the builtins module (via explicit import statement), the lookup should follow the normal module lookup logic that's implemented via imported_symbol

The known_module_symbol is just a wrapper around imported_symbol lookup for the known modules i.e., they're not available in the current namespace but requires an external lookup. This change just makes that explicit such that typing_symbol and typing_extensions_symbol lookup happens as if the symbols were imported via from typing import ... and from typing_extensions import ... respectively. There's additional context in this thread: #16073 (comment)

dhruvmanila added a commit that referenced this pull request Feb 17, 2025
## Summary

This PR does the following:
* Moves the following from `types.rs` in `symbol.rs`:
	* `symbol`
	* `global_symbol`
	* `imported_symbol`
	* `symbol_from_bindings`
	* `symbol_from_declarations`
	* `SymbolAndQualifiers`
	* `SymbolFromDeclarationsResult`
* Moves the following from `stdlib.rs` in `symbol.rs` and removes
`stdlib.rs`:
	* `known_module_symbol`
	* `builtins_symbol`
	* `typing_symbol` (only for tests)
	* `typing_extensions_symbol`
	* `builtins_module_scope`
	* `core_module_scope`
* Add `symbol_from_bindings_impl` and `symbol_from_declarations_impl` to
keep `RequiresExplicitReExport` an implementation detail
* Make `declaration_type` a `pub(crate)` as it's required in
`symbol_from_declarations` (`binding_type` is already `pub(crate)`

The main motivation is to keep the implementation details private and
only expose an ergonomic API which uses sane defaults for various
scenario to avoid any mistakes from the caller. Refer to
#16133 (comment),
#16133 (comment) for
details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants