Refactor symbol lookup APIs to hide re-export implementation details#16133
Refactor symbol lookup APIs to hide re-export implementation details#16133dhruvmanila merged 2 commits intomainfrom
Conversation
| ty.inner_type() | ||
| }); | ||
| let declared_ty = | ||
| symbol_from_declarations(self.db(), declarations, RequiresExplicitReExport::No) |
There was a problem hiding this comment.
I think maybe we should rename the current symbol_from_declarations and symbol_from_bindings to private functions symbol_from_declarations_impl and symbol_from_bindings_impl and then have public versions that assume RequiresExplicitReExport::No. IMO it would be ideal if RequiresExplicitReExport enum could be a private implementation detail of the lookup functions, and not part of the public lookup API at all. (Also maybe at some point we should move the lookup functions to a submodule so they actually can have implementation details that are private from inference.)
The reasoning is that type inference should only ever infer direct from some set of bindings or declarations when it's doing within-file inference. All cross-file stuff should go through APIs like imported_symbol or builtin_symbol; it's never correct for a different file to peek into another file's individual bindings and declarations. So it doesn't make sense to expose from-bindings and from-declarations APIs that let you specify RequiresExplicitReExport (and also it's annoying to have to specify it everywhere.)
There was a problem hiding this comment.
Yes, this is basically my plan as a follow-up (mentioned in the PR description) by moving it all in symbol.rs. I don't think it'll be "private" by staying in types.rs as it already contains a public usage of it. Apologies if it wasn't obvious.
There was a problem hiding this comment.
Oops sorry totally missed that in the PR description! Great that we are independently thinking along the same lines :)
86c5876 to
bff9112
Compare
11fed30 to
884da77
Compare
884da77 to
96b5a35
Compare
| /// Return the symbol for a member of `types.ModuleType`. | ||
| pub(crate) fn module_type_symbol<'db>(db: &'db dyn Db, name: &str) -> Symbol<'db> { | ||
| if module_type_symbols(db) | ||
| .iter() | ||
| .any(|module_type_member| &**module_type_member == name) | ||
| { | ||
| KnownClass::ModuleType.to_instance(db).member(db, name) | ||
| } else { | ||
| Symbol::Unbound | ||
| } | ||
| } |
There was a problem hiding this comment.
Sorry for the post-merge review. I think it might be good to add some more doc-comments to this function, because it's a bit weird as a standalone routine. In general we wouldn't check to see whether a symbol exists on a class before doing the .member() call on the instance type -- we'd just do the .member() call on the instance type, since it has the same end result. The reason for doing the funny dance here to only call KnownClass::ModuleType.to_instance(db).member(db, name) when absolutely necessary is that it was a fairly significant performance regression to fallback to doing that for every name lookup that wasn't found in the module's globals. So we use less idiomatic (and much more verbose) code here as a micro-optimisation because it's used in a very hot path.
| /// Lookup the type of `symbol` in the builtins namespace. | ||
| /// | ||
| /// Returns `Symbol::Unbound` if the `builtins` module isn't available for some reason. | ||
| /// | ||
| /// Note that this function is only intended for use in the context of the builtins *namespace* | ||
| /// and should not be used when a symbol is being explicitly imported from the `builtins` module | ||
| /// (e.g. `from builtins import int`). | ||
| pub(crate) fn builtins_symbol<'db>(db: &'db dyn Db, symbol: &str) -> Symbol<'db> { | ||
| resolve_module(db, &KnownModule::Builtins.name()) | ||
| .map(|module| { | ||
| external_symbol_impl(db, module.file(), symbol).or_fall_back_to(db, || { | ||
| // We're looking up in the builtins namespace and not the module, so we should | ||
| // do the normal lookup in `types.ModuleType` and not the special one as in | ||
| // `imported_symbol`. | ||
| module_type_symbol(db, symbol) | ||
| }) | ||
| }) | ||
| .unwrap_or(Symbol::Unbound) | ||
| } |
There was a problem hiding this comment.
Again, sorry for the delayed review... I'm a bit confused by the changes here. Why is the builtins namespace being handled so differently to all other module namespaces? Yes, it's true that __name__, __doc__ and other attributes found on types.ModuleType are present in the builtins namespace:
>>> import builtins
>>> builtins_dict = builtins.__dict__
>>> builtins_dict['__name__']
'builtins'
>>> builtins_dict['__doc__']
"Built-in functions, types, exceptions, and other objects.\n\nThis module provides direct access to all 'built-in'\nidentifiers of Python; for example, builtins.len is\nthe full name for the built-in function len().\n\nThis module is not normally accessed explicitly by most\napplications, but can be useful in modules that provide\nobjects with the same name as a built-in value, but in\nwhich the built-in of that name is also needed."but that's also true for any other module:
>>> import typing
>>> typing_dict = typing.__dict__
>>> typing_dict['__name__']
'typing'
>>> typing_dict['__doc__']
'\nThe typing module: Support for gradual typing as defined by PEP 484 and subsequent PEPs.\n\nAmong other things, the module includes the following:\n* Generic, Protocol, and internal machinery to support generic aliases.\n All subscripted types like X[int], Union[int, str] are generic aliases.\n* Various "special forms" that have unique meanings in type annotations:\n NoReturn, Never, ClassVar, Self, Concatenate, Unpack, and others.\n* Classes whose instances can be type arguments to generic classes and functions:\n TypeVar, ParamSpec, TypeVarTuple.\n* Public helper functions: get_type_hints, overload, cast, final, and others.\n* Several protocols to support duck-typing:\n SupportsFloat, SupportsIndex, SupportsAbs, and others.\n* Special types: NewType, NamedTuple, TypedDict.\n* Deprecated aliases for builtin types and collections.abc ABCs.\n\nAny name not present in __all__ is an implementation detail\nthat may be changed without notice. Use at your own risk!\n'Why are we adding special handling to builtins specifically so that builtins_symbol(db, "__doc__") returns a bound symbol, but not to all of the functions in stdlib.rs? __doc__ is also present in the global namespace of typing, typing_extensions, or any other core module, and it's a different object to builtins.__doc__
There was a problem hiding this comment.
That's a good point.
So, while looking into the way builtins should be handled, I saw the other functions in stdlib.rs as well. The main reason for special handling builtins is related to #15476 where we need to treat the builtins namespace as different to explicitly importing symbols from the builtins module.
- When considering a symbol from a builtins namespace (via the fallback logic of name lookup), the lookup should go through the external symbol query which will make sure that explicit re-exports are required.
- While, when considering a symbol from the builtins module (via explicit import statement), the lookup should follow the normal module lookup logic that's implemented via
imported_symbol
The known_module_symbol is just a wrapper around imported_symbol lookup for the known modules i.e., they're not available in the current namespace but requires an external lookup. This change just makes that explicit such that typing_symbol and typing_extensions_symbol lookup happens as if the symbols were imported via from typing import ... and from typing_extensions import ... respectively. There's additional context in this thread: #16073 (comment)
## Summary This PR does the following: * Moves the following from `types.rs` in `symbol.rs`: * `symbol` * `global_symbol` * `imported_symbol` * `symbol_from_bindings` * `symbol_from_declarations` * `SymbolAndQualifiers` * `SymbolFromDeclarationsResult` * Moves the following from `stdlib.rs` in `symbol.rs` and removes `stdlib.rs`: * `known_module_symbol` * `builtins_symbol` * `typing_symbol` (only for tests) * `typing_extensions_symbol` * `builtins_module_scope` * `core_module_scope` * Add `symbol_from_bindings_impl` and `symbol_from_declarations_impl` to keep `RequiresExplicitReExport` an implementation detail * Make `declaration_type` a `pub(crate)` as it's required in `symbol_from_declarations` (`binding_type` is already `pub(crate)` The main motivation is to keep the implementation details private and only expose an ergonomic API which uses sane defaults for various scenario to avoid any mistakes from the caller. Refer to #16133 (comment), #16133 (comment) for details.
Summary
This PR refactors the symbol lookup APIs to better facilitate the re-export implementation. Specifically,
module_type_symbolwhich returns theSymbolthat's a member oftypes.ModuleTypesymbol->symbol_impl; addsymbolwhich delegates tosymbol_implwithRequireExplicitReExport::Noglobal_symbolto dosymbol_impl-> fall back tomodule_type_symboland default toRequireExplicitReExport::Noimported_symbolto dosymbol_implwithRequireExplicitReExportasYesif the module is in a stub file elseNoknown_module_symbolto useimported_symbolwith a fallback tomodule_type_symbolModuleLiteralType::memberto useimported_symbolwith a custom fallbackWe could potentially also update
symbol_from_declarationsandsymbol_from_bindingsto avoid passing in theRequireExplicitReExportas it would be alwaysNoif called directly. We could addsymbol_from_declarations_implandsymbol_from_bindings_impl.Looking at the
_implfunctions, I think we should move all of these symbol related logic intosymbol.rswhereSymbolis defined and the_implcould be private while we expose the public APIs at the crate level. This would also make theRequireExplicitReExportan implementation detail and the caller doesn't need to worry about it.Happy to hear others thoughts on this.