Refactor symbol lookup APIs to hide re-export implementation details by dhruvmanila · Pull Request #16133 · astral-sh/ruff

dhruvmanila · 2025-02-13T06:58:18Z

Summary

This PR refactors the symbol lookup APIs to better facilitate the re-export implementation. Specifically,

Add module_type_symbol which returns the Symbol that's a member of types.ModuleType
Rename symbol -> symbol_impl; add symbol which delegates to symbol_impl with RequireExplicitReExport::No
Update global_symbol to do symbol_impl -> fall back to module_type_symbol and default to RequireExplicitReExport::No
Add imported_symbol to do symbol_impl with RequireExplicitReExport as Yes if the module is in a stub file else No
Update known_module_symbol to use imported_symbol with a fallback to module_type_symbol
Update ModuleLiteralType::member to use imported_symbol with a custom fallback

We could potentially also update symbol_from_declarations and symbol_from_bindings to avoid passing in the RequireExplicitReExport as it would be always No if called directly. We could add symbol_from_declarations_impl and symbol_from_bindings_impl.

Looking at the _impl functions, I think we should move all of these symbol related logic into symbol.rs where Symbol is defined and the _impl could be private while we expose the public APIs at the crate level. This would also make the RequireExplicitReExport an implementation detail and the caller doesn't need to worry about it.

Happy to hear others thoughts on this.

crates/red_knot_python_semantic/src/types.rs

carljm

Looks great!

crates/red_knot_python_semantic/src/types.rs

carljm · 2025-02-13T21:52:06Z

crates/red_knot_python_semantic/src/types/infer.rs

-                ty.inner_type()
-            });
+        let declared_ty =
+            symbol_from_declarations(self.db(), declarations, RequiresExplicitReExport::No)


I think maybe we should rename the current symbol_from_declarations and symbol_from_bindings to private functions symbol_from_declarations_impl and symbol_from_bindings_impl and then have public versions that assume RequiresExplicitReExport::No. IMO it would be ideal if RequiresExplicitReExport enum could be a private implementation detail of the lookup functions, and not part of the public lookup API at all. (Also maybe at some point we should move the lookup functions to a submodule so they actually can have implementation details that are private from inference.)

The reasoning is that type inference should only ever infer direct from some set of bindings or declarations when it's doing within-file inference. All cross-file stuff should go through APIs like imported_symbol or builtin_symbol; it's never correct for a different file to peek into another file's individual bindings and declarations. So it doesn't make sense to expose from-bindings and from-declarations APIs that let you specify RequiresExplicitReExport (and also it's annoying to have to specify it everywhere.)

Yes, this is basically my plan as a follow-up (mentioned in the PR description) by moving it all in symbol.rs. I don't think it'll be "private" by staying in types.rs as it already contains a public usage of it. Apologies if it wasn't obvious.

Oops sorry totally missed that in the PR description! Great that we are independently thinking along the same lines :)

AlexWaygood · 2025-02-14T13:35:15Z

crates/red_knot_python_semantic/src/types.rs

+/// Return the symbol for a member of `types.ModuleType`.
+pub(crate) fn module_type_symbol<'db>(db: &'db dyn Db, name: &str) -> Symbol<'db> {
+    if module_type_symbols(db)
+        .iter()
+        .any(|module_type_member| &**module_type_member == name)
+    {
+        KnownClass::ModuleType.to_instance(db).member(db, name)
+    } else {
+        Symbol::Unbound
+    }
+}


Sorry for the post-merge review. I think it might be good to add some more doc-comments to this function, because it's a bit weird as a standalone routine. In general we wouldn't check to see whether a symbol exists on a class before doing the .member() call on the instance type -- we'd just do the .member() call on the instance type, since it has the same end result. The reason for doing the funny dance here to only call KnownClass::ModuleType.to_instance(db).member(db, name) when absolutely necessary is that it was a fairly significant performance regression to fallback to doing that for every name lookup that wasn't found in the module's globals. So we use less idiomatic (and much more verbose) code here as a micro-optimisation because it's used in a very hot path.

Added in 89cefbe (#16152)

AlexWaygood · 2025-02-16T17:46:02Z

crates/red_knot_python_semantic/src/types.rs

+/// Lookup the type of `symbol` in the builtins namespace.
+///
+/// Returns `Symbol::Unbound` if the `builtins` module isn't available for some reason.
+///
+/// Note that this function is only intended for use in the context of the builtins *namespace*
+/// and should not be used when a symbol is being explicitly imported from the `builtins` module
+/// (e.g. `from builtins import int`).
+pub(crate) fn builtins_symbol<'db>(db: &'db dyn Db, symbol: &str) -> Symbol<'db> {
+    resolve_module(db, &KnownModule::Builtins.name())
+        .map(|module| {
+            external_symbol_impl(db, module.file(), symbol).or_fall_back_to(db, || {
+                // We're looking up in the builtins namespace and not the module, so we should
+                // do the normal lookup in `types.ModuleType` and not the special one as in
+                // `imported_symbol`.
+                module_type_symbol(db, symbol)
+            })
+        })
+        .unwrap_or(Symbol::Unbound)
+}


Again, sorry for the delayed review... I'm a bit confused by the changes here. Why is the builtins namespace being handled so differently to all other module namespaces? Yes, it's true that __name__, __doc__ and other attributes found on types.ModuleType are present in the builtins namespace:

>>> import builtins >>> builtins_dict = builtins.__dict__ >>> builtins_dict['__name__'] 'builtins' >>> builtins_dict['__doc__'] "Built-in functions, types, exceptions, and other objects.\n\nThis module provides direct access to all 'built-in'\nidentifiers of Python; for example, builtins.len is\nthe full name for the built-in function len().\n\nThis module is not normally accessed explicitly by most\napplications, but can be useful in modules that provide\nobjects with the same name as a built-in value, but in\nwhich the built-in of that name is also needed."

but that's also true for any other module:

>>> import typing >>> typing_dict = typing.__dict__ >>> typing_dict['__name__'] 'typing' >>> typing_dict['__doc__'] '\nThe typing module: Support for gradual typing as defined by PEP 484 and subsequent PEPs.\n\nAmong other things, the module includes the following:\n* Generic, Protocol, and internal machinery to support generic aliases.\n All subscripted types like X[int], Union[int, str] are generic aliases.\n* Various "special forms" that have unique meanings in type annotations:\n NoReturn, Never, ClassVar, Self, Concatenate, Unpack, and others.\n* Classes whose instances can be type arguments to generic classes and functions:\n TypeVar, ParamSpec, TypeVarTuple.\n* Public helper functions: get_type_hints, overload, cast, final, and others.\n* Several protocols to support duck-typing:\n SupportsFloat, SupportsIndex, SupportsAbs, and others.\n* Special types: NewType, NamedTuple, TypedDict.\n* Deprecated aliases for builtin types and collections.abc ABCs.\n\nAny name not present in __all__ is an implementation detail\nthat may be changed without notice. Use at your own risk!\n'

Why are we adding special handling to builtins specifically so that builtins_symbol(db, "__doc__") returns a bound symbol, but not to all of the functions in stdlib.rs? __doc__ is also present in the global namespace of typing, typing_extensions, or any other core module, and it's a different object to builtins.__doc__

That's a good point.

So, while looking into the way builtins should be handled, I saw the other functions in stdlib.rs as well. The main reason for special handling builtins is related to #15476 where we need to treat the builtins namespace as different to explicitly importing symbols from the builtins module.

When considering a symbol from a builtins namespace (via the fallback logic of name lookup), the lookup should go through the external symbol query which will make sure that explicit re-exports are required.

While, when considering a symbol from the builtins module (via explicit import statement), the lookup should follow the normal module lookup logic that's implemented via imported_symbol

The known_module_symbol is just a wrapper around imported_symbol lookup for the known modules i.e., they're not available in the current namespace but requires an external lookup. This change just makes that explicit such that typing_symbol and typing_extensions_symbol lookup happens as if the symbols were imported via from typing import ... and from typing_extensions import ... respectively. There's additional context in this thread: #16073 (comment)

## Summary This PR does the following: * Moves the following from `types.rs` in `symbol.rs`: * `symbol` * `global_symbol` * `imported_symbol` * `symbol_from_bindings` * `symbol_from_declarations` * `SymbolAndQualifiers` * `SymbolFromDeclarationsResult` * Moves the following from `stdlib.rs` in `symbol.rs` and removes `stdlib.rs`: * `known_module_symbol` * `builtins_symbol` * `typing_symbol` (only for tests) * `typing_extensions_symbol` * `builtins_module_scope` * `core_module_scope` * Add `symbol_from_bindings_impl` and `symbol_from_declarations_impl` to keep `RequiresExplicitReExport` an implementation detail * Make `declaration_type` a `pub(crate)` as it's required in `symbol_from_declarations` (`binding_type` is already `pub(crate)` The main motivation is to keep the implementation details private and only expose an ergonomic API which uses sane defaults for various scenario to avoid any mistakes from the caller. Refer to #16133 (comment), #16133 (comment) for details.

dhruvmanila added the ty Multi-file analysis & type inference label Feb 13, 2025

dhruvmanila changed the title ~~WIP: Follow-up from re-export implementation~~ Refactor symbol lookup APIs to hide re-export implementation details Feb 13, 2025

dhruvmanila marked this pull request as ready for review February 13, 2025 13:08

dhruvmanila requested review from AlexWaygood, MichaReiser, carljm and sharkdp as code owners February 13, 2025 13:08

MichaReiser reviewed Feb 13, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Show resolved Hide resolved

MichaReiser reviewed Feb 13, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Feb 13, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Feb 13, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Show resolved Hide resolved

carljm approved these changes Feb 13, 2025

View reviewed changes

dhruvmanila force-pushed the dhruv/re-export-2 branch from 86c5876 to bff9112 Compare February 14, 2025 02:13

dhruvmanila force-pushed the dhruv/re-export-3 branch from 11fed30 to 884da77 Compare February 14, 2025 03:04

This was referenced Feb 14, 2025

red-knot: move symbol lookups in symbol.rs #16152

Merged

[red-knot] Support re-export conventions for stub files #16073

Merged

Base automatically changed from dhruv/re-export-2 to main February 14, 2025 09:47

dhruvmanila added 2 commits February 14, 2025 15:18

Follow-up from re-export implementation

feb96fb

Address review feedback

96b5a35

dhruvmanila force-pushed the dhruv/re-export-3 branch from 884da77 to 96b5a35 Compare February 14, 2025 09:48

dhruvmanila merged commit 63dd68e into main Feb 14, 2025
21 checks passed

dhruvmanila deleted the dhruv/re-export-3 branch February 14, 2025 09:55

AlexWaygood reviewed Feb 14, 2025

View reviewed changes

AlexWaygood reviewed Feb 16, 2025

View reviewed changes

Comments

Conversation

dhruvmanila commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carljm Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

carljm Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlexWaygood Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexWaygood Feb 16, 2025

Choose a reason for hiding this comment

Uh oh!

dhruvmanila Feb 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dhruvmanila commented Feb 13, 2025 •

edited

Loading

dhruvmanila Feb 17, 2025 •

edited

Loading