Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[java] Resolve explicit types using FQCNs, without hitting the classloader #1207

Closed
oowekyala opened this issue Jun 25, 2018 · 4 comments
Closed
Labels
in:type-resolution Affects the type resolution code
Milestone

Comments

@oowekyala
Copy link
Member

Many types are completely explicit in Java given the info that's present in the source file. These includes types in an extends/ implements clause, import statements, method and constructor formal parameter types, wildcard bounds, return types, (non-var) local variable types, etc. Even though the actual .class files may not be on our auxclasspath, we can resolve the qualified name of these in most cases, which could enhance the robustness of our type resolution when the auxclasspath is not set.

Implementation

The key would be to build some part of our type resolution around TypeQualifiedNames.

  • I think we could have a subclass of JavaTypeDefinition that wraps a TypeQualifiedName instead of necessarily a Class. For most operations of this class, we don't need a Class object anyway, since we return other type definitions for example.
  • A clean solution would need a specialised data structure to resolve qualified names from simple names. This would declutter ClassTypeResolver, which mixes that logic with the rest for now, and sync our implementations
    • We could have a procedure that uses the import declarations, package declaration, and type declarations of the file to build this data structure and attach it to the compilation unit node.
    • That data structure could, given the simple name of a type, determine whether
      1. we can unambiguously resolve it to a FQCN, then it returns it
      2. we don't have enough info to resolve it unambiguously (e.g. in the presence of several wildcard import statements and no correct auxclasspath)
    • We could reuse that implementation in UnnecessaryFQCN, which currently takes its own decisions ad-hoc. With such a "TypeScope", UnnecessaryFQCN could be simply formulated as:
      • For each FQCN n in the file with simple name s,
        • if TypeScope(s) equals n, then add a violation
  • Using simple lookups to that data structure, the QualifiedNameResolver could then populate the unambiguous types it finds with a type definition that wraps a qualified name.
@oowekyala oowekyala added the a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community label Jun 25, 2018
@oowekyala
Copy link
Member Author

Some additionnal thoughts:

We wouldn't really need a separate TypeScope object. A more flexible approach would be to enrich the SourceFileScope, which for the moment is next to useless. The point of having a symbol table is to resolve names to the entity they reference. For now, the symbol table only cares about the entities that are defined in the same file, but it should be more general than that, i.e., be able to resolve a reference even though we have no accompanying node. This could later be used as a stepping stone for multifile analysis.

The idea of "reference" here would have to be defined precisely, since it's more general than the idea of FQName . E.g. a name could make a reference to a local variable, even though these variables don't have a canonical name.

Among other things, classifying names w.r.t. the entity they describe might help improving our symbol table framework, as well as simplify some analysis steps like type resolution. This is described in JLS 6.5, and most of it doesn't even require a ClassLoader but uses only syntactical context information, which suggests that we could set some attribute in the ASTName node directly in the parser.

@jsotuyod
Copy link
Member

jsotuyod commented Jan 20, 2019

@oowekyala I'm not sure what your idea actually is... without hitting the classloader at all, the only names you can unambiguously resolve are:

  • java.lang primitives
  • void
  • explicit imports (no wildcards)
  • types defined within the same file being analyzed

For everything else, you need to hit the classloader to make sure if a class exists according to the order in which they take precedence:

  1. same package
  2. implicit imports (everything in java.lang)
  3. wildcard imports
  4. actual fully qualified class references

Notice this last one is counter-intuitive, but finding a FQCN in the code is actually hard to deal with, as it can technically be anything:

  • foo.bar.Baz -> class Baz in package foo.bar? or is it currentPackage.foo.bar.Baz, with foo, bar and Baz being nested types?
  • Baz -> class Baz in the default package? or is it currentPackage.Baz?
  • Foo.Bar -> class Baz in package Foo? nested type Bar in default package's Foo? or is it currentPackage.Foo.Bar?

Currently, TypeSet is the best implementation we have to do this, and is already self-contained. And if we are already hitting the classloader, just keeping the class reference makes sense.

I agree the current type resolution codebase duplicates a lot of this logic (mostly down to the current split between symbol table and type resolution), and that's something worth fixing.

I'd also love to see type resolution being more on-demand (ie: actually computing types as needed when asking for a node type) rather than done for everything just in case, but that's probably gonna be harder.

@oowekyala
Copy link
Member Author

I'm not sure what your idea actually is... without hitting the classloader at all, the only names you can unambiguously resolve are:

Yes you're right, that was very optimistic. We will anyway need to rely on the classloader for type resolution.

For everything else, you need to hit the classloader to make sure if a class exists according to the order in which they take precedence:
same package
implicit imports (everything in java.lang)
wildcard imports
actual fully qualified class references

That is the precedence order at the top of a compilation unit. Inside classes the shadowing order is much more complicated. TypeSet relies only on the classloader, and doesn't account at all for the context the reference is found in, which is crucial. E.g.

  • Member types can be referenced by simple name in the declaring class, in subclasses of the declaring class, in their own nested classes, in the nested classes of the subclasses of the declaring class
  • Type parameters can shadow other type names, even in method declarations
  • Local classes can be referenced by simple name, but they don't have a canonical name so asking the classloader is pointless

The JLS goes in details about when and how shadowing occurs, and how to resolve names. It supports itself with the notion of scope, which TypeSet is oblivious to. On the other end of the spectrum, our current Scope infrastructure only indexes the local AST and is oblivious to things that happen outside of the compilation unit.

The framework I present in #1566 aims to unify those approaches. That PR only presents the reference framework, but what I have implemented around SymbolTable already supersedes TypeSet, and is general enough to replace Scope eventually. It's designed with multifile analysis in mind, and it's already lazy --which may allow some to someday resolve types on-demand. I think it's an attractive alternative to our current situation, and it will allow type resolution to concentrate on things like method selection and type inference rather than resolving explicit things.

@oowekyala oowekyala added this to the 7.0.0 milestone Oct 25, 2020
@oowekyala
Copy link
Member Author

I think this is already implemented in PMD 7. Should we add this to the release notes?

@adangel adangel added in:type-resolution Affects the type resolution code and removed a:RFC A drafted proposal on changes to PMD, up for feedback form the team and community labels Jan 13, 2023
@adangel adangel mentioned this issue Jan 23, 2023
55 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:type-resolution Affects the type resolution code
Projects
None yet
Development

No branches or pull requests

3 participants