Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Simplify the rulechain #2490

Merged
merged 85 commits into from
Jun 26, 2020
Merged

[core] Simplify the rulechain #2490

merged 85 commits into from
Jun 26, 2020

Conversation

oowekyala
Copy link
Member

@oowekyala oowekyala commented May 18, 2020

Describe the PR

Simplify the way the rulechain is implemented, clarify the contract of Rule::apply, remove some needless API about the rulechain. In effect, this makes applying rules on a tree index the only way to apply rules. This subsumes the two previous ways to apply rules (rulechain, which uses an index, and Rule.apply, which doesn't).

Current problems

  • Currently rules are artificially split between those that use the rulechain and those that don't. This split affects several classes: RuleSet, RuleSets, Rulechain, etc.
  • Rulechain rules, for some reason, use one custom "RulechainVisitor" per language. All of those are nearly perfect duplicates, and are completely unnecessary. We can traverse a tree without using the language-specific visitor interface.
  • RulechainVisitors also circumvent the Rule::apply method for XPath rules (they call XPathRule::evaluate). This is inconsistent and unnecessary. It weakens the contract of Rule::apply.
  • Rule::apply, by taking a list of nodes as an argument, potentially hides errors: if an error occurs, the remaining nodes are not processed.
    • In most usages we use a singleton list anyway, so passing a list is useless
  • Rule::apply is also implemented the same way everywhere: basically loop over all nodes and visit them. But implementations are inconsistent w.r.t. error handling:
    • some test the node explicitly for some node type and ignore the rest (even if the node is from the same language)

protected void visitAll(List<? extends Node> nodes, RuleContext ctx) {
for (Object element : nodes) {
/*
It is important to note that we are assuming that all nodes here are of type Compilation Unit,
but our caller method may be called with any type of node, and that's why we need to check the kind
of instance of each element
*/
if (element instanceof ASTCompilationUnit) {
ASTCompilationUnit node = (ASTCompilationUnit) element;
visit(node, ctx);
}
}
}

  • some cast the node without checking if this is appropriate, which may fail at runtime

protected void visitAll(List<? extends Node> nodes, RuleContext ctx) {
for (Object element : nodes) {
JspNode node = (JspNode) element;
visit(node, ctx);
}
}

  • Another problem with those implementations is that they define an auxilliary visitAll method. This multiplies the extension points (and hence the failure points) for no benefit at all.

This PR irons out this part of the API, by removing the divide between rulechain rules and other ones, and simplifying those extension points.

Changes

The basic API change to Rule is the following:

  • Every rule exposes a RuleTargetSelector, a strategy that selects the nodes that a rule should visit. There are several strategies:
    • Select nodes by their XPath names: used to implement XPath rulechain rules
    • Select nodes by their class: used to implement Java rulechain rules
    • Just select the root node: used to implement other rules (matches the current behavior of making a recursive top-down visit starting from the root)
  • Foreach tree, foreach rule, the Rule::apply method is systematically fed with the nodes of the tree that match the selector. This means Rule::apply is now the only entry point to apply a rule (not circumvented by XPath rules).

The selectors select nodes from an index of the tree, similar to the one previously built by RulechainVisitors. This is now used by all rules though. The same implementation is used by all languages.

Other API changes include:

  • RulechainVisitor and Rulechain are removed. Rule application is managed by a RuleApplicator object
    • The constructor of BaseLanguageModule is altered, since there's no need to provide a rulechain visitor class. Language::getRulechainVisitorClass is removed
  • RuleSet::apply is deprecated. RuleSets with an 's' should be used for that
  • Rule::apply now takes a single node instead of a list
  • All the visitAll methods are removed. Since by contract apply is only called with nodes selected by the target selector, it's safe to rely on this and throw an exception if the node is not of the correct type. So implementations of apply always cast without checking now.
  • Rule::addRuleChainVisit is deprecated. Users should override AbstractRule::buildTargetSelector instead. It's still there temporarily, for compatibility

In the non-API changes, there is now a way to index and visit abstract types efficiently. This simplifies some rulechain rules (though eg for the java tree, many abstractions only exist in the java-grammar branch). This represents most of the code additions of this PR mind you.

Related issues

Ready?

Deprecations due on master

  • Rulesets. This is an implementation detail
  • Benchmark API (internal)

Future work

As part of the broader API changes around eg RuleContext and SourceCodeProcessor, I think removing RuleSets::apply would be appropriate. Possibly removing RuleSets altogether would be beneficial.

@adangel adangel added this to the 7.0.0 milestone May 22, 2020
Change behavior of LoosePackageCoupling

It stores state between visits, should not
use the rulechain
@ghost
Copy link

ghost commented Jun 15, 2020

1 Message
📖 No java rules are changed!

Generated by 🚫 Danger

@oowekyala oowekyala marked this pull request as ready for review June 15, 2020 15:26
@adangel adangel self-requested a review June 20, 2020 13:16
@adangel adangel linked an issue Jun 26, 2020 that may be closed by this pull request
adangel added a commit that referenced this pull request Jun 26, 2020
Copy link
Member

@adangel adangel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, looks like a lot of work! Please don't take my comments personal, I'm just trying to understand what you did... (you have definitively lost me with the LatticeRelation ...).

I'll try to merge this now...

Rulesets. This is an implementation detail

I've done the deprecations, see 91868f3
We probably need to look at RuleSetFactory as well, there are a couple of usages of RuleSets... But that's a separate task.

Port remaining usages of addRuleChainVisit

That's a follow up PR.

Benchmark API (internal)

What do you mean with that? Benchmark is currently possible to be called from CLI (-benchmark), so we need to define, what exactly the API is...
But I guess, this can be done later as well, as I didn't see, that you modified the benchmark classes here.

* @param nodeClass
* the AST node to add to the RuleChain visit list
*/
void addRuleChainVisit(Class<? extends Node> nodeClass);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this can't be deprecated on master, since there is no way to migrate before. I've added a point to #1139 for the migration documentation.

When we have finished the changes on PMD 7, we can go back and look, whether we introduce e.g. RuleTargetSelector already in PMD 6.x and providing a compatibility API - that would allow us to deprecate this method in PMD 6.x. But - if we do that - only as a last step and maybe only as a small minor release for PMD 6.x after PMD 7 is out (for helping migration).

import net.sourceforge.pmd.lang.ast.Node;
import net.sourceforge.pmd.lang.rule.internal.RuleApplicator;

/**
* Grouping of Rules per Language in a RuleSet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's btw something we need to think about: The comment probably was the intention, but was never implemented this way - all the rules are put together into the same list regardless of their language....
So, how do we deal with a ruleset that contains rules of multiple languages, etc.... (in terms of internal processing and expected unsurprising behavior of PMD).

import net.sourceforge.pmd.lang.ast.Node;
import net.sourceforge.pmd.lang.rule.internal.RuleApplicator;

/**
* Grouping of Rules per Language in a RuleSet.
*
* @author pieter_van_raemdonck - Application Engineers NV/SA - www.ae.be
*/
public class RuleSets {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll @Deprecate and @InternalApi it on master....
Note: there are quite some usages in the PMD Eclipse plugin. This is due to the unclear API on how to call PMD programmatically, e.g. how to provide programmatically the rulesets to be used for analyzing....

*
* @see Rule#getTargetSelector()
*/
public abstract class RuleTargetSelector extends TargetSelectorInternal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit weird... RuleTargetSelector is API, but the supertype is internal.... effectively, this makes the internal supertype API as well

Ok, the class is abstract and the constructor is hidden - so no one can implement their own RuleTargetSelector...

Right now, RuleTargetSelector serves as a utility class, that provides defined (hidden) implementations of itself...
Not sure yet, what to do here, but it still looks weird to me....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is super weird. I wanted to make RuleTargetSelector published because it's somewhere on the Rule interface, but still keep everything about TreeIndex internal... So no one can implement their RuleTargetSelector because that would require us to expose TreeIndex.

I'm truly not sure this is the best way to do it, if you have a better idea go ahead

import net.sourceforge.pmd.lang.rule.internal.GraphUtils.DotColor;

/**
* Indexes data of type {@code <V>} with keys of type {@code <K>}, where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: Do we really need to be that generic? We are dealing with AST Nodes, don't we?
That's btw one example for #2611 - it will be very difficult to understand this class without looking at many more classes in parallel to understand, what's going on....
Maybe the question in a different way: Do we really implement it that abstract? We only need it for PMD purposes, not for general purpose....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, as far as I understand, that is your solution for dealing with Abstract node types.... right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, as far as I understand, that is your solution for dealing with Abstract node types.... right?

Yes that's it.

Honestly making it generic is easy and makes tests much more comprehensive and easier to write. This data structure is a kind of multimap where values are added at once to several keys, doing minimal work. So we can add Node values to several Class keys at once, just by knowing node.getClass(), without having to traverse all the supertypes for every instance. We also don't accumulate values for keys that won't be queried (those Class instances for which no rulechain rule has added a visit request).

The API is really simple I think, you just use put and get as if it was a regular map.

* Map each element of the given iterable with the given function,
* and accumulates it into the collector.
*/
public static <T, U, A, C> C map(Collector<? super U, A, ? extends C> collector,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example of overly used generics... What is T, U, A ,C?... Maybe I'm just not that into streams...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just "used generics" rather than overused... It's not my fault that the Collector API uses 3 type parameters...

U, A, C in that order correspond to the T, A, R type parameters of Collector:

T - the type of input elements to the reduction operation
A - the mutable accumulation type of the reduction operation (often hidden as an implementation detail)
R - the result type of the reduction operation

On this method we have a source of T elements, a mapper from T to U, and a collector that accumulates a bunch of Us to produce a C, which is generally a collection of U.

The A is an implementation detail, when you make methods that return collectors, this is generally some wildcard to not expose it to clients. But when you use the collector to perform the collection operation, then you need to declare A to capture it. You don't need to know that to use methods that use collectors, but the implementation of the method has to capture it.

It takes some time to get used to the collector API but once you get it there's really nothing voodoo happening.

Consider that this method is super reusable, you can do map(toSet(), listOfT, t -> t.getU()) and end up with a Set<U>. You can use any of the standard collectors in Collectors. It's much better than writing one specialized map function for each kind of result type (mapToSet, mapToList, etc).

@@ -18,32 +19,25 @@
*/
public abstract class AbstractJavaRulechainRule extends AbstractJavaRule {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can get rid of this class eventually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes. But the fact that rules also implement the visitor interface is a problem for that, as the default visit method traverses the tree, which we want to avoid.

adangel added a commit that referenced this pull request Jun 26, 2020
Merge branch 'pr-2490' into pmd/7.0.x

Fixes #1785
@adangel adangel merged commit 00dbe08 into pmd:pmd/7.0.x Jun 26, 2020
@oowekyala oowekyala deleted the type-heap branch June 26, 2020 14:10
@oowekyala
Copy link
Member Author

Thanks for merging!

What do you mean with that? Benchmark is currently possible to be called from CLI (-benchmark), so we need to define, what exactly the API is...
But I guess, this can be done later as well, as I didn't see, that you modified the benchmark classes here.

Yeah I don't remember why I mentioned the benchmark api here. What I meant, is that the implementation classes are probably internal API (and the -benchmark option is the published API to use it). We need to review that somewhere before we decide on this

oowekyala added a commit to pmd/pmd-designer that referenced this pull request Jun 27, 2020
@adangel adangel mentioned this pull request Jan 23, 2023
55 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core] Allow abstract node types to be valid rulechain visits
2 participants