Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] NodeStream API #1622

Merged
merged 73 commits into from
Dec 14, 2019
Merged

[core] NodeStream API #1622

merged 73 commits into from
Dec 14, 2019

Conversation

oowekyala
Copy link
Member

Context

With the removal of Jaxen for 7.0.0, the method Node.findChildrenWithXPath will be broken (it throws a checked JaxenException). It's usually only used when the path at which the node looked for is too deep to be roamed in a type-safe, null-safe manner without making the code size explode, e.g. in ForLoopCanBeForeach:

return "./StatementExpressionList[count(*)=1]"
+ "/StatementExpression"
+ "/*[self::PostfixExpression and @Image='++' or self::PreIncrementExpression]"
+ "/PrimaryExpression"
+ "/PrimaryPrefix"
+ "/Name"
+ (itName == null ? "" : "[@Image='" + itName + "']");
}

Doing the same check in Java without XPath would probably involve around 30 lines of clumsy null-check and type checks/casts. Changing the grammar to flatten the AST will simplify this partially, but it doesn't remove the problem.

An alternative that Java 8 enables is node stream processing. Basically we can design an API where we can stream the descendants of a node, allowing to filter, flatmap to children of a certain type, flatmap to descendants, etc, in a completely type safe and null safe manner.

Saxon itself has an API to write XPath-like queries in Java, leveraging the standard Stream API. From the Saxon HE doc:

From Saxon 9.9, additional methods exploiting the Java 8 Streams API become available to navigate directly from XdmNode instances. In many cases these will be faster and more convenient than navigation using XPath. For example, instead of executing the XPath expression .//chapter[@title='Introduction'] to find a specific element within a document, it is possible to use the Java expression node.select(descendant("chapter").where(eq(attribute("title"),"Introduction)")).asNode().

That API is nice, but it's not typesafe, uses strings everywhere, and using it directly would force us to wrap the nodes in a Saxon adapter.

This PR

This PR implements a feature-complete, self-contained API to stream nodes. NodeStreams

  • are lazy
  • can be iterated multiple times
  • are type-safe and null-safe
  • can be easily converted to a regular Stream
  • can be easily obtained from a node or a collection of nodes
  • are as expressive as XPath expressions, faster, probably more memory efficient (albeit more verbose)

See the javadoc on the NodeStream interface for more details.

The PR also refactors ForLoopCanBeForeachRule to use that new API. You can see how it blends well with Optional and regular Streams. What's more interesting is that I observed that the rule runs at least 30% faster!

I would argue that with that new API in 7.0.0, we could deprecate the methods getFirstDescendantOfType, getFirstChildOfType, findFirstParentOfType, etc. until 8.0.0. Their functionality is superseded by the new API, and they're not lazy.

What's missing yet:

  • a benchmark between the two possible implementations of descendantStream(). I would think the current implementation is the fastest, but it could also be implemented with a lazy tree visit.
  • support for stopping descendantStream() on find boundaries. This can be added later, because it needs the Stream#takeWhile function, which was introduced in JDK 9, so we'll have to implement it ourselves.

@oowekyala oowekyala added an:enhancement An improvement on existing features / rules in:ast About the AST structure or API, the parsing step labels Jan 27, 2019
@oowekyala oowekyala added this to the 7.0.0 milestone Jan 27, 2019
@ghost
Copy link

ghost commented Jan 27, 2019

1 Message
📖 No java rules are changed!

Generated by 🚫 Danger

@adangel adangel self-assigned this Dec 14, 2019
@adangel adangel changed the title [core] Node stream api [core] Node Stream API Dec 14, 2019
@adangel adangel changed the title [core] Node Stream API [core] NodeStream API Dec 14, 2019
@adangel adangel merged commit 8e89b80 into pmd:pmd/7.0.x Dec 14, 2019
@oowekyala oowekyala deleted the node-stream-api branch December 14, 2019 18:00
@adangel adangel mentioned this pull request May 23, 2020
14 tasks
@adangel adangel mentioned this pull request Jan 23, 2023
55 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
an:enhancement An improvement on existing features / rules in:ast About the AST structure or API, the parsing step
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants