Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull #16628: use SLL prediction mode for fast javadoc parsing to improve performance #16628

Merged
merged 1 commit into from
Mar 25, 2025

Conversation

mahfouz72
Copy link
Member

@mahfouz72 mahfouz72 commented Mar 22, 2025

This is the fastest prediction mode, and provides correct results for many grammars

public enum PredictionMode {
	/**
	 * The SLL(*) prediction mode. This prediction mode ignores the current
	 * parser context when making predictions. This is the fastest prediction
	 * mode, and provides correct results for many grammars. This prediction
	 * mode is more powerful than the prediction mode provided by ANTLR 3, but
	 * may result in syntax errors for grammar and input combinations which are
	 * not SLL.
	 *
	 * <p>
	 * When using this prediction mode, the parser will either return a correct
	 * parse tree (i.e. the same parse tree that would be returned with the
	 * {@link #LL} prediction mode), or it will report a syntax error. If a
	 * syntax error is encountered when using the {@link #SLL} prediction mode,
	 * it may be due to either an actual syntax error in the input or indicate
	 * that the particular combination of grammar and input requires the more
	 * powerful {@link #LL} prediction abilities to complete successfully.</p>
	 *
	 * <p>
	 * This prediction mode does not provide any guarantees for prediction
	 * behavior for syntactically-incorrect inputs.</p>
	 */
	SLL,
        .....

Diff Regression config: https://gist.githubusercontent.com/mahfouz72/33967bf86e4a29cafa429771b0e95808/raw/f621465b3c7dd6e64ac603d396570bf92a1a69d6/check_patch.xml

@mahfouz72
Copy link
Member Author

Interesting!

===================== BENCHMARK SUMMARY ====================
Execution Time Baseline: 570.62 s
Average Execution Time: 474.83 s
============================================================
Execution Time Difference: 16.7800%
Difference exceeds the maximum allowed difference (16.7800%      > 10%)!

@mahfouz72
Copy link
Member Author

Since the Javadoc grammar contains many ambiguities and many nested rules, it may not be safe to rely on single lookahead parsing. However, if we can rewrite the grammar to reduce nesting and ambiguity, we could significantly improve parsing time by using SLL parsing

@nrmancuso @rnveach @romani
What do you think?

@rnveach
Copy link
Contributor

rnveach commented Mar 22, 2025

This is a 16% deduction right? Maybe consider running the performance multiple times to confirm it is a stable deduction.

What are the downsides of this mode? I assume they didn't enable it by default for a reason.

Just curious if we also seen what this does on the Java side?

@mahfouz72
Copy link
Member Author

This is a 16% deduction right? Maybe consider running the performance multiple times to confirm it is a stable deduction.

Yes, 100 seconds deduction which is 16%. I restarted CI by pushing again to see if this reduction is stable.

What are the downsides of this mode? I assume they didn't enable it by default for a reason.

From Javadoc of this option:

If a syntax error is encountered when using the {@link #SLL} prediction mode, it may be due to either an actual syntax error in
the input or indicate that the particular combination of grammar and input requires the more powerful {@link #LL} prediction
abilities to complete successfully

This prediction mode does not provide any guarantees for prediction
behavior for syntactically-incorrect inputs.

As per my understanding SLL mode doesn't handle ambiguities well. If a grammar contains multiple possible parse paths for a given input, SLL mode will fail. and It doesn't backtrack, once it hits an ambiguity, it stops. As it works with a single lookahead and not taking the whole parser context into account

Just curious if we also seen what this does on the Java side?

I didn't try this may ne we should try it in another PR but I think that Java can be more complex than Javadoc which might make it require the full LL approach for correct parsing. @nrmancuso can confirm this

We can use SLL mode as the default parsing strategy, and if it encounters any issues or fails to parse, we can switch to LL mode.

@mahfouz72
Copy link
Member Author

Github, generate report

@rnveach
Copy link
Contributor

rnveach commented Mar 22, 2025

If a syntax error is encountered ... indicate that the particular combination of grammar and input requires the more powerful {https://github.com/link #LL} prediction abilities to complete successfully

I believe we need regression without suppressing grammar failures on all our regression projects. We need to know if this change is now increasing grammar failures. Since I believe CI was passing, it seems its not really big concern.

@mahfouz72
Copy link
Member Author

Github, generate report

1 similar comment
@mahfouz72
Copy link
Member Author

Github, generate report

@romani
Copy link
Member

romani commented Mar 22, 2025

if you think this config is frequently to use you can put it as folder to https://github.com/checkstyle/test-configs and re-use at any time.

@mahfouz72
Copy link
Member Author

Time decreased by 16% so I reduced the baseline.
The report (without suppressing grammar failures) shows no difference, so parsing is unaffected.
I think it is ok to suppress this surviving mutation as we just use a setter and can't think of a test case that specific to this line

please see if this is good and update is approved I can do similar approach for java parsing

@romani
Copy link
Member

romani commented Mar 23, 2025

We are going to zero mutation phase, some most weird should be covered by pure unit testing, if no reasonable coverage can be provided.

Copy link
Member

@nrmancuso nrmancuso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, I tried this when I implemented the Java grammar and it did not work at the time. Maybe some improvements have been made since then, we should at least try it out again.

I’m don’t see any point in holding up this PR over a pitest suppression.

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Items

@@ -112,6 +113,7 @@ public ParseStatus parseJavadocAsDetailNode(DetailAST javadocCommentAst) {

try {
final JavadocParser javadocParser = createJavadocParser(javadocComment, errorListener);
javadocParser.getInterpreter().setPredictionMode(PredictionMode.SLL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it inside create..... method.

Let try to update pitest to avoid call interpreter
Like

<avoidCallsTo>org.apache.commons.logging</avoidCallsTo>

We need to find way avoid pitest suppression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last resort is approach like

<param>lazyLoad</param>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with avoidCallsTo and xwiki is unrelated

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to merge,
thanks a lot

@romani romani merged commit 1a88f34 into checkstyle:master Mar 25, 2025
50 of 101 checks passed
@romani romani changed the title Pull #16628: Try SLL prediction mode for fast javadoc parsing Pull #16628: use SLL prediction mode for fast javadoc parsing to improve performance Mar 25, 2025
@github-actions github-actions bot added this to the 10.22.0 milestone Mar 25, 2025
Abdelrhmansersawy pushed a commit to Abdelrhmansersawy/checkstyle that referenced this pull request Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants