Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #16005: fix parse-error if @see ends with dot #16355

Merged
merged 1 commit into from
Mar 13, 2025

Conversation

@mohitsatr
Copy link
Contributor Author

@romani now I need to add tests in JavadocParserTreeTest.java to show problem is actually fixed, right ?

@romani
Copy link
Member

romani commented Feb 18, 2025

no, we need prove that it works by some checks usage over parser functionality.

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please run regression diff report.

items:

@romani
Copy link
Member

romani commented Feb 18, 2025

@mahfouz72 , please review this PR, great chance to look at ANTLR changes.

Copy link
Member

@mahfouz72 mahfouz72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item

@mahfouz72
Copy link
Member

@romani @mohitsatr please take a look at this comment

@mohitsatr
Copy link
Contributor Author

mohitsatr commented Feb 19, 2025

I thought issue was about allowing a . at the end of Html element.
We don't really wanted the description text to appear immediately after closing tag. so anything other than '.' at the end should result in parsing error as it is right behaviour.

@mahfouz72
Copy link
Member

so anything other than '.' at the end should result in parsing error as it is right behaviour.

Why right behaviour? can you share a proof that javadoc tool is not passing if there is a description right after the html element other than .

@mohitsatr
Copy link
Contributor Author

mohitsatr commented Feb 20, 2025

okay I misinterpreted the issue.
all this time I was assuming that we only wanted a dot at the end of closing Html tag.

@see <a href="xxx" >link </a>. some description          // legal
@see <a href="xxx" >link </a>  some description          // also legal

@see <a href="xxx" >link </a>e  some description        // illegal because not dot

I could not see the whole picture and I became too fixated on a single blade of grass-- in this case a Dot.
@mahfouz72 you'r approach is much better. We can have the description started immediately after closing tag and keep the whitespace optional.

I also got the point that we don't want the parser to be too strict because well it is Check's job to check for those type of things.

@mohitsatr mohitsatr force-pushed the literal-see-parse-error branch 2 times, most recently from 9ee2929 to c644a01 Compare February 22, 2025 08:25
@mohitsatr
Copy link
Contributor Author

Github, generate report

@romani
Copy link
Member

romani commented Feb 22, 2025

no much reason to generate report if you have regular CI failures, please address them first.

@mohitsatr mohitsatr force-pushed the literal-see-parse-error branch from c644a01 to 083aec7 Compare February 23, 2025 02:02
@mohitsatr
Copy link
Contributor Author

Github, generate report

@romani
Copy link
Member

romani commented Feb 23, 2025

Please put in our Input all edge cases of that we're mentioned in this PR, and be creative in making content for Input.

@mohitsatr
Copy link
Contributor Author

I wonder why report is not getting triggered ?

@mohitsatr
Copy link
Contributor Author

Please put in our Input all edge cases of that we're mentioned in this PR, and be creative in making content for Input.

sure

@mohitsatr
Copy link
Contributor Author

Github, generate report for configs in PR description

Copy link
Contributor

Report generation failed. Please check the logs for more details.

Link: https://github.com/checkstyle/checkstyle/actions/runs/13490580406

@mohitsatr mohitsatr force-pushed the literal-see-parse-error branch 2 times, most recently from 5bf30f7 to 780a045 Compare February 24, 2025 05:05
@mohitsatr
Copy link
Contributor Author

Github, generate report for configs in PR description

Copy link
Contributor

Report generation failed. Please check the logs for more details.

Link: https://github.com/checkstyle/checkstyle/actions/runs/13491196203

@mohitsatr
Copy link
Contributor Author

@romani can you please take a look at why report generation is failing

@romani
Copy link
Member

romani commented Feb 24, 2025

@mohitsatr
Copy link
Contributor Author

Github, generate report

@romani
Copy link
Member

romani commented Mar 2, 2025

@mahfouz72 , please finish review

Copy link
Member

@mahfouz72 mahfouz72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item

Comment on lines +133 to +138
@Test
public void testExample3() throws Exception {
final String[] expected = CommonUtil.EMPTY_STRING_ARRAY;
verifyWithInlineConfigParser(
getPath("InputNonEmptyAtclauseDescriptionThree.java"), expected);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a defect in the parser, not the check we just found it using a check execution, so we should add a test in javadoc ASTTest

@romani do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we covering all by testing of Checks and regular Inputs, no pure unit testing.
We add additional test that class if we think it helps.

@mohitsatr, please add one more test to that test class to visualize AST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to visualize AST

I didn't quite get what you mean by that ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohitsatr like I did in #16458

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mohitsatr mohitsatr force-pushed the literal-see-parse-error branch from 780a045 to cfa185c Compare March 4, 2025 02:39
Copy link
Member

@mahfouz72 mahfouz72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to merge

@romani
Copy link
Member

romani commented Mar 8, 2025

@mohitsatr , please resolve conflict

@mohitsatr mohitsatr force-pushed the literal-see-parse-error branch from cfa185c to 8e40e2d Compare March 8, 2025 07:27
Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good rule of thumb:

@romani
Copy link
Member

romani commented Mar 8, 2025

Github, generate report

@romani romani assigned nrmancuso and unassigned mahfouz72 Mar 8, 2025
@romani romani requested a review from nrmancuso March 8, 2025 17:57
@romani
Copy link
Member

romani commented Mar 8, 2025

I have only minors, @nrmancuso , please confirm parser update

(reference | STRING | htmlElement) (WS | NEWLINE)* ((WS | NEWLINE) description)?
(
htmlElement (WS | NEWLINE)* description?
| (reference | STRING) (WS | NEWLINE)* ((WS | NEWLINE) description)?
Copy link
Contributor

@rnveach rnveach Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(WS | NEWLINE)* ((WS | NEWLINE) description)? seems repetative.
Any reason this isn't (WS | NEWLINE)* description?? The * should already handle multiple, so you don't need yo say an extra item of a multiple.

It is also a samilar pattern above.
It should be able to be reduced to (htmlElement | reference | STRING) (WS | NEWLINE)* description?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, the (WS | NEWLINE) says one is required. While the other doesn't need a space between the html element. Still seems like there should be a way to reduce the repetative-ness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(reference | STRING) ((WS | NEWLINE)+ description)? ?

Copy link
Contributor

@rnveach rnveach Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Assuming you meant to replace both lines and not the last one) Reference or STRING doesn't include htmlElement, so we can't drop that if the current code change (and original) is correct, and I don't doubt it.

We can't move (WS | NEWLINE) inside the parenthesis of the ?, because it will fail on things like STRING WS NEWLINE (assume complete match) when description is not present but everything else is.

description does include NEWLINE and WS (by virtue of text). You may be able to change ((WS | NEWLINE) description)? to just description? like the line above, but is best for regression to confirm.
Edit: but this could mean the description starts with a space/newline instead of the first non-whitespace text.

This grammar just seems excessively complex for the goal to be complex. The WS | NEWLINE (or similar) has gotten this grammar in trouble in the past since everything includes it, like I showed above with description.

Just another reason to put this grammar away and start over to see if it could be made leaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WS | NEWLINE (or similar) has gotten this grammar in trouble in the past since everything includes it

I completely agree, without them the grammar will look a lot better.

Why are we interested in parsing WS or NEWLINE in the first place? If a check needs to enforce or use some whitespace rules, why not leave this responsibility to the check itself rather than the parser? This is how we handle Java code, whitespace is not explicitly parsed in the Java grammar, yet we have multiple checks that enforce whitespace rules.

Can't we just completely skip WS and NEWLINE?

Copy link
Contributor

@rnveach rnveach Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a check needs to enforce or use some whitespace rules, why not leave this responsibility to the check itself rather than the parser

My guess is the grammar was written to overcome Java's AST shortcomings in a way.

A check only sees what we provide it. For some reason we deemed java checks can only see the AST and not the source file anymore. This can cause some issues with whitespace (indentation check) and new lines as now we have to make complex logic on determining where newlines and spaces are because the AST does not include them. This made more complex by the fact our AST isn't in source order, so you have to do scans to determine what the source order next token even is before determining if there is a simple space/newline next. We also lose identification of what the whitespace (space, tab, etc...) and newlines (\r, \r\n, \n) actually is, going back to relying on the source code.

The fact that the Javadoc grammar is in source order may be a benefit to not needing to identify all WS/NEWLINE, but also if you remove all of them then things like descriptions may become a little weird to look at.

For example, this PR has TEXT -> . website of checkstyle and it isn't broken into 4 different TEXTs because of the whitespace. I think was the main reason the grammar keeps them in, so we could easily make chunks of text and we thought that would be easier to handle in checks.

It could be the Javadoc grammar is actually in a somewhat decent state (we still have some issues with excessive WS/NEWLINE IMO), it just looks more complex to those who are not fully familiar with it and is just not very new eyes friendly.

Copy link
Member

@nrmancuso nrmancuso Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, due to how our Javadoc grammar is written, our lexer really only "scans" (which is typically only the first part of a lexer's job) in many cases. This means that we need to pass on this additional information (like newlines, whitespace) to the parser because it is also doing some of the work that the lexer normally does to recognize tokens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just looks more complex to those who are not fully familiar with it

couldn't agree more.

Copy link
Member

@nrmancuso nrmancuso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Item:

Comment on lines +1155 to +1156
htmlElement (WS | NEWLINE)* description?
| (reference | STRING) (WS | NEWLINE)* ((WS | NEWLINE) description)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, there are a bunch of ways that we can achieve the goal of fixing the bug in the issue. While I think that @mahfouz72 's recommendation is reasonable, it is likely that the best performance can be achieved by reducing the original rule and avoiding extra alternatives:

Suggested change
htmlElement (WS | NEWLINE)* description?
| (reference | STRING) (WS | NEWLINE)* ((WS | NEWLINE) description)?
(htmlElement | reference | STRING) (WS | NEWLINE)* description?

@mahfouz72 can you help to analyze the difference in performance between your suggestion and this simplified version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is no much difference in performance. I tried several runs and my suggestion works slightly better.

Note: For this, I analyze performance on a single, small Javadoc so we may not capture many details.

Although your rule looks cleaner and has fewer alternatives, I’m unsure why it doesn’t improve performance.

2

1

The Javadoc used for testing

* Some summary.
*
* @param j some param
* @see Class  afdaf
* @see Classafdaf
* @see "ACADF" adfaf
* @see "adfaf"dafadf 
* @see <a href="www.checkstyle.org/">website</a>...
* @see <a href="www.checkstyle.org/">website</a> ...
* "some description"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mahfouz72 I assume there was not any change in the tree itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests inside JavadocParseTreeTest are passing but we need regression on large projects also to be confident about this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although your rule looks cleaner and has fewer alternatives, I’m unsure why it doesn’t improve performance.

Probably specificity :). It is ideal to write grammars to be as exact as possible in most cases. @mahfouz72 Thanks a lot for checking this out.

@nrmancuso nrmancuso assigned romani and unassigned nrmancuso Mar 10, 2025
@romani
Copy link
Member

romani commented Mar 11, 2025

two review items at #16355 (review) should be resolved before merge.

@mohitsatr mohitsatr force-pushed the literal-see-parse-error branch from 8e40e2d to b2a400b Compare March 12, 2025 03:23
@mohitsatr
Copy link
Contributor Author

not sure why CI is red.

@romani
Copy link
Member

romani commented Mar 13, 2025

CI failure Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.12.1:site (default-site) on project sample: Execution default-site of goal org.apache.maven.plugins:maven-site-plugin:3.12.1:site failed: Plugin org.apache.maven.plugins:maven-site-plugin:3.12.1 or one of its dependencies could not be resolved: The following artifacts could not be resolved: org.apache.maven.reporting:maven-reporting-exec:jar:1.6.0 (absent): Could not transfer artifact org.apache.maven.reporting:maven-reporting-exec:jar:1.6.0 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/146.75.32.215] failed: connect timed out

network problem, happens periodically, restarted.

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot !!

good to merge after CI pass.

@romani romani merged commit 296ef58 into checkstyle:master Mar 13, 2025
112 checks passed
@mohitsatr mohitsatr deleted the literal-see-parse-error branch March 14, 2025 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse errors if @see spans multiple lines
5 participants