-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #16005: fix parse-error if @see ends with dot #16355
Conversation
@romani now I need to add tests in JavadocParserTreeTest.java to show problem is actually fixed, right ? |
no, we need prove that it works by some checks usage over parser functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please run regression diff report.
items:
...ces/com/puppycrawl/tools/checkstyle/checks/javadoc/nonemptyatclausedescription/Example3.java
Outdated
Show resolved
Hide resolved
@mahfouz72 , please review this PR, great chance to look at ANTLR changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
item
src/main/resources/com/puppycrawl/tools/checkstyle/grammar/javadoc/JavadocParser.g4
Outdated
Show resolved
Hide resolved
@romani @mohitsatr please take a look at this comment |
I thought issue was about allowing a |
Why right behaviour? can you share a proof that javadoc tool is not passing if there is a description right after the html element other than |
okay I misinterpreted the issue.
I could not see the whole picture and I became too fixated on a single blade of grass-- in this case a Dot. I also got the point that we don't want the parser to be too strict because well it is Check's job to check for those type of things. |
9ee2929
to
c644a01
Compare
Github, generate report |
no much reason to generate report if you have regular CI failures, please address them first. |
c644a01
to
083aec7
Compare
Github, generate report |
Please put in our Input all edge cases of that we're mentioned in this PR, and be creative in making content for Input. |
I wonder why report is not getting triggered ? |
sure |
Github, generate report for configs in PR description |
Report generation failed. Please check the logs for more details. |
5bf30f7
to
780a045
Compare
Github, generate report for configs in PR description |
Report generation failed. Please check the logs for more details. |
@romani can you please take a look at why report generation is failing |
Please read https://github.com/checkstyle/contribution/tree/master/checkstyle-tester#executing-generation-using-github-action you missed special keywords in PR description |
Github, generate report |
@mahfouz72 , please finish review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
item
@Test | ||
public void testExample3() throws Exception { | ||
final String[] expected = CommonUtil.EMPTY_STRING_ARRAY; | ||
verifyWithInlineConfigParser( | ||
getPath("InputNonEmptyAtclauseDescriptionThree.java"), expected); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a defect in the parser, not the check we just found it using a check execution, so we should add a test in javadoc ASTTest
@romani do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we covering all by testing of Checks and regular Inputs, no pure unit testing.
We add additional test that class if we think it helps.
@mohitsatr, please add one more test to that test class to visualize AST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to visualize AST
I didn't quite get what you mean by that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohitsatr like I did in #16458
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
780a045
to
cfa185c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok to merge
@mohitsatr , please resolve conflict |
cfa185c
to
8e40e2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last
...kstyle/checks/javadoc/nonemptyatclausedescription/InputNonEmptyAtclauseDescriptionThree.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good rule of thumb:
...kstyle/checks/javadoc/nonemptyatclausedescription/InputNonEmptyAtclauseDescriptionThree.java
Show resolved
Hide resolved
Github, generate report |
I have only minors, @nrmancuso , please confirm parser update |
(reference | STRING | htmlElement) (WS | NEWLINE)* ((WS | NEWLINE) description)? | ||
( | ||
htmlElement (WS | NEWLINE)* description? | ||
| (reference | STRING) (WS | NEWLINE)* ((WS | NEWLINE) description)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(WS | NEWLINE)* ((WS | NEWLINE) description)?
seems repetative.
Any reason this isn't (WS | NEWLINE)* description?
? The *
should already handle multiple, so you don't need yo say an extra item of a multiple.
It is also a samilar pattern above.
It should be able to be reduced to (htmlElement | reference | STRING) (WS | NEWLINE)* description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, the (WS | NEWLINE)
says one is required. While the other doesn't need a space between the html element. Still seems like there should be a way to reduce the repetative-ness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(reference | STRING) ((WS | NEWLINE)+ description)?
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Assuming you meant to replace both lines and not the last one) Reference
or STRING
doesn't include htmlElement
, so we can't drop that if the current code change (and original) is correct, and I don't doubt it.
We can't move (WS | NEWLINE)
inside the parenthesis of the ?
, because it will fail on things like STRING WS NEWLINE
(assume complete match) when description
is not present but everything else is.
description
does include NEWLINE
and WS
(by virtue of text
). You may be able to change ((WS | NEWLINE) description)?
to just description?
like the line above, but is best for regression to confirm.
Edit: but this could mean the description starts with a space/newline instead of the first non-whitespace text.
This grammar just seems excessively complex for the goal to be complex. The WS | NEWLINE
(or similar) has gotten this grammar in trouble in the past since everything includes it, like I showed above with description.
Just another reason to put this grammar away and start over to see if it could be made leaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WS | NEWLINE (or similar) has gotten this grammar in trouble in the past since everything includes it
I completely agree, without them the grammar will look a lot better.
Why are we interested in parsing WS
or NEWLINE
in the first place? If a check needs to enforce or use some whitespace rules, why not leave this responsibility to the check itself rather than the parser? This is how we handle Java code, whitespace is not explicitly parsed in the Java grammar, yet we have multiple checks that enforce whitespace rules.
Can't we just completely skip WS
and NEWLINE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a check needs to enforce or use some whitespace rules, why not leave this responsibility to the check itself rather than the parser
My guess is the grammar was written to overcome Java's AST shortcomings in a way.
A check only sees what we provide it. For some reason we deemed java checks can only see the AST and not the source file anymore. This can cause some issues with whitespace (indentation check) and new lines as now we have to make complex logic on determining where newlines and spaces are because the AST does not include them. This made more complex by the fact our AST isn't in source order, so you have to do scans to determine what the source order next token even is before determining if there is a simple space/newline next. We also lose identification of what the whitespace (space, tab, etc...) and newlines (\r, \r\n, \n) actually is, going back to relying on the source code.
The fact that the Javadoc grammar is in source order may be a benefit to not needing to identify all WS/NEWLINE, but also if you remove all of them then things like descriptions may become a little weird to look at.
For example, this PR has TEXT -> . website of checkstyle
and it isn't broken into 4 different TEXTs because of the whitespace. I think was the main reason the grammar keeps them in, so we could easily make chunks of text and we thought that would be easier to handle in checks.
It could be the Javadoc grammar is actually in a somewhat decent state (we still have some issues with excessive WS/NEWLINE IMO), it just looks more complex to those who are not fully familiar with it and is just not very new eyes friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, due to how our Javadoc grammar is written, our lexer really only "scans" (which is typically only the first part of a lexer's job) in many cases. This means that we need to pass on this additional information (like newlines, whitespace) to the parser because it is also doing some of the work that the lexer normally does to recognize tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it just looks more complex to those who are not fully familiar with it
couldn't agree more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Item:
htmlElement (WS | NEWLINE)* description? | ||
| (reference | STRING) (WS | NEWLINE)* ((WS | NEWLINE) description)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, there are a bunch of ways that we can achieve the goal of fixing the bug in the issue. While I think that @mahfouz72 's recommendation is reasonable, it is likely that the best performance can be achieved by reducing the original rule and avoiding extra alternatives:
htmlElement (WS | NEWLINE)* description? | |
| (reference | STRING) (WS | NEWLINE)* ((WS | NEWLINE) description)? | |
(htmlElement | reference | STRING) (WS | NEWLINE)* description? |
@mahfouz72 can you help to analyze the difference in performance between your suggestion and this simplified version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is no much difference in performance. I tried several runs and my suggestion works slightly better.
Note: For this, I analyze performance on a single, small Javadoc so we may not capture many details.
Although your rule looks cleaner and has fewer alternatives, I’m unsure why it doesn’t improve performance.
The Javadoc used for testing
* Some summary.
*
* @param j some param
* @see Class afdaf
* @see Classafdaf
* @see "ACADF" adfaf
* @see "adfaf"dafadf
* @see <a href="www.checkstyle.org/">website</a>...
* @see <a href="www.checkstyle.org/">website</a> ...
* "some description"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mahfouz72 I assume there was not any change in the tree itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests inside JavadocParseTreeTest
are passing but we need regression on large projects also to be confident about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although your rule looks cleaner and has fewer alternatives, I’m unsure why it doesn’t improve performance.
Probably specificity :). It is ideal to write grammars to be as exact as possible in most cases. @mahfouz72 Thanks a lot for checking this out.
two review items at #16355 (review) should be resolved before merge. |
8e40e2d
to
b2a400b
Compare
not sure why CI is red. |
CI failure network problem, happens periodically, restarted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot !!
good to merge after CI pass.
fixes #16005
Diff Regression config: https://gist.githubusercontent.com/mohitsatr/f5a524a06294d907c84b9fc5b68661dc/raw/604872961df92db35e66c26218dc74c61bb10097/config.xml