Skip to content

[JEP 467] Add support for MarkdownComments#4875

Closed
johannescoetzee wants to merge 14 commits intojavaparser:masterfrom
johannescoetzee:johannes/markdown-comments
Closed

[JEP 467] Add support for MarkdownComments#4875
johannescoetzee wants to merge 14 commits intojavaparser:masterfrom
johannescoetzee:johannes/markdown-comments

Conversation

@johannescoetzee
Copy link
Copy Markdown
Collaborator

Fixes #4660.

This PR adds support for Markdown comments as described in JEP 467. I ran into a few issues while implementing this, so the end result is a compromise between the ideal design and what is doable in a reasonable amount of time and without rewriting all of the comment handling (and requiring users to do the same).

MarkdownComment node

As it is, this PR adds the MarkdownComment node, which is distinct from JavadocComment. I considered merging these under a common class, but in my opinion the benefits to doing so is far outweighed by the complexity of the change required. The content field of the MarkdownComment contains the raw text, including the /// and leading spaces after the first line, which is somewhat consistent with how block comments are handled. Keeping this information is necessary for the pretty printer and LPP.

It also contains a method getMarkdownContent which strips the leading whitespace, /// and indents the remaining content consistent with the description provided in JEP 467. I suspect that this will be more useful to users than using the raw content.

Parser changes

I initially tried updating the grammar to parse markdown comments as a single token, but ran into some difficulties doing this. From what I can see it looks like it should be possible, but would require manually manipulating the tokenizer input stream and I wasn't confident enough that this would work out to spend too much time on it. Instead, the MarkdownComment consists of a range of tokens starting with the first SINGLE_LINE_COMMENT, including all leading and trailing whitespace for the body and ending with the last SINGLE_LINE_COMMENT. The handling of all this is done in the CommonTokenAction method defined in java.jj which is added to the GeneratedJavaParserTokenManager.

Pretty printer and LPP

Support in the pretty printer is mostly a copy-paste of block comments are handled with only minor tweaks to account for the lack of /** and */. Support for the LPP is more complicated since the assumption was made that a comment will always consist of a single token. I've updated the code handling adding, removing, and replacing comments to account for the possibility of having multiple tokens corresponding to the comment.

Note on the implementation

The implementation and general commit history for this is a bit messy again. The comment handling logic was surprisingly tricky to extend to handle this case, so I tried to implement things in the least disruptive way. As I progressed, and especially when getting to the printers, I ran into issues with my initial implementation (especially the content field) and had to backtrack and change things a bit. Sorry about that.

@johannescoetzee johannescoetzee changed the title Johannes/markdown comments [WIP] Add support for MarkdownComments Oct 16, 2025
@johannescoetzee johannescoetzee force-pushed the johannes/markdown-comments branch 2 times, most recently from 4814c13 to 96f311b Compare October 16, 2025 16:05
@johannescoetzee johannescoetzee changed the title [WIP] Add support for MarkdownComments Add support for MarkdownComments Oct 16, 2025
@johannescoetzee johannescoetzee changed the title Add support for MarkdownComments [JEP 467] Add support for MarkdownComments Oct 16, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 60.30769% with 129 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.368%. Comparing base (51d5210) to head (9ac0726).
⚠️ Report is 25 commits behind head on master.

Files with missing lines Patch % Lines
.../lexicalpreservation/LexicalPreservingPrinter.java 63.736% 27 Missing and 6 partials ⚠️
.../github/javaparser/printer/PrettyPrintVisitor.java 0.000% 11 Missing ⚠️
...va/com/github/javaparser/ast/comments/Comment.java 0.000% 10 Missing ⚠️
...ithub/javaparser/ast/comments/MarkdownComment.java 83.928% 8 Missing and 1 partial ⚠️
...avaparser/GeneratedJavaParserTokenManagerBase.java 60.869% 4 Missing and 5 partials ⚠️
...parser/ast/comments/TraditionalJavadocComment.java 70.000% 6 Missing ⚠️
...om/github/javaparser/ast/visitor/CloneVisitor.java 25.000% 6 Missing ⚠️
...github/javaparser/ast/comments/JavadocComment.java 0.000% 5 Missing ⚠️
...aparser/ast/visitor/GenericListVisitorAdapter.java 0.000% 5 Missing ⚠️
.../javaparser/ast/visitor/GenericVisitorAdapter.java 0.000% 4 Missing ⚠️
... and 16 more
Additional details and impacted files

Impacted file tree graph

@@               Coverage Diff               @@
##              master     #4875       +/-   ##
===============================================
- Coverage     58.405%   58.368%   -0.037%     
+ Complexity      2537      2534        -3     
===============================================
  Files            677       681        +4     
  Lines          39046     39302      +256     
  Branches        7089      7134       +45     
===============================================
+ Hits           22805     22940      +135     
- Misses         13341     13448      +107     
- Partials        2900      2914       +14     
Flag Coverage Δ
AlsoSlowTests 58.368% <60.307%> (-0.037%) ⬇️
javaparser-core 58.368% <60.307%> (-0.037%) ⬇️
javaparser-symbol-solver 58.368% <60.307%> (-0.037%) ⬇️
jdk-10 57.919% <60.493%> (-0.048%) ⬇️
jdk-11 57.936% <60.493%> (-0.033%) ⬇️
jdk-12 57.933% <60.493%> (-0.036%) ⬇️
jdk-13 57.936% <60.493%> (-0.033%) ⬇️
jdk-14 58.170% <60.493%> (-0.019%) ⬇️
jdk-15 58.173% <60.493%> (-0.035%) ⬇️
jdk-16 58.147% <60.493%> (-0.035%) ⬇️
jdk-17 58.300% <60.493%> (-0.036%) ⬇️
jdk-18 58.300% <60.493%> (-0.036%) ⬇️
jdk-8 57.937% <60.493%> (-0.033%) ⬇️
jdk-9 57.934% <60.493%> (-0.033%) ⬇️
macos-latest 58.360% <60.307%> (-0.037%) ⬇️
ubuntu-latest 58.355% <60.307%> (-0.037%) ⬇️
windows-latest 58.350% <60.307%> (-0.037%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...n/java/com/github/javaparser/StaticJavaParser.java 70.238% <100.000%> (ø)
...ava/com/github/javaparser/ast/CompilationUnit.java 74.524% <ø> (ø)
...thub/javaparser/ast/visitor/NodeFinderVisitor.java 4.788% <ø> (ø)
...hub/javaparser/ast/visitor/VoidVisitorAdapter.java 99.065% <100.000%> (+0.004%) ⬆️
...thub/javaparser/metamodel/JavaParserMetaModel.java 99.707% <100.000%> (+0.001%) ⬆️
.../javaparser/metamodel/JavadocCommentMetaModel.java 100.000% <100.000%> (ø)
...javaparser/metamodel/MarkdownCommentMetaModel.java 100.000% <100.000%> (ø)
.../metamodel/TraditionalJavadocCommentMetaModel.java 100.000% <100.000%> (ø)
.../java/com/github/javaparser/JavaParserAdapter.java 97.297% <0.000%> (ø)
...parser/ast/visitor/GenericVisitorWithDefaults.java 96.226% <0.000%> (-0.917%) ⬇️
... and 24 more

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a111ba0...9ac0726. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jlerbsc
Copy link
Copy Markdown
Collaborator

jlerbsc commented Oct 20, 2025

I tried several markdown examples provided by the specification https://openjdk.org/jeps/467 without causing any parsing errors. Ultimately, why should we make changes in JP since it seems to work with the current version?

@johannescoetzee
Copy link
Copy Markdown
Collaborator Author

@jlerbsc I assumed you wanted this based on your comment in #4869. This PR treats markdown comments as a single comment instead of a series of line comments, which makes it easier for users to use them. With how line comments are handled, only the last line comment would be associated with the commented method. Any preceding comments would be orphans, so correctly handling those currently involves handling ranges manually.

In that sense, I still think the change here is beneficial, but if simply parsing the comments without errors or crashing is sufficient, then this PR isn't needed and can be closed without issue

@jlerbsc
Copy link
Copy Markdown
Collaborator

jlerbsc commented Oct 21, 2025

@johannescoetzee I agree with you that markdown comments are multi-line comments similar to javadoc comments. What I don't understand is that grouping these elements into a common class or using a class hierarchy makes change more complex. Could you elaborate on this point?

@ftomassetti , what is your opinion on integrating Markdown comments into the existing comment architecture? Do you agree with the suggestion made in this PR? Personally, I don't know JP well enough to have a strong opinion on this subject, but it seems to me that this PR introduces additional complexity into comment management.

@johannescoetzee
Copy link
Copy Markdown
Collaborator Author

@jlerbsc I've thought about your question/concern some more and I've changed my mind. I don't think this is really a problem anymore.

I had the idea that grouping them into a common class while I was still using the first representation I tried for this (seen in the earlier commits). In that representation, the content field contained what is now returned by the getMarkdownContent method. In that version, the usual reconstruction of comments comment.getHeader() + comment.getContent() + comment.getFooter() no longer made sense, and in my opinion changing that anywhere javadoc comments were handled (including in user code) was not worth the benefit of uniting those comments.

In the current version proposed by this PR, this is no longer an issue. I would therefore propose doing the following:

  1. Rename JavadocComment to TraditionalJavadocComment, following the naming convention from https://docs.oracle.com/javase/specs/jls/se25/html/jls-3.html#jls-3.7 (where it's called TraditionalComment, but keeping the Javadoc in the name is useful in JavaParser to distinguish it from BlockComment
  2. Create an abstract JavadocComment class that is extended by both TraditionalJavadocComment and MarkdownComment
  3. Update the Javadoc and JavadocParser classes to handle MarkdownComments, which should now be a fairly small change.

Please let me know if you agree with this approach. Once I hear back from you, I'll update the PR with what we've discussed.

@johannescoetzee johannescoetzee force-pushed the johannes/markdown-comments branch from 96f311b to 1b1ce43 Compare October 23, 2025 09:22
@johannescoetzee johannescoetzee force-pushed the johannes/markdown-comments branch from 3adeb35 to 4157ca4 Compare October 23, 2025 13:29
@johannescoetzee
Copy link
Copy Markdown
Collaborator Author

I've pushed changes starting with the Rename JavadocComment to TraditionalJavadocComment commit that implement the shared ancestor for TraditionalJavadocComment and MarkdownComment that also demonstrate the code changes required to handle this case. Overall the changes on the user side will be fairly minor. The majority of the manual changes I had to make were for javaparser internal components. It's possible I missed something, but I've covered the things I found and I would, of course, fix any issues related to this that come up.

@johannescoetzee
Copy link
Copy Markdown
Collaborator Author

johannescoetzee commented Oct 23, 2025

It looks like the test failures are slow tests that aren't run with mvn test by default. I'm updating expectations for those now

Edit: Done

@johannescoetzee johannescoetzee force-pushed the johannes/markdown-comments branch from 50ff8f3 to 5705012 Compare October 23, 2025 14:05
@jlerbsc
Copy link
Copy Markdown
Collaborator

jlerbsc commented Oct 24, 2025

@johannescoetzee Thank you for the valuable work you have done. For now, I will wait for @ftomassetti 's opinion to shed further light on how to proceed.

@johannescoetzee
Copy link
Copy Markdown
Collaborator Author

@jlerbsc Sounds good. Since this isn't required to parse Java 23 code, would you be open to accepting Java 25 contributions before this is finalised? I'm on vacation next week, but I've made good progress with handling module imports, so will likely have that ready soon after I'm back.

@jlerbsc
Copy link
Copy Markdown
Collaborator

jlerbsc commented Oct 30, 2025

Hi @johannescoetzee , I am going to ask you to modify your PR to simplify it and isolate the integration of Markdown comments from the current comment management. Would it be possible for you to separate the comment class hierarchy into a separate PR (this would involve preparing the code to support Markdown comments) and then create a new PR to manage Markdown comments?

@jlerbsc
Copy link
Copy Markdown
Collaborator

jlerbsc commented Oct 30, 2025

Do you think that a solution consisting, initially, of parsing Markdown comments as inline comments and then creating and transforming these comments into Markdown in post-processing would be easier to maintain than a solution consisting of parsing Markdown comments directly?

@johannescoetzee
Copy link
Copy Markdown
Collaborator Author

Do you think that a solution consisting, initially, of parsing Markdown comments as inline comments and then creating and transforming these comments into Markdown in post-processing would be easier to maintain than a solution consisting of parsing Markdown comments directly?

I did consider this approach as well, but thought it would end up being more complex. The main problem is that all but the last line comment before a method end up as orphan comments, so, to correctly reconstruct the markdown comments, it would necessary to connect these line comments by looking at token ranges and the tokens between them. While separating this logic from the parser would bring some benefit, I think it would be harder to reason about than the approach implemented in this PR where the token stream is being processed directly. It would also only work if tokens are saved (as far as I understand), which is a significant downside.

@jlerbsc
Copy link
Copy Markdown
Collaborator

jlerbsc commented Nov 5, 2025

Do you think that a solution consisting, initially, of parsing Markdown comments as inline comments and then creating and transforming these comments into Markdown in post-processing would be easier to maintain than a solution consisting of parsing Markdown comments directly?

I did consider this approach as well, but thought it would end up being more complex. The main problem is that all but the last line comment before a method end up as orphan comments, so, to correctly reconstruct the markdown comments, it would necessary to connect these line comments by looking at token ranges and the tokens between them. While separating this logic from the parser would bring some benefit, I think it would be harder to reason about than the approach implemented in this PR where the token stream is being processed directly. It would also only work if tokens are saved (as far as I understand), which is a significant downside.

Could you add this analysis, along with the other approaches you considered, to the description of the PR, in order to document in the JP project the approach that was chosen and those that were considered? Thank you.

@jlerbsc
Copy link
Copy Markdown
Collaborator

jlerbsc commented Nov 20, 2025

Can you close this PR if it is no longer relevant?

@johannescoetzee
Copy link
Copy Markdown
Collaborator Author

Closed in favour of #4885 and #4899

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider supporting the document comments in the form of the markdown included in Java23 (JEP 467)?

2 participants