Skip to content

Fix sourcegen markup#2009

Merged
ischoegl merged 2 commits intoCantera:mainfrom
ischoegl:fix-sourcegen-backticks
Oct 14, 2025
Merged

Fix sourcegen markup#2009
ischoegl merged 2 commits intoCantera:mainfrom
ischoegl:fix-sourcegen-backticks

Conversation

@ischoegl
Copy link
Copy Markdown
Member

@ischoegl ischoegl commented Oct 13, 2025

Changes proposed in this pull request

Fix issues where XML markup causes docstrings extracted by sourcegen to cut off (<bold>, <emphasis>, <computeroutput>).

Checklist

  • The pull request includes a clear description of this code change
  • Commit messages have short titles and reference relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • Style & formatting of contributed code follows contributing guidelines
  • The pull request is ready for review

@ischoegl ischoegl force-pushed the fix-sourcegen-backticks branch from 108b688 to ed042ff Compare October 13, 2025 00:13
@ischoegl ischoegl added the clib label Oct 13, 2025
@ischoegl ischoegl marked this pull request as ready for review October 13, 2025 00:44
@ischoegl ischoegl requested a review from a team October 13, 2025 00:44
@bryanwweber
Copy link
Copy Markdown
Member

Can you provide an example of this happening? If the tags are properly closed, etree shouldn't cut them off afaict

@ischoegl
Copy link
Copy Markdown
Member Author

Can you provide an example of this happening? If the tags are properly closed, etree shouldn't cut them off afaict

usesHDF5 - the issue is that we need to translate XML back to markup. (not necessarily that etree misbehaves)

@bryanwweber
Copy link
Copy Markdown
Member

I'm sorry for my lack of familiarity with this code, I don't feel like I can review this 😕 I was hoping if I asked I'd be able to better understand but I think it's just too deep in the weeds for me anymore! I'm not sure how to evaluate whether the change works or not, or whether it has any other side effects, or why it's necessary in the first place 🫣

@ischoegl
Copy link
Copy Markdown
Member Author

ischoegl commented Oct 13, 2025

I'm sorry for my lack of familiarity with this code, I don't feel like I can review this 😕 I was hoping if I asked I'd be able to better understand but I think it's just too deep in the weeds for me anymore! I'm not sure how to evaluate whether the change works or not, or whether it has any other side effects, or why it's necessary in the first place 🫣

Sorry, @bryanwweber ... if you look at the doxygen XML output that is generated for usesHDF5:

       <briefdescription>
<para>Returns true if Cantera was compiled with C++ <computeroutput>HDF5</computeroutput> support. </para>
        </briefdescription>

sourcegen needs to convert this back to a conventional markdown string for the doxygen \brief description, i.e.,

Returns true if Cantera was compiled with C++ `HDF5` support.

Before this fix, doxygen (or Jinja2?) cut off like so:

Returns true if Cantera was compiled with C++

as the XML tags hadn't been removed.

PS: I updated the code to better document what is going on.

@ischoegl ischoegl force-pushed the fix-sourcegen-backticks branch from ed042ff to 28f013e Compare October 13, 2025 20:27
@bryanwweber
Copy link
Copy Markdown
Member

Thanks for the detailed explanation! I think I understand a little bit better now. I'm going to ask some questions/speculation still from my ignorance, so I'm not asking for any changes or further investigation in this PR! @speth feel free to approve and merge if this looks good.

sourcegen needs to convert this back to a conventional markdown string for the doxygen \brief description

I see, sourcegen is... generating... the C++ source/header files that have to contain the documentation strings. Since parsing comments from the actual C++ source/header files is not at all worth pursuing, the most convenient option is to use the Doxygen XML output as the source for the documentation to be copied into the generated code.

I wonder if there's a way to pass the XML output to Doxygen when building docs for the generated code to say, this is the source for the docstrings of the generated files? That would avoid needing to copy the strings around.

Before this fix, doxygen (or Jinja2?) cut off like so:

If Jinja2 is responsible for this, it's probably due to somehow escaping the reserved HTML characters. Is Doxygen XML valid in docstrings in the main source code? If not, Doxygen may indeed be at fault...

@ischoegl
Copy link
Copy Markdown
Member Author

ischoegl commented Oct 14, 2025

@bryanwwebersourcegen is the code I adopted for the generated CLib. So doxygen first documents the Cantera code base, which is available as XML. That XML is then used to auto-generate CLib. CLib is now itself fully documented, which is where Jinja2 templates come in. For processed output, see https://cantera.org/dev/cxx/dd/df6/group__CAPIindex.html (that documentation is generated from a second doxygen pass).

The legacy (hand-coded) CLib will be removed after 3.2. See https://cantera.org/dev/clib/index.html

Copy link
Copy Markdown
Member

@bryanwweber bryanwweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ischoegl I think I'm not being clear, but that's on me. I'm asking questions here since 1) I don't think we found the root cause of this problem and 2) I really don't like editing XML with regex 😬 In any case, this seems to work for the job that it has, so it looks good to me!

@ischoegl ischoegl merged commit 2b6bf8f into Cantera:main Oct 14, 2025
49 checks passed
@ischoegl ischoegl deleted the fix-sourcegen-backticks branch October 14, 2025 13:35
@ischoegl
Copy link
Copy Markdown
Member Author

@bryanwweber ... I am 99% certain that Jinja2 swallows the output, as the C++ headers generated from their templates cut off after the first XML tag. As we need markdown for doxygen anyways, there's no need to further investigate. I certainly see the point of XML/regex, but the editing only touches what I consider terminal nodes. Any of the XML formatting tags that are replaced only apply to those terminal nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants