Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 11, 2025

Fix markdown parser to support HtmlInline elements

  • Understand the current code structure and identify the issue
  • Add handling for HtmlInline in the GetText method
  • Add a test case to verify HtmlInline support with the provided example
  • Add comprehensive test for multiple inline HTML elements
  • Run tests to ensure the fix works correctly
  • Validate no regressions in existing functionality
  • Complete code review with no issues found
  • All tests passing (10/10 MarkdownReaderTests)
  • Address PR feedback: simplify tests and verify both Text and GetMarkdown

Summary:
Fixed a bug where the markdown parser threw NotSupportedException when encountering HTML inline elements (like <sup>, <strong>, <em>, etc.) in markdown documents. Added support for HtmlInline type in the GetText method by appending the HTML tag content. Added two comprehensive test cases to verify the fix works correctly with single and multiple inline HTML elements. Tests now use explicit assertions with Assert.Equal for both Text and GetMarkdown() properties.

Original prompt

This section details on the original issue you should resolve

<issue_title>[MEDI] Markdown parser fails when document contains InlineHtml</issue_title>
<issue_description>A bug reported offline by @KrystofS :

Sample markdown that breaks the reader:

When getting the managed exception object for this exception, the runtime will first try to allocate a new managed object <sup>[1]</sup>, and if that fails, will return a pre-allocated, shared, global out of memory exception object.

IIRC the Markdig library represents it with HtmlInline and the parser currently throws because such type is not expected.</issue_description>

<agent_instructions>I am referring to the code that is present here: https://github.com/dotnet/extensions/tree/main/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig Don't forget to add a test with repro.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Microsoft Reviewers: Open in CodeFlow

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot PTAL at my comments. Overall it LGTM, just needs some polishing.

Copilot AI changed the title [WIP] Fix markdown parser issue with inline HTML Fix markdown parser crash on inline HTML elements Dec 11, 2025
Copilot AI requested a review from adamsitnik December 11, 2025 10:21
@adamsitnik adamsitnik marked this pull request as ready for review December 11, 2025 11:15
Copilot AI review requested due to automatic review settings December 11, 2025 11:15
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where the markdown parser threw NotSupportedException when encountering HTML inline elements in markdown documents. The fix adds support for the Markdig library's HtmlInline type in the GetText method, enabling the parser to handle common inline HTML tags like <sup>, <strong>, <em>, <sub>, etc.

Key Changes:

  • Added handling for HtmlInline inline elements in the MarkdownParser's GetText method
  • Added two comprehensive test cases to verify single and multiple inline HTML elements work correctly
  • Tests verify both Text and GetMarkdown() properties produce expected output

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs Added HtmlInline case in the GetText method to append the HTML tag content instead of throwing NotSupportedException
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Readers/MarkdownReaderTests.cs Added SupportsInlineHtml and SupportsMultipleInlineHtmlElements test methods to verify the fix works correctly with various HTML inline elements

After a thorough review of this pull request, I found no issues to report. The implementation is clean, follows existing code patterns, includes appropriate test coverage, and correctly resolves the reported bug. The fix is minimal and focused, affecting only the necessary code path to support HTML inline elements in markdown documents.

@adamsitnik adamsitnik enabled auto-merge (squash) December 11, 2025 11:28
@adamsitnik adamsitnik merged commit e491777 into main Dec 11, 2025
11 of 12 checks passed
@adamsitnik adamsitnik deleted the copilot/fix-markdown-parser-inline-html branch December 11, 2025 13:10
ericstj pushed a commit to ericstj/extensions that referenced this pull request Dec 11, 2025
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: adamsitnik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MEDI] Markdown parser fails when document contains InlineHtml

3 participants