-
Notifications
You must be signed in to change notification settings - Fork 850
Fix markdown parser crash on inline HTML elements #7131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: adamsitnik <[email protected]>
Co-authored-by: adamsitnik <[email protected]>
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot PTAL at my comments. Overall it LGTM, just needs some polishing.
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Readers/MarkdownReaderTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Readers/MarkdownReaderTests.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: adamsitnik <[email protected]>
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug where the markdown parser threw NotSupportedException when encountering HTML inline elements in markdown documents. The fix adds support for the Markdig library's HtmlInline type in the GetText method, enabling the parser to handle common inline HTML tags like <sup>, <strong>, <em>, <sub>, etc.
Key Changes:
- Added handling for
HtmlInlineinline elements in the MarkdownParser'sGetTextmethod - Added two comprehensive test cases to verify single and multiple inline HTML elements work correctly
- Tests verify both
TextandGetMarkdown()properties produce expected output
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs |
Added HtmlInline case in the GetText method to append the HTML tag content instead of throwing NotSupportedException |
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Readers/MarkdownReaderTests.cs |
Added SupportsInlineHtml and SupportsMultipleInlineHtmlElements test methods to verify the fix works correctly with various HTML inline elements |
After a thorough review of this pull request, I found no issues to report. The implementation is clean, follows existing code patterns, includes appropriate test coverage, and correctly resolves the reported bug. The fix is minimal and focused, affecting only the necessary code path to support HTML inline elements in markdown documents.
Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: adamsitnik <[email protected]>
Fix markdown parser to support HtmlInline elements
Summary:
Fixed a bug where the markdown parser threw
NotSupportedExceptionwhen encountering HTML inline elements (like<sup>,<strong>,<em>, etc.) in markdown documents. Added support forHtmlInlinetype in theGetTextmethod by appending the HTML tag content. Added two comprehensive test cases to verify the fix works correctly with single and multiple inline HTML elements. Tests now use explicit assertions withAssert.Equalfor bothTextandGetMarkdown()properties.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.
Microsoft Reviewers: Open in CodeFlow