Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 11, 2025

Markdig parses tables without trailing | characters differently than tables with them, causing table.ColumnDefinitions.Count to vary. The code assumed all tables needed -1 adjustment for column count, which failed for tables without trailing pipes.

Example of failing markdown:

| Flag                                       |      Value | Description
|:-------------------------------------------|-----------:|:-----------
| READYTORUN_FLAG_PLATFORM_NEUTRAL_SOURCE    | 0x00000001 | Set if the original IL
| READYTORUN_FLAG_COMPOSITE                  | 0x00000002 | The image represents

Changes

  • MarkdownParser.cs: Replace hardcoded table.ColumnDefinitions.Count - 1 with dynamic GetColumnCount() that examines actual row cells and their column spans
  • MarkdownReaderTests.cs: Add test case with table from issue demonstrating both multi-column parsing and empty cell handling

The fix calculates column count by iterating rows and summing ColumnSpan values, returning the maximum found. This handles both table formats correctly.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • securitytools.pkgs.visualstudio.com
    • Triggering command: /opt/hostedtoolcache/CodeQL/2.23.6/x64/codeql/csharp/tools/linux64/Semmle.Autobuild.CSharp /opt/hostedtoolcache/CodeQL/2.23.6/x64/codeql/csharp/tools/linux64/Semmle.Autobuild.CSharp (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>[MEDI] Parsing markdown fails for tables without trailing |</issue_title>
<issue_description>A bug reported offline by @KrystofS :

Sample markdown that breaks the reader, taken from here:

| Flag                                       |      Value | Description
|:-------------------------------------------|-----------:|:-----------
| READYTORUN_FLAG_PLATFORM_NEUTRAL_SOURCE    | 0x00000001 | Set if the original IL image was platform neutral. The platform neutrality is part of assembly name. This flag can be used to reconstruct the full original assembly name.
| READYTORUN_FLAG_COMPOSITE                  | 0x00000002 | The image represents a composite R2R file resulting from a combined compilation of a larger number of input MSIL assemblies.
| READYTORUN_FLAG_PARTIAL                    | 0x00000004 |
| READYTORUN_FLAG_NONSHARED_PINVOKE_STUBS    | 0x00000008 | PInvoke stubs compiled into image are non-shareable (no secret parameter)
| READYTORUN_FLAG_EMBEDDED_MSIL              | 0x00000010 | Input MSIL is embedded in the R2R image.
| READYTORUN_FLAG_COMPONENT                  | 0x00000020 | This is a component assembly of a composite R2R image
| READYTORUN_FLAG_MULTIMODULE_VERSION_BUBBLE | 0x00000040 | This R2R module has multiple modules within its version bubble (For versions before version 6.3, all modules are assumed to possibly have this characteristic)
| READYTORUN_FLAG_UNRELATED_R2R_CODE         | 0x00000080 | This R2R module has code in it that would not be naturally encoded into this module
| READYTORUN_FLAG_PLATFORM_NATIVE_IMAGE      | 0x00000100 | The owning composite executable is in the platform native format
fail: Microsoft.Extensions.DataIngestion.IngestionPipeline[6]
      An error occurred while ingesting document 'C:\Users\Krystof\Documents\GitHub\chat-demo\MyLocalAIApp\wwwroot\Data\readytorun-format.md'.
      System.IndexOutOfRangeException: Index was outside the bounds of the array.
         at Microsoft.Extensions.DataIngestion.MarkdownParser.GetCells(Table table, String outputContent)
         at Microsoft.Extensions.DataIngestion.MarkdownParser.MapBlock(String documentMarkdown, Boolean previousWasBreak, Block block)
         at Microsoft.Extensions.DataIngestion.MarkdownParser.Map(MarkdownDocument markdownDocument, String documentMarkdown, String identifier)
         at Microsoft.Extensions.DataIngestion.MarkdownParser.Parse(String markdown, String identifier)
         at Microsoft.Extensions.DataIngestion.MarkdownReader.ReadAsync(Stream source, String identifier, String mediaType, CancellationToken cancellationToken)
         at Microsoft.Extensions.DataIngestion.IngestionDocumentReader.ReadAsync(FileInfo source, String identifier, String mediaType, CancellationToken cancellationToken)
         at Microsoft.Extensions.DataIngestion.IngestionPipeline`1.ProcessAsync(IEnumerable`1 files, Activity rootActivity, CancellationToken cancellationToken)+MoveNext()

Ideally we would recognize this pattern and map it to a valid table structure.
</issue_description>

<agent_instructions>I am referring to the code that is present here: https://github.com/dotnet/extensions/tree/main/src/Libraries/Microsoft.Extensions.DataIngestion.Markdig Don't forget to add a test with repro.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Microsoft Reviewers: Open in CodeFlow

Copilot AI changed the title [WIP] Fix parsing of markdown tables without trailing pipe Fix IndexOutOfRangeException when parsing markdown tables without trailing pipes Dec 11, 2025
Copilot AI requested a review from adamsitnik December 11, 2025 10:26
@adamsitnik adamsitnik marked this pull request as ready for review December 11, 2025 11:13
Copilot AI review requested due to automatic review settings December 11, 2025 11:13
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! thanks

@adamsitnik adamsitnik enabled auto-merge (squash) December 11, 2025 11:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an IndexOutOfRangeException that occurred when parsing markdown tables without trailing pipe (|) characters. The Markdig library parses these tables differently, causing table.ColumnDefinitions.Count to return inconsistent values: for tables with trailing pipes it returns actualColumns + 1, but for tables without trailing pipes it returns the actual column count. The previous code assumed all tables needed a -1 adjustment, which failed for tables without trailing pipes.

Key Changes

  • Replaced hardcoded table.ColumnDefinitions.Count - 1 with a dynamic GetColumnCount() helper function that calculates columns by examining actual row cells and summing their ColumnSpan values
  • Added comprehensive test case SupportsTablesWithoutTrailingPipes() demonstrating the fix works for tables with empty cells and multi-row scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/MarkdownParser.cs Introduces GetColumnCount() local function to dynamically calculate column count by iterating through table rows and summing cell column spans, replacing unreliable ColumnDefinitions.Count - 1 hardcode
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Readers/MarkdownReaderTests.cs Adds test case with real-world table from runtime docs featuring tables without trailing pipes, empty cells, and 10 rows to validate the fix

@adamsitnik adamsitnik merged commit e801349 into main Dec 11, 2025
11 of 12 checks passed
@adamsitnik adamsitnik deleted the copilot/fix-markdown-table-parsing branch December 11, 2025 13:11
ericstj pushed a commit to ericstj/extensions that referenced this pull request Dec 11, 2025
…iling pipes (dotnet#7133)

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: adamsitnik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MEDI] Parsing markdown fails for tables without trailing |

3 participants