Skip to content

feat: Basic table support#980

Merged
daveallie merged 3 commits intocrosspoint-reader:masterfrom
allgoewer:feature/basic-tables
Feb 19, 2026
Merged

feat: Basic table support#980
daveallie merged 3 commits intocrosspoint-reader:masterfrom
allgoewer:feature/basic-tables

Conversation

@allgoewer
Copy link
Contributor

@allgoewer allgoewer commented Feb 18, 2026

I've been reading "Children of Time" over the last days and that book, annyoingly, has some tabular content.
This content is relevant for the story so I needed some really basic way to at least be able to read those tables.

This commit simply renders the contents of table cells as separate paragraphs with a small header describing its position in the table. For me, it's better than nothing.

Summary

  • What is the goal of this PR?

Implements really basic table support

  • What changes are included?

    • Minimal changes to ChapterHtmlSlimParser
    • A demo book in test/epubs

Additional Context

Here's some screenshots of the demo-book I provide with this PR.

PXL_20260218_211446510

PXL_20260218_211456379


AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? PARTIALLY
Little bit of guidance on what to touch, parts of the impl, rest manually.

This commit simply renders the content of table
cells as a separate paragraph with a small header
describing its position in the table.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

📝 Walkthrough

Walkthrough

Adds table-flattening to the HTML parser: introduces isTableStructuralTag() and table-tracking state, treats table/tr/td/th as flush boundaries, emits main table cell content as per-cell text blocks with prefixed headers, and skips nested tables.

Changes

Cohort / File(s) Summary
Table Parsing Implementation
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
Added isTableStructuralTag(const char*). Extended startElement, characterData, and endElement to track tableDepth, flatten top-level tables into per-cell blocks (prepend header like "Tab Row N, Cell M:" with inline styling), skip nested-table content, and treat table structural tags as flush boundaries. Updated flush/inline-tag logic to ignore table structural tags.
Parser State Variables
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
Added private members for table tracking: tableDepth (int), tableRowIndex (int), tableColIndex (int).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Basic table support' clearly and concisely summarizes the main change: adding basic table rendering support to the parser.
Description check ✅ Passed The description is directly related to the changeset, explaining the motivation (reading tables in an ebook), implementation approach (rendering cells as paragraphs with headers), and providing context about included demo book.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)

55-57: Consider marking as static for internal linkage.

This helper is only used within this translation unit. Adding static would clarify intent and potentially allow better compiler optimization.

Suggested change
-bool isTableStructuralTag(const char* name) {
+static bool isTableStructuralTag(const char* name) {
   return strcmp(name, "table") == 0 || strcmp(name, "tr") == 0 || strcmp(name, "td") == 0 || strcmp(name, "th") == 0;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 55 - 57, The
helper function isTableStructuralTag has internal-only usage and should be given
internal linkage: mark it static (or anonymous-namespace if preferred) so the
compiler knows it's translation-unit local; update the function
declaration/definition for isTableStructuralTag to use static and rebuild to
ensure no external linkage remains.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d02e21a and b47f4c7.

📒 Files selected for processing (3)
  • lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
  • lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
  • test/epubs/test_tables.epub
🧰 Additional context used
🧬 Code graph analysis (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)
lib/Epub/Epub/blocks/BlockStyle.h (1)
  • Justify (10-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: cppcheck
  • GitHub Check: build
🔇 Additional comments (4)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h (1)

65-69: LGTM!

The new table-tracking member variables are well-placed alongside other style-tracking state, and their initializations are consistent with the reset values in endElement.

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (3)

165-170: LGTM!

Row handling correctly increments the row index and resets the cell index for each new row.


172-212: LGTM!

Cell handling is well-structured:

  • Defensive handling for cells without explicit <tr> parent
  • Proper style stack management for the header text
  • Clean separation between header and cell content

600-640: LGTM!

The endElement changes correctly integrate table structural tags into flush decisions and properly reset state when exiting cells and tables. The depth checks ensure state is only reset at the correct nesting level.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 151-163: The parser currently overwrites outer-table state when
encountering a nested table because the table handling block sets inTable,
tableDepth, tableRowIndex, tableCellIndex and tableCellDepth unconditionally;
change this to guard against nested tables by checking the existing inTable flag
(or otherwise detecting depth) and skipping nested table initialization when
already inTable, or alternatively implement a small stack to save/restore table
context (push on table open, pop on table close) so closing an inner table
doesn't wipe the outer table's
tableDepth/tableRowIndex/tableCellIndex/tableCellDepth; update the block that
sets these fields (the code around inTable, tableDepth, tableRowIndex,
tableCellIndex, tableCellDepth and flushPartWordBuffer) accordingly.

---

Nitpick comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 55-57: The helper function isTableStructuralTag has internal-only
usage and should be given internal linkage: mark it static (or
anonymous-namespace if preferred) so the compiler knows it's translation-unit
local; update the function declaration/definition for isTableStructuralTag to
use static and rebuild to ensure no external linkage remains.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b47f4c7 and 07b6c27.

📒 Files selected for processing (1)
  • lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)
lib/Epub/Epub/blocks/BlockStyle.h (1)
  • Justify (10-29)
🔇 Additional comments (2)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (2)

55-57: Centralized table tag detection looks good.
Keeps table-structural checks consistent and readable.


600-610: Good: table structural tags act as flush boundaries.
This should prevent dropped content around table transitions.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 151-210: The parser's table-handling mutates shared state
(inTable, tableDepth, tableRowIndex, tableCellIndex, tableCellDepth)
unconditionally, so nested <table> elements overwrite outer-table context;
change the logic to guard or stack table state: when encountering a "table" in
the start-element handler (the block that currently sets
inTable/tableDepth/row/cell) push the current table state onto a small stack (or
only initialize these fields if not already inTable) and when encountering the
corresponding "table" end-element pop/restore the previous table state instead
of blindly clearing it; do the same guarded/stacked approach for the
"tr"/"td"/"th" handling and their end handlers (same pattern referenced around
the other block at lines ~633-640) so inner tables do not clobber outer-table
indices and depths.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (2)

184-187: Consider extracting alignment resolution to a helper.

This alignment resolution logic (defaulting CssTextAlign::None to Justify) is duplicated from parseAndBuildPages (lines 689-692). Extracting it to a small helper method would reduce duplication.

Example helper
// In class declaration or as private inline helper
CssTextAlign resolveAlignment() const {
  return (paragraphAlignment == static_cast<uint8_t>(CssTextAlign::None))
             ? CssTextAlign::Justify
             : static_cast<CssTextAlign>(paragraphAlignment);
}

Then use resolveAlignment() in both locations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 184 - 187,
Extract the duplicated alignment-resolution logic into a small helper (e.g.,
CssTextAlign resolveAlignment() const) on the ChapterHtmlSlimParser class and
replace both occurrences: the current inline ternary using
self->paragraphAlignment (which sets tableCellBlockStyle.alignment) and the
identical logic in parseAndBuildPages (lines ~689-692). The helper should return
CssTextAlign::Justify when paragraphAlignment equals
static_cast<uint8_t>(CssTextAlign::None) and otherwise return
static_cast<CssTextAlign>(paragraphAlignment); then call resolveAlignment() in
place of the duplicated expressions.

641-654: Minor: Conditions can be consolidated.

Lines 641-647 both set nextWordContinues = false for different table structural tags. Consider combining for clarity:

Consolidated condition
-  if (self->tableDepth == 1 && (strcmp(name, "td") == 0 || strcmp(name, "th") == 0)) {
-    self->nextWordContinues = false;
-  }
-
-  if (self->tableDepth == 1 && (strcmp(name, "tr") == 0)) {
-    self->nextWordContinues = false;
-  }
-
-  if (self->tableDepth == 1 && strcmp(name, "table") == 0) {
+  if (self->tableDepth == 1 && isTableStructuralTag(name)) {
+    self->nextWordContinues = false;
+  }
+
+  if (self->tableDepth == 1 && strcmp(name, "table") == 0) {
     self->tableDepth -= 1;
     self->tableRowIndex = 0;
     self->tableColIndex = 0;
-    self->nextWordContinues = false;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 641 - 654,
Consolidate repeated conditional branches that set self->nextWordContinues =
false when at tableDepth == 1 by merging checks for "td", "th", and "tr" into a
single if that checks tableDepth == 1 and whether name is "td" || "th" || "tr",
and keep the existing separate block for "table" which additionally decrements
self->tableDepth and resets self->tableRowIndex and self->tableColIndex; update
the code around the occurrences of self->tableDepth, self->tableRowIndex,
self->tableColIndex and nextWordContinues in ChapterHtmlSlimParser.cpp to remove
duplication and preserve current behavior.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07b6c27 and 6ad9c85.

📒 Files selected for processing (2)
  • lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
  • lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
🧰 Additional context used
🧬 Code graph analysis (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)
lib/Epub/Epub/blocks/BlockStyle.h (1)
  • Justify (10-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: cppcheck
  • GitHub Check: build
🔇 Additional comments (6)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h (1)

65-67: LGTM!

The table tracking state variables are well-placed alongside other style-tracking fields and follow existing naming conventions. Using int with 0-initialization is appropriate for depth/index tracking.

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (5)

55-57: LGTM!

The helper correctly identifies core table structural tags. For consistency with HEADER_TAGS/BLOCK_TAGS patterns, you could use a static array, but the inline approach is fine for this small set.


151-167: LGTM!

The nested table skip logic correctly addresses the previously identified issue - incrementing tableDepth without modifying depth ensures proper tracking when nested </table> is encountered later.


497-500: LGTM!

Correctly skips all character data inside nested tables. The early return placement ensures this check happens before other processing.


607-613: LGTM!

The nested table close handling correctly balances the early return in startElement - neither increments/decrements depth for nested tables, maintaining consistent tracking. Clearing the buffer ensures no nested content leaks through.


618-623: LGTM!

Correctly treating table structural tags as flush boundaries rather than inline elements ensures proper word separation at cell boundaries.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 184-187: Extract the duplicated alignment-resolution logic into a
small helper (e.g., CssTextAlign resolveAlignment() const) on the
ChapterHtmlSlimParser class and replace both occurrences: the current inline
ternary using self->paragraphAlignment (which sets
tableCellBlockStyle.alignment) and the identical logic in parseAndBuildPages
(lines ~689-692). The helper should return CssTextAlign::Justify when
paragraphAlignment equals static_cast<uint8_t>(CssTextAlign::None) and otherwise
return static_cast<CssTextAlign>(paragraphAlignment); then call
resolveAlignment() in place of the duplicated expressions.
- Around line 641-654: Consolidate repeated conditional branches that set
self->nextWordContinues = false when at tableDepth == 1 by merging checks for
"td", "th", and "tr" into a single if that checks tableDepth == 1 and whether
name is "td" || "th" || "tr", and keep the existing separate block for "table"
which additionally decrements self->tableDepth and resets self->tableRowIndex
and self->tableColIndex; update the code around the occurrences of
self->tableDepth, self->tableRowIndex, self->tableColIndex and nextWordContinues
in ChapterHtmlSlimParser.cpp to remove duplication and preserve current
behavior.

@daveallie daveallie merged commit 103fac2 into crosspoint-reader:master Feb 19, 2026
6 checks passed
saslv pushed a commit to saslv/crosspoint-reader that referenced this pull request Feb 19, 2026
I've been reading "Children of Time" over the last days and that book,
annyoingly, has some tabular content.
This content is relevant for the story so I needed some really basic way
to at least be able to read those tables.


This commit simply renders the contents of table cells as separate
paragraphs with a small header describing its position in the table. For
me, it's better than nothing.

## Summary

* **What is the goal of this PR?**

Implements really basic table support

* **What changes are included?**

  * Minimal changes to ChapterHtmlSlimParser
  * A demo book in test/epubs

## Additional Context

Here's some screenshots of the demo-book I provide with this PR.


![PXL_20260218_211446510](https://github.com/user-attachments/assets/49ef81b8-2fa0-4f0d-bb6f-4ef885be6772)


![PXL_20260218_211456379](https://github.com/user-attachments/assets/e7c82b35-b4a9-4a7d-9ec5-2b4bc2ff3514)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**PARTIALLY**_ 
_Little bit of guidance on what to touch, parts of the impl, rest
manually._
el pushed a commit to el/crosspoint-reader that referenced this pull request Feb 19, 2026
I've been reading "Children of Time" over the last days and that book,
annyoingly, has some tabular content.
This content is relevant for the story so I needed some really basic way
to at least be able to read those tables.


This commit simply renders the contents of table cells as separate
paragraphs with a small header describing its position in the table. For
me, it's better than nothing.

## Summary

* **What is the goal of this PR?**

Implements really basic table support

* **What changes are included?**

  * Minimal changes to ChapterHtmlSlimParser
  * A demo book in test/epubs

## Additional Context

Here's some screenshots of the demo-book I provide with this PR.


![PXL_20260218_211446510](https://github.com/user-attachments/assets/49ef81b8-2fa0-4f0d-bb6f-4ef885be6772)


![PXL_20260218_211456379](https://github.com/user-attachments/assets/e7c82b35-b4a9-4a7d-9ec5-2b4bc2ff3514)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**PARTIALLY**_ 
_Little bit of guidance on what to touch, parts of the impl, rest
manually._
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 20, 2026
I've been reading "Children of Time" over the last days and that book,
annyoingly, has some tabular content.
This content is relevant for the story so I needed some really basic way
to at least be able to read those tables.


This commit simply renders the contents of table cells as separate
paragraphs with a small header describing its position in the table. For
me, it's better than nothing.

## Summary

* **What is the goal of this PR?**

Implements really basic table support

* **What changes are included?**

  * Minimal changes to ChapterHtmlSlimParser
  * A demo book in test/epubs

## Additional Context

Here's some screenshots of the demo-book I provide with this PR.


![PXL_20260218_211446510](https://github.com/user-attachments/assets/49ef81b8-2fa0-4f0d-bb6f-4ef885be6772)


![PXL_20260218_211456379](https://github.com/user-attachments/assets/e7c82b35-b4a9-4a7d-9ec5-2b4bc2ff3514)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**PARTIALLY**_ 
_Little bit of guidance on what to touch, parts of the impl, rest
manually._
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants