feat: Basic table support by allgoewer · Pull Request #980 · crosspoint-reader/crosspoint-reader

allgoewer · 2026-02-18T21:22:05Z

I've been reading "Children of Time" over the last days and that book, annyoingly, has some tabular content.
This content is relevant for the story so I needed some really basic way to at least be able to read those tables.

This commit simply renders the contents of table cells as separate paragraphs with a small header describing its position in the table. For me, it's better than nothing.

Summary

What is the goal of this PR?

Implements really basic table support

What changes are included?
- Minimal changes to ChapterHtmlSlimParser
- A demo book in test/epubs

Additional Context

Here's some screenshots of the demo-book I provide with this PR.

AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? PARTIALLY
Little bit of guidance on what to touch, parts of the impl, rest manually.

This commit simply renders the content of table cells as a separate paragraph with a small header describing its position in the table.

coderabbitai · 2026-02-18T21:22:19Z

📝 Walkthrough

Walkthrough

Adds table-flattening to the HTML parser: introduces isTableStructuralTag() and table-tracking state, treats table/tr/td/th as flush boundaries, emits main table cell content as per-cell text blocks with prefixed headers, and skips nested tables.

Changes

Cohort / File(s)	Summary
Table Parsing Implementation `lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`	Added `isTableStructuralTag(const char*)`. Extended `startElement`, `characterData`, and `endElement` to track `tableDepth`, flatten top-level tables into per-cell blocks (prepend header like "Tab Row N, Cell M:" with inline styling), skip nested-table content, and treat table structural tags as flush boundaries. Updated flush/inline-tag logic to ignore table structural tags.
Parser State Variables `lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h`	Added private members for table tracking: `tableDepth` (int), `tableRowIndex` (int), `tableColIndex` (int).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Basic table support' clearly and concisely summarizes the main change: adding basic table rendering support to the parser.
Description check	✅ Passed	The description is directly related to the changeset, explaining the motivation (reading tables in an ebook), implementation approach (rendering cells as paragraphs with headers), and providing context about included demo book.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)

55-57: Consider marking as static for internal linkage.

This helper is only used within this translation unit. Adding static would clarify intent and potentially allow better compiler optimization.

Suggested change

-bool isTableStructuralTag(const char* name) {
+static bool isTableStructuralTag(const char* name) {
   return strcmp(name, "table") == 0 || strcmp(name, "tr") == 0 || strcmp(name, "td") == 0 || strcmp(name, "th") == 0;
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 55 - 57, The
helper function isTableStructuralTag has internal-only usage and should be given
internal linkage: mark it static (or anonymous-namespace if preferred) so the
compiler knows it's translation-unit local; update the function
declaration/definition for isTableStructuralTag to use static and rebuild to
ensure no external linkage remains.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d02e21a and b47f4c7.

📒 Files selected for processing (3)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
test/epubs/test_tables.epub

🧰 Additional context used

🧬 Code graph analysis (1)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)

lib/Epub/Epub/blocks/BlockStyle.h (1)

Justify (10-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: cppcheck
GitHub Check: build

🔇 Additional comments (4)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h (1)

65-69: LGTM!

The new table-tracking member variables are well-placed alongside other style-tracking state, and their initializations are consistent with the reset values in endElement.

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (3)

165-170: LGTM!

Row handling correctly increments the row index and resets the cell index for each new row.

172-212: LGTM!

Cell handling is well-structured:

Defensive handling for cells without explicit <tr> parent

Proper style stack management for the header text

Clean separation between header and cell content

600-640: LGTM!

The endElement changes correctly integrate table structural tags into flush decisions and properly reset state when exiting cells and tables. The depth checks ensure state is only reset at the correct nesting level.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 151-163: The parser currently overwrites outer-table state when
encountering a nested table because the table handling block sets inTable,
tableDepth, tableRowIndex, tableCellIndex and tableCellDepth unconditionally;
change this to guard against nested tables by checking the existing inTable flag
(or otherwise detecting depth) and skipping nested table initialization when
already inTable, or alternatively implement a small stack to save/restore table
context (push on table open, pop on table close) so closing an inner table
doesn't wipe the outer table's
tableDepth/tableRowIndex/tableCellIndex/tableCellDepth; update the block that
sets these fields (the code around inTable, tableDepth, tableRowIndex,
tableCellIndex, tableCellDepth and flushPartWordBuffer) accordingly.

---

Nitpick comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 55-57: The helper function isTableStructuralTag has internal-only
usage and should be given internal linkage: mark it static (or
anonymous-namespace if preferred) so the compiler knows it's translation-unit
local; update the function declaration/definition for isTableStructuralTag to
use static and rebuild to ensure no external linkage remains.

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

coderabbitai

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b47f4c7 and 07b6c27.

📒 Files selected for processing (1)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

🧰 Additional context used

🧬 Code graph analysis (1)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)

lib/Epub/Epub/blocks/BlockStyle.h (1)

Justify (10-29)

🔇 Additional comments (2)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (2)

55-57: Centralized table tag detection looks good.
Keeps table-structural checks consistent and readable.

600-610: Good: table structural tags act as flush boundaries.
This should prevent dropped content around table transitions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 151-210: The parser's table-handling mutates shared state
(inTable, tableDepth, tableRowIndex, tableCellIndex, tableCellDepth)
unconditionally, so nested <table> elements overwrite outer-table context;
change the logic to guard or stack table state: when encountering a "table" in
the start-element handler (the block that currently sets
inTable/tableDepth/row/cell) push the current table state onto a small stack (or
only initialize these fields if not already inTable) and when encountering the
corresponding "table" end-element pop/restore the previous table state instead
of blindly clearing it; do the same guarded/stacked approach for the
"tr"/"td"/"th" handling and their end handlers (same pattern referenced around
the other block at lines ~633-640) so inner tables do not clobber outer-table
indices and depths.

coderabbitai

🧹 Nitpick comments (2)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (2)

184-187: Consider extracting alignment resolution to a helper.

This alignment resolution logic (defaulting CssTextAlign::None to Justify) is duplicated from parseAndBuildPages (lines 689-692). Extracting it to a small helper method would reduce duplication.

Example helper

// In class declaration or as private inline helper
CssTextAlign resolveAlignment() const {
  return (paragraphAlignment == static_cast<uint8_t>(CssTextAlign::None))
             ? CssTextAlign::Justify
             : static_cast<CssTextAlign>(paragraphAlignment);
}

Then use resolveAlignment() in both locations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 184 - 187,
Extract the duplicated alignment-resolution logic into a small helper (e.g.,
CssTextAlign resolveAlignment() const) on the ChapterHtmlSlimParser class and
replace both occurrences: the current inline ternary using
self->paragraphAlignment (which sets tableCellBlockStyle.alignment) and the
identical logic in parseAndBuildPages (lines ~689-692). The helper should return
CssTextAlign::Justify when paragraphAlignment equals
static_cast<uint8_t>(CssTextAlign::None) and otherwise return
static_cast<CssTextAlign>(paragraphAlignment); then call resolveAlignment() in
place of the duplicated expressions.

641-654: Minor: Conditions can be consolidated.

Lines 641-647 both set nextWordContinues = false for different table structural tags. Consider combining for clarity:

Consolidated condition

-  if (self->tableDepth == 1 && (strcmp(name, "td") == 0 || strcmp(name, "th") == 0)) {
-    self->nextWordContinues = false;
-  }
-
-  if (self->tableDepth == 1 && (strcmp(name, "tr") == 0)) {
-    self->nextWordContinues = false;
-  }
-
-  if (self->tableDepth == 1 && strcmp(name, "table") == 0) {
+  if (self->tableDepth == 1 && isTableStructuralTag(name)) {
+    self->nextWordContinues = false;
+  }
+
+  if (self->tableDepth == 1 && strcmp(name, "table") == 0) {
     self->tableDepth -= 1;
     self->tableRowIndex = 0;
     self->tableColIndex = 0;
-    self->nextWordContinues = false;
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 641 - 654,
Consolidate repeated conditional branches that set self->nextWordContinues =
false when at tableDepth == 1 by merging checks for "td", "th", and "tr" into a
single if that checks tableDepth == 1 and whether name is "td" || "th" || "tr",
and keep the existing separate block for "table" which additionally decrements
self->tableDepth and resets self->tableRowIndex and self->tableColIndex; update
the code around the occurrences of self->tableDepth, self->tableRowIndex,
self->tableColIndex and nextWordContinues in ChapterHtmlSlimParser.cpp to remove
duplication and preserve current behavior.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07b6c27 and 6ad9c85.

📒 Files selected for processing (2)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h

🧰 Additional context used

🧬 Code graph analysis (1)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)

lib/Epub/Epub/blocks/BlockStyle.h (1)

Justify (10-29)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: cppcheck
GitHub Check: build

🔇 Additional comments (6)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h (1)

65-67: LGTM!

The table tracking state variables are well-placed alongside other style-tracking fields and follow existing naming conventions. Using int with 0-initialization is appropriate for depth/index tracking.

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (5)

55-57: LGTM!

The helper correctly identifies core table structural tags. For consistency with HEADER_TAGS/BLOCK_TAGS patterns, you could use a static array, but the inline approach is fine for this small set.

151-167: LGTM!

The nested table skip logic correctly addresses the previously identified issue - incrementing tableDepth without modifying depth ensures proper tracking when nested </table> is encountered later.

497-500: LGTM!

Correctly skips all character data inside nested tables. The early return placement ensures this check happens before other processing.

607-613: LGTM!

The nested table close handling correctly balances the early return in startElement - neither increments/decrements depth for nested tables, maintaining consistent tracking. Clearing the buffer ensures no nested content leaks through.

618-623: LGTM!

Correctly treating table structural tags as flush boundaries rather than inline elements ensures proper word separation at cell boundaries.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 184-187: Extract the duplicated alignment-resolution logic into a
small helper (e.g., CssTextAlign resolveAlignment() const) on the
ChapterHtmlSlimParser class and replace both occurrences: the current inline
ternary using self->paragraphAlignment (which sets
tableCellBlockStyle.alignment) and the identical logic in parseAndBuildPages
(lines ~689-692). The helper should return CssTextAlign::Justify when
paragraphAlignment equals static_cast<uint8_t>(CssTextAlign::None) and otherwise
return static_cast<CssTextAlign>(paragraphAlignment); then call
resolveAlignment() in place of the duplicated expressions.
- Around line 641-654: Consolidate repeated conditional branches that set
self->nextWordContinues = false when at tableDepth == 1 by merging checks for
"td", "th", and "tr" into a single if that checks tableDepth == 1 and whether
name is "td" || "th" || "tr", and keep the existing separate block for "table"
which additionally decrements self->tableDepth and resets self->tableRowIndex
and self->tableColIndex; update the code around the occurrences of
self->tableDepth, self->tableRowIndex, self->tableColIndex and nextWordContinues
in ChapterHtmlSlimParser.cpp to remove duplication and preserve current
behavior.

I've been reading "Children of Time" over the last days and that book, annyoingly, has some tabular content. This content is relevant for the story so I needed some really basic way to at least be able to read those tables. This commit simply renders the contents of table cells as separate paragraphs with a small header describing its position in the table. For me, it's better than nothing. ## Summary * **What is the goal of this PR?** Implements really basic table support * **What changes are included?** * Minimal changes to ChapterHtmlSlimParser * A demo book in test/epubs ## Additional Context Here's some screenshots of the demo-book I provide with this PR. ![PXL_20260218_211446510](https://github.com/user-attachments/assets/49ef81b8-2fa0-4f0d-bb6f-4ef885be6772) ![PXL_20260218_211456379](https://github.com/user-attachments/assets/e7c82b35-b4a9-4a7d-9ec5-2b4bc2ff3514) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**PARTIALLY**_ _Little bit of guidance on what to touch, parts of the impl, rest manually._

Add some really basic table support

b47f4c7

This commit simply renders the content of table cells as a separate paragraph with a small header describing its position in the table.

Run clang-format-fix

07b6c27

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp Show resolved Hide resolved

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

Skip nested tables, cleanup unnecessary variables

6ad9c85

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

daveallie approved these changes Feb 19, 2026

View reviewed changes

daveallie merged commit 103fac2 into crosspoint-reader:master Feb 19, 2026
6 checks passed

This was referenced Feb 19, 2026

chore(release): 1.1.0 Release Candidate #992

Merged

[Feature Request] table support #876

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Basic table support#980

feat: Basic table support#980
daveallie merged 3 commits intocrosspoint-reader:masterfrom
allgoewer:feature/basic-tables

allgoewer commented Feb 18, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

allgoewer commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Additional Context

AI Usage

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

allgoewer commented Feb 18, 2026 •

edited

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading