feat: Basic table support#980
Conversation
This commit simply renders the content of table cells as a separate paragraph with a small header describing its position in the table.
📝 WalkthroughWalkthroughAdds table-flattening to the HTML parser: introduces Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)
55-57: Consider marking asstaticfor internal linkage.This helper is only used within this translation unit. Adding
staticwould clarify intent and potentially allow better compiler optimization.Suggested change
-bool isTableStructuralTag(const char* name) { +static bool isTableStructuralTag(const char* name) { return strcmp(name, "table") == 0 || strcmp(name, "tr") == 0 || strcmp(name, "td") == 0 || strcmp(name, "th") == 0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 55 - 57, The helper function isTableStructuralTag has internal-only usage and should be given internal linkage: mark it static (or anonymous-namespace if preferred) so the compiler knows it's translation-unit local; update the function declaration/definition for isTableStructuralTag to use static and rebuild to ensure no external linkage remains.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpplib/Epub/Epub/parsers/ChapterHtmlSlimParser.htest/epubs/test_tables.epub
🧰 Additional context used
🧬 Code graph analysis (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)
lib/Epub/Epub/blocks/BlockStyle.h (1)
Justify(10-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: cppcheck
- GitHub Check: build
🔇 Additional comments (4)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h (1)
65-69: LGTM!The new table-tracking member variables are well-placed alongside other style-tracking state, and their initializations are consistent with the reset values in
endElement.lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (3)
165-170: LGTM!Row handling correctly increments the row index and resets the cell index for each new row.
172-212: LGTM!Cell handling is well-structured:
- Defensive handling for cells without explicit
<tr>parent- Proper style stack management for the header text
- Clean separation between header and cell content
600-640: LGTM!The
endElementchanges correctly integrate table structural tags into flush decisions and properly reset state when exiting cells and tables. The depth checks ensure state is only reset at the correct nesting level.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 151-163: The parser currently overwrites outer-table state when
encountering a nested table because the table handling block sets inTable,
tableDepth, tableRowIndex, tableCellIndex and tableCellDepth unconditionally;
change this to guard against nested tables by checking the existing inTable flag
(or otherwise detecting depth) and skipping nested table initialization when
already inTable, or alternatively implement a small stack to save/restore table
context (push on table open, pop on table close) so closing an inner table
doesn't wipe the outer table's
tableDepth/tableRowIndex/tableCellIndex/tableCellDepth; update the block that
sets these fields (the code around inTable, tableDepth, tableRowIndex,
tableCellIndex, tableCellDepth and flushPartWordBuffer) accordingly.
---
Nitpick comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 55-57: The helper function isTableStructuralTag has internal-only
usage and should be given internal linkage: mark it static (or
anonymous-namespace if preferred) so the compiler knows it's translation-unit
local; update the function declaration/definition for isTableStructuralTag to
use static and rebuild to ensure no external linkage remains.
There was a problem hiding this comment.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
🧰 Additional context used
🧬 Code graph analysis (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)
lib/Epub/Epub/blocks/BlockStyle.h (1)
Justify(10-29)
🔇 Additional comments (2)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (2)
55-57: Centralized table tag detection looks good.
Keeps table-structural checks consistent and readable.
600-610: Good: table structural tags act as flush boundaries.
This should prevent dropped content around table transitions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 151-210: The parser's table-handling mutates shared state
(inTable, tableDepth, tableRowIndex, tableCellIndex, tableCellDepth)
unconditionally, so nested <table> elements overwrite outer-table context;
change the logic to guard or stack table state: when encountering a "table" in
the start-element handler (the block that currently sets
inTable/tableDepth/row/cell) push the current table state onto a small stack (or
only initialize these fields if not already inTable) and when encountering the
corresponding "table" end-element pop/restore the previous table state instead
of blindly clearing it; do the same guarded/stacked approach for the
"tr"/"td"/"th" handling and their end handlers (same pattern referenced around
the other block at lines ~633-640) so inner tables do not clobber outer-table
indices and depths.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (2)
184-187: Consider extracting alignment resolution to a helper.This alignment resolution logic (defaulting
CssTextAlign::NonetoJustify) is duplicated fromparseAndBuildPages(lines 689-692). Extracting it to a small helper method would reduce duplication.Example helper
// In class declaration or as private inline helper CssTextAlign resolveAlignment() const { return (paragraphAlignment == static_cast<uint8_t>(CssTextAlign::None)) ? CssTextAlign::Justify : static_cast<CssTextAlign>(paragraphAlignment); }Then use
resolveAlignment()in both locations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 184 - 187, Extract the duplicated alignment-resolution logic into a small helper (e.g., CssTextAlign resolveAlignment() const) on the ChapterHtmlSlimParser class and replace both occurrences: the current inline ternary using self->paragraphAlignment (which sets tableCellBlockStyle.alignment) and the identical logic in parseAndBuildPages (lines ~689-692). The helper should return CssTextAlign::Justify when paragraphAlignment equals static_cast<uint8_t>(CssTextAlign::None) and otherwise return static_cast<CssTextAlign>(paragraphAlignment); then call resolveAlignment() in place of the duplicated expressions.
641-654: Minor: Conditions can be consolidated.Lines 641-647 both set
nextWordContinues = falsefor different table structural tags. Consider combining for clarity:Consolidated condition
- if (self->tableDepth == 1 && (strcmp(name, "td") == 0 || strcmp(name, "th") == 0)) { - self->nextWordContinues = false; - } - - if (self->tableDepth == 1 && (strcmp(name, "tr") == 0)) { - self->nextWordContinues = false; - } - - if (self->tableDepth == 1 && strcmp(name, "table") == 0) { + if (self->tableDepth == 1 && isTableStructuralTag(name)) { + self->nextWordContinues = false; + } + + if (self->tableDepth == 1 && strcmp(name, "table") == 0) { self->tableDepth -= 1; self->tableRowIndex = 0; self->tableColIndex = 0; - self->nextWordContinues = false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 641 - 654, Consolidate repeated conditional branches that set self->nextWordContinues = false when at tableDepth == 1 by merging checks for "td", "th", and "tr" into a single if that checks tableDepth == 1 and whether name is "td" || "th" || "tr", and keep the existing separate block for "table" which additionally decrements self->tableDepth and resets self->tableRowIndex and self->tableColIndex; update the code around the occurrences of self->tableDepth, self->tableRowIndex, self->tableColIndex and nextWordContinues in ChapterHtmlSlimParser.cpp to remove duplication and preserve current behavior.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpplib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
🧰 Additional context used
🧬 Code graph analysis (1)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)
lib/Epub/Epub/blocks/BlockStyle.h (1)
Justify(10-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: cppcheck
- GitHub Check: build
🔇 Additional comments (6)
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h (1)
65-67: LGTM!The table tracking state variables are well-placed alongside other style-tracking fields and follow existing naming conventions. Using
intwith 0-initialization is appropriate for depth/index tracking.lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (5)
55-57: LGTM!The helper correctly identifies core table structural tags. For consistency with
HEADER_TAGS/BLOCK_TAGSpatterns, you could use a static array, but the inline approach is fine for this small set.
151-167: LGTM!The nested table skip logic correctly addresses the previously identified issue - incrementing
tableDepthwithout modifyingdepthensures proper tracking when nested</table>is encountered later.
497-500: LGTM!Correctly skips all character data inside nested tables. The early return placement ensures this check happens before other processing.
607-613: LGTM!The nested table close handling correctly balances the early return in
startElement- neither increments/decrementsdepthfor nested tables, maintaining consistent tracking. Clearing the buffer ensures no nested content leaks through.
618-623: LGTM!Correctly treating table structural tags as flush boundaries rather than inline elements ensures proper word separation at cell boundaries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 184-187: Extract the duplicated alignment-resolution logic into a
small helper (e.g., CssTextAlign resolveAlignment() const) on the
ChapterHtmlSlimParser class and replace both occurrences: the current inline
ternary using self->paragraphAlignment (which sets
tableCellBlockStyle.alignment) and the identical logic in parseAndBuildPages
(lines ~689-692). The helper should return CssTextAlign::Justify when
paragraphAlignment equals static_cast<uint8_t>(CssTextAlign::None) and otherwise
return static_cast<CssTextAlign>(paragraphAlignment); then call
resolveAlignment() in place of the duplicated expressions.
- Around line 641-654: Consolidate repeated conditional branches that set
self->nextWordContinues = false when at tableDepth == 1 by merging checks for
"td", "th", and "tr" into a single if that checks tableDepth == 1 and whether
name is "td" || "th" || "tr", and keep the existing separate block for "table"
which additionally decrements self->tableDepth and resets self->tableRowIndex
and self->tableColIndex; update the code around the occurrences of
self->tableDepth, self->tableRowIndex, self->tableColIndex and nextWordContinues
in ChapterHtmlSlimParser.cpp to remove duplication and preserve current
behavior.
I've been reading "Children of Time" over the last days and that book, annyoingly, has some tabular content. This content is relevant for the story so I needed some really basic way to at least be able to read those tables. This commit simply renders the contents of table cells as separate paragraphs with a small header describing its position in the table. For me, it's better than nothing. ## Summary * **What is the goal of this PR?** Implements really basic table support * **What changes are included?** * Minimal changes to ChapterHtmlSlimParser * A demo book in test/epubs ## Additional Context Here's some screenshots of the demo-book I provide with this PR.   --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**PARTIALLY**_ _Little bit of guidance on what to touch, parts of the impl, rest manually._
I've been reading "Children of Time" over the last days and that book, annyoingly, has some tabular content. This content is relevant for the story so I needed some really basic way to at least be able to read those tables. This commit simply renders the contents of table cells as separate paragraphs with a small header describing its position in the table. For me, it's better than nothing. ## Summary * **What is the goal of this PR?** Implements really basic table support * **What changes are included?** * Minimal changes to ChapterHtmlSlimParser * A demo book in test/epubs ## Additional Context Here's some screenshots of the demo-book I provide with this PR.   --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**PARTIALLY**_ _Little bit of guidance on what to touch, parts of the impl, rest manually._
I've been reading "Children of Time" over the last days and that book, annyoingly, has some tabular content. This content is relevant for the story so I needed some really basic way to at least be able to read those tables. This commit simply renders the contents of table cells as separate paragraphs with a small header describing its position in the table. For me, it's better than nothing. ## Summary * **What is the goal of this PR?** Implements really basic table support * **What changes are included?** * Minimal changes to ChapterHtmlSlimParser * A demo book in test/epubs ## Additional Context Here's some screenshots of the demo-book I provide with this PR.   --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**PARTIALLY**_ _Little bit of guidance on what to touch, parts of the impl, rest manually._
I've been reading "Children of Time" over the last days and that book, annyoingly, has some tabular content.
This content is relevant for the story so I needed some really basic way to at least be able to read those tables.
This commit simply renders the contents of table cells as separate paragraphs with a small header describing its position in the table. For me, it's better than nothing.
Summary
Implements really basic table support
What changes are included?
Additional Context
Here's some screenshots of the demo-book I provide with this PR.
AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? PARTIALLY
Little bit of guidance on what to touch, parts of the impl, rest manually.