fix: Skip large CSS files to prevent crashes#952
fix: Skip large CSS files to prevent crashes#952daveallie merged 1 commit intocrosspoint-reader:masterfrom
Conversation
📝 WalkthroughWalkthroughIntroduces pre-parsing safeguards in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/Epub/Epub.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/Epub/Epub.cpp`:
- Around line 285-293: The code currently continues when getItemSize(cssPath,
&cssFileSize) fails, which leaves cssFileSize unknown; change the logic in the
block around getItemSize so that a false return is treated as an error: log an
error via LOG_ERR (include cssPath) and continue (skip the file) instead of
proceeding; ensure the check still enforces cssFileSize > MAX_CSS_FILE_SIZE when
getItemSize returns true and only parses when getItemSize succeeded and size is
within MAX_CSS_FILE_SIZE.
| // Check CSS file size before decompressing - skip files that are too large | ||
| size_t cssFileSize = 0; | ||
| if (getItemSize(cssPath, &cssFileSize)) { | ||
| if (cssFileSize > MAX_CSS_FILE_SIZE) { | ||
| LOG_ERR("EBP", "CSS file too large (%zu bytes > %zu max), skipping: %s", | ||
| cssFileSize, MAX_CSS_FILE_SIZE, cssPath.c_str()); | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
Fail-closed when CSS size cannot be determined.
If getItemSize fails, the code still proceeds to parse the CSS, which can reintroduce the crash risk this change is meant to avoid. Treat an unknown size as unsafe and skip the file.
🔧 Proposed fix
- size_t cssFileSize = 0;
- if (getItemSize(cssPath, &cssFileSize)) {
- if (cssFileSize > MAX_CSS_FILE_SIZE) {
- LOG_ERR("EBP", "CSS file too large (%zu bytes > %zu max), skipping: %s",
- cssFileSize, MAX_CSS_FILE_SIZE, cssPath.c_str());
- continue;
- }
- }
+ size_t cssFileSize = 0;
+ if (!getItemSize(cssPath, &cssFileSize)) {
+ LOG_ERR("EBP", "Could not determine CSS file size, skipping: %s", cssPath.c_str());
+ continue;
+ }
+ if (cssFileSize > MAX_CSS_FILE_SIZE) {
+ LOG_ERR("EBP", "CSS file too large (%zu bytes > %zu max), skipping: %s",
+ cssFileSize, MAX_CSS_FILE_SIZE, cssPath.c_str());
+ continue;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Check CSS file size before decompressing - skip files that are too large | |
| size_t cssFileSize = 0; | |
| if (getItemSize(cssPath, &cssFileSize)) { | |
| if (cssFileSize > MAX_CSS_FILE_SIZE) { | |
| LOG_ERR("EBP", "CSS file too large (%zu bytes > %zu max), skipping: %s", | |
| cssFileSize, MAX_CSS_FILE_SIZE, cssPath.c_str()); | |
| continue; | |
| } | |
| } | |
| // Check CSS file size before decompressing - skip files that are too large | |
| size_t cssFileSize = 0; | |
| if (!getItemSize(cssPath, &cssFileSize)) { | |
| LOG_ERR("EBP", "Could not determine CSS file size, skipping: %s", cssPath.c_str()); | |
| continue; | |
| } | |
| if (cssFileSize > MAX_CSS_FILE_SIZE) { | |
| LOG_ERR("EBP", "CSS file too large (%zu bytes > %zu max), skipping: %s", | |
| cssFileSize, MAX_CSS_FILE_SIZE, cssPath.c_str()); | |
| continue; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/Epub/Epub.cpp` around lines 285 - 293, The code currently continues when
getItemSize(cssPath, &cssFileSize) fails, which leaves cssFileSize unknown;
change the logic in the block around getItemSize so that a false return is
treated as an error: log an error via LOG_ERR (include cssPath) and continue
(skip the file) instead of proceeding; ensure the check still enforces
cssFileSize > MAX_CSS_FILE_SIZE when getItemSize returns true and only parses
when getItemSize succeeded and size is within MAX_CSS_FILE_SIZE.
|
CSS file bigger than 128kb sounds like malformed epub. We can probably just skip them without trying to parse. What do you think @jdk2pq ? I seem to remember you had a file limit in your recent PR but can't find it now |
4bd156d to
4cd00b1
Compare
There was a problem hiding this comment.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/Epub/Epub.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: cppcheck
🔇 Additional comments (2)
lib/Epub/Epub.cpp (2)
261-265: LGTM!The threshold constants are well-chosen for ESP32 memory constraints and appropriately documented with inline comments explaining their purpose.
277-283: LGTM!The heap check is correctly placed before each iteration to detect memory pressure from prior parsing. The logging provides useful diagnostics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@lib/Epub/Epub.cpp`:
- Around line 285-293: The current logic in Epub::parse (around
getItemSize/cssFileSize/MAX_CSS_FILE_SIZE/cssPath) treats a false return from
getItemSize as if size is safe; change this to fail-closed by skipping the CSS
when getItemSize returns false: when getItemSize(cssPath, &cssFileSize) returns
false, log an error via LOG_ERR (include cssPath) and continue, rather than
falling through to parsing; ensure cssFileSize is only used when getItemSize
succeeded and that the MAX_CSS_FILE_SIZE check remains unchanged for the
successful case.
|
Thank you for the fix @sam-lord!!
Not necessarily malformed, but definitely poorly put together by the publisher (especially in @sam-lord's case where the publisher appears to be including their website's CSS inside of the epub 🙃). This fix makes sense to me! As for the file limit that was in my PR previously, I removed that since the thought was we were already buffering the CSS file, and we'd hit the MAX_RULES limit anyway (and it looks like we do indeed 😄). I tested the existing code out by making an EPUB with ~200KB of Tailwind's CSS, and I hit the same error. After the fix, it's working great. I'll add that test EPUB to my repo here. Longer-term, we should think about:
|
## Summary **What is the goal of this PR?** (e.g., Implements the new feature for file uploading.) * Fixes: crosspoint-reader#947 **What changes are included?** * Check to see if there's free heap memory before processing CSS (should we be doing this type of check or is it better to just crash if we exhaust the memory?) * Skip CSS files larger than 128kb ## Additional Context * I found that a copy of `Release it` contained a 250kb+ CSS file, from the homepage of the publisher. It has nothing to do with the epub, so we should just skip it * Major question: Are there better ways to detect CSS that doesn't belong in a book, or is this size-based approach valid? * Another question: Are there any epubs we know of that legitimately include >128kb CSS files? Code changes themselves created with an agent, all investigation and write-up done by human. If you (the maintainers) would prefer a different fix for this issue, let me know. --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**< YES >**_
## Summary **What is the goal of this PR?** (e.g., Implements the new feature for file uploading.) * Fixes: crosspoint-reader#947 **What changes are included?** * Check to see if there's free heap memory before processing CSS (should we be doing this type of check or is it better to just crash if we exhaust the memory?) * Skip CSS files larger than 128kb ## Additional Context * I found that a copy of `Release it` contained a 250kb+ CSS file, from the homepage of the publisher. It has nothing to do with the epub, so we should just skip it * Major question: Are there better ways to detect CSS that doesn't belong in a book, or is this size-based approach valid? * Another question: Are there any epubs we know of that legitimately include >128kb CSS files? Code changes themselves created with an agent, all investigation and write-up done by human. If you (the maintainers) would prefer a different fix for this issue, let me know. --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**< YES >**_
## Summary **What is the goal of this PR?** (e.g., Implements the new feature for file uploading.) * Fixes: crosspoint-reader#947 **What changes are included?** * Check to see if there's free heap memory before processing CSS (should we be doing this type of check or is it better to just crash if we exhaust the memory?) * Skip CSS files larger than 128kb ## Additional Context * I found that a copy of `Release it` contained a 250kb+ CSS file, from the homepage of the publisher. It has nothing to do with the epub, so we should just skip it * Major question: Are there better ways to detect CSS that doesn't belong in a book, or is this size-based approach valid? * Another question: Are there any epubs we know of that legitimately include >128kb CSS files? Code changes themselves created with an agent, all investigation and write-up done by human. If you (the maintainers) would prefer a different fix for this issue, let me know. --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**< YES >**_
Summary
What is the goal of this PR? (e.g., Implements the new feature for file uploading.)
What changes are included?
Additional Context
Release itcontained a 250kb+ CSS file, from the homepage of the publisher. It has nothing to do with the epub, so we should just skip itCode changes themselves created with an agent, all investigation and write-up done by human. If you (the maintainers) would prefer a different fix for this issue, let me know.
AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? < YES >