perf: optimize drawPixel()#748
Merged
daveallie merged 8 commits intocrosspoint-reader:masterfrom Feb 8, 2026
Merged
Conversation
0a246e8 to
3dd246c
Compare
557d2b8 to
a6d2444
Compare
Contributor
Author
|
Small clarification:
Just note that once #737 is merged, there will be a small conflict so I will need to push a merge commit. FOR REVIEWER: please merge #737 before reviewing this one. Thanks! |
CaptainFrito
previously approved these changes
Feb 8, 2026
Contributor
CaptainFrito
left a comment
There was a problem hiding this comment.
Great optimization work!
Contributor
Contributor
Author
|
Nice, thanks! Should be ready now |
osteotek
approved these changes
Feb 8, 2026
daveallie
approved these changes
Feb 8, 2026
daveallie
pushed a commit
that referenced
this pull request
Feb 8, 2026
## Summary Ref #737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that #737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
lukestein
pushed a commit
to lukestein/crosspoint-reader
that referenced
this pull request
Feb 8, 2026
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
lukestein
pushed a commit
to lukestein/crosspoint-reader
that referenced
this pull request
Feb 8, 2026
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
lukestein
pushed a commit
to lukestein/crosspoint-reader
that referenced
this pull request
Feb 8, 2026
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
jdk2pq
added a commit
to jdk2pq/crosspoint-reader
that referenced
this pull request
Feb 9, 2026
…king-space * master: feat: Add percentage support to CSS properties (crosspoint-reader#738) Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow Add release candidate workflow fix: Allow OTA update from RC build to full release (crosspoint-reader#778) fix(ui): Add Back label in KOReader Sync screen (crosspoint-reader#770) fix: Add EPUB 3 cover image detection (crosspoint-reader#760) feat: A web editor for settings (crosspoint-reader#667) feat: add HalStorage (crosspoint-reader#656) perf: optimize drawPixel() (crosspoint-reader#748) feat: wakeup target detection (crosspoint-reader#731) fix: Scrolling page items calculation (crosspoint-reader#716) refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746) feat: optimize fillRectDither (crosspoint-reader#737)
Marma92
pushed a commit
to Marma92/crosspoint-reader
that referenced
this pull request
Feb 10, 2026
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
Marma92
pushed a commit
to Marma92/crosspoint-reader
that referenced
this pull request
Feb 10, 2026
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
jdk2pq
added a commit
to jdk2pq/crosspoint-reader
that referenced
this pull request
Feb 11, 2026
…king-space * master: feat: use natural sort in file browser (crosspoint-reader#722) fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741) feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600) feat: Add Italian hyphenation support (crosspoint-reader#584) feat: Add percentage support to CSS properties (crosspoint-reader#738) Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow Move release candidate workflow to manual dispatch fix: Allow OTA update from RC build to full release (crosspoint-reader#778) refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746) perf: optimize drawPixel() (crosspoint-reader#748) feat: wakeup target detection (crosspoint-reader#731) fix: Scrolling page items calculation (crosspoint-reader#716) feat: optimize fillRectDither (crosspoint-reader#737) fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727) fix: Remove separations after style changes (crosspoint-reader#720) fix: Lag before displaying covers on home screen (crosspoint-reader#721) feat: Add Settings for toggling CSS on or off (crosspoint-reader#717) Use GITHUB_HEAD_REF release: 1.0.0
jdk2pq
added a commit
to jdk2pq/crosspoint-reader
that referenced
this pull request
Feb 13, 2026
* master: (25 commits) fix: Reduce MIN_SIZE_FOR_POPUP to 10KB (crosspoint-reader#809) docs: Update USER_GUIDE.md (crosspoint-reader#817) fix: Prevent sleeping when in OPDS browser / downloading books (crosspoint-reader#818) feat: Extend python debugging monitor functionality (keyword filter / suppress) (crosspoint-reader#810) docs: Update USER_GUIDE.md (crosspoint-reader#808) feat: Connect to last wifi by default (crosspoint-reader#752) feat: use natural sort in file browser (crosspoint-reader#722) fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741) feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600) feat: Add Italian hyphenation support (crosspoint-reader#584) feat: Add percentage support to CSS properties (crosspoint-reader#738) Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow Move release candidate workflow to manual dispatch fix: Allow OTA update from RC build to full release (crosspoint-reader#778) refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746) perf: optimize drawPixel() (crosspoint-reader#748) feat: wakeup target detection (crosspoint-reader#731) fix: Scrolling page items calculation (crosspoint-reader#716) feat: optimize fillRectDither (crosspoint-reader#737) fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727) ...
Unintendedsideeffects
pushed a commit
to Unintendedsideeffects/crosspoint-reader
that referenced
this pull request
Feb 17, 2026
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
Unintendedsideeffects
pushed a commit
to Unintendedsideeffects/crosspoint-reader
that referenced
this pull request
Feb 17, 2026
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ref #737
This PR further reduce ~25ms from rendering time, testing inside the Setting screen:
And in extreme case (fill the entire screen with black or gray color):
Note that #737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms
Details
This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call:
Generated assembly code:
This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these:
Of course, this optimization is not magic. It's only beneficial under 3 conditions:
rotateCoordinatesis simply a jump table, where each branch is just 3-4 instAI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? NO