feat: optimize fillRectDither#737
Conversation
|
I did some more measurements and concluded that there are some other ops that need to be optimized too (Would appreciate if this PR can be merged in a short time frame, so I can proceed with other optimizations) This measurement is took by pressing navigation button on "Setting" screen, using Lyra theme. This PR only reduces 4ms from the drawing time: The time used by the original theme is significantly lower:
|
lukestein
left a comment
There was a problem hiding this comment.
No code review, but testing on device and everything I can find looks good. I do think the Lyra UI feels a bit snappier with this. Thank you! 🎉
|
#748 should add a more significant improvement in perf |
jdk2pq
left a comment
There was a problem hiding this comment.
Reviewed the code and this seems like a great optimization. The Bayer 4x4 matrix was a little bit of overkill, at least for our rendering options in greys right now.
I'd still like @daveallie to weigh in before we merge this one.
lukestein
left a comment
There was a problem hiding this comment.
Re-approving (with same caveats) given only comments added
## Summary
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
## Visual validation
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
## Details
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
## Visual validation
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
## Details
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary Ref #737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that #737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
## Summary
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
## Visual validation
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
## Details
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary Ref #737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that #737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
## Summary
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
## Visual validation
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
## Details
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
## Summary
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
## Visual validation
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
## Details
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
## Summary
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
## Visual validation
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
## Details
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
…king-space * master: feat: Add percentage support to CSS properties (crosspoint-reader#738) Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow Add release candidate workflow fix: Allow OTA update from RC build to full release (crosspoint-reader#778) fix(ui): Add Back label in KOReader Sync screen (crosspoint-reader#770) fix: Add EPUB 3 cover image detection (crosspoint-reader#760) feat: A web editor for settings (crosspoint-reader#667) feat: add HalStorage (crosspoint-reader#656) perf: optimize drawPixel() (crosspoint-reader#748) feat: wakeup target detection (crosspoint-reader#731) fix: Scrolling page items calculation (crosspoint-reader#716) refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746) feat: optimize fillRectDither (crosspoint-reader#737)
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
…king-space * master: feat: use natural sort in file browser (crosspoint-reader#722) fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741) feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600) feat: Add Italian hyphenation support (crosspoint-reader#584) feat: Add percentage support to CSS properties (crosspoint-reader#738) Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow Move release candidate workflow to manual dispatch fix: Allow OTA update from RC build to full release (crosspoint-reader#778) refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746) perf: optimize drawPixel() (crosspoint-reader#748) feat: wakeup target detection (crosspoint-reader#731) fix: Scrolling page items calculation (crosspoint-reader#716) feat: optimize fillRectDither (crosspoint-reader#737) fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727) fix: Remove separations after style changes (crosspoint-reader#720) fix: Lag before displaying covers on home screen (crosspoint-reader#721) feat: Add Settings for toggling CSS on or off (crosspoint-reader#717) Use GITHUB_HEAD_REF release: 1.0.0
* master: (25 commits) fix: Reduce MIN_SIZE_FOR_POPUP to 10KB (crosspoint-reader#809) docs: Update USER_GUIDE.md (crosspoint-reader#817) fix: Prevent sleeping when in OPDS browser / downloading books (crosspoint-reader#818) feat: Extend python debugging monitor functionality (keyword filter / suppress) (crosspoint-reader#810) docs: Update USER_GUIDE.md (crosspoint-reader#808) feat: Connect to last wifi by default (crosspoint-reader#752) feat: use natural sort in file browser (crosspoint-reader#722) fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741) feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600) feat: Add Italian hyphenation support (crosspoint-reader#584) feat: Add percentage support to CSS properties (crosspoint-reader#738) Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow Move release candidate workflow to manual dispatch fix: Allow OTA update from RC build to full release (crosspoint-reader#778) refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746) perf: optimize drawPixel() (crosspoint-reader#748) feat: wakeup target detection (crosspoint-reader#731) fix: Scrolling page items calculation (crosspoint-reader#716) feat: optimize fillRectDither (crosspoint-reader#737) fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727) ...
## Summary
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
## Visual validation
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
## Details
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
## Summary
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`
Testing code:
```cpp
{
auto start_t = millis();
renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
}
{
auto start_t = millis();
renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
auto elapsed = millis() - start_t;
Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
}
```
Before:
```
[1125] [ ] Test fillRectDither drawn in 327 ms
[1347] [ ] Test fillRect drawn in 222 ms
```
After:
```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```
## Visual validation
Before:
<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>
After:
<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>
## Details
The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:
<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>
With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:
<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>
---
### AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? **NO**
## Summary Ref crosspoint-reader#737 This PR further reduce ~25ms from rendering time, testing inside the Setting screen: ``` master: [68440] [GFX] Time = 73 ms from clearScreen to displayBuffer PR: [97806] [GFX] Time = 47 ms from clearScreen to displayBuffer ``` And in extreme case (fill the entire screen with black or gray color): ``` master: [1125] [ ] Test fillRectDither drawn in 327 ms [1347] [ ] Test fillRect drawn in 222 ms PR: [1334] [ ] Test fillRectDither drawn in 225 ms [1455] [ ] Test fillRect drawn in 121 ms ``` Note that crosspoint-reader#737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms ## Details This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call: ``` int rotatedX = 0; int rotatedY = 0; rotateCoordinates(x, y, &rotatedX, &rotatedY); ``` Generated assembly code: <img width="771" height="215" alt="image" src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82" /> This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these: <img width="1471" height="832" alt="image" src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914" /> Of course, this optimization is not magic. It's only beneficial under 3 conditions: - The function is small, not in size, but in terms of effective instructions. For example, the `rotateCoordinates` is simply a jump table, where each branch is just 3-4 inst - The function has multiple input arguments, which requires some move to put it onto the correct place - The function is called very frequently (i.e. critical path) --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? **NO**
Summary
This PR optimizes the
fillRectDitherfunction, making it as fast as a normalfillRectTesting code:
{ auto start_t = millis(); renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray); auto elapsed = millis() - start_t; Serial.printf("[%lu] [ ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed); } { auto start_t = millis(); renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true); auto elapsed = millis() - start_t; Serial.printf("[%lu] [ ] Test fillRect drawn in %lu ms\n", millis(), elapsed); }Before:
After:
Visual validation
Before:
After:
Details
The original version is quite slow because it does quite a lot of computations. A single pixel needs around 20 instructions just to know if it's black or white:
With the new, templated and more light-weight approach, each pixel takes only 3-4 instructions, the modulo operator is translated into bitwise ops:
AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? NO