Skip to content

feat: optimize fillRectDither#737

Merged
osteotek merged 2 commits intocrosspoint-reader:masterfrom
ngxson:xsn/optimize_fill_dither
Feb 8, 2026
Merged

feat: optimize fillRectDither#737
osteotek merged 2 commits intocrosspoint-reader:masterfrom
ngxson:xsn/optimize_fill_dither

Conversation

@ngxson
Copy link
Contributor

@ngxson ngxson commented Feb 7, 2026

Summary

This PR optimizes the fillRectDither function, making it as fast as a normal fillRect

Testing code:

  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }

Before:

[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

After:

[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms

Visual validation

Before:

Screenshot 2026-02-07 at 01 04 19

After:

Screenshot 2026-02-07 at 01 36 30

Details

The original version is quite slow because it does quite a lot of computations. A single pixel needs around 20 instructions just to know if it's black or white:

Screenshot 2026-02-07 at 00 15 54

With the new, templated and more light-weight approach, each pixel takes only 3-4 instructions, the modulo operator is translated into bitwise ops:

Screenshot 2026-02-07 at 01 47 51

AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? NO

CaptainFrito
CaptainFrito previously approved these changes Feb 7, 2026
Copy link
Contributor

@CaptainFrito CaptainFrito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

@ngxson
Copy link
Contributor Author

ngxson commented Feb 7, 2026

I did some more measurements and concluded that there are some other ops that need to be optimized too (Would appreciate if this PR can be merged in a short time frame, so I can proceed with other optimizations)

This measurement is took by pressing navigation button on "Setting" screen, using Lyra theme. This PR only reduces 4ms from the drawing time:

master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[12679] [GFX] Time = 69 ms from clearScreen to displayBuffer

The time used by the original theme is significantly lower:

[42367] [GFX] Time = 40 ms from clearScreen to displayBuffer

I hope with the right optimizations, Lyra theme can reduce the drawing time to 50ms yep I managed to reduce the rendering time to 42ms

lukestein
lukestein previously approved these changes Feb 7, 2026
Copy link
Contributor

@lukestein lukestein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code review, but testing on device and everything I can find looks good. I do think the Lyra UI feels a bit snappier with this. Thank you! 🎉

@lukestein lukestein requested a review from a team February 7, 2026 15:08
@ngxson ngxson mentioned this pull request Feb 7, 2026
@ngxson
Copy link
Contributor Author

ngxson commented Feb 7, 2026

#748 should add a more significant improvement in perf

Copy link
Contributor

@jdk2pq jdk2pq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the code and this seems like a great optimization. The Bayer 4x4 matrix was a little bit of overkill, at least for our rendering options in greys right now.

I'd still like @daveallie to weigh in before we merge this one.

@jdk2pq jdk2pq requested a review from daveallie February 7, 2026 16:24
@ngxson ngxson dismissed stale reviews from lukestein and CaptainFrito via 7800b47 February 7, 2026 17:39
Copy link
Contributor

@lukestein lukestein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving (with same caveats) given only comments added

@lukestein
Copy link
Contributor

Per @ngxson, this PR should merge before #748 (to which it will introduce a small conflict)

@osteotek osteotek merged commit 76908d3 into crosspoint-reader:master Feb 8, 2026
5 checks passed
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

## Visual validation

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

## Details

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

## Visual validation

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

## Details

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
daveallie pushed a commit that referenced this pull request Feb 8, 2026
## Summary

Ref #737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
daveallie pushed a commit that referenced this pull request Feb 8, 2026
## Summary

This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

## Visual validation

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

## Details

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
daveallie pushed a commit that referenced this pull request Feb 8, 2026
## Summary

Ref #737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

## Visual validation

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

## Details

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

## Visual validation

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

## Details

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

## Visual validation

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

## Details

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
jdk2pq added a commit to jdk2pq/crosspoint-reader that referenced this pull request Feb 9, 2026
…king-space

* master:
  feat: Add percentage support to CSS properties (crosspoint-reader#738)
  Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow
  Add release candidate workflow
  fix: Allow OTA update from RC build to full release (crosspoint-reader#778)
  fix(ui): Add Back label in KOReader Sync screen (crosspoint-reader#770)
  fix: Add EPUB 3 cover image detection (crosspoint-reader#760)
  feat: A web editor for settings (crosspoint-reader#667)
  feat: add HalStorage (crosspoint-reader#656)
  perf: optimize drawPixel() (crosspoint-reader#748)
  feat: wakeup target detection (crosspoint-reader#731)
  fix: Scrolling page items calculation (crosspoint-reader#716)
  refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746)
  feat: optimize fillRectDither (crosspoint-reader#737)
Marma92 pushed a commit to Marma92/crosspoint-reader that referenced this pull request Feb 10, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Marma92 pushed a commit to Marma92/crosspoint-reader that referenced this pull request Feb 10, 2026
This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Marma92 pushed a commit to Marma92/crosspoint-reader that referenced this pull request Feb 10, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
jdk2pq added a commit to jdk2pq/crosspoint-reader that referenced this pull request Feb 11, 2026
…king-space

* master:
  feat: use natural sort in file browser (crosspoint-reader#722)
  fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741)
  feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600)
  feat: Add Italian hyphenation support (crosspoint-reader#584)
  feat: Add percentage support to CSS properties (crosspoint-reader#738)
  Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow
  Move release candidate workflow to manual dispatch
  fix: Allow OTA update from RC build to full release (crosspoint-reader#778)
  refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746)
  perf: optimize drawPixel() (crosspoint-reader#748)
  feat: wakeup target detection (crosspoint-reader#731)
  fix: Scrolling page items calculation (crosspoint-reader#716)
  feat: optimize fillRectDither (crosspoint-reader#737)
  fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727)
  fix: Remove separations after style changes (crosspoint-reader#720)
  fix: Lag before displaying covers on home screen (crosspoint-reader#721)
  feat: Add Settings for toggling CSS on or off (crosspoint-reader#717)
  Use GITHUB_HEAD_REF
  release: 1.0.0
jdk2pq added a commit to jdk2pq/crosspoint-reader that referenced this pull request Feb 13, 2026
* master: (25 commits)
  fix: Reduce MIN_SIZE_FOR_POPUP to 10KB (crosspoint-reader#809)
  docs: Update USER_GUIDE.md (crosspoint-reader#817)
  fix: Prevent sleeping when in OPDS browser / downloading books (crosspoint-reader#818)
  feat: Extend python debugging monitor functionality (keyword filter / suppress) (crosspoint-reader#810)
  docs: Update USER_GUIDE.md (crosspoint-reader#808)
  feat: Connect to last wifi by default (crosspoint-reader#752)
  feat: use natural sort in file browser (crosspoint-reader#722)
  fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741)
  feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600)
  feat: Add Italian hyphenation support (crosspoint-reader#584)
  feat: Add percentage support to CSS properties (crosspoint-reader#738)
  Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow
  Move release candidate workflow to manual dispatch
  fix: Allow OTA update from RC build to full release (crosspoint-reader#778)
  refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746)
  perf: optimize drawPixel() (crosspoint-reader#748)
  feat: wakeup target detection (crosspoint-reader#731)
  fix: Scrolling page items calculation (crosspoint-reader#716)
  feat: optimize fillRectDither (crosspoint-reader#737)
  fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727)
  ...
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
## Summary

This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

## Visual validation

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

## Details

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
## Summary

This PR optimizes the `fillRectDither` function, making it as fast as a
normal `fillRect`

Testing code:

```cpp
  {
    auto start_t = millis();
    renderer.fillRectDither(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), Color::LightGray);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRectDither drawn in %lu ms\n", millis(), elapsed);
  }

  {
    auto start_t = millis();
    renderer.fillRect(0, 0, renderer.getScreenWidth(), renderer.getScreenHeight(), true);
    auto elapsed = millis() - start_t;
    Serial.printf("[%lu] [   ] Test fillRect drawn in %lu ms\n", millis(), elapsed);
  }
```

Before:

```
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms
```

After:

```
[1065] [ ] Test fillRectDither drawn in 238 ms
[1287] [ ] Test fillRect drawn in 222 ms
```

## Visual validation

Before:

<img width="415" height="216" alt="Screenshot 2026-02-07 at 01 04 19"
src="https://github.com/user-attachments/assets/5802dbba-187b-4d2b-a359-1318d3932d38"
/>

After:

<img width="420" height="191" alt="Screenshot 2026-02-07 at 01 36 30"
src="https://github.com/user-attachments/assets/3c3c8e14-3f3a-4205-be78-6ed771dcddf4"
/>

## Details

The original version is quite slow because it does quite a lot of
computations. A single pixel needs around 20 instructions just to know
if it's black or white:

<img width="1170" height="693" alt="Screenshot 2026-02-07 at 00 15 54"
src="https://github.com/user-attachments/assets/7c5a55e7-0598-4340-8b7b-17307d7921cb"
/>

With the new, templated and more light-weight approach, each pixel takes
only 3-4 instructions, the modulo operator is translated into bitwise
ops:

<img width="1175" height="682" alt="Screenshot 2026-02-07 at 01 47 51"
src="https://github.com/user-attachments/assets/4ec2cf74-6cc0-4b5b-87d5-831563ef164f"
/>

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants