Skip to content

perf: optimize drawPixel()#748

Merged
daveallie merged 8 commits intocrosspoint-reader:masterfrom
ngxson:xsn/optimize_gfx
Feb 8, 2026
Merged

perf: optimize drawPixel()#748
daveallie merged 8 commits intocrosspoint-reader:masterfrom
ngxson:xsn/optimize_gfx

Conversation

@ngxson
Copy link
Contributor

@ngxson ngxson commented Feb 7, 2026

Summary

Ref #737

This PR further reduce ~25ms from rendering time, testing inside the Setting screen:

master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer

And in extreme case (fill the entire screen with black or gray color):

master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms

Note that #737 is NOT applied on top of this PR. But with 2 of them combined, it should reduce from 47ms --> 42ms

Details

This PR based on the fact that function calls are costly if the function is small enough. For example, this simple call:

  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);

Generated assembly code:

image

This adds ~10 instructions just to prepare the registers prior to the function call, plus some more instructions for the function's epilogue/prologue. Inlining it removing all of these:

image

Of course, this optimization is not magic. It's only beneficial under 3 conditions:

  • The function is small, not in size, but in terms of effective instructions. For example, the rotateCoordinates is simply a jump table, where each branch is just 3-4 inst
  • The function has multiple input arguments, which requires some move to put it onto the correct place
  • The function is called very frequently (i.e. critical path)

AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? NO

@lukestein lukestein requested review from a team and daveallie February 7, 2026 19:30
lukestein
lukestein previously approved these changes Feb 7, 2026
Copy link
Contributor

@lukestein lukestein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code review conducted, but I have been using this on device and everything seems to render exactly as expected.

(Note I have tested this separately from #737 so no assessment of how they interact.)

@lukestein lukestein requested a review from jdk2pq February 7, 2026 19:33
@ngxson
Copy link
Contributor Author

ngxson commented Feb 7, 2026

Small clarification:

  • feat: optimize fillRectDither #737 only optimizes the code path of fillRectDither, which will eventually call drawPixel
  • This PR optimizes the drawPixel which is used by all other APIs inside GfxRenderer, including the fillRectDither mentioned above

Just note that once #737 is merged, there will be a small conflict so I will need to push a merge commit.

FOR REVIEWER: please merge #737 before reviewing this one. Thanks!

CaptainFrito
CaptainFrito previously approved these changes Feb 8, 2026
Copy link
Contributor

@CaptainFrito CaptainFrito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great optimization work!

@lukestein
Copy link
Contributor

Just note that once #737 is merged, there will be a small conflict so I will need to push a merge commit.

FOR REVIEWER: please merge #737 before reviewing this one. Thanks!

Thanks @ngxson. I note #737 is now merged so this can get ready whenever you are. 🚀

@ngxson ngxson dismissed stale reviews from CaptainFrito and lukestein via 67a738a February 8, 2026 13:18
@ngxson
Copy link
Contributor Author

ngxson commented Feb 8, 2026

Nice, thanks! Should be ready now

@daveallie daveallie merged commit a87eacc into crosspoint-reader:master Feb 8, 2026
5 checks passed
daveallie pushed a commit that referenced this pull request Feb 8, 2026
## Summary

Ref #737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 8, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
jdk2pq added a commit to jdk2pq/crosspoint-reader that referenced this pull request Feb 9, 2026
…king-space

* master:
  feat: Add percentage support to CSS properties (crosspoint-reader#738)
  Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow
  Add release candidate workflow
  fix: Allow OTA update from RC build to full release (crosspoint-reader#778)
  fix(ui): Add Back label in KOReader Sync screen (crosspoint-reader#770)
  fix: Add EPUB 3 cover image detection (crosspoint-reader#760)
  feat: A web editor for settings (crosspoint-reader#667)
  feat: add HalStorage (crosspoint-reader#656)
  perf: optimize drawPixel() (crosspoint-reader#748)
  feat: wakeup target detection (crosspoint-reader#731)
  fix: Scrolling page items calculation (crosspoint-reader#716)
  refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746)
  feat: optimize fillRectDither (crosspoint-reader#737)
Marma92 pushed a commit to Marma92/crosspoint-reader that referenced this pull request Feb 10, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Marma92 pushed a commit to Marma92/crosspoint-reader that referenced this pull request Feb 10, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
jdk2pq added a commit to jdk2pq/crosspoint-reader that referenced this pull request Feb 11, 2026
…king-space

* master:
  feat: use natural sort in file browser (crosspoint-reader#722)
  fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741)
  feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600)
  feat: Add Italian hyphenation support (crosspoint-reader#584)
  feat: Add percentage support to CSS properties (crosspoint-reader#738)
  Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow
  Move release candidate workflow to manual dispatch
  fix: Allow OTA update from RC build to full release (crosspoint-reader#778)
  refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746)
  perf: optimize drawPixel() (crosspoint-reader#748)
  feat: wakeup target detection (crosspoint-reader#731)
  fix: Scrolling page items calculation (crosspoint-reader#716)
  feat: optimize fillRectDither (crosspoint-reader#737)
  fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727)
  fix: Remove separations after style changes (crosspoint-reader#720)
  fix: Lag before displaying covers on home screen (crosspoint-reader#721)
  feat: Add Settings for toggling CSS on or off (crosspoint-reader#717)
  Use GITHUB_HEAD_REF
  release: 1.0.0
jdk2pq added a commit to jdk2pq/crosspoint-reader that referenced this pull request Feb 13, 2026
* master: (25 commits)
  fix: Reduce MIN_SIZE_FOR_POPUP to 10KB (crosspoint-reader#809)
  docs: Update USER_GUIDE.md (crosspoint-reader#817)
  fix: Prevent sleeping when in OPDS browser / downloading books (crosspoint-reader#818)
  feat: Extend python debugging monitor functionality (keyword filter / suppress) (crosspoint-reader#810)
  docs: Update USER_GUIDE.md (crosspoint-reader#808)
  feat: Connect to last wifi by default (crosspoint-reader#752)
  feat: use natural sort in file browser (crosspoint-reader#722)
  fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741)
  feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600)
  feat: Add Italian hyphenation support (crosspoint-reader#584)
  feat: Add percentage support to CSS properties (crosspoint-reader#738)
  Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow
  Move release candidate workflow to manual dispatch
  fix: Allow OTA update from RC build to full release (crosspoint-reader#778)
  refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746)
  perf: optimize drawPixel() (crosspoint-reader#748)
  feat: wakeup target detection (crosspoint-reader#731)
  fix: Scrolling page items calculation (crosspoint-reader#716)
  feat: optimize fillRectDither (crosspoint-reader#737)
  fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727)
  ...
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
## Summary

Ref crosspoint-reader#737

This PR further reduce ~25ms from rendering time, testing inside the
Setting screen:

```
master:
[68440] [GFX] Time = 73 ms from clearScreen to displayBuffer

PR:
[97806] [GFX] Time = 47 ms from clearScreen to displayBuffer
```

And in extreme case (fill the entire screen with black or gray color):

```
master:
[1125] [   ] Test fillRectDither drawn in 327 ms
[1347] [   ] Test fillRect drawn in 222 ms

PR:
[1334] [   ] Test fillRectDither drawn in 225 ms
[1455] [   ] Test fillRect drawn in 121 ms
```

Note that
crosspoint-reader#737 is NOT
applied on top of this PR. But with 2 of them combined, it should reduce
from 47ms --> 42ms

## Details

This PR based on the fact that function calls are costly if the function
is small enough. For example, this simple call:

```
  int rotatedX = 0;
  int rotatedY = 0;
  rotateCoordinates(x, y, &rotatedX, &rotatedY);
```

Generated assembly code:

<img width="771" height="215" alt="image"
src="https://github.com/user-attachments/assets/37991659-3304-41c3-a3b2-fb967da53f82"
/>

This adds ~10 instructions just to prepare the registers prior to the
function call, plus some more instructions for the function's
epilogue/prologue. Inlining it removing all of these:

<img width="1471" height="832" alt="image"
src="https://github.com/user-attachments/assets/b67a22ee-93ba-4017-88ed-c973e28ec914"
/>

Of course, this optimization is not magic. It's only beneficial under 3
conditions:
- The function is small, not in size, but in terms of effective
instructions. For example, the `rotateCoordinates` is simply a jump
table, where each branch is just 3-4 inst
- The function has multiple input arguments, which requires some move to
put it onto the correct place
- The function is called very frequently (i.e. critical path)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **NO**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants