Skip to content

tests/bench/runtime_coreapis: add clist_sort() benchmark#21403

Merged
mguetschow merged 2 commits intoRIOT-OS:masterfrom
maribu:tests/bench/runtime_coreapis
Apr 14, 2025
Merged

tests/bench/runtime_coreapis: add clist_sort() benchmark#21403
mguetschow merged 2 commits intoRIOT-OS:masterfrom
maribu:tests/bench/runtime_coreapis

Conversation

@maribu
Copy link
Copy Markdown
Member

@maribu maribu commented Apr 12, 2025

Contribution description

This adds a benchmark for clist_sort() and calls that on three different list sizes.

Testing procedure

Run the benchmark and check whether the run times are plausible.

Issues/PRs references

None

@maribu maribu added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Apr 12, 2025
@riot-ci
Copy link
Copy Markdown

riot-ci commented Apr 12, 2025

Murdock results

✔️ PASSED

6b29409 tests/bench/runtime_coreapis: reduce benchmark runs in the CI

Success Failures Total Runtime
20 0 21 01m:50s

Artifacts

@crasbe
Copy link
Copy Markdown
Contributor

crasbe commented Apr 12, 2025

Following the discussion from the Matrix chat, would it make sense to add a "reverse list" and "mostly sorted" list to the Benchmark as well?

@maribu
Copy link
Copy Markdown
Member Author

maribu commented Apr 12, 2025

This is currently the reversed list (worst case) from the sorting algorithm's PoV. (All relevant have perfectly sorted in wrong order as worst case, or have no worst case).

The issue that Kaspar has is about cache locality: Right now two neighbouring nodes would also be adjacent in memory, as each cycle would now reverses the list, leaving topological neighbours also adjacent in mem.

For more realistical numbers, not having all list nodes comeing from the same buffer but being more scattered would be nices.

@github-actions github-actions bot added the Area: sys Area: System label Apr 12, 2025
@kaspar030
Copy link
Copy Markdown
Contributor

Does the current merge sort behave very weird? As in, sorted data seems to be its worst case?

@kaspar030
Copy link
Copy Markdown
Contributor

The issue that Kaspar has is about cache locality: Right now two neighbouring nodes would also be adjacent in memory, as each cycle would now reverses the list, leaving topological neighbours also adjacent in mem.

Are you sure? I wasn't aware the nrf52840 has a memory cache.

I'm about to get pretty nerd sniped here. :)

Copy link
Copy Markdown
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now not using the warmup anymore. IMO it is not needed (did you observe values changing?). Anyhow, feel free to re-add that, split out the change to benchmark.h, or just leave it in. :)

The formatting is a bit off here:

2025-04-13 22:24:53,206 #      clist_sort (#4, rev):         7us  ---   7.000us per call  ---     142857 calls per sec    
2025-04-13 22:24:53,229 #     clist_sort (#4, prng):         7us  ---   7.000us per call  ---     142857 calls per sec    
2025-04-13 22:24:53,251 #     clist_sort (#4, sort):         7us  ---   7.000us per call  ---     142857 calls per sec    
2025-04-13 22:24:53,273 #  clist_sort (#4, ≈sort):         7us  ---   7.000us per call  ---     142857 calls per sec      
2025-04-13 22:24:53,296 #      clist_sort (#8, rev):        17us  ---  17.000us per call  ---      58823 calls per sec    
2025-04-13 22:24:53,318 #     clist_sort (#8, prng):        15us  ---  15.000us per call  ---      66666 calls per sec     
2025-04-13 22:24:53,340 #     clist_sort (#8, sort):        17us  ---  17.000us per call  ---      58823 calls per sec     
2025-04-13 22:24:53,362 #  clist_sort (#8, ≈sort):        16us  ---  16.000us per call  ---      62500 calls per sec       

... but I don't think that matters.

ACK.

@kaspar030
Copy link
Copy Markdown
Contributor

Ah, I did a couple of runs with even larger sizes, with this and the insertion sort from #21398. The values make sense to me now. 👍

@maribu maribu mentioned this pull request Apr 13, 2025
2 tasks
@maribu maribu force-pushed the tests/bench/runtime_coreapis branch from 3e231de to a893300 Compare April 13, 2025 20:47
@maribu
Copy link
Copy Markdown
Member Author

maribu commented Apr 13, 2025

The issue that Kaspar has is about cache locality: Right now two neighbouring nodes would also be adjacent in memory, as each cycle would now reverses the list, leaving topological neighbours also adjacent in mem.

Are you sure? I wasn't aware the nrf52840 has a memory cache.

As far as I know the nRF52840 has no data cache, only an instruction cache (but I haven't looke d into the data sheet for a long time). I got confused with a test run on native.

@maribu maribu force-pushed the tests/bench/runtime_coreapis branch from a893300 to 7601771 Compare April 13, 2025 20:52
@maribu maribu force-pushed the tests/bench/runtime_coreapis branch from 7601771 to 7c59103 Compare April 13, 2025 20:56
@github-actions github-actions bot removed the Area: sys Area: System label Apr 13, 2025
@maribu
Copy link
Copy Markdown
Member Author

maribu commented Apr 13, 2025

The formatting is a bit off here:

I rephrased the ≈sort to alm.srt. The approx sign is two bytes in length.

This is now not using the warmup anymore. IMO it is not needed (did you observe values changing?). Anyhow, feel free to re-add that, split out the change to benchmark.h, or just leave it in. :)

I split it out. I think it helped a bit with the numbers on native being a bit more stable, but that might be luck.

@maribu maribu force-pushed the tests/bench/runtime_coreapis branch 2 times, most recently from 4235f8b to 1353f2f Compare April 14, 2025 09:44
@maribu maribu force-pushed the tests/bench/runtime_coreapis branch from 1353f2f to 865a96d Compare April 14, 2025 11:19
@maribu maribu force-pushed the tests/bench/runtime_coreapis branch from 865a96d to d6d14c7 Compare April 14, 2025 11:23
maribu and others added 2 commits April 14, 2025 13:36
This adds a benchmark for clist_sort() and calls that on three different
list sizes.

Co-authored-by: Kaspar Schleiser <[email protected]>
Co-authored-by: Mikolai Gütschow <[email protected]>
On `native32` / `native64` in the CI, the benchmark results will be
unusable due to the background load of other build tasks anyway. There
is a risk that due to the high number of benchmark runs the test will
time out.

So to avoid the timeouts and because the results are unusable anyway,
the benchmark runs have been drastically reduced. This will still
allow the CI to confirm that the benchmarks do run successfully to
completion.
@maribu maribu force-pushed the tests/bench/runtime_coreapis branch from d6d14c7 to 6b29409 Compare April 14, 2025 11:36
Copy link
Copy Markdown
Contributor

@mguetschow mguetschow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

(Adding a benchmark test right before hard freeze shouldn't do any harm, should it? 🪀 )

@mguetschow mguetschow added this pull request to the merge queue Apr 14, 2025
@crasbe crasbe added this to the Release 2025.04 milestone Apr 14, 2025
Merged via the queue into RIOT-OS:master with commit ce273ac Apr 14, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants