tests/bench/runtime_coreapis: add clist_sort() benchmark#21403
tests/bench/runtime_coreapis: add clist_sort() benchmark#21403mguetschow merged 2 commits intoRIOT-OS:masterfrom
Conversation
|
Following the discussion from the Matrix chat, would it make sense to add a "reverse list" and "mostly sorted" list to the Benchmark as well? |
|
This is currently the reversed list (worst case) from the sorting algorithm's PoV. (All relevant have perfectly sorted in wrong order as worst case, or have no worst case). The issue that Kaspar has is about cache locality: Right now two neighbouring nodes would also be adjacent in memory, as each cycle would now reverses the list, leaving topological neighbours also adjacent in mem. For more realistical numbers, not having all list nodes comeing from the same buffer but being more scattered would be nices. |
|
Does the current merge sort behave very weird? As in, sorted data seems to be its worst case? |
Are you sure? I wasn't aware the nrf52840 has a memory cache. I'm about to get pretty nerd sniped here. :) |
kaspar030
left a comment
There was a problem hiding this comment.
This is now not using the warmup anymore. IMO it is not needed (did you observe values changing?). Anyhow, feel free to re-add that, split out the change to benchmark.h, or just leave it in. :)
The formatting is a bit off here:
2025-04-13 22:24:53,206 # clist_sort (#4, rev): 7us --- 7.000us per call --- 142857 calls per sec
2025-04-13 22:24:53,229 # clist_sort (#4, prng): 7us --- 7.000us per call --- 142857 calls per sec
2025-04-13 22:24:53,251 # clist_sort (#4, sort): 7us --- 7.000us per call --- 142857 calls per sec
2025-04-13 22:24:53,273 # clist_sort (#4, ≈sort): 7us --- 7.000us per call --- 142857 calls per sec
2025-04-13 22:24:53,296 # clist_sort (#8, rev): 17us --- 17.000us per call --- 58823 calls per sec
2025-04-13 22:24:53,318 # clist_sort (#8, prng): 15us --- 15.000us per call --- 66666 calls per sec
2025-04-13 22:24:53,340 # clist_sort (#8, sort): 17us --- 17.000us per call --- 58823 calls per sec
2025-04-13 22:24:53,362 # clist_sort (#8, ≈sort): 16us --- 16.000us per call --- 62500 calls per sec
... but I don't think that matters.
ACK.
|
Ah, I did a couple of runs with even larger sizes, with this and the insertion sort from #21398. The values make sense to me now. 👍 |
3e231de to
a893300
Compare
As far as I know the nRF52840 has no data cache, only an instruction cache (but I haven't looke d into the data sheet for a long time). I got confused with a test run on native. |
a893300 to
7601771
Compare
7601771 to
7c59103
Compare
I rephrased the
I split it out. I think it helped a bit with the numbers on |
4235f8b to
1353f2f
Compare
1353f2f to
865a96d
Compare
865a96d to
d6d14c7
Compare
This adds a benchmark for clist_sort() and calls that on three different list sizes. Co-authored-by: Kaspar Schleiser <[email protected]> Co-authored-by: Mikolai Gütschow <[email protected]>
On `native32` / `native64` in the CI, the benchmark results will be unusable due to the background load of other build tasks anyway. There is a risk that due to the high number of benchmark runs the test will time out. So to avoid the timeouts and because the results are unusable anyway, the benchmark runs have been drastically reduced. This will still allow the CI to confirm that the benchmarks do run successfully to completion.
d6d14c7 to
6b29409
Compare
mguetschow
left a comment
There was a problem hiding this comment.
Thanks!
(Adding a benchmark test right before hard freeze shouldn't do any harm, should it? 🪀 )
Contribution description
This adds a benchmark for clist_sort() and calls that on three different list sizes.
Testing procedure
Run the benchmark and check whether the run times are plausible.
Issues/PRs references
None