Conversation
|
Crash can be reproduce with |
|
Not super surprising: various components like |
|
Missing a void singelisort_i32(int32_t* data, size_t len) {
std::vector<int32_t> aux_memory{};
aux_memory.reserve(aux_alloc_size(len));
sort32(data, static_cast<uint64_t>(len), aux_memory.data(),
aux_memory.capacity()*4);
}(also, debug diff) |
|
And I've changed |
|
@dzaima good catch. You are right the aux len is passed as size in bytes, not as size in elements. That's a kind of unfortunate inconsistency to how the input len is passend. |
Thanks will check out. |
I have a generalized version of parity merge https://github.com/Voultapher/sort-research-rs/blob/main/src/graveyard/ipn_stable_with_probe_common.rs#L1064 if you don't care about Ord safety you can even remove some of the checks. |
|
Aux is interpreted as many different types by different algorithms, so I felt like bytes was the only measure that made sense. The |
bed1ff2 to
badf8cb
Compare
|
Tests now pass, but I didn't do any fancy fuzzing or so. |
badf8cb to
b6dfa83
Compare
|
Running benchmarks now, will post results here. |
That is an internal story, you could easily divide by |
|
Interactive graphs analysis.tar.gz Results look impressive. Surprisingly though the pattern perf sees large difference between |
|
SingeliSort is modular, so if you want say a pure counting sort you can make a variation on sort.singeli and compile one for yourself. Building in a division makes this more confusing. Also aux is consistently a void pointer and I just think it's weird to measure it in varying units—it's bench.c that's sloppy about this, not SingeliSort. There's a lot to clean up about aux usage, and I'd like to be able to operate with less space for large arrays. The distribution base cases generally check aux size; merging in glidesort should use swaps to cut down on space if there isn't enough. I'll add an allocating version at some point. I can reproduce the random_d20 slowdown in your benchmarks, but not in C (including compiled as C++) using |
I guess the way I produce my .map(|val| -> u64 {
// Extends the value into the 64 bit range,
// while preserving input order.
let x = ((*val as i64) + (i32::MAX as i64) + 1) as u64;
x.checked_mul(i32::MAX as u64).unwrap()
}) |
|
All right, so it no longer fits into a small range. Using The slower results from Zipf make sense given than Robin Hood sorting doesn't apply to such a distribution, so it has to actually do some merging. RH doesn't apply to d20 either, but quicksort's low-cardinality handling does. For Zipf there's apparently not enough duplication; it improves to be nearly equal to uniform random when the size is bumped to a million. I'd suggest having at least one benchmark that does some distribution over a small number of values picked randomly from the entire range (or some large range in the case of strings etc.). |
I have a collection of additional patterns sort-research-rs/benches/bench.rs Line 187 in 5e07fb6
That doesn't seem to be true. |
|
I'd suggest randomizing the range of Sorry for the confusion: the |
|
The 99p_zero_1p_random-10000 test uncovered a segfault caused by SingeliSort does poorly with repeated element tests like u64-random_d1024-10000, somewhat worse than fluxsort and more like pdq. This is a case where Robin Hood sorting is used but is detrimental. The ratio of ~10 unique copies of each element is the worst case because it doesn't get caught by a sample but ends up with about 5 collisions per hash insertion, which is a lot. On the other hand if there were a way of dealing with exact duplicates, probably gated on some sort of collision counter, then RH would end up much faster than other sorts. I tested out a method using a hash table with 4-byte ints in BQN, and it came out even better than a radix sort. This would be a not-even-unstable optimization like counting sort. For larger numbers of duplicates as in u64-random_d64-10000, switching to merge sort so quickly (<192) is detrimental because it doesn't adapt to duplicates. Also that threshold was tuned with i32 and it looks like something around 96 is better for u64. When the number gets large enough like u64-random_d20-10000 this no longer matters, but SingeliSort is still a little behind fluxsort. Annoying since I designed fluxsort's low-cardinality handling in the first place (with reference to pdqsort). I think SingeliSort is doing too much work when it gets to a partition that has all equal values. [I'll just edit this bit in: wrong, that part could maybe be improved but is about as good as fluxsort. The cost is a mix of switching to merge sort or other base cases, and overhead from doing a sqrt(n) sample which becomes relevant because the total cost is so low.] I haven't said it yet: thanks for all your work on these tests! |
I want to look into that in detail, thanks for the suggestion. Not sure when I'll find the time.
Interesting, you are right in addition to my test suite the benchmark suite can by virtue of running many times and with additional patterns be seen as a kind of test.
I have to admit that I haven't looked closely at radix, or counting sorts yet. So I only understood half of that.
One idea I can provide that also fixes some of issues in glidesort is to have
I wasn't aware you were involved in the development of fluxsort. You are not credited as far as I can tell. |
|
Glidesort's using a sqrt(n/2) threshold as of v0.1.2: commit. This seems not very principled to me. Working it out now, if merge is 20% slower than partition and the whole array is made of length- I've mangled glidesort quite a bit. The run finder actually always searches until it finds a long enough run, so unsorted sections all get grouped together. Then in the logical merge part it ignores sortedness if it would merge an unsorted section with a substantially smaller sorted one. Still not perfect, but one run in the middle can't force everything to use merges. I just thought of a simple test for a single repeated element: once you find the run, compare the elements 1/3 and 2/3 along it: they're always equal if 2/3 of elements are the same and never if less than 1/3 are. Not a full solution but it seems nice and practical. The issue with u64-random_d64-10000 was quicksort switching to the base merge though: the glide layer never gets involved as there aren't natural runs. There's a comment in fluxsort's source. I don't care about attribution though, and have told Scandum that. |
Still WIP, SingeliSort crashes for certain patterns, and sort u64 missing.