Skip to content

Iterate backwards on zdiff/zinter/zunion to optimize for zslInsert#8105

Merged
oranagra merged 2 commits intoredis:unstablefrom
felipou:optimize-zdiffinterunion-insertion
Dec 3, 2020
Merged

Iterate backwards on zdiff/zinter/zunion to optimize for zslInsert#8105
oranagra merged 2 commits intoredis:unstablefrom
felipou:optimize-zdiffinterunion-insertion

Conversation

@felipou
Copy link
Contributor

@felipou felipou commented Nov 28, 2020

In the iterator for these functions, we'll traverse the sorted sets
in a reversed way so that largest elements come first. We prefer
this order because it's optimized for insertion in a skiplist, which
is the destination of the elements being iterated in these functions.

I didn't modify the iteration code for sets (since it isn't ordered anyway),
and I also kept the function names (didn't rename to "reverse" or something),
just added a comment to zuiInitIterator. I feel like this should be further
documented, but I wasn't sure where to put further explanations.

I just noticed that the "ui" in "zuiInitIterator" is probably "union" and "inter".
Maybe we should rename all "zui" functions to "zuid"?

In the iterator for these functions, we'll traverse the sorted sets
in a reversed way so that largest elements come first. We prefer
this order because it's optimized for insertion in a skiplist, which
is the destination of the elements being iterated in there functions.
@felipou felipou marked this pull request as draft November 28, 2020 23:31
@felipou
Copy link
Contributor Author

felipou commented Nov 28, 2020

@oranagra

@oranagra
Copy link
Member

My initial thought was that this iterator (zuiInitIterator) should take a boolean input indicating if the iteration should be forward or backward, and support both, but if it's only gonna be used in one direction, i guess there's no need for that.

Regarding the name, interesting observation about the "ui" prefix, but in that case we should have changed it to "uid" (union, intersection, diff), but IMHO that's lame.
Note that the object it uses for the iteration is called "zsetopval" (i.e. "op" for "operation"). that's looks better IMHO, so if anything we can rename all of these to "op".
However, i don't mind leaving them as is for now.

I would like to ask that you conduct a simple benchmark on big objects to show that this change really improves performance (or at least doesn't hurt it). i.e. run ZUIONSTORE, ZINTERSTORE and ZDIFFSTORE (both algorithms) on large sets with the old and new code and measure the time (maybe using CONFIG RESETSTAT and INFO COMMANDSTATS). maybe do that once on skiplists, and once for ziplist (after setting zset-max-ziplist-entries to be very high).

IIUC in ZUNION, there's no benefit of doing backwards or forward, maybe we'll notice that forward if more cache friendly.

@itamarhaber
Copy link
Member

Heya @felipou and thanks for this. Would it possible to back up the "We prefer this order..." claim with a simple before/after benchmark?

@felipou
Copy link
Contributor Author

felipou commented Nov 29, 2020

Heya @felipou and thanks for this. Would it possible to back up the "We prefer this order..." claim with a simple before/after benchmark?

@itamarhaber Yes, will do, I'll follow @oranagra suggestions!

@felipou
Copy link
Contributor Author

felipou commented Nov 29, 2020

My initial thought was that this iterator (zuiInitIterator) should take a boolean input indicating if the iteration should be forward or backward, and support both, but if it's only gonna be used in one direction, i guess there's no need for that.

I started thinking something along those lines, I was going to implement a zuiInitReverseIterator and zuiPrev, but when I noticed that we could use the same direction in all cases, so I decided it was simpler to just always reverse the order.

Regarding the name, interesting observation about the "ui" prefix, but in that case we should have changed it to "uid" (union, intersection, diff), but IMHO that's lame.
Note that the object it uses for the iteration is called "zsetopval" (i.e. "op" for "operation"). that's looks better IMHO, so if anything we can rename all of these to "op".
However, i don't mind leaving them as is for now.

Ok, I'll leave them as is (good idea for the new name though).

I would like to ask that you conduct a simple benchmark on big objects to show that this change really improves performance (or at least doesn't hurt it). i.e. run ZUIONSTORE, ZINTERSTORE and ZDIFFSTORE (both algorithms) on large sets with the old and new code and measure the time (maybe using CONFIG RESETSTAT and INFO COMMANDSTATS). maybe do that once on skiplists, and once for ziplist (after setting zset-max-ziplist-entries to be very high).

IIUC in ZUNION, there's no benefit of doing backwards or forward, maybe we'll notice that forward if more cache friendly.

Ok, I'll try to run some benchmarks today, thanks for the tips!

@felipou
Copy link
Contributor Author

felipou commented Nov 29, 2020

I create a simple script for an initial benchmark with Python: https://gist.github.com/felipou/50a697856a24e1910f332648f34ba94c

There are the results so far:

optimize-zdiffinterunion-insertion branch:
$ python bench_zdiff.py 
cmdstat_zdiffstore {'calls': 30, 'usec': 61575448, 'usec_per_call': 2052514.88}
cmdstat_zunionstore {'calls': 30, 'usec': 81069605, 'usec_per_call': 2702320.25}


unstable branch:
$ python bench_zdiff.py 
cmdstat_zdiffstore {'calls': 30, 'usec': 63695331, 'usec_per_call': 2123177.75}
cmdstat_zunionstore {'calls': 30, 'usec': 82097329, 'usec_per_call': 2736577.5}

Which shows a 3.33% reduction in time for zdiff, and 1.25% reduction in time for zunion. I expected better results, but upon better inspecting the zslInsert code, I've come to think that the gain in performance for inserting this way isn't that great compared to the constant times of the algorithm.

But anyway, if you think this seems worth it, I can do some improved benchmarks (get the standard deviation to check if there really was a significant improvement, for example), and also create a benchmark for zinter.

@itamarhaber
Copy link
Member

I create a simple script for an initial benchmark...

Nicely done, thank you!

@oranagra
Copy link
Member

i also expected bigger impact, maybe if you'll use bigger objects the impact will be greater. (now each call takes just 2ms, maybe scale by 100 so that each call takes 200ms).

since it's not a big effort and you're almost there, i'd rather completing the effort (i'm paranoid that in some of the cases maybe we're causing a damage and we won't realize it).
i don't think we need standard deviation, either the impact will be clearly apparent, or not. if not, then at least we didn't cause an apparent damage.

i do think we want to cover all cases (i.e. including ziplists, and both zdiff algorithms)... assuming the effort isn't big.
p.s. ziplists may be slow, so you won't need very big objects (although still bigger than the default zset-max-ziplist-entries.
p.p.s. if the insertion takes too long, IMHO you can drop the random score for ZADD and either use sequential scores, or just save these to an RDB file to be re-used by the followup tests.

@felipou
Copy link
Contributor Author

felipou commented Nov 29, 2020

i also expected bigger impact, maybe if you'll use bigger objects the impact will be greater. (now each call takes just 2ms, maybe scale by 100 so that each call takes 200ms).

It's actually around 2s. At list that's what I observed, and I assumed "usec" was microseconds, so 10^-6 seconds, giving 2.12 seconds from the result of 2123177.75 usec per call. Previously I had tested with 10 million elements, and it took around 20 seconds.

since it's not a big effort and you're almost there, i'd rather completing the effort (i'm paranoid that in some of the cases maybe we're causing a damage and we won't realize it).
i don't think we need standard deviation, either the impact will be clearly apparent, or not. if not, then at least we didn't cause an apparent damage.

i do think we want to cover all cases (i.e. including ziplists, and both zdiff algorithms)... assuming the effort isn't big.
p.s. ziplists may be slow, so you won't need very big objects (although still bigger than the default zset-max-ziplist-entries.
p.p.s. if the insertion takes too long, IMHO you can drop the random score for ZADD and either use sequential scores, or just save these to an RDB file to be re-used by the followup tests.

No problem, I don't think the effort will be too big. I'll come up with a zinter benchmark, then try to do some optimization trick as you suggested to avoid the slow process of zadds. Then I'll just prepare the code to run everything and that's it. Should be finished by tomorrow.

@felipou
Copy link
Contributor Author

felipou commented Dec 1, 2020

Just updated the benchmark codes, here are the results for the skiplist (the percentages are the reduction in average time):

zdiffstore   8.2%
zunionstore  0.7%
zinterstore  4.9%
zdiff        6.5%
zunion       0.8%
zinter       4.4%

Raw results:

unstable branch

cmdstat_zdiffstore {'calls': 30, 'usec': 20479755, 'usec_per_call': 682658.56}
cmdstat_zunionstore {'calls': 30, 'usec': 82061999, 'usec_per_call': 2735400.0}
cmdstat_zinterstore {'calls': 30, 'usec': 12285768, 'usec_per_call': 409525.59}
cmdstat_zdiff {'calls': 30, 'usec': 26110391, 'usec_per_call': 870346.38}
cmdstat_zunion {'calls': 30, 'usec': 96385441, 'usec_per_call': 3212848.0}
cmdstat_zinter {'calls': 30, 'usec': 14372887, 'usec_per_call': 479096.22}


optimize-zdiffinterunion-insertion branch

cmdstat_zdiffstore {'calls': 30, 'usec': 18801110, 'usec_per_call': 626703.69}
cmdstat_zunionstore {'calls': 30, 'usec': 81522011, 'usec_per_call': 2717400.25}
cmdstat_zinterstore {'calls': 30, 'usec': 11679567, 'usec_per_call': 389318.91}
cmdstat_zdiff {'calls': 30, 'usec': 24413496, 'usec_per_call': 813783.19}
cmdstat_zunion {'calls': 30, 'usec': 95623414, 'usec_per_call': 3187447.25}
cmdstat_zinter {'calls': 30, 'usec': 13741229, 'usec_per_call': 458040.97}

Now I'll benchmark with ziplists (setting zset-max-ziplist-entries as @oranagra suggested), will post the results soon.

@oranagra
Copy link
Member

oranagra commented Dec 1, 2020

is there a reason to benchmark both zdiff and zdiffstore? shouldn't they give similar results? (8% and 6% are similar in my eyes 8-).
on the other hand i think it would be good to benchmark both zdiff algorithms.

@itamarhaber
Copy link
Member

FYI @filipecosta90

@felipou
Copy link
Contributor Author

felipou commented Dec 2, 2020

is there a reason to benchmark both zdiff and zdiffstore? shouldn't they give similar results? (8% and 6% are similar in my eyes 8-).
on the other hand i think it would be good to benchmark both zdiff algorithms.

Actually no, I thought someone had asked for it, but now I see that I was mistaken. Anyway, I just finished the benchmarks forcing ziplists with size 50k (it was already slow enough to build the zsets with this size):

zdiffstore   3.7%
zunionstore  1.1%
zinterstore  0.9%
zdiff        0.6%
zunion       -1.4%
zinter       1.2%

And the raw results:

unstable branch

cmdstat_zdiffstore {'calls': 30, 'usec': 3614248, 'usec_per_call': 120474.93}
cmdstat_zunionstore {'calls': 30, 'usec': 6515481, 'usec_per_call': 217182.7}
cmdstat_zinterstore {'calls': 30, 'usec': 212673932, 'usec_per_call': 7089131.0}
cmdstat_zdiff {'calls': 30, 'usec': 3067841, 'usec_per_call': 102261.37}
cmdstat_zunion {'calls': 30, 'usec': 4973754, 'usec_per_call': 165791.8}
cmdstat_zinter {'calls': 30, 'usec': 212498954, 'usec_per_call': 7083298.5}


optimize-zdiffinterunion-insertion branch

cmdstat_zdiffstore {'calls': 30, 'usec': 3479659, 'usec_per_call': 115988.63}
cmdstat_zunionstore {'calls': 30, 'usec': 6444188, 'usec_per_call': 214806.27}
cmdstat_zinterstore {'calls': 30, 'usec': 210749675, 'usec_per_call': 7024989.5}
cmdstat_zdiff {'calls': 30, 'usec': 3049990, 'usec_per_call': 101666.34}
cmdstat_zunion {'calls': 30, 'usec': 5042516, 'usec_per_call': 168083.86}
cmdstat_zinter {'calls': 30, 'usec': 210030685, 'usec_per_call': 7001023.0}

If I'm not mistaken, I believe the zslInserts will always be performed in a skiplist, I think that's how the target zset is initialized on these algorithms. Since we're using much smaller zsets (the previous benchmark I ran with zsets with 1 million elements, and now just 50k), I think this is why it didn't show a larger performance gain.

I'll try to benchmark both zdiff algorithms now using a skiplist.

@felipou
Copy link
Contributor Author

felipou commented Dec 2, 2020

The results (reduction in average time):
algo1 3.2%
algo2 9.9%

@felipou felipou marked this pull request as ready for review December 2, 2020 02:06
@oranagra
Copy link
Member

oranagra commented Dec 2, 2020

well, the improvement isn't huge, but it's certainly there, and there's probably no regression (zunion -1.4% is probably a flaky benchmark).

if we went that distance, let's make sure of that before we close this campaign.
i.e. please re-test zunion, if the new iteration is consistently being slower, we may want to go the other way for it, but i suppose they're both the same.

@felipou
Copy link
Contributor Author

felipou commented Dec 2, 2020

In the case of large (100k) ziplists, it seems the zunion is usually a little bit slower. It's hard to say for sure without using deeper statistics. I ran the zunion benchmark multiple times, and then ran again for skiplists to check again:

zunion ziplist 100k

unstable
cmdstat_zunion {'calls': 30, 'usec': 12566170, 'usec_per_call': 418872.34}
cmdstat_zunion {'calls': 30, 'usec': 12633060, 'usec_per_call': 421102.0}
cmdstat_zunion {'calls': 30, 'usec': 12857628, 'usec_per_call': 428587.59}
cmdstat_zunion {'calls': 30, 'usec': 12899303, 'usec_per_call': 429976.78}

optimize-zdiffinterunion-insertion
cmdstat_zunion {'calls': 30, 'usec': 12935758, 'usec_per_call': 431191.94}
cmdstat_zunion {'calls': 30, 'usec': 12986985, 'usec_per_call': 432899.5}
cmdstat_zunion {'calls': 30, 'usec': 12490244, 'usec_per_call': 416341.47}
cmdstat_zunion {'calls': 30, 'usec': 13053896, 'usec_per_call': 435129.88}



zunion skiplist 100k

unstable
cmdstat_zunion {'calls': 30, 'usec': 11332653, 'usec_per_call': 377755.09}
cmdstat_zunion {'calls': 30, 'usec': 11396162, 'usec_per_call': 379872.06}
cmdstat_zunion {'calls': 30, 'usec': 11323758, 'usec_per_call': 377458.59}
cmdstat_zunion {'calls': 30, 'usec': 11358097, 'usec_per_call': 378603.22}

optimize-zdiffinterunion-insertion
cmdstat_zunion {'calls': 30, 'usec': 11060640, 'usec_per_call': 368688.0}
cmdstat_zunion {'calls': 30, 'usec': 11294824, 'usec_per_call': 376494.12}
cmdstat_zunion {'calls': 30, 'usec': 11564499, 'usec_per_call': 385483.31}
cmdstat_zunion {'calls': 30, 'usec': 11441641, 'usec_per_call': 381388.03}

I optimized the benchmark code to dump the binary values and load from a file, now it's a lot faster because we don't need to build the zsets every time, which was the slowest part of the benchmark. So now I can try anything it else very fast, and using the same data structures as input.

@oranagra
Copy link
Member

oranagra commented Dec 3, 2020

from the numbers you posted, it looks like the average regression for ZUNION is about 1% for ziplists.
and about 0.1% improvement for skiplists.

considering ziplists are not expected to grow that big (100k entries), i think we should ignore this.
adding an option to iterate the other way just for this case is an unnecessary code complication IMHO.

@oranagra oranagra merged commit 4cd1fb1 into redis:unstable Dec 3, 2020
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Mar 2, 2021
…edis#8105)

In the iterator for these functions, we'll traverse the sorted sets
in a reversed way so that largest elements come first. We prefer
this order because it's optimized for insertion in a skiplist, which
is the destination of the elements being iterated in there functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants