Iterate backwards on zdiff/zinter/zunion to optimize for zslInsert by felipou · Pull Request #8105 · redis/redis

felipou · 2020-11-28T23:30:39Z

In the iterator for these functions, we'll traverse the sorted sets
in a reversed way so that largest elements come first. We prefer
this order because it's optimized for insertion in a skiplist, which
is the destination of the elements being iterated in these functions.

I didn't modify the iteration code for sets (since it isn't ordered anyway),
and I also kept the function names (didn't rename to "reverse" or something),
just added a comment to zuiInitIterator. I feel like this should be further
documented, but I wasn't sure where to put further explanations.

I just noticed that the "ui" in "zuiInitIterator" is probably "union" and "inter".
Maybe we should rename all "zui" functions to "zuid"?

In the iterator for these functions, we'll traverse the sorted sets in a reversed way so that largest elements come first. We prefer this order because it's optimized for insertion in a skiplist, which is the destination of the elements being iterated in there functions.

felipou · 2020-11-28T23:31:39Z

@oranagra

src/t_zset.c

oranagra · 2020-11-29T07:59:36Z

My initial thought was that this iterator (zuiInitIterator) should take a boolean input indicating if the iteration should be forward or backward, and support both, but if it's only gonna be used in one direction, i guess there's no need for that.

Regarding the name, interesting observation about the "ui" prefix, but in that case we should have changed it to "uid" (union, intersection, diff), but IMHO that's lame.
Note that the object it uses for the iteration is called "zsetopval" (i.e. "op" for "operation"). that's looks better IMHO, so if anything we can rename all of these to "op".
However, i don't mind leaving them as is for now.

I would like to ask that you conduct a simple benchmark on big objects to show that this change really improves performance (or at least doesn't hurt it). i.e. run ZUIONSTORE, ZINTERSTORE and ZDIFFSTORE (both algorithms) on large sets with the old and new code and measure the time (maybe using CONFIG RESETSTAT and INFO COMMANDSTATS). maybe do that once on skiplists, and once for ziplist (after setting zset-max-ziplist-entries to be very high).

IIUC in ZUNION, there's no benefit of doing backwards or forward, maybe we'll notice that forward if more cache friendly.

itamarhaber · 2020-11-29T10:06:51Z

Heya @felipou and thanks for this. Would it possible to back up the "We prefer this order..." claim with a simple before/after benchmark?

felipou · 2020-11-29T16:11:34Z

Heya @felipou and thanks for this. Would it possible to back up the "We prefer this order..." claim with a simple before/after benchmark?

@itamarhaber Yes, will do, I'll follow @oranagra suggestions!

felipou · 2020-11-29T16:16:08Z

My initial thought was that this iterator (zuiInitIterator) should take a boolean input indicating if the iteration should be forward or backward, and support both, but if it's only gonna be used in one direction, i guess there's no need for that.

I started thinking something along those lines, I was going to implement a zuiInitReverseIterator and zuiPrev, but when I noticed that we could use the same direction in all cases, so I decided it was simpler to just always reverse the order.

Regarding the name, interesting observation about the "ui" prefix, but in that case we should have changed it to "uid" (union, intersection, diff), but IMHO that's lame.
Note that the object it uses for the iteration is called "zsetopval" (i.e. "op" for "operation"). that's looks better IMHO, so if anything we can rename all of these to "op".
However, i don't mind leaving them as is for now.

Ok, I'll leave them as is (good idea for the new name though).

I would like to ask that you conduct a simple benchmark on big objects to show that this change really improves performance (or at least doesn't hurt it). i.e. run ZUIONSTORE, ZINTERSTORE and ZDIFFSTORE (both algorithms) on large sets with the old and new code and measure the time (maybe using CONFIG RESETSTAT and INFO COMMANDSTATS). maybe do that once on skiplists, and once for ziplist (after setting zset-max-ziplist-entries to be very high).

IIUC in ZUNION, there's no benefit of doing backwards or forward, maybe we'll notice that forward if more cache friendly.

Ok, I'll try to run some benchmarks today, thanks for the tips!

felipou · 2020-11-29T20:08:10Z

I create a simple script for an initial benchmark with Python: https://gist.github.com/felipou/50a697856a24e1910f332648f34ba94c

There are the results so far:

optimize-zdiffinterunion-insertion branch:
$ python bench_zdiff.py 
cmdstat_zdiffstore {'calls': 30, 'usec': 61575448, 'usec_per_call': 2052514.88}
cmdstat_zunionstore {'calls': 30, 'usec': 81069605, 'usec_per_call': 2702320.25}


unstable branch:
$ python bench_zdiff.py 
cmdstat_zdiffstore {'calls': 30, 'usec': 63695331, 'usec_per_call': 2123177.75}
cmdstat_zunionstore {'calls': 30, 'usec': 82097329, 'usec_per_call': 2736577.5}

Which shows a 3.33% reduction in time for zdiff, and 1.25% reduction in time for zunion. I expected better results, but upon better inspecting the zslInsert code, I've come to think that the gain in performance for inserting this way isn't that great compared to the constant times of the algorithm.

But anyway, if you think this seems worth it, I can do some improved benchmarks (get the standard deviation to check if there really was a significant improvement, for example), and also create a benchmark for zinter.

itamarhaber · 2020-11-29T20:31:34Z

I create a simple script for an initial benchmark...

Nicely done, thank you!

oranagra · 2020-11-29T21:36:14Z

i also expected bigger impact, maybe if you'll use bigger objects the impact will be greater. (now each call takes just 2ms, maybe scale by 100 so that each call takes 200ms).

since it's not a big effort and you're almost there, i'd rather completing the effort (i'm paranoid that in some of the cases maybe we're causing a damage and we won't realize it).
i don't think we need standard deviation, either the impact will be clearly apparent, or not. if not, then at least we didn't cause an apparent damage.

i do think we want to cover all cases (i.e. including ziplists, and both zdiff algorithms)... assuming the effort isn't big.
p.s. ziplists may be slow, so you won't need very big objects (although still bigger than the default zset-max-ziplist-entries.
p.p.s. if the insertion takes too long, IMHO you can drop the random score for ZADD and either use sequential scores, or just save these to an RDB file to be re-used by the followup tests.

felipou · 2020-11-29T21:58:55Z

i also expected bigger impact, maybe if you'll use bigger objects the impact will be greater. (now each call takes just 2ms, maybe scale by 100 so that each call takes 200ms).

It's actually around 2s. At list that's what I observed, and I assumed "usec" was microseconds, so 10^-6 seconds, giving 2.12 seconds from the result of 2123177.75 usec per call. Previously I had tested with 10 million elements, and it took around 20 seconds.

since it's not a big effort and you're almost there, i'd rather completing the effort (i'm paranoid that in some of the cases maybe we're causing a damage and we won't realize it).
i don't think we need standard deviation, either the impact will be clearly apparent, or not. if not, then at least we didn't cause an apparent damage.

i do think we want to cover all cases (i.e. including ziplists, and both zdiff algorithms)... assuming the effort isn't big.
p.s. ziplists may be slow, so you won't need very big objects (although still bigger than the default zset-max-ziplist-entries.
p.p.s. if the insertion takes too long, IMHO you can drop the random score for ZADD and either use sequential scores, or just save these to an RDB file to be re-used by the followup tests.

No problem, I don't think the effort will be too big. I'll come up with a zinter benchmark, then try to do some optimization trick as you suggested to avoid the slow process of zadds. Then I'll just prepare the code to run everything and that's it. Should be finished by tomorrow.

felipou · 2020-12-01T04:00:10Z

Just updated the benchmark codes, here are the results for the skiplist (the percentages are the reduction in average time):

zdiffstore   8.2%
zunionstore  0.7%
zinterstore  4.9%
zdiff        6.5%
zunion       0.8%
zinter       4.4%

Raw results:

unstable branch

cmdstat_zdiffstore {'calls': 30, 'usec': 20479755, 'usec_per_call': 682658.56}
cmdstat_zunionstore {'calls': 30, 'usec': 82061999, 'usec_per_call': 2735400.0}
cmdstat_zinterstore {'calls': 30, 'usec': 12285768, 'usec_per_call': 409525.59}
cmdstat_zdiff {'calls': 30, 'usec': 26110391, 'usec_per_call': 870346.38}
cmdstat_zunion {'calls': 30, 'usec': 96385441, 'usec_per_call': 3212848.0}
cmdstat_zinter {'calls': 30, 'usec': 14372887, 'usec_per_call': 479096.22}


optimize-zdiffinterunion-insertion branch

cmdstat_zdiffstore {'calls': 30, 'usec': 18801110, 'usec_per_call': 626703.69}
cmdstat_zunionstore {'calls': 30, 'usec': 81522011, 'usec_per_call': 2717400.25}
cmdstat_zinterstore {'calls': 30, 'usec': 11679567, 'usec_per_call': 389318.91}
cmdstat_zdiff {'calls': 30, 'usec': 24413496, 'usec_per_call': 813783.19}
cmdstat_zunion {'calls': 30, 'usec': 95623414, 'usec_per_call': 3187447.25}
cmdstat_zinter {'calls': 30, 'usec': 13741229, 'usec_per_call': 458040.97}

Now I'll benchmark with ziplists (setting zset-max-ziplist-entries as @oranagra suggested), will post the results soon.

oranagra · 2020-12-01T06:44:34Z

is there a reason to benchmark both zdiff and zdiffstore? shouldn't they give similar results? (8% and 6% are similar in my eyes 8-).
on the other hand i think it would be good to benchmark both zdiff algorithms.

itamarhaber · 2020-12-01T08:59:05Z

FYI @filipecosta90

felipou · 2020-12-02T01:34:43Z

is there a reason to benchmark both zdiff and zdiffstore? shouldn't they give similar results? (8% and 6% are similar in my eyes 8-).
on the other hand i think it would be good to benchmark both zdiff algorithms.

Actually no, I thought someone had asked for it, but now I see that I was mistaken. Anyway, I just finished the benchmarks forcing ziplists with size 50k (it was already slow enough to build the zsets with this size):

zdiffstore   3.7%
zunionstore  1.1%
zinterstore  0.9%
zdiff        0.6%
zunion       -1.4%
zinter       1.2%

And the raw results:

unstable branch

cmdstat_zdiffstore {'calls': 30, 'usec': 3614248, 'usec_per_call': 120474.93}
cmdstat_zunionstore {'calls': 30, 'usec': 6515481, 'usec_per_call': 217182.7}
cmdstat_zinterstore {'calls': 30, 'usec': 212673932, 'usec_per_call': 7089131.0}
cmdstat_zdiff {'calls': 30, 'usec': 3067841, 'usec_per_call': 102261.37}
cmdstat_zunion {'calls': 30, 'usec': 4973754, 'usec_per_call': 165791.8}
cmdstat_zinter {'calls': 30, 'usec': 212498954, 'usec_per_call': 7083298.5}


optimize-zdiffinterunion-insertion branch

cmdstat_zdiffstore {'calls': 30, 'usec': 3479659, 'usec_per_call': 115988.63}
cmdstat_zunionstore {'calls': 30, 'usec': 6444188, 'usec_per_call': 214806.27}
cmdstat_zinterstore {'calls': 30, 'usec': 210749675, 'usec_per_call': 7024989.5}
cmdstat_zdiff {'calls': 30, 'usec': 3049990, 'usec_per_call': 101666.34}
cmdstat_zunion {'calls': 30, 'usec': 5042516, 'usec_per_call': 168083.86}
cmdstat_zinter {'calls': 30, 'usec': 210030685, 'usec_per_call': 7001023.0}

If I'm not mistaken, I believe the zslInserts will always be performed in a skiplist, I think that's how the target zset is initialized on these algorithms. Since we're using much smaller zsets (the previous benchmark I ran with zsets with 1 million elements, and now just 50k), I think this is why it didn't show a larger performance gain.

I'll try to benchmark both zdiff algorithms now using a skiplist.

felipou · 2020-12-02T02:06:21Z

The results (reduction in average time):
algo1 3.2%
algo2 9.9%

oranagra · 2020-12-02T07:05:27Z

well, the improvement isn't huge, but it's certainly there, and there's probably no regression (zunion -1.4% is probably a flaky benchmark).

if we went that distance, let's make sure of that before we close this campaign.
i.e. please re-test zunion, if the new iteration is consistently being slower, we may want to go the other way for it, but i suppose they're both the same.

felipou · 2020-12-02T23:47:38Z

In the case of large (100k) ziplists, it seems the zunion is usually a little bit slower. It's hard to say for sure without using deeper statistics. I ran the zunion benchmark multiple times, and then ran again for skiplists to check again:

zunion ziplist 100k

unstable
cmdstat_zunion {'calls': 30, 'usec': 12566170, 'usec_per_call': 418872.34}
cmdstat_zunion {'calls': 30, 'usec': 12633060, 'usec_per_call': 421102.0}
cmdstat_zunion {'calls': 30, 'usec': 12857628, 'usec_per_call': 428587.59}
cmdstat_zunion {'calls': 30, 'usec': 12899303, 'usec_per_call': 429976.78}

optimize-zdiffinterunion-insertion
cmdstat_zunion {'calls': 30, 'usec': 12935758, 'usec_per_call': 431191.94}
cmdstat_zunion {'calls': 30, 'usec': 12986985, 'usec_per_call': 432899.5}
cmdstat_zunion {'calls': 30, 'usec': 12490244, 'usec_per_call': 416341.47}
cmdstat_zunion {'calls': 30, 'usec': 13053896, 'usec_per_call': 435129.88}



zunion skiplist 100k

unstable
cmdstat_zunion {'calls': 30, 'usec': 11332653, 'usec_per_call': 377755.09}
cmdstat_zunion {'calls': 30, 'usec': 11396162, 'usec_per_call': 379872.06}
cmdstat_zunion {'calls': 30, 'usec': 11323758, 'usec_per_call': 377458.59}
cmdstat_zunion {'calls': 30, 'usec': 11358097, 'usec_per_call': 378603.22}

optimize-zdiffinterunion-insertion
cmdstat_zunion {'calls': 30, 'usec': 11060640, 'usec_per_call': 368688.0}
cmdstat_zunion {'calls': 30, 'usec': 11294824, 'usec_per_call': 376494.12}
cmdstat_zunion {'calls': 30, 'usec': 11564499, 'usec_per_call': 385483.31}
cmdstat_zunion {'calls': 30, 'usec': 11441641, 'usec_per_call': 381388.03}

I optimized the benchmark code to dump the binary values and load from a file, now it's a lot faster because we don't need to build the zsets every time, which was the slowest part of the benchmark. So now I can try anything it else very fast, and using the same data structures as input.

oranagra · 2020-12-03T07:24:01Z

from the numbers you posted, it looks like the average regression for ZUNION is about 1% for ziplists.
and about 0.1% improvement for skiplists.

considering ziplists are not expected to grow that big (100k entries), i think we should ignore this.
adding an option to iterate the other way just for this case is an unnecessary code complication IMHO.

src/t_zset.c

…edis#8105) In the iterator for these functions, we'll traverse the sorted sets in a reversed way so that largest elements come first. We prefer this order because it's optimized for insertion in a skiplist, which is the destination of the elements being iterated in there functions.

felipou mentioned this pull request Nov 28, 2020

Add new commands ZDIFF and ZDIFFSTORE #7961

Merged

felipou marked this pull request as draft November 28, 2020 23:31

oranagra reviewed Nov 29, 2020

View reviewed changes

src/t_zset.c Show resolved Hide resolved

felipou marked this pull request as ready for review December 2, 2020 02:06

oranagra reviewed Dec 3, 2020

View reviewed changes

src/t_zset.c Outdated Show resolved Hide resolved

src/t_zset.c Outdated Show resolved Hide resolved

clarification in comments.

b398354

oranagra approved these changes Dec 3, 2020

View reviewed changes

oranagra merged commit 4cd1fb1 into redis:unstable Dec 3, 2020

Conversation

felipou commented Nov 28, 2020

Uh oh!

felipou commented Nov 28, 2020

Uh oh!

Uh oh!

oranagra commented Nov 29, 2020

Uh oh!

itamarhaber commented Nov 29, 2020

Uh oh!

felipou commented Nov 29, 2020

Uh oh!

felipou commented Nov 29, 2020

Uh oh!

felipou commented Nov 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itamarhaber commented Nov 29, 2020

Uh oh!

oranagra commented Nov 29, 2020

Uh oh!

felipou commented Nov 29, 2020

Uh oh!

felipou commented Dec 1, 2020

Uh oh!

oranagra commented Dec 1, 2020

Uh oh!

itamarhaber commented Dec 1, 2020

Uh oh!

felipou commented Dec 2, 2020

Uh oh!

felipou commented Dec 2, 2020

Uh oh!

oranagra commented Dec 2, 2020

Uh oh!

felipou commented Dec 2, 2020

Uh oh!

oranagra commented Dec 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

felipou commented Nov 29, 2020 •

edited

Loading

oranagra commented Dec 3, 2020 •

edited

Loading