⚡ Avoid allocating arrays for SequenceSet bsearch (♻️ extract abstract strategy methods) #569
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This was primarily written as a big step towards replacing the internal data structure used by SequenceSet, from an array of
[min, max]tuples, into either two arrays (mins and maxes), packing the u32x2 runs into u64 Integers (no allocation required), or chunks---where each chunk could implement its own strategy to match the data: a simple array of Integers, a packed string of 16x2 runs, an 8kb bitmap string (2**16 bits), or whatever.So, most of the commits refactor away from the assumption that the
minmaxesarray/enum is cheap, and prefer to use the more abstract "runs" or some abstraction that ultimately just delegates tominmaxes(for now)I expected it to have similar or worse performance vs the original. And, after running benchmarks against every commit, some of them do slow down some benchmarks a little bit. But, I was surprised to find that (when YJIT is enabled) overall performance was actually improved in many of the most important benchmarks. The bulk of that speedup happens in the last commit: ♻️ Avoid using minmax arrays in SequenceSet update. And that commit wouldn't be possible without several of the earlier commits that individually reduced performance.
Benchmark results
As you can see, the "new 1,000" benchmark gains are modest, but they held up consistently across multiple runs. (I have another upcoming PR that gives a much bigger boost to those.) But the set ops benchmarks are more significant: ~10% for union, difference, intersection, xor, and ~5% for complement.
A bit caveat: I haven't (recently) checked these numbers without YJIT. But, if you're in a situation to impacted by SequenceSet performance (most users aren't), then you should definitely enable YJIT! I've found that, at least since 3.4, SequenceSet is always faster with YJIT.
FWIW, I ran these benchmarks with local versions of the gem, made from 559b86a and a228ee1, and the following patch: