⚡ Avoid allocating arrays for SequenceSet bsearch (♻️ extract abstract strategy methods) #569

nevans · 2025-12-10T15:32:00Z

This was primarily written as a big step towards replacing the internal data structure used by SequenceSet, from an array of [min, max] tuples, into either two arrays (mins and maxes), packing the u32x2 runs into u64 Integers (no allocation required), or chunks---where each chunk could implement its own strategy to match the data: a simple array of Integers, a packed string of 16x2 runs, an 8kb bitmap string (2**16 bits), or whatever.

So, most of the commits refactor away from the assumption that the minmaxes array/enum is cheap, and prefer to use the more abstract "runs" or some abstraction that ultimately just delegates to minmaxes (for now)

I expected it to have similar or worse performance vs the original. And, after running benchmarks against every commit, some of them do slow down some benchmarks a little bit. But, I was surprised to find that (when YJIT is enabled) overall performance was actually improved in many of the most important benchmarks. The bulk of that speedup happens in the last commit: ♻️ Avoid using minmax arrays in SequenceSet update. And that commit wouldn't be possible without several of the earlier commits that individually reduced performance.

Benchmark results

As you can see, the "new 1,000" benchmark gains are modest, but they held up consistently across multiple runs. (I have another upcoming PR that gives a much bigger boost to those.) But the set ops benchmarks are more significant: ~10% for union, difference, intersection, xor, and ~5% for complement.

A bit caveat: I haven't (recently) checked these numbers without YJIT. But, if you're in a situation to impacted by SequenceSet performance (most users aren't), then you should definitely enable YJIT! I've found that, at least since 3.4, SequenceSet is always faster with YJIT.

$ export RUBY_YJIT_ENABLE=1
$ export BENCHMARK_WARMUP_RUNS=1000
$ export BENCHMARK_RANDOM_SEED=42
$ benchmark-driver benchmarks/sequence_set-new.yml --filter 1,000
Warming up --------------------------------------
n=  1,000   ints   (sorted)     2.038k i/s -      2.233k times in 1.095582s (490.63μs/i)
n=  1,000 string   (sorted)     2.043k i/s -      2.050k times in 1.003333s (489.43μs/i)
n=  1,000   ints (shuffled)     1.910k i/s -      1.910k times in 1.000154s (523.64μs/i)
n=  1,000 string (shuffled)     1.618k i/s -      1.640k times in 1.013639s (618.07μs/i)
Calculating -------------------------------------
                                 local  53-a228ee1d5  38-559b86a8b     v0.5.12 
n=  1,000   ints   (sorted)     2.067k        2.025k        2.012k      1.333k i/s -      6.114k times in 2.957376s 3.019141s 3.038651s 4.586535s
n=  1,000 string   (sorted)     2.039k        2.025k        2.000k      1.559k i/s -      6.129k times in 3.006008s 3.026695s 3.064887s 3.930728s
n=  1,000   ints (shuffled)     1.799k        1.835k        1.812k      1.256k i/s -      5.729k times in 3.184944s 3.121641s 3.161801s 4.560806s
n=  1,000 string (shuffled)     1.509k        1.465k        1.465k      1.466k i/s -      4.853k times in 3.215193s 3.311855s 3.312962s 3.310028s

Comparison:
             n=  1,000   ints   (sorted)
                      local:      2067.4 i/s 
               53-a228ee1d5:      2025.1 i/s - 1.02x  slower
               38-559b86a8b:      2012.1 i/s - 1.03x  slower
                    v0.5.12:      1333.0 i/s - 1.55x  slower

             n=  1,000 string   (sorted)
                      local:      2038.9 i/s 
               53-a228ee1d5:      2025.0 i/s - 1.01x  slower
               38-559b86a8b:      1999.7 i/s - 1.02x  slower
                    v0.5.12:      1559.3 i/s - 1.31x  slower

             n=  1,000   ints (shuffled)
               53-a228ee1d5:      1835.3 i/s 
               38-559b86a8b:      1811.9 i/s - 1.01x  slower
                      local:      1798.8 i/s - 1.02x  slower
                    v0.5.12:      1256.1 i/s - 1.46x  slower

             n=  1,000 string (shuffled)
                      local:      1509.4 i/s 
                    v0.5.12:      1466.2 i/s - 1.03x  slower
               53-a228ee1d5:      1465.3 i/s - 1.03x  slower
               38-559b86a8b:      1464.9 i/s - 1.03x  slower
$ benchmark-driver benchmarks/sequence_set-ops.yml 
Warming up --------------------------------------
               union     2.576k i/s -      2.827k times in 1.097469s (388.21μs/i)
        intersection     2.203k i/s -      2.409k times in 1.093279s (453.83μs/i)
          difference     2.906k i/s -      2.910k times in 1.001249s (344.07μs/i)
                 xor     1.054k i/s -      1.155k times in 1.096150s (949.05μs/i)
          complement     5.367k i/s -      5.885k times in 1.096465s (186.32μs/i)
Calculating -------------------------------------
                          local  53-a228ee1d5  38-559b86a8b     v0.5.12 
               union     2.400k        2.178k        2.169k      2.178k i/s -      7.727k times in 3.219364s 3.547838s 3.562686s 3.548432s
        intersection     2.045k        1.837k        1.875k      1.888k i/s -      6.610k times in 3.232691s 3.598856s 3.525360s 3.501345s
          difference     2.847k        2.509k        2.535k      2.564k i/s -      8.719k times in 3.062715s 3.474725s 3.439710s 3.401161s
                 xor    991.799       896.170       895.933     861.054 i/s -      3.161k times in 3.187137s 3.527233s 3.528164s 3.671081s
          complement     5.143k        4.814k        4.875k      4.829k i/s -     16.101k times in 3.130841s 3.344953s 3.302562s 3.334549s

Comparison:
                            union
               local:      2400.2 i/s 
        53-a228ee1d5:      2177.9 i/s - 1.10x  slower
             v0.5.12:      2177.6 i/s - 1.10x  slower
        38-559b86a8b:      2168.9 i/s - 1.11x  slower

                     intersection
               local:      2044.7 i/s 
             v0.5.12:      1887.8 i/s - 1.08x  slower
        38-559b86a8b:      1875.0 i/s - 1.09x  slower
        53-a228ee1d5:      1836.7 i/s - 1.11x  slower

                       difference
               local:      2846.8 i/s 
             v0.5.12:      2563.5 i/s - 1.11x  slower
        38-559b86a8b:      2534.8 i/s - 1.12x  slower
        53-a228ee1d5:      2509.3 i/s - 1.13x  slower

                              xor
               local:       991.8 i/s 
        53-a228ee1d5:       896.2 i/s - 1.11x  slower
        38-559b86a8b:       895.9 i/s - 1.11x  slower
             v0.5.12:       861.1 i/s - 1.15x  slower

                       complement
               local:      5142.7 i/s 
        38-559b86a8b:      4875.3 i/s - 1.05x  slower
             v0.5.12:      4828.5 i/s - 1.07x  slower
        53-a228ee1d5:      4813.5 i/s - 1.07x  slower

FWIW, I ran these benchmarks with local versions of the gem, made from 559b86a and a228ee1, and the following patch:

diff --git a/benchmarks/sequence_set-new.yml b/benchmarks/sequence_set-new.yml
--- a/benchmarks/sequence_set-new.yml
+++ b/benchmarks/sequence_set-new.yml
@@ -129,21 +129,16 @@ contexts:
   - name: local
     prelude: |
       $LOAD_PATH.unshift "./lib"
-      $allowed_to_profile = true # only profile local code
     require: false
-  - name: v0.5.12
-    gems:
-      net-imap: 0.5.12
-    require: false
-  - name: v0.5.9
+  - name: 53-a228ee1d5
     gems:
-      net-imap: 0.5.9
+      net-imap: 0.6.0.dev.53.ga228ee1d5
     require: false
-  - name: v0.5.0
+  - name: 38-559b86a8b
     gems:
-      net-imap: 0.5.0
+      net-imap: 0.6.0.dev.38.g559b86a8b
     require: false
-  - name: v0.4.21
+  - name: v0.5.12
     gems:
-      net-imap: 0.4.21
+      net-imap: 0.5.12
     require: false
diff --git a/benchmarks/sequence_set-ops.yml b/benchmarks/sequence_set-ops.yml
--- a/benchmarks/sequence_set-ops.yml
+++ b/benchmarks/sequence_set-ops.yml
@@ -39,15 +39,15 @@ contexts:
     prelude: |
       $LOAD_PATH.unshift "./lib"
     require: false
-  - name: v0.5.9
+  - name: 53-a228ee1d5
     gems:
-      net-imap: 0.5.9
+      net-imap: 0.6.0.dev.53.ga228ee1d5
     require: false
-  - name: v0.5.0
+  - name: 38-559b86a8b
     gems:
-      net-imap: 0.5.0
+      net-imap: 0.6.0.dev.38.g559b86a8b
     require: false
-  - name: v0.4.21
+  - name: v0.5.12
     gems:
-      net-imap: 0.4.21
+      net-imap: 0.5.12
     require: false

Some benchmarks are very noisy—the slice benchmark in particular—and results can vary wildly based on random data. These benchmarks would benefit by generating fixture data up-front. That way, they can all run against the same data sets, but the data can still be randomized. Alternatively, using srand may also help.

This pulls out the code to initialize, clone, or freeze `@set_data`, to make it easier to replace all of the set_data methods (or move them into a SetData class) later. Additionally, `#clear` now calls `@set_data.clear` rather than allocating a new array.

This abstracts away from using `minmaxes`, because a different implementation will prefer to access these values differently.

Don't assign directly to minmax arrays, because we are refactoring towards an implementation that doesn't store runs in arrays.

Extracted private methods for updating set_data, rather than directly updating `minmaxes` or `runs` as arrays. This provides a set of abstract update methods that can be updated to replace the underlying data structure.

nevans added 10 commits December 10, 2025 09:48

♻️ Init SequenceSet#cardinality with runs.count

c890dc2

♻️ Reduce SequenceSet#normalized_string duplication

e7e29e5

♻️ Refactor SequenceSet's private min/max accessors

0a87931

This abstracts away from using `minmaxes`, because a different implementation will prefer to access these values differently.

♻️ Refactor SequenceSet's private bsearch methods

abf8714

♻️ Avoid assigning to minmax arrays in SequenceSet

11d846e

Don't assign directly to minmax arrays, because we are refactoring towards an implementation that doesn't store runs in arrays.

♻️ Refactor SequenceSet's private update primitives

9b57ba2

Extracted private methods for updating set_data, rather than directly updating `minmaxes` or `runs` as arrays. This provides a set of abstract update methods that can be updated to replace the underlying data structure.

♻️ Refactor SequenceSet's private update methods

e3e6c3f

♻️ Avoid using minmax arrays in SequenceSet update

c6d32dc

nevans added the sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class. label Dec 10, 2025

nevans merged commit f361f6d into master Dec 10, 2025
32 checks passed

nevans deleted the sequence_set/extract-abstract-storage-impl-methods branch December 10, 2025 19:39

nevans added the performance related to CPU use, memory use, latency, etc label Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡ Avoid allocating arrays for SequenceSet bsearch (♻️ extract abstract strategy methods) #569

⚡ Avoid allocating arrays for SequenceSet bsearch (♻️ extract abstract strategy methods) #569

Uh oh!

nevans commented Dec 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡ Avoid allocating arrays for SequenceSet bsearch (♻️ extract abstract strategy methods) #569

⚡ Avoid allocating arrays for SequenceSet bsearch (♻️ extract abstract strategy methods) #569

Uh oh!

Conversation

nevans commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nevans commented Dec 10, 2025 •

edited

Loading