Skip to content

Conversation

@nevans
Copy link
Collaborator

@nevans nevans commented Dec 10, 2025

This was primarily written as a big step towards replacing the internal data structure used by SequenceSet, from an array of [min, max] tuples, into either two arrays (mins and maxes), packing the u32x2 runs into u64 Integers (no allocation required), or chunks---where each chunk could implement its own strategy to match the data: a simple array of Integers, a packed string of 16x2 runs, an 8kb bitmap string (2**16 bits), or whatever.

So, most of the commits refactor away from the assumption that the minmaxes array/enum is cheap, and prefer to use the more abstract "runs" or some abstraction that ultimately just delegates to minmaxes (for now)

I expected it to have similar or worse performance vs the original. And, after running benchmarks against every commit, some of them do slow down some benchmarks a little bit. But, I was surprised to find that (when YJIT is enabled) overall performance was actually improved in many of the most important benchmarks. The bulk of that speedup happens in the last commit: ♻️ Avoid using minmax arrays in SequenceSet update. And that commit wouldn't be possible without several of the earlier commits that individually reduced performance.

Benchmark results

As you can see, the "new 1,000" benchmark gains are modest, but they held up consistently across multiple runs. (I have another upcoming PR that gives a much bigger boost to those.) But the set ops benchmarks are more significant: ~10% for union, difference, intersection, xor, and ~5% for complement.

A bit caveat: I haven't (recently) checked these numbers without YJIT. But, if you're in a situation to impacted by SequenceSet performance (most users aren't), then you should definitely enable YJIT! I've found that, at least since 3.4, SequenceSet is always faster with YJIT.

$ export RUBY_YJIT_ENABLE=1
$ export BENCHMARK_WARMUP_RUNS=1000
$ export BENCHMARK_RANDOM_SEED=42
$ benchmark-driver benchmarks/sequence_set-new.yml --filter 1,000
Warming up --------------------------------------
n=  1,000   ints   (sorted)     2.038k i/s -      2.233k times in 1.095582s (490.63μs/i)
n=  1,000 string   (sorted)     2.043k i/s -      2.050k times in 1.003333s (489.43μs/i)
n=  1,000   ints (shuffled)     1.910k i/s -      1.910k times in 1.000154s (523.64μs/i)
n=  1,000 string (shuffled)     1.618k i/s -      1.640k times in 1.013639s (618.07μs/i)
Calculating -------------------------------------
                                 local  53-a228ee1d5  38-559b86a8b     v0.5.12 
n=  1,000   ints   (sorted)     2.067k        2.025k        2.012k      1.333k i/s -      6.114k times in 2.957376s 3.019141s 3.038651s 4.586535s
n=  1,000 string   (sorted)     2.039k        2.025k        2.000k      1.559k i/s -      6.129k times in 3.006008s 3.026695s 3.064887s 3.930728s
n=  1,000   ints (shuffled)     1.799k        1.835k        1.812k      1.256k i/s -      5.729k times in 3.184944s 3.121641s 3.161801s 4.560806s
n=  1,000 string (shuffled)     1.509k        1.465k        1.465k      1.466k i/s -      4.853k times in 3.215193s 3.311855s 3.312962s 3.310028s

Comparison:
             n=  1,000   ints   (sorted)
                      local:      2067.4 i/s 
               53-a228ee1d5:      2025.1 i/s - 1.02x  slower
               38-559b86a8b:      2012.1 i/s - 1.03x  slower
                    v0.5.12:      1333.0 i/s - 1.55x  slower

             n=  1,000 string   (sorted)
                      local:      2038.9 i/s 
               53-a228ee1d5:      2025.0 i/s - 1.01x  slower
               38-559b86a8b:      1999.7 i/s - 1.02x  slower
                    v0.5.12:      1559.3 i/s - 1.31x  slower

             n=  1,000   ints (shuffled)
               53-a228ee1d5:      1835.3 i/s 
               38-559b86a8b:      1811.9 i/s - 1.01x  slower
                      local:      1798.8 i/s - 1.02x  slower
                    v0.5.12:      1256.1 i/s - 1.46x  slower

             n=  1,000 string (shuffled)
                      local:      1509.4 i/s 
                    v0.5.12:      1466.2 i/s - 1.03x  slower
               53-a228ee1d5:      1465.3 i/s - 1.03x  slower
               38-559b86a8b:      1464.9 i/s - 1.03x  slower
$ benchmark-driver benchmarks/sequence_set-ops.yml 
Warming up --------------------------------------
               union     2.576k i/s -      2.827k times in 1.097469s (388.21μs/i)
        intersection     2.203k i/s -      2.409k times in 1.093279s (453.83μs/i)
          difference     2.906k i/s -      2.910k times in 1.001249s (344.07μs/i)
                 xor     1.054k i/s -      1.155k times in 1.096150s (949.05μs/i)
          complement     5.367k i/s -      5.885k times in 1.096465s (186.32μs/i)
Calculating -------------------------------------
                          local  53-a228ee1d5  38-559b86a8b     v0.5.12 
               union     2.400k        2.178k        2.169k      2.178k i/s -      7.727k times in 3.219364s 3.547838s 3.562686s 3.548432s
        intersection     2.045k        1.837k        1.875k      1.888k i/s -      6.610k times in 3.232691s 3.598856s 3.525360s 3.501345s
          difference     2.847k        2.509k        2.535k      2.564k i/s -      8.719k times in 3.062715s 3.474725s 3.439710s 3.401161s
                 xor    991.799       896.170       895.933     861.054 i/s -      3.161k times in 3.187137s 3.527233s 3.528164s 3.671081s
          complement     5.143k        4.814k        4.875k      4.829k i/s -     16.101k times in 3.130841s 3.344953s 3.302562s 3.334549s

Comparison:
                            union
               local:      2400.2 i/s 
        53-a228ee1d5:      2177.9 i/s - 1.10x  slower
             v0.5.12:      2177.6 i/s - 1.10x  slower
        38-559b86a8b:      2168.9 i/s - 1.11x  slower

                     intersection
               local:      2044.7 i/s 
             v0.5.12:      1887.8 i/s - 1.08x  slower
        38-559b86a8b:      1875.0 i/s - 1.09x  slower
        53-a228ee1d5:      1836.7 i/s - 1.11x  slower

                       difference
               local:      2846.8 i/s 
             v0.5.12:      2563.5 i/s - 1.11x  slower
        38-559b86a8b:      2534.8 i/s - 1.12x  slower
        53-a228ee1d5:      2509.3 i/s - 1.13x  slower

                              xor
               local:       991.8 i/s 
        53-a228ee1d5:       896.2 i/s - 1.11x  slower
        38-559b86a8b:       895.9 i/s - 1.11x  slower
             v0.5.12:       861.1 i/s - 1.15x  slower

                       complement
               local:      5142.7 i/s 
        38-559b86a8b:      4875.3 i/s - 1.05x  slower
             v0.5.12:      4828.5 i/s - 1.07x  slower
        53-a228ee1d5:      4813.5 i/s - 1.07x  slower

FWIW, I ran these benchmarks with local versions of the gem, made from 559b86a and a228ee1, and the following patch:

diff --git a/benchmarks/sequence_set-new.yml b/benchmarks/sequence_set-new.yml
--- a/benchmarks/sequence_set-new.yml
+++ b/benchmarks/sequence_set-new.yml
@@ -129,21 +129,16 @@ contexts:
   - name: local
     prelude: |
       $LOAD_PATH.unshift "./lib"
-      $allowed_to_profile = true # only profile local code
     require: false
-  - name: v0.5.12
-    gems:
-      net-imap: 0.5.12
-    require: false
-  - name: v0.5.9
+  - name: 53-a228ee1d5
     gems:
-      net-imap: 0.5.9
+      net-imap: 0.6.0.dev.53.ga228ee1d5
     require: false
-  - name: v0.5.0
+  - name: 38-559b86a8b
     gems:
-      net-imap: 0.5.0
+      net-imap: 0.6.0.dev.38.g559b86a8b
     require: false
-  - name: v0.4.21
+  - name: v0.5.12
     gems:
-      net-imap: 0.4.21
+      net-imap: 0.5.12
     require: false
diff --git a/benchmarks/sequence_set-ops.yml b/benchmarks/sequence_set-ops.yml
--- a/benchmarks/sequence_set-ops.yml
+++ b/benchmarks/sequence_set-ops.yml
@@ -39,15 +39,15 @@ contexts:
     prelude: |
       $LOAD_PATH.unshift "./lib"
     require: false
-  - name: v0.5.9
+  - name: 53-a228ee1d5
     gems:
-      net-imap: 0.5.9
+      net-imap: 0.6.0.dev.53.ga228ee1d5
     require: false
-  - name: v0.5.0
+  - name: 38-559b86a8b
     gems:
-      net-imap: 0.5.0
+      net-imap: 0.6.0.dev.38.g559b86a8b
     require: false
-  - name: v0.4.21
+  - name: v0.5.12
     gems:
-      net-imap: 0.4.21
+      net-imap: 0.5.12
     require: false

Some benchmarks are very noisy—the slice benchmark in particular—and
results can vary wildly based on random data.

These benchmarks would benefit by generating fixture data up-front.
That way, they can all run against the same data sets, but the data can
still be randomized.  Alternatively, using srand may also help.
This pulls out the code to initialize, clone, or freeze `@set_data`, to
make it easier to replace all of the set_data methods (or move them into
a SetData class) later.

Additionally, `#clear` now calls `@set_data.clear` rather than
allocating a new array.
This abstracts away from using `minmaxes`, because a different
implementation will prefer to access these values differently.
Don't assign directly to minmax arrays, because we are refactoring
towards an implementation that doesn't store runs in arrays.
Extracted private methods for updating set_data, rather than directly
updating `minmaxes` or `runs` as arrays.  This provides a set of
abstract update methods that can be updated to replace the underlying
data structure.
@nevans nevans added the sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class. label Dec 10, 2025
@nevans nevans merged commit f361f6d into master Dec 10, 2025
32 checks passed
@nevans nevans deleted the sequence_set/extract-abstract-storage-impl-methods branch December 10, 2025 19:39
@nevans nevans added the performance related to CPU use, memory use, latency, etc label Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance related to CPU use, memory use, latency, etc sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants