⚡🔎 Abridge `SequenceSet#inspect` output for more than 512 entries #502

nevans · 2025-08-07T21:59:06Z

When using SequenceSet to store mailbox state for mailboxes with many millions of messages, the last thing you want to do is accidentally generate a string and output that!

So this establishes a maximum number of entries to output as an inspect string, currently 512. Above that number, we output only a much smaller number of entries from the head and tail, currently 16 from each end.

Net::IMAP::SequenceSet(((1..5000) % 2).to_a).inspect
#=> #<Net::IMAP::SequenceSet 2500 entries "1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,...(2468 entries omitted)...,4969,4971,4973,4975,4977,4979,4981,4983,4985,4987,4989,4991,4993,4995,4997,4999">

A note on the implementation

When the sequence set has an internal denormalized string, we use regexps to grab the first and last groups of entries from that.

At first, I used a simple Regexp match anchored to \z to grab the tail entries from the string. But that was shockingly slow even when the string isn't extremely large.

Next I benchmarked replacing Regexp#match with String#rpartition and that performed fine, using the same naive regexps. String#rindex was also fine. They are both slower than Regexp#match anchored to \A, but not too bad and not dramatically slower based on string size. I found myself wishing for a String#rscan or String#rsplit. 😆

Then I benchmarked replacing \d+ with \d{0,10} or [1-9]\d{1,9}, and those both made the regexp fast... but Regexp.linear_time? doesn't like nested quantifiers. Fully expanding both the \d+ and the (entry){100} keeps it fast and satisfies Regexp.linear_time?.

But that regexp is UGLY! 🙁

When using SequenceSet to store mailbox state for mailboxes with many millions of messages, the last thing you want to do is accidentally generate a string and output that! So this establishes a maximum number of entries to output as an inspect string, currently 512. Above that number, we output only a much smaller number of entries from the head and tail, currently 16 from each end. When the sequence set has an internal denormalized string, we use regexps to grab the first and last groups of entries from that. A note on the implementation: At first, I used a simple Regexp match anchored to `\z` to grab the tail entries from the string. But that was shockingly slow even when the string isn't extremely large. Next I benchmarked replacing `Regexp#match` with `String#rpartition` and that performed fine, using the same naive regexps. `String#rindex` was also fine. They are both slower than `Regexp#match` anchored to `\A`, but not too bad and not dramatically slower based on string size. Then I benchmarked replacing `\d+` with `\d{0,10}` or `[1-9]\d{1,9}`, and those both made the regexp fast... but `Regexp.linear_time?` doesn't like nested quantifiers. Fully expanding both the `\d+` and the `(entry){100}` keeps it fast _and_ satisfies `Regexp.linear_time?`. But that regexp is UGLY! 🙁

nevans · 2025-08-28T20:27:48Z

lib/net/imap/sequence_set.rb

+      #
+      # This unrolls all nested quantifiers.  It's much harder to read, but it's
+      # also the fastest out of all the versions I tested.
+      nz_uint32   = /[1-9](?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d)?)?)?)?)?)?)?)?)?/


n.b:

# max digits in nz-number (2**32 - 1).digits.count => 10 # to unroll /[1-9]\d{0,9}/ /[1-9]#{9.times.inject(//) { /(?:\d#{_1.source})?/ }.source}/ == /[1-9](?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d(?:\d)?)?)?)?)?)?)?)?)?/ => true

📚 Update SequenceSet string rdoc

123b632

nevans changed the title ~~⚡🔎 Abridged SequenceSet#inspect output~~ ⚡🔎 Abridged SequenceSet#inspect output Aug 7, 2025

nevans force-pushed the sequence_set/abridged-inspect branch from a9f8422 to ef3a6e7 Compare August 7, 2025 22:01

nevans force-pushed the sequence_set/abridged-inspect branch from ef3a6e7 to 3585984 Compare August 7, 2025 22:13

nevans merged commit bac67ca into master Aug 7, 2025
37 checks passed

nevans deleted the sequence_set/abridged-inspect branch August 7, 2025 22:39

nevans commented Aug 28, 2025

View reviewed changes

nevans changed the title ~~⚡🔎 Abridged SequenceSet#inspect output~~ ⚡🔎 Abridge SequenceSet#inspect output for more than 512 entries Aug 28, 2025

nevans added the enhancement New feature or request label Sep 30, 2025

nevans added the sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class. label Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡🔎 Abridge `SequenceSet#inspect` output for more than 512 entries #502

⚡🔎 Abridge `SequenceSet#inspect` output for more than 512 entries #502

Uh oh!

nevans commented Aug 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

nevans Aug 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡🔎 Abridge SequenceSet#inspect output for more than 512 entries #502

⚡🔎 Abridge SequenceSet#inspect output for more than 512 entries #502

Uh oh!

Conversation

nevans commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A note on the implementation

Uh oh!

Uh oh!

nevans Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

⚡🔎 Abridge `SequenceSet#inspect` output for more than 512 entries #502

⚡🔎 Abridge `SequenceSet#inspect` output for more than 512 entries #502

nevans commented Aug 7, 2025 •

edited

Loading

nevans Aug 28, 2025 •

edited

Loading