♻️ Rename `SequenceSet` internals #562

nevans · 2025-12-06T17:44:29Z

This PR renames the poorly named @tuples ivar and all of the methods dealing with that data structure. It should not contain any significant refactoring besides renaming (and changing direct ivar access to use an attr_reader).

The instance variable and attribute were renamed to the much more abstract set_data. But most uses were updated to use either runs or minmaxes, based on how they are used:

#minmaxes is currently an alias for #set_data, but an upcoming PR will change how @set_data is structured. This prepares for that by documenting which code expects an array (or enum) of (min, max) tuples. Some of this code could remain unchanged if #minmaxes were changed to return an enumerator.
#runs is also currently an alias for #set_data, and is used to document code that expects an array of "runs" but can be abstract about how those runs are implemented. An upcoming PR will replace [min, max] arrays with u32x2 packed integers, which dramatically reduces memory allocations and speeds up some operations (although it does slow down others).

Most of the internal private methods were renamed:

import_{type} - for converting external inputs into internal types
parse_{type} - for converting strings into internal types
export_{type} - for converting internal types into exported types
bsearch_{type} - for binary searching set_data
other methods were renamed to replace tuple(s) with minmax(es) or run(s)

This refactoring is a small step towards #484.

* `str_to_tuples` => `parse_runs` * `str_to_tuple` => `parse_minmax` (alias: `parse_run`) * `parse_string_entry` => `parse_entry`

* `input_to_tuples` => `import_runs` * `input_to_tuple` => `import_minmax` (alias: `import_run`) * `range_to_tuple` => `import_range_minmax` * `to_tuple_int` => `import_num`

* `export_string_entries` => `export_minmaxes`/`export_runs` * `tuple_to_str` => `export_minmax`/`export_run` * `tuple_to_entry` => `export_minmax_entry`/`export_run_entry` * `each_number_in_tuple` => `each_number_in_minmax` * `from_tuple_int` => `export_num`

* `include_tuple?` => `include_run?` * `intersect_tuple?` => `intersect_run?`

* `tuple_gte_with_index` => `bsearch_minmax_with_index` * `tuple_gte_to` => `bsearch_range`

* `tuples_add` => `add_minmaxes` or `add_runs` * `tuples_subtract` => `subtract_minmaxes` or `subtract_runs` * `tuple_add` => `add_minmax` or `add_run` * `tuple_coalesce` => `add_coalesced_minmax` * `tuple_subtract` => `subtract_minmax` or `subtract_run` * `tuple_trim_or_split` => `trim_or_split` * `tuples_trim_or_delete` => `trim_or_delete` Also renamed _some_ local variables for consistency. Mostly `tuple` => `minmax`.

* `each_entry_tuple` => `each_entry_minmax` * `each_tuple_with_index` => `each_minmax_with_index` * `reverse_each_tuple_with_index` => `reverse_each_minmax_with_index` * `lookup_number_by_tuple_index` => `seek_number_in_minmaxes`

'#minmaxes` is currently an alias for `#set_data`, but an upcoming PR will change how the `@set_data` is structured. This prepares for that by documenting which code expects an array (or enum) of `(min, max)` tuples. Some of this code could remain unchanged if `#minmaxes` were changed to return an enumerator.

This is an alias for `#set_data`, and is used to document code that expects an array of "runs" but can be abstract about how those runs are implemented. An upcoming PR will replace `[min, max]` arrays with `u32x2` packed integers, which dramatically reduces memory allocations and speeds up _some_ operations (although it does slow down others).

The instance variable and attribute were renamed to `set_data`. But most uses were replaced with either `runs` or `minmaxes`, based on how they are used.

The results are highly dependant on data, YJIT, etc... but in my tests this implementation seems to _consistently_ run faster. ``` Comparison: builtin: L ^ R local: 967.0 i/s before #567 (559b86a): 917.5 i/s - 1.05x slower before #562 (a228ee1): 901.2 i/s - 1.07x slower v0.5.12: 840.0 i/s - 1.15x slower v0.5.0: 830.7 i/s - 1.16x slower v0.4.21: 797.0 i/s - 1.21x slower v0.5.9: 790.4 i/s - 1.22x slower builtin: L.xor! R local: 960.9 i/s before #562 (a228ee1): 954.3 i/s - 1.01x slower before #567 (559b86a): 940.0 i/s - 1.02x slower v0.5.12: 796.4 i/s - 1.21x slower v0.5.0: 633.9 i/s - 1.52x slower v0.4.21: 620.1 i/s - 1.55x slower v0.5.9: 619.0 i/s - 1.55x slower ``` When I first wrote this code, the `master` branch was consistently 1.3x-1.5x slower. The speedup is much less now, so other implementation details are probably more important. When the internal set data implementation is changed (see #484), this and other operations should be re-evaluated. I suspect an iterative "merge" may be the fastest overall approach, but that code is more complex than simply relying on the core add/subtract/complement methods with simple boolean algebra.

nevans added 11 commits December 6, 2025 11:58

♻️ Add aliases for SequenceSet ivar (pre-rename)

04d11c1

♻️ Rename SequenceSet's private parse methods

e0ebe71

* `str_to_tuples` => `parse_runs` * `str_to_tuple` => `parse_minmax` (alias: `parse_run`) * `parse_string_entry` => `parse_entry`

♻️ Rename SequenceSet's private import methods

34f89fb

* `input_to_tuples` => `import_runs` * `input_to_tuple` => `import_minmax` (alias: `import_run`) * `range_to_tuple` => `import_range_minmax` * `to_tuple_int` => `import_num`

♻️ Rename SequenceSet's private intersect/include predicates

86308dc

* `include_tuple?` => `include_run?` * `intersect_tuple?` => `intersect_run?`

♻️ Rename SequenceSet's private bsearch methods

8a76cb6

* `tuple_gte_with_index` => `bsearch_minmax_with_index` * `tuple_gte_to` => `bsearch_range`

♻️ Rename SequenceSet's private enumeration methods

f922eb4

* `each_entry_tuple` => `each_entry_minmax` * `each_tuple_with_index` => `each_minmax_with_index` * `reverse_each_tuple_with_index` => `reverse_each_minmax_with_index` * `lookup_number_by_tuple_index` => `seek_number_in_minmaxes`

♻️ Rename SequenceSet's private set_data ivar/attr

7d73944

The instance variable and attribute were renamed to `set_data`. But most uses were replaced with either `runs` or `minmaxes`, based on how they are used.

nevans force-pushed the sequence_set/rename-internals branch from c1cd9f5 to 7d73944 Compare December 6, 2025 18:00

nevans merged commit 5015fbe into master Dec 6, 2025
32 checks passed

nevans deleted the sequence_set/rename-internals branch December 6, 2025 18:55

nevans added the sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class. label Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

♻️ Rename `SequenceSet` internals #562

♻️ Rename `SequenceSet` internals #562

Uh oh!

nevans commented Dec 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

♻️ Rename SequenceSet internals #562

♻️ Rename SequenceSet internals #562

Uh oh!

Conversation

nevans commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

♻️ Rename `SequenceSet` internals #562

♻️ Rename `SequenceSet` internals #562

nevans commented Dec 6, 2025 •

edited

Loading