Skip to content

Conversation

@nevans
Copy link
Collaborator

@nevans nevans commented Dec 6, 2025

This PR renames the poorly named @tuples ivar and all of the methods dealing with that data structure. It should not contain any significant refactoring besides renaming (and changing direct ivar access to use an attr_reader).

The instance variable and attribute were renamed to the much more abstract set_data. But most uses were updated to use either runs or minmaxes, based on how they are used:

  • #minmaxes is currently an alias for #set_data, but an upcoming PR will change how @set_data is structured. This prepares for that by documenting which code expects an array (or enum) of (min, max) tuples. Some of this code could remain unchanged if #minmaxes were changed to return an enumerator.
  • #runs is also currently an alias for #set_data, and is used to document code that expects an array of "runs" but can be abstract about how those runs are implemented. An upcoming PR will replace [min, max] arrays with u32x2 packed integers, which dramatically reduces memory allocations and speeds up some operations (although it does slow down others).

Most of the internal private methods were renamed:

  • import_{type} - for converting external inputs into internal types
  • parse_{type} - for converting strings into internal types
  • export_{type} - for converting internal types into exported types
  • bsearch_{type} - for binary searching set_data
  • other methods were renamed to replace tuple(s) with minmax(es) or run(s)

This refactoring is a small step towards #484.

nevans added 11 commits December 6, 2025 11:58
* `str_to_tuples`        => `parse_runs`
* `str_to_tuple`         => `parse_minmax` (alias: `parse_run`)
* `parse_string_entry`   => `parse_entry`
* `input_to_tuples`      => `import_runs`
* `input_to_tuple`       => `import_minmax` (alias: `import_run`)
* `range_to_tuple`       => `import_range_minmax`
* `to_tuple_int`         => `import_num`
* `export_string_entries` => `export_minmaxes`/`export_runs`
* `tuple_to_str`          => `export_minmax`/`export_run`
* `tuple_to_entry`        => `export_minmax_entry`/`export_run_entry`
* `each_number_in_tuple`  => `each_number_in_minmax`
* `from_tuple_int`        => `export_num`
* `include_tuple?`   => `include_run?`
* `intersect_tuple?` => `intersect_run?`
* `tuple_gte_with_index` => `bsearch_minmax_with_index`
* `tuple_gte_to`         => `bsearch_range`
* `tuples_add`            => `add_minmaxes` or `add_runs`
* `tuples_subtract`       => `subtract_minmaxes` or `subtract_runs`
* `tuple_add`             => `add_minmax` or `add_run`
* `tuple_coalesce`        => `add_coalesced_minmax`
* `tuple_subtract`        => `subtract_minmax` or `subtract_run`
* `tuple_trim_or_split`   => `trim_or_split`
* `tuples_trim_or_delete` => `trim_or_delete`

Also renamed _some_ local variables for consistency.
Mostly `tuple` => `minmax`.
* `each_entry_tuple`              => `each_entry_minmax`
* `each_tuple_with_index`         => `each_minmax_with_index`
* `reverse_each_tuple_with_index` => `reverse_each_minmax_with_index`
* `lookup_number_by_tuple_index`  => `seek_number_in_minmaxes`
'#minmaxes` is currently an alias for `#set_data`, but an upcoming PR
will change how the `@set_data` is structured.  This prepares for that
by documenting which code expects an array (or enum) of `(min, max)`
tuples.  Some of this code could remain unchanged if `#minmaxes` were
changed to return an enumerator.
This is an alias for `#set_data`, and is used to document code that
expects an array of "runs" but can be abstract about how those runs
are implemented.  An upcoming PR will replace `[min, max]` arrays with
`u32x2` packed integers, which dramatically reduces memory allocations
and speeds up _some_ operations (although it does slow down others).
The instance variable and attribute were renamed to `set_data`.  But
most uses were replaced with either `runs` or `minmaxes`, based on how
they are used.
@nevans nevans force-pushed the sequence_set/rename-internals branch from c1cd9f5 to 7d73944 Compare December 6, 2025 18:00
@nevans nevans merged commit 5015fbe into master Dec 6, 2025
32 checks passed
@nevans nevans deleted the sequence_set/rename-internals branch December 6, 2025 18:55
nevans added a commit that referenced this pull request Dec 8, 2025
The results are highly dependant on data, YJIT, etc... but in my tests
this implementation seems to _consistently_ run faster.

```
Comparison:
             builtin: L ^ R
                  local:       967.0 i/s
before #567 (559b86a):       917.5 i/s - 1.05x  slower
before #562 (a228ee1):       901.2 i/s - 1.07x  slower
                v0.5.12:       840.0 i/s - 1.15x  slower
                 v0.5.0:       830.7 i/s - 1.16x  slower
                v0.4.21:       797.0 i/s - 1.21x  slower
                 v0.5.9:       790.4 i/s - 1.22x  slower

             builtin: L.xor! R
                  local:       960.9 i/s
before #562 (a228ee1):       954.3 i/s - 1.01x  slower
before #567 (559b86a):       940.0 i/s - 1.02x  slower
                v0.5.12:       796.4 i/s - 1.21x  slower
                 v0.5.0:       633.9 i/s - 1.52x  slower
                v0.4.21:       620.1 i/s - 1.55x  slower
                 v0.5.9:       619.0 i/s - 1.55x  slower
```

When I first wrote this code, the `master` branch was consistently
1.3x-1.5x slower.  The speedup is much less now, so other implementation
details are probably more important.  When the internal set data
implementation is changed (see #484), this and other operations should
be re-evaluated.  I suspect an iterative "merge" may be the fastest
overall approach, but that code is more complex than simply relying on
the core add/subtract/complement methods with simple boolean algebra.
@nevans nevans added the sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class. label Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sequence-set Any code the IMAP `sequence-set` data type or grammar rule, especially the SequenceSet class.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants