Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: coregx/coregex
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.12.18
Choose a base ref
...
head repository: coregx/coregex
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.12.19
Choose a head ref
  • 1 commit
  • 18 files changed
  • 1 contributor

Commits on Mar 24, 2026

  1. perf: v0.12.19 — zero-alloc captures, 95% less memory (#152)

    * perf: remove dual transition storage — State.transitions eliminated
    
    Remove transitions []StateID and transitionCount from State struct.
    Transitions now stored exclusively in DFACache.flatTrans flat table.
    
    - Remove State.AddTransition(), Transition(), Stride(), TransitionCount()
    - Remove Builder.move() (unused after DetectAcceleration simplification)
    - Simplify DetectAcceleration/DetectAccelerationFromCached to return nil
    - Add DetectAccelerationFromFlat() reading from flat table
    - Simplify tryDetectAccelerationWithCache (flatTrans-only path)
    - Remove 3 redundant AddTransition calls from determinize
    - Update tests: add TestDetectAccelerationFromFlat, remove State transition tests
    
    Memory: ~222MB -> ~150MB (eliminates redundant per-state transition slices)
    
    * perf: Rust-aligned BT visited limit for UseNFA — 72% less memory
    
    Add NewBoundedBacktrackerSmall() with 128K entries (256KB) visited
    capacity, matching Rust regex's default visited_capacity.
    
    UseNFA path now creates BT with small limit. When haystack exceeds
    BT capacity, falls back to PikeVM (correct for leftmost-first).
    UseBoundedBacktracker strategy retains 32M limit for POSIX longest-match.
    
    LangArena LogParser (7MB log, 13 patterns, 10 iterations):
    - Total alloc: 89MB -> 25MB (-72%)
    - RSS (Sys): 353MB -> 41MB (-88%)
    - errors pattern: 66MB -> 2.4MB (-96%)
    - Speed: no regression (113-126ms per iter)
    
    * perf: byte-based DFA cache limit — 2MB default like Rust
    
    Replace MaxStates (count) with CacheCapacityBytes (bytes).
    Default: 2MB matching Rust regex's hybrid_cache_capacity.
    
    - Add DFACache.MemoryUsage() (mirrors Rust Cache::memory_usage)
    - Insert checks MemoryUsage() >= capacityBytes instead of state count
    - Config: CacheCapacityBytes (new), MaxStates (deprecated, backward compat)
    - Self-adjusting: fewer states for large stride, more for small
    - effectiveCapacityBytes() bridges legacy MaxStates to bytes (~100B/state)
    
    * wip: SlotTable-based capture search — greedy loop capture bug
    
    SearchWithSlotTableCapturesAt now uses SlotTable instead of legacy COW.
    Works for simple patterns like (foo)(bar), but greedy repetitions
    (a+)(b+) lose group start positions during loop iterations.
    
    Root cause: addSearchThread CopySlots overwrites capture slots on each
    loop iteration. Need stack-based epsilon closure with RestoreCapture
    frames (Rust approach) to preserve capture context through loops.
    
    TODO: Convert recursive addSearchThread to stack-based with save/restore
    Status: 2 NFA unit test failures, all meta tests pass (meta still on COW)
    
    * wip: stack-based epsilon closure with RestoreCapture
    
    Converted addSearchThread and addSearchThreadToNext from recursive to
    stack-based with captureFrame (Explore + RestoreCapture frames).
    Mirrors Rust pikevm.rs FollowEpsilon::RestoreCapture pattern.
    
    Still failing: greedy loop captures (a+)(b+) — per-state SlotTable
    overwrites group start on each loop iteration (State visited again
    in next generation). Per-thread COW preserves all variants.
    
    Root issue: per-state storage loses capture history across byte
    transitions in greedy loops. Need either per-thread indexing or
    generation-aware slot preservation.
    
    Status: 2 NFA unit tests fail, all meta tests pass
    
    * feat: dual SlotTable capture tracking — zero-alloc FindSubmatch
    
    Implement Rust-style dual SlotTable (curr/next) for capture propagation
    across byte transitions. Stack-based epsilon closure with RestoreCapture
    frames preserves capture context through greedy loops.
    
    Key changes:
    - Add NextSlotTable + captureStack + currSlots to PikeVMState
    - addSearchThread: stack-based with captureFrame (Explore + RestoreCapture)
    - addSearchThreadToNext: loads from curr SlotTable, writes to next
    - Swap SlotTable/NextSlotTable after each byte (Rust mem::swap pattern)
    - Don't clear Visited before seed — prevents SlotTable row overwrite
    - Wire meta FindSubmatch to use SlotTable path
    - Fix empty match capture groups (buildCapturesFromSlots)
    
    FindAllSubmatch (5 patterns, 50K matches, 800KB input):
    - Alloc: 554MB -> 26MB (-95%)
    - Mallocs: 12.5M -> 440K (-96%)
    - Time: 1.48s -> 0.45s (3.3x faster)
    
    * docs: update CHANGELOG, OPTIMIZATIONS, add ARCHITECTURE.md for v0.12.19
    
    - CHANGELOG: add SlotTable capture tracking entry
    - OPTIMIZATIONS: add #10 Dual SlotTable (95% less memory), update version
    - ARCHITECTURE.md: new file documenting engine architecture, memory model,
      thread safety, and Rust alignment
    kolkov authored Mar 24, 2026
    Configuration menu
    Copy the full SHA
    ab4039b View commit details
    Browse the repository at this point in the history
Loading