* perf: remove dual transition storage — State.transitions eliminated
Remove transitions []StateID and transitionCount from State struct.
Transitions now stored exclusively in DFACache.flatTrans flat table.
- Remove State.AddTransition(), Transition(), Stride(), TransitionCount()
- Remove Builder.move() (unused after DetectAcceleration simplification)
- Simplify DetectAcceleration/DetectAccelerationFromCached to return nil
- Add DetectAccelerationFromFlat() reading from flat table
- Simplify tryDetectAccelerationWithCache (flatTrans-only path)
- Remove 3 redundant AddTransition calls from determinize
- Update tests: add TestDetectAccelerationFromFlat, remove State transition tests
Memory: ~222MB -> ~150MB (eliminates redundant per-state transition slices)
* perf: Rust-aligned BT visited limit for UseNFA — 72% less memory
Add NewBoundedBacktrackerSmall() with 128K entries (256KB) visited
capacity, matching Rust regex's default visited_capacity.
UseNFA path now creates BT with small limit. When haystack exceeds
BT capacity, falls back to PikeVM (correct for leftmost-first).
UseBoundedBacktracker strategy retains 32M limit for POSIX longest-match.
LangArena LogParser (7MB log, 13 patterns, 10 iterations):
- Total alloc: 89MB -> 25MB (-72%)
- RSS (Sys): 353MB -> 41MB (-88%)
- errors pattern: 66MB -> 2.4MB (-96%)
- Speed: no regression (113-126ms per iter)
* perf: byte-based DFA cache limit — 2MB default like Rust
Replace MaxStates (count) with CacheCapacityBytes (bytes).
Default: 2MB matching Rust regex's hybrid_cache_capacity.
- Add DFACache.MemoryUsage() (mirrors Rust Cache::memory_usage)
- Insert checks MemoryUsage() >= capacityBytes instead of state count
- Config: CacheCapacityBytes (new), MaxStates (deprecated, backward compat)
- Self-adjusting: fewer states for large stride, more for small
- effectiveCapacityBytes() bridges legacy MaxStates to bytes (~100B/state)
* wip: SlotTable-based capture search — greedy loop capture bug
SearchWithSlotTableCapturesAt now uses SlotTable instead of legacy COW.
Works for simple patterns like (foo)(bar), but greedy repetitions
(a+)(b+) lose group start positions during loop iterations.
Root cause: addSearchThread CopySlots overwrites capture slots on each
loop iteration. Need stack-based epsilon closure with RestoreCapture
frames (Rust approach) to preserve capture context through loops.
TODO: Convert recursive addSearchThread to stack-based with save/restore
Status: 2 NFA unit test failures, all meta tests pass (meta still on COW)
* wip: stack-based epsilon closure with RestoreCapture
Converted addSearchThread and addSearchThreadToNext from recursive to
stack-based with captureFrame (Explore + RestoreCapture frames).
Mirrors Rust pikevm.rs FollowEpsilon::RestoreCapture pattern.
Still failing: greedy loop captures (a+)(b+) — per-state SlotTable
overwrites group start on each loop iteration (State visited again
in next generation). Per-thread COW preserves all variants.
Root issue: per-state storage loses capture history across byte
transitions in greedy loops. Need either per-thread indexing or
generation-aware slot preservation.
Status: 2 NFA unit tests fail, all meta tests pass
* feat: dual SlotTable capture tracking — zero-alloc FindSubmatch
Implement Rust-style dual SlotTable (curr/next) for capture propagation
across byte transitions. Stack-based epsilon closure with RestoreCapture
frames preserves capture context through greedy loops.
Key changes:
- Add NextSlotTable + captureStack + currSlots to PikeVMState
- addSearchThread: stack-based with captureFrame (Explore + RestoreCapture)
- addSearchThreadToNext: loads from curr SlotTable, writes to next
- Swap SlotTable/NextSlotTable after each byte (Rust mem::swap pattern)
- Don't clear Visited before seed — prevents SlotTable row overwrite
- Wire meta FindSubmatch to use SlotTable path
- Fix empty match capture groups (buildCapturesFromSlots)
FindAllSubmatch (5 patterns, 50K matches, 800KB input):
- Alloc: 554MB -> 26MB (-95%)
- Mallocs: 12.5M -> 440K (-96%)
- Time: 1.48s -> 0.45s (3.3x faster)
* docs: update CHANGELOG, OPTIMIZATIONS, add ARCHITECTURE.md for v0.12.19
- CHANGELOG: add SlotTable capture tracking entry
- OPTIMIZATIONS: add #10 Dual SlotTable (95% less memory), update version
- ARCHITECTURE.md: new file documenting engine architecture, memory model,
thread safety, and Rust alignment