Skip to content

Comments

perf: add ASCII fast path and codepoint iteration to avoid Intl.Segmenter#72

Merged
sindresorhus merged 2 commits intosindresorhus:mainfrom
privatenumber:perf/optimizations
Feb 18, 2026
Merged

perf: add ASCII fast path and codepoint iteration to avoid Intl.Segmenter#72
sindresorhus merged 2 commits intosindresorhus:mainfrom
privatenumber:perf/optimizations

Conversation

@privatenumber
Copy link
Contributor

@privatenumber privatenumber commented Feb 15, 2026

Problem

Intl.Segmenter and stripAnsi run on every call regardless of input. For ASCII strings like "hello world", the segmenter alone costs ~4µs — but width = length.

Changes

ASCII fast path

  • Regex check /^[\u0020-\u007E]*$/ to detect pure printable ASCII — skip segmenter/regex/EAW entirely

stripAnsi guard

  • Skip stripAnsi when no ESC (\x1B) or CSI (\x9B) present

Performance (Apple M2 Max, Node 25.2.1)

Input Before After Speedup
ascii short (11) 3.87 µs 63.85 ns 61x
ascii long (1000) 137.52 µs 777 ns 177x
ANSI short (5) 2.08 µs 147.91 ns 14x
ANSI heavy (100) 30.56 µs 4.81 µs 6.4x

Non-ASCII inputs (CJK, emoji, mixed) are unaffected — they fall through to the existing Intl.Segmenter path.

All 198 tests pass.

Upstream

Ref: #71

@privatenumber privatenumber marked this pull request as ready for review February 15, 2026 03:52
@privatenumber privatenumber changed the title perf: 22% smaller, 6–142x faster perf: add ASCII fast path and codepoint iteration to avoid Intl.Segmenter Feb 15, 2026
@privatenumber privatenumber force-pushed the perf/optimizations branch 2 times, most recently from e547caf to 4ba7f25 Compare February 15, 2026 09:09
@sindresorhus
Copy link
Owner

I'm happy to accept 1, those are easy wins, but I don't really want 2. It complicates the codebase significantly again, and Intl.Segmenter performance will increase with time.

@privatenumber
Copy link
Contributor Author

Updated:

  • Dropped the codepoint iteration commit (commit 2) — agree that Intl.Segmenter perf will improve over time
  • Switched ASCII detection to regex per your suggestion (/^[\u0020-\u007E]*$/)
  • Added \x9B (CSI) check to the stripAnsi guard
  • Rebased on new main (minimally-qualified emoji fix)
  • 198 tests pass

@sindresorhus sindresorhus merged commit 91b9857 into sindresorhus:main Feb 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants