Skip to content

perf: memchr-accelerated fast path for relative()#27

Merged
hyf0 merged 2 commits intomainfrom
perf/memchr-accelerated-relative
Feb 23, 2026
Merged

perf: memchr-accelerated fast path for relative()#27
hyf0 merged 2 commits intomainfrom
perf/memchr-accelerated-relative

Conversation

@Brooooooklyn
Copy link
Copy Markdown
Collaborator

Replace the component-based relative() implementation with a memchr SIMD-accelerated fast path for absolute UTF-8 paths on both Unix and Windows. The key optimization is avoiding the absolutize() call which triggers current_dir() (a syscall) on every invocation even when both paths are already absolute.

Fast path (absolute + UTF-8):

  • Uses memchr to jump between / positions instead of byte-by-byte
  • Operates directly on &str slices with zero PathBuf allocations
  • Handles . and .. normalization only when needed (rare slow path)

Windows fast path additionally:

  • Normalizes \ to / via memchr SIMD (zero-alloc when none present)
  • Extracts and compares drive/UNC root prefixes case-insensitively
  • Falls back to self.normalize() for different-root paths

Slow path also improved:

  • Uses normalize() instead of absolutize() for already-absolute paths, avoiding the unnecessary current_dir() syscall

Benchmark results (Unix):

  • relative_simple: 252µs → 796ns (~317x faster)
  • relative_deep_nesting: 170µs → 1.42µs (~120x faster)
  • relative_with_dots: 56µs → 750ns (~75x faster)
  • relative_same_path: 55µs → 123ns (~447x faster)
  • relative_parent_child: 74µs → 191ns (~387x faster)

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Feb 22, 2026

Merging this PR will improve performance by ×6.1

⚡ 7 improved benchmarks
✅ 12 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
as_path_chaining 59.3 µs 53.6 µs +10.73%
relative_parent_child 22.2 µs 4.6 µs ×4.8
relative_deep_nesting 79.5 µs 13 µs ×6.1
relative_same_path 14.1 µs 3 µs ×4.7
relative_with_dots 23.7 µs 11.7 µs ×2
relative_simple 53.1 µs 9.5 µs ×5.6
normalize 59.1 µs 53.4 µs +10.52%

Comparing perf/memchr-accelerated-relative (72a97fe) with main (10cf037)

Open in CodSpeed

@Brooooooklyn Brooooooklyn requested a review from hyf0 February 22, 2026 17:34
Brooooooklyn and others added 2 commits February 23, 2026 01:44
Replace the component-based `relative()` implementation with a memchr
SIMD-accelerated fast path for absolute UTF-8 paths on both Unix and
Windows. The key optimization is avoiding the `absolutize()` call which
triggers `current_dir()` (a syscall) on every invocation even when both
paths are already absolute.

Fast path (absolute + UTF-8):
- Uses memchr to jump between `/` positions instead of byte-by-byte
- Operates directly on `&str` slices with zero PathBuf allocations
- Handles `.` and `..` normalization only when needed (rare slow path)
- Uses SmallVec<[&str; 8]> to avoid heap allocation for typical paths

Windows fast path additionally:
- Normalizes `\` to `/` via memchr SIMD (zero-alloc when none present)
- Extracts and compares drive/UNC root prefixes case-insensitively
- Falls back to `self.normalize()` for different-root paths

Slow path also improved:
- Uses `normalize()` instead of `absolutize()` for already-absolute
  paths, avoiding the unnecessary `current_dir()` syscall

Benchmark results (Unix):
- relative_simple:       252µs → 796ns  (~317x faster)
- relative_deep_nesting: 170µs → 1.42µs (~120x faster)
- relative_with_dots:     56µs → 750ns  (~75x faster)
- relative_same_path:     55µs → 123ns  (~447x faster)
- relative_parent_child:  74µs → 191ns  (~387x faster)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Since edition 2018+, Cargo auto-discovers all `benches/*.rs` files as
benchmark targets regardless of explicit `[[bench]]` entries.
`benches/fixtures.rs` is a shared data module (not a runnable benchmark),
so it gets picked up without `harness = false`, causing CodSpeed to
reject it.

Setting `autobenches = false` in `[package]` disables auto-discovery
so only the explicitly declared `[[bench]]` targets are used.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@Brooooooklyn Brooooooklyn force-pushed the perf/memchr-accelerated-relative branch from f332d5a to 72a97fe Compare February 22, 2026 17:47
@hyf0
Copy link
Copy Markdown
Owner

hyf0 commented Feb 23, 2026

Well done!

@hyf0 hyf0 merged commit 94aca38 into main Feb 23, 2026
5 checks passed
@github-actions github-actions bot mentioned this pull request Feb 23, 2026
graphite-app bot pushed a commit to rolldown/rolldown that referenced this pull request Feb 23, 2026
## Summary

Bump `sugar_path` from 1.2.1 to 2^

### What changed in sugar_path 2.0

The 2.0 release is focused on reducing allocations in hot paths. The key optimizations:

- **`normalize()` returns `Cow<'_, Path>` instead of `PathBuf`** — a `needs_normalization()` fast-path check (using `memchr`) detects already-clean paths and returns `Cow::Borrowed` with zero allocation ([#32](hyf0/sugar_path#32))
- **`absolutize()` / `absolutize_with()` return `Cow<'_, Path>`** — same idea: already-absolute clean paths are returned borrowed ([#34](hyf0/sugar_path#34))
- **`memchr`-accelerated fast path for `relative()`** — replaces the component-iterator approach with SIMD-accelerated `/` scanning, avoids the `absolutize()` → `current_dir()` syscall when both paths are already absolute, and uses `SmallVec<[&str; 8]>` to stay on the stack ([#27](hyf0/sugar_path#27))
- **Reduced allocations across the board** — reuse buffers, `SmallVec` for component lists, avoid `collect()` into `Vec` ([#26](hyf0/sugar_path#26))

### Breaking change

`normalize()`, `absolutize()`, and `absolutize_with()` now return `Cow<'_, Path>` instead of `PathBuf`. Call sites that need an owned `PathBuf` require `.into_owned()`, and chained operations like `.join().normalize().to_slash_lossy()` need to be split so the intermediate `Cow` lives long enough.

## Test plan

- [x] CI passes (same API surface, just `Cow` unwrapping at call sites)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants