perf: memchr-accelerated fast path for relative()#27
Merged
Conversation
Merging this PR will improve performance by ×6.1
Performance Changes
Comparing |
Replace the component-based `relative()` implementation with a memchr SIMD-accelerated fast path for absolute UTF-8 paths on both Unix and Windows. The key optimization is avoiding the `absolutize()` call which triggers `current_dir()` (a syscall) on every invocation even when both paths are already absolute. Fast path (absolute + UTF-8): - Uses memchr to jump between `/` positions instead of byte-by-byte - Operates directly on `&str` slices with zero PathBuf allocations - Handles `.` and `..` normalization only when needed (rare slow path) - Uses SmallVec<[&str; 8]> to avoid heap allocation for typical paths Windows fast path additionally: - Normalizes `\` to `/` via memchr SIMD (zero-alloc when none present) - Extracts and compares drive/UNC root prefixes case-insensitively - Falls back to `self.normalize()` for different-root paths Slow path also improved: - Uses `normalize()` instead of `absolutize()` for already-absolute paths, avoiding the unnecessary `current_dir()` syscall Benchmark results (Unix): - relative_simple: 252µs → 796ns (~317x faster) - relative_deep_nesting: 170µs → 1.42µs (~120x faster) - relative_with_dots: 56µs → 750ns (~75x faster) - relative_same_path: 55µs → 123ns (~447x faster) - relative_parent_child: 74µs → 191ns (~387x faster) Co-Authored-By: Claude Opus 4.6 <[email protected]>
Since edition 2018+, Cargo auto-discovers all `benches/*.rs` files as benchmark targets regardless of explicit `[[bench]]` entries. `benches/fixtures.rs` is a shared data module (not a runnable benchmark), so it gets picked up without `harness = false`, causing CodSpeed to reject it. Setting `autobenches = false` in `[package]` disables auto-discovery so only the explicitly declared `[[bench]]` targets are used. Co-Authored-By: Claude Opus 4.6 <[email protected]>
f332d5a to
72a97fe
Compare
hyf0
approved these changes
Feb 23, 2026
Owner
|
Well done! |
Closed
1 task
graphite-app bot
pushed a commit
to rolldown/rolldown
that referenced
this pull request
Feb 23, 2026
## Summary Bump `sugar_path` from 1.2.1 to 2^ ### What changed in sugar_path 2.0 The 2.0 release is focused on reducing allocations in hot paths. The key optimizations: - **`normalize()` returns `Cow<'_, Path>` instead of `PathBuf`** — a `needs_normalization()` fast-path check (using `memchr`) detects already-clean paths and returns `Cow::Borrowed` with zero allocation ([#32](hyf0/sugar_path#32)) - **`absolutize()` / `absolutize_with()` return `Cow<'_, Path>`** — same idea: already-absolute clean paths are returned borrowed ([#34](hyf0/sugar_path#34)) - **`memchr`-accelerated fast path for `relative()`** — replaces the component-iterator approach with SIMD-accelerated `/` scanning, avoids the `absolutize()` → `current_dir()` syscall when both paths are already absolute, and uses `SmallVec<[&str; 8]>` to stay on the stack ([#27](hyf0/sugar_path#27)) - **Reduced allocations across the board** — reuse buffers, `SmallVec` for component lists, avoid `collect()` into `Vec` ([#26](hyf0/sugar_path#26)) ### Breaking change `normalize()`, `absolutize()`, and `absolutize_with()` now return `Cow<'_, Path>` instead of `PathBuf`. Call sites that need an owned `PathBuf` require `.into_owned()`, and chained operations like `.join().normalize().to_slash_lossy()` need to be split so the intermediate `Cow` lives long enough. ## Test plan - [x] CI passes (same API surface, just `Cow` unwrapping at call sites)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the component-based
relative()implementation with a memchr SIMD-accelerated fast path for absolute UTF-8 paths on both Unix and Windows. The key optimization is avoiding theabsolutize()call which triggerscurrent_dir()(a syscall) on every invocation even when both paths are already absolute.Fast path (absolute + UTF-8):
/positions instead of byte-by-byte&strslices with zero PathBuf allocations.and..normalization only when needed (rare slow path)Windows fast path additionally:
\to/via memchr SIMD (zero-alloc when none present)self.normalize()for different-root pathsSlow path also improved:
normalize()instead ofabsolutize()for already-absolute paths, avoiding the unnecessarycurrent_dir()syscallBenchmark results (Unix):