feat(oxc_str): upgrade Ident with precomputed hash for fast equality#18400
feat(oxc_str): upgrade Ident with precomputed hash for fast equality#18400
Conversation
41be0e3 to
5274050
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the Ident type to optimize equality checks and hash operations by precomputing and storing a hash value alongside the string data. The implementation packs the string length and hash into a single 64-bit value, with length in the lower 32 bits and FxHash in the upper 32 bits.
Changes:
- Upgraded
Identfrom a simple&strwrapper to a structure containingNonNull<u8>, packedlen_and_hash, and lifetime marker - Implemented fast equality comparison by checking
len_and_hashbefore full string comparison - Added multiple
PartialEqimplementations for interoperability with&str,String,Atom, etc. - Fixed
Hashimplementation to delegate tostr::hash()for compatibility with mixed-type hash containers
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| crates/oxc_str/src/ident.rs | Complete refactoring of Ident structure with new memory layout, hash precomputation, equality optimizations, and comprehensive trait implementations |
| crates/oxc_str/Cargo.toml | Added rustc-hash dependency for FxHasher |
| Cargo.lock | Updated lockfile to reflect new rustc-hash dependency |
Comments suppressed due to low confidence (1)
crates/oxc_str/src/ident.rs:523
- The test for hash compatibility only tests Ident-to-Ident lookup, which doesn't actually verify the stated purpose. To properly test hash compatibility with str, you should use the Borrow trait to look up an Ident using a &str key. For example:
set.get("foo")or test in a HashMap where you insert with Ident and lookup with &str.
use rustc_hash::FxHashSet;
// This test verifies that Ident's Hash impl is compatible with str lookups
// which is required for correct behavior in mixed hash containers
let mut set: FxHashSet<Ident<'_>> = FxHashSet::default();
set.insert(Ident::new("foo"));
// The hash of Ident("foo") should allow finding it via another Ident("foo")
let lookup = Ident::new("foo");
assert!(set.contains(&lookup));
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CodSpeed Performance ReportMerging this PR will degrade performance by 4.09%Comparing Summary
Performance Changes
Footnotes
|
Design Constraint:
|
|
@camchenry see above, we have a problem :-) |
5274050 to
d002ce7
Compare
|
Oh, we just need to fix the call site, but it's not trivial for people 🤔 |
Solution FoundAfter research into how Rust compilers handle this (rustc's Remove
|
|
I can see another potential solution:
#[repr(transparent)]
struct FastHashIdent<'a>(Ident<'a>);
// Optimized hash
impl Hash for FastHashIdent<'_> {
#[inline]
fn hash<H: Hasher>(&self, hasher: &mut H) {
let hash = self.0.len_and_hash().rotate_left(25);
hasher.write_u64(hash);
}
}
struct IdentHashMap<'a, V> {
inner: hashbrown::HashMap<FastHashIdent<'a>, V, NoopBuildHasher, &'a Bump>,
}
impl IdentHashMap<'a, V> {
pub fn insert(&mut self, ident: Ident<'a>, v: V) -> Option<V> {
let wrapped = FastHashIdent(ident);
self.inner.insert(wrapped, v)
}
// ... other `HashMap` methods
}We need an Result:
EDIT: Or just ditch Or actually, |
|
@overlookmotel Question about the The new This alignment change cascades through the AST - for example, Alternative design using two pub struct Ident<'a> {
ptr: NonNull<u8>, // 4 bytes on 32-bit
len: u32, // 4 bytes
hash: u32, // 4 bytes
_marker: PhantomData<&'a str>,
}This would be:
This keeps 4-byte alignment on 32-bit like the old Should we use separate |
|
This is a pain, but I think we need different implementations for 32 bit and 64 bit. I found that packing length + hash into a single But, as you say, it's advantageous to store length and hash in 2 x I'd suggest sticking with |
I don't, Claude did, you are talking to Claude ;-)
|
😄 🤖
Set a custom layout for |
|
I'm going to continue from 2 u32s to get this PR rolling |
|
About I suggest a separate #[cfg(target_pointer_width = "64")]
mod len_and_hash {
#[derive(Clone, Copy)]
#[repr(transparent)]
pub struct LenAndHash(u64);
impl LenAndHash {
pub fn new(len: u32, hash: u32) -> Self { Self(u64::from(len) | (u64::from(hash) << 32)) }
pub fn len(self) -> u32 { self.0 as u32 }
pub fn hash(self) -> u32 { (self.0 >> 32) as u32 }
pub fn hash_for_hashing(self) -> u64 { self.0.rotate_left(25) }
}
}
#[cfg(not(target_pointer_width = "64"))]
mod len_and_hash {
#[derive(Clone, Copy)]
pub struct LenAndHash {
len: u32,
hash: u32,
}
impl LenAndHash {
pub fn new(len: u32, hash: u32) -> Self { Self { len, hash} }
pub fn len(self) -> u32 { self.len }
pub fn hash(self) -> u32 { self.hash }
pub fn hash_for_hashing(self) -> u64 { self.hash as u64 }
}
}
use len_and_hash::LenAndHash;
#[derive(Clone, Copy)]
#[repr(C)]
pub struct Ident<'a> {
ptr: NonNull<u8>,
len_and_hash: LenAndHash,
_marker: PhantomData<&'a str>,
}Then use |
1e0f0a9 to
4b0ab24
Compare
4b0ab24 to
c731466
Compare
Merge activity
|
…18400) ## Summary Upgrade `Ident` to store a precomputed hash for fast equality checks and efficient hash map operations. ### Changes - Change `Ident` layout from simple `&str` wrapper to `NonNull<u8>` + `len: u32` + `hash: u32` - Fast `PartialEq` between `Ident`s: compare length and hash first before doing full string comparison - `Hash` impl uses precomputed hash (no `Borrow<str>` - lookups must use `&Ident`) - Add `Ident::new()` and `Ident::len()` methods - Add specific `PartialEq` implementations for `&Ident`, `&str`, `String`, `Atom`, `&Atom` - Add `rustc-hash` dependency for `FxHasher` - Fix enum layout calculation in ast_tools to round size to alignment ### Layout - **64-bit**: 16 bytes, align 8 (ptr 8 + len 4 + hash 4) - **32-bit**: 12 bytes, align 4 (ptr 4 + len 4 + hash 4) Using two separate `u32` fields instead of a single `u64` keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs. 🤖 Generated with [Claude Code](https://claude.ai/code)
c731466 to
cb0beca
Compare
de43879 to
ec7dd07
Compare
ec7dd07 to
a2af5d6
Compare
…18400) ## Summary Upgrade `Ident` to store a precomputed hash for fast equality checks and efficient hash map operations. ### Changes - Change `Ident` layout from simple `&str` wrapper to `NonNull<u8>` + `len: u32` + `hash: u32` - Fast `PartialEq` between `Ident`s: compare length and hash first before doing full string comparison - `Hash` impl uses precomputed hash (no `Borrow<str>` - lookups must use `&Ident`) - Add `Ident::new()` and `Ident::len()` methods - Add specific `PartialEq` implementations for `&Ident`, `&str`, `String`, `Atom`, `&Atom` - Add `rustc-hash` dependency for `FxHasher` - Fix enum layout calculation in ast_tools to round size to alignment ### Layout - **64-bit**: 16 bytes, align 8 (ptr 8 + len 4 + hash 4) - **32-bit**: 12 bytes, align 4 (ptr 4 + len 4 + hash 4) Using two separate `u32` fields instead of a single `u64` keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs. 🤖 Generated with [Claude Code](https://claude.ai/code)
a2af5d6 to
f460a88
Compare
…18400) ## Summary Upgrade `Ident` to store a precomputed hash for fast equality checks and efficient hash map operations. ### Changes - Change `Ident` layout from simple `&str` wrapper to `NonNull<u8>` + `len: u32` + `hash: u32` - Fast `PartialEq` between `Ident`s: compare length and hash first before doing full string comparison - `Hash` impl uses precomputed hash (no `Borrow<str>` - lookups must use `&Ident`) - Add `Ident::new()` and `Ident::len()` methods - Add specific `PartialEq` implementations for `&Ident`, `&str`, `String`, `Atom`, `&Atom` - Add `rustc-hash` dependency for `FxHasher` - Fix enum layout calculation in ast_tools to round size to alignment ### Layout - **64-bit**: 16 bytes, align 8 (ptr 8 + len 4 + hash 4) - **32-bit**: 12 bytes, align 4 (ptr 4 + len 4 + hash 4) Using two separate `u32` fields instead of a single `u64` keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs. 🤖 Generated with [Claude Code](https://claude.ai/code)

Summary
Upgrade
Identto store a precomputed hash for fast equality checks and efficient hash map operations.Changes
Identlayout from simple&strwrapper toNonNull<u8>+len: u32+hash: u32PartialEqbetweenIdents: compare length and hash first before doing full string comparisonHashimpl uses precomputed hash (noBorrow<str>- lookups must use&Ident)Ident::new()andIdent::len()methodsPartialEqimplementations for&Ident,&str,String,Atom,&Atomrustc-hashdependency forFxHasherLayout
Using two separate
u32fields instead of a singleu64keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs.🤖 Generated with Claude Code