Skip to content

feat(oxc_str): upgrade Ident with precomputed hash for fast equality#18400

Closed
Boshen wants to merge 1 commit intomainfrom
refactor/ident-precomputed-hash
Closed

feat(oxc_str): upgrade Ident with precomputed hash for fast equality#18400
Boshen wants to merge 1 commit intomainfrom
refactor/ident-precomputed-hash

Conversation

@Boshen
Copy link
Member

@Boshen Boshen commented Jan 22, 2026

Summary

Upgrade Ident to store a precomputed hash for fast equality checks and efficient hash map operations.

Changes

  • Change Ident layout from simple &str wrapper to NonNull<u8> + len: u32 + hash: u32
  • Fast PartialEq between Idents: compare length and hash first before doing full string comparison
  • Hash impl uses precomputed hash (no Borrow<str> - lookups must use &Ident)
  • Add Ident::new() and Ident::len() methods
  • Add specific PartialEq implementations for &Ident, &str, String, Atom, &Atom
  • Add rustc-hash dependency for FxHasher
  • Fix enum layout calculation in ast_tools to round size to alignment

Layout

  • 64-bit: 16 bytes, align 8 (ptr 8 + len 4 + hash 4)
  • 32-bit: 12 bytes, align 4 (ptr 4 + len 4 + hash 4)

Using two separate u32 fields instead of a single u64 keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings January 22, 2026 11:29
@github-actions github-actions bot added the C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior label Jan 22, 2026
@Boshen Boshen force-pushed the refactor/ident-precomputed-hash branch from 41be0e3 to 5274050 Compare January 22, 2026 11:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Ident type to optimize equality checks and hash operations by precomputing and storing a hash value alongside the string data. The implementation packs the string length and hash into a single 64-bit value, with length in the lower 32 bits and FxHash in the upper 32 bits.

Changes:

  • Upgraded Ident from a simple &str wrapper to a structure containing NonNull<u8>, packed len_and_hash, and lifetime marker
  • Implemented fast equality comparison by checking len_and_hash before full string comparison
  • Added multiple PartialEq implementations for interoperability with &str, String, Atom, etc.
  • Fixed Hash implementation to delegate to str::hash() for compatibility with mixed-type hash containers

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 9 comments.

File Description
crates/oxc_str/src/ident.rs Complete refactoring of Ident structure with new memory layout, hash precomputation, equality optimizations, and comprehensive trait implementations
crates/oxc_str/Cargo.toml Added rustc-hash dependency for FxHasher
Cargo.lock Updated lockfile to reflect new rustc-hash dependency
Comments suppressed due to low confidence (1)

crates/oxc_str/src/ident.rs:523

  • The test for hash compatibility only tests Ident-to-Ident lookup, which doesn't actually verify the stated purpose. To properly test hash compatibility with str, you should use the Borrow trait to look up an Ident using a &str key. For example: set.get("foo") or test in a HashMap where you insert with Ident and lookup with &str.
        use rustc_hash::FxHashSet;

        // This test verifies that Ident's Hash impl is compatible with str lookups
        // which is required for correct behavior in mixed hash containers
        let mut set: FxHashSet<Ident<'_>> = FxHashSet::default();
        set.insert(Ident::new("foo"));

        // The hash of Ident("foo") should allow finding it via another Ident("foo")
        let lookup = Ident::new("foo");
        assert!(set.contains(&lookup));
    }
}


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codspeed-hq
Copy link

codspeed-hq bot commented Jan 22, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 4.09%

Comparing refactor/ident-precomputed-hash (f460a88) with main (87684d7)1

Summary

❌ 2 regressed benchmarks
✅ 40 untouched benchmarks
⏩ 3 skipped benchmarks2

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation parser[binder.ts] 3.1 ms 3.3 ms -4.09%
Simulation parser[react.development.js] 1.2 ms 1.3 ms -3.06%

Footnotes

  1. No successful run was found on main (993fd2b) during the generation of this report, so 87684d7 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@Boshen
Copy link
Member Author

Boshen commented Jan 22, 2026

Design Constraint: Borrow<str> vs Precomputed Hash

There's a fundamental conflict between two requirements:

1. Precomputed Hash for Performance

The optimized Ident stores a precomputed FxHash in len_and_hash for fast hash map lookups:

impl Hash for Ident<'_> {
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        let hash = self.len_and_hash.rotate_left(32 - 7);
        hasher.write_u64(hash);
    }
}

2. Borrow<str> for Heterogeneous Lookups

Code like this requires Borrow<str>:

// In oxc_isolated_declarations/src/lib.rs:634
can_expando_function_names.contains(ident.name.as_str())

The Problem

Rust's Borrow trait contract requires: if T: Borrow<Q>, then hash(t) == hash(t.borrow()).

This means if Ident: Borrow<str>, then ident.hash() must equal ident.as_str().hash().

We cannot have both:

  • ✅ Custom Hash using precomputed hash
  • Borrow<str> for heterogeneous lookups

Options

  1. Keep Borrow<str>, delegate Hash to str - Correct but loses hash optimization
  2. Remove Borrow<str>, use precomputed Hash - Fast but requires changing call sites to use &Ident instead of &str for lookups
  3. Use hashbrown::raw_entry API - Allows custom hashing during lookup, but more complex

@Boshen Which approach do you prefer?

@Boshen
Copy link
Member Author

Boshen commented Jan 22, 2026

@camchenry see above, we have a problem :-)

@Boshen Boshen force-pushed the refactor/ident-precomputed-hash branch from 5274050 to d002ce7 Compare January 22, 2026 12:04
@Boshen Boshen requested a review from Dunqing as a code owner January 22, 2026 12:04
@github-actions github-actions bot added the A-isolated-declarations Isolated Declarations label Jan 22, 2026
@Boshen
Copy link
Member Author

Boshen commented Jan 22, 2026

Oh, we just need to fix the call site, but it's not trivial for people 🤔

@Boshen
Copy link
Member Author

Boshen commented Jan 22, 2026

Solution Found

After research into how Rust compilers handle this (rustc's Symbol, string interning crates), the solution is:

Remove Borrow<str>, Use Precomputed Hash

Why Borrow<str> is incompatible with precomputed hash:

The Rust Borrow trait contract requires: if T: Borrow<Q>, then hash(t) == hash(t.borrow()).

This means if Ident: Borrow<str>, then ident.hash() must equal ident.as_str().hash() - which defeats the purpose of precomputed hashing.

How rustc's Symbol handles this:

  • Does NOT implement Borrow<str>
  • Uses derived Hash on the index (fast integer hash)
  • Requires interning strings first - all lookups use Symbol, not &str

Changes in this PR:

  1. Remove Borrow<str> from Ident
  2. Hash::hash() uses precomputed hash (fast!)
  3. Call sites updated: set.contains(name.as_str())set.contains(&name)

Sources:

@github-actions github-actions bot added A-linter Area - Linter A-parser Area - Parser A-cli Area - CLI A-ast Area - AST A-ast-tools Area - AST tools A-linter-plugins Area - Linter JS plugins labels Jan 22, 2026
@overlookmotel
Copy link
Member

overlookmotel commented Jan 22, 2026

I can see another potential solution:

  1. Implement Hash on Ident as ident.as_str().hash(hasher). So Ident::hash and str::hash are equivalent.

  2. Add a wrapper type which uses optimized hashing scheme.

#[repr(transparent)]
struct FastHashIdent<'a>(Ident<'a>);

// Optimized hash
impl Hash for FastHashIdent<'_> {
    #[inline]
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        let hash = self.0.len_and_hash().rotate_left(25);
        hasher.write_u64(hash);
    }
}
  1. Only use the optimized hashing in IdentHashMap.

IdentHashMap uses the wrapper type for hash map keys.

struct IdentHashMap<'a, V> {
    inner: hashbrown::HashMap<FastHashIdent<'a>, V, NoopBuildHasher, &'a Bump>,
}

impl IdentHashMap<'a, V> {
    pub fn insert(&mut self, ident: Ident<'a>, v: V) -> Option<V> {
        let wrapped = FastHashIdent(ident);
        self.inner.insert(wrapped, v)
    }

    // ... other `HashMap` methods
}

We need an IdentHashMap type anyway, as we need to use a no-op BuildHasher (not FxHashBuilder) for these hashmaps.

Result:

  • Ident can remain Borrow<str>.
  • Hashmaps keyed with Ident should use IdentHashMap to get better perf, but Ident still works as key of normal HashMaps.
  • Switching hashmaps over to IdentHashMap can be done incrementally.

EDIT: Or just ditch Borrow<str>. I don't think we need it. We can still implement Deref to &str and AsRef<str> for Ident. Those don't have the same requirements.

Or actually, Ident should probably IMO deref to Atom, and Atom deref to &str.

@Boshen
Copy link
Member Author

Boshen commented Jan 22, 2026

@overlookmotel Question about the Ident design:

The new Ident uses a u64 for len_and_hash (length in bottom 32 bits, hash in top 32 bits), which requires 8-byte alignment on 32-bit platforms. The old Ident (which was just &str) had 4-byte alignment on 32-bit.

This alignment change cascades through the AST - for example, IdentifierName goes from 16 bytes to 24 bytes on 32-bit, and enums containing it also grow.

Alternative design using two u32 fields:

pub struct Ident<'a> {
    ptr: NonNull<u8>,    // 4 bytes on 32-bit
    len: u32,            // 4 bytes
    hash: u32,           // 4 bytes
    _marker: PhantomData<&'a str>,
}

This would be:

  • 64-bit: 16 bytes, align 8 (with padding)
  • 32-bit: 12 bytes, align 4

This keeps 4-byte alignment on 32-bit like the old &str-based Ident.

Should we use separate len and hash fields to avoid the alignment/size increase on 32-bit platforms? Or is the current u64 approach acceptable?

@overlookmotel
Copy link
Member

overlookmotel commented Jan 22, 2026

This is a pain, but I think we need different implementations for 32 bit and 64 bit. I found that packing length + hash into a single u64 produced tighter assembly on 64 bit platforms, and allows Ident to be passed between functions in 2 x registers, instead of requiring 3 (quite a significant difference).

But, as you say, it's advantageous to store length and hash in 2 x u32s on 32 bit.

I'd suggest sticking with len_and_hash: u64 to start with, and then producing a more optimal 32 bit version later on.

@Boshen
Copy link
Member Author

Boshen commented Jan 22, 2026

But, as you say, it's advantageous to store length and hash in 2 x u32s on 32 bit.

I don't, Claude did, you are talking to Claude ;-)

I'd suggest sticking with len_and_hash: u64 to start with, and then producing a more optimal 32 bit version later on.

  error[E0080]: evaluation panicked: assertion failed: size_of::<IdentifierName>() == 16
      --> crates/oxc_ast/src/generated/assert_layouts.rs:1650:5
       |
  1650 |     assert!(size_of::<IdentifierName>() == 16);
       |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ evaluation of `generated::assert_layouts::_` failed here

@overlookmotel
Copy link
Member

https://oxc.rs/docs/contribute/introduction.html

You are responsible for all AI-generated issues or PRs you submit

😄 🤖

error[E0080]: evaluation panicked: assertion failed: size_of::<IdentifierName>() == 16

Set a custom layout for Ident in calculate_layout_for_primitive in ast_tools assert_layouts.rs.

@Boshen
Copy link
Member Author

Boshen commented Jan 22, 2026

I'm going to continue from 2 u32s to get this PR rolling

@overlookmotel
Copy link
Member

overlookmotel commented Jan 22, 2026

About len_and_hash: Actually, the hash implementation needs to be different on 32 bit too. If I remember right, hashbrown behaves differently on 32-bit systems - it only uses the bottom 32 bits of the u64 hash or something.

I suggest a separate LenAndHash struct which abstracts away the platform difference. Something like:

#[cfg(target_pointer_width = "64")]
mod len_and_hash {
    #[derive(Clone, Copy)]
    #[repr(transparent)]
    pub struct LenAndHash(u64);

    impl LenAndHash {
        pub fn new(len: u32, hash: u32) -> Self { Self(u64::from(len) | (u64::from(hash) << 32)) }
        pub fn len(self) -> u32 { self.0 as u32 }
        pub fn hash(self) -> u32 { (self.0 >> 32) as u32 }
        pub fn hash_for_hashing(self) -> u64 { self.0.rotate_left(25) }
    }
}

#[cfg(not(target_pointer_width = "64"))]
mod len_and_hash {
    #[derive(Clone, Copy)]
    pub struct LenAndHash {
        len: u32,
        hash: u32,
    }

    impl LenAndHash {
        pub fn new(len: u32, hash: u32) -> Self { Self { len, hash} }
        pub fn len(self) -> u32 { self.len }
        pub fn hash(self) -> u32 { self.hash }
        pub fn hash_for_hashing(self) -> u64 { self.hash as u64 }
    }
}

use len_and_hash::LenAndHash;

#[derive(Clone, Copy)]
#[repr(C)]
pub struct Ident<'a> {
    ptr: NonNull<u8>,
    len_and_hash: LenAndHash,
    _marker: PhantomData<&'a str>,
}

Then use hash_for_hashing for the Hash impl.

Copy link
Member Author

Boshen commented Jan 23, 2026


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Boshen Boshen force-pushed the refactor/ident-precomputed-hash branch from 1e0f0a9 to 4b0ab24 Compare January 23, 2026 08:14
@Boshen Boshen changed the title refactor(oxc_str): upgrade Ident with precomputed hash for fast equality feat(oxc_str): upgrade Ident with precomputed hash for fast equality Jan 23, 2026
@github-actions github-actions bot added the C-enhancement Category - New feature or request label Jan 23, 2026
@Boshen Boshen force-pushed the refactor/ident-precomputed-hash branch from 4b0ab24 to c731466 Compare January 23, 2026 08:45
@graphite-app graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Jan 23, 2026
@graphite-app
Copy link
Contributor

graphite-app bot commented Jan 23, 2026

Merge activity

graphite-app bot pushed a commit that referenced this pull request Jan 23, 2026
…18400)

## Summary

Upgrade `Ident` to store a precomputed hash for fast equality checks and efficient hash map operations.

### Changes

- Change `Ident` layout from simple `&str` wrapper to `NonNull<u8>` + `len: u32` + `hash: u32`
- Fast `PartialEq` between `Ident`s: compare length and hash first before doing full string comparison
- `Hash` impl uses precomputed hash (no `Borrow<str>` - lookups must use `&Ident`)
- Add `Ident::new()` and `Ident::len()` methods
- Add specific `PartialEq` implementations for `&Ident`, `&str`, `String`, `Atom`, `&Atom`
- Add `rustc-hash` dependency for `FxHasher`
- Fix enum layout calculation in ast_tools to round size to alignment

### Layout

- **64-bit**: 16 bytes, align 8 (ptr 8 + len 4 + hash 4)
- **32-bit**: 12 bytes, align 4 (ptr 4 + len 4 + hash 4)

Using two separate `u32` fields instead of a single `u64` keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs.

🤖 Generated with [Claude Code](https://claude.ai/code)
@graphite-app graphite-app bot force-pushed the refactor/ident-precomputed-hash branch from c731466 to cb0beca Compare January 23, 2026 12:45
@Boshen Boshen marked this pull request as draft January 23, 2026 12:45
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Jan 23, 2026
@Boshen Boshen force-pushed the refactor/ident-precomputed-hash branch 2 times, most recently from de43879 to ec7dd07 Compare January 24, 2026 03:12
@Boshen Boshen marked this pull request as ready for review January 24, 2026 03:12
@Boshen Boshen marked this pull request as draft January 24, 2026 05:26
@Boshen Boshen force-pushed the refactor/ident-precomputed-hash branch from ec7dd07 to a2af5d6 Compare January 24, 2026 11:16
@Boshen Boshen marked this pull request as ready for review January 24, 2026 11:16
@Boshen Boshen marked this pull request as draft January 24, 2026 11:17
…18400)

## Summary

Upgrade `Ident` to store a precomputed hash for fast equality checks and efficient hash map operations.

### Changes

- Change `Ident` layout from simple `&str` wrapper to `NonNull<u8>` + `len: u32` + `hash: u32`
- Fast `PartialEq` between `Ident`s: compare length and hash first before doing full string comparison
- `Hash` impl uses precomputed hash (no `Borrow<str>` - lookups must use `&Ident`)
- Add `Ident::new()` and `Ident::len()` methods
- Add specific `PartialEq` implementations for `&Ident`, `&str`, `String`, `Atom`, `&Atom`
- Add `rustc-hash` dependency for `FxHasher`
- Fix enum layout calculation in ast_tools to round size to alignment

### Layout

- **64-bit**: 16 bytes, align 8 (ptr 8 + len 4 + hash 4)
- **32-bit**: 12 bytes, align 4 (ptr 4 + len 4 + hash 4)

Using two separate `u32` fields instead of a single `u64` keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs.

🤖 Generated with [Claude Code](https://claude.ai/code)
@Boshen Boshen force-pushed the refactor/ident-precomputed-hash branch from a2af5d6 to f460a88 Compare January 24, 2026 13:41
Boshen added a commit that referenced this pull request Jan 24, 2026
…18400)

## Summary

Upgrade `Ident` to store a precomputed hash for fast equality checks and efficient hash map operations.

### Changes

- Change `Ident` layout from simple `&str` wrapper to `NonNull<u8>` + `len: u32` + `hash: u32`
- Fast `PartialEq` between `Ident`s: compare length and hash first before doing full string comparison
- `Hash` impl uses precomputed hash (no `Borrow<str>` - lookups must use `&Ident`)
- Add `Ident::new()` and `Ident::len()` methods
- Add specific `PartialEq` implementations for `&Ident`, `&str`, `String`, `Atom`, `&Atom`
- Add `rustc-hash` dependency for `FxHasher`
- Fix enum layout calculation in ast_tools to round size to alignment

### Layout

- **64-bit**: 16 bytes, align 8 (ptr 8 + len 4 + hash 4)
- **32-bit**: 12 bytes, align 4 (ptr 4 + len 4 + hash 4)

Using two separate `u32` fields instead of a single `u64` keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs.

🤖 Generated with [Claude Code](https://claude.ai/code)
@Boshen Boshen closed this Feb 6, 2026
@overlookmotel overlookmotel deleted the refactor/ident-precomputed-hash branch February 10, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ast Area - AST A-ast-tools Area - AST tools A-cli Area - CLI A-isolated-declarations Isolated Declarations A-linter Area - Linter A-linter-plugins Area - Linter JS plugins A-parser Area - Parser C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior C-enhancement Category - New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments