feat(oxc_str): upgrade Ident with precomputed hash for fast equality by Boshen · Pull Request #18400 · oxc-project/oxc

Boshen · 2026-01-22T11:29:07Z

Summary

Upgrade Ident to store a precomputed hash for fast equality checks and efficient hash map operations.

Changes

Change Ident layout from simple &str wrapper to NonNull<u8> + len: u32 + hash: u32
Fast PartialEq between Idents: compare length and hash first before doing full string comparison
Hash impl uses precomputed hash (no Borrow<str> - lookups must use &Ident)
Add Ident::new() and Ident::len() methods
Add specific PartialEq implementations for &Ident, &str, String, Atom, &Atom
Add rustc-hash dependency for FxHasher
Fix enum layout calculation in ast_tools to round size to alignment

Layout

64-bit: 16 bytes, align 8 (ptr 8 + len 4 + hash 4)
32-bit: 12 bytes, align 4 (ptr 4 + len 4 + hash 4)

Using two separate u32 fields instead of a single u64 keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs.

🤖 Generated with Claude Code

crates/oxc_str/src/ident.rs

Copilot

Pull request overview

This PR refactors the Ident type to optimize equality checks and hash operations by precomputing and storing a hash value alongside the string data. The implementation packs the string length and hash into a single 64-bit value, with length in the lower 32 bits and FxHash in the upper 32 bits.

Changes:

Upgraded Ident from a simple &str wrapper to a structure containing NonNull<u8>, packed len_and_hash, and lifetime marker
Implemented fast equality comparison by checking len_and_hash before full string comparison
Added multiple PartialEq implementations for interoperability with &str, String, Atom, etc.
Fixed Hash implementation to delegate to str::hash() for compatibility with mixed-type hash containers

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 9 comments.

File	Description
crates/oxc_str/src/ident.rs	Complete refactoring of Ident structure with new memory layout, hash precomputation, equality optimizations, and comprehensive trait implementations
crates/oxc_str/Cargo.toml	Added rustc-hash dependency for FxHasher
Cargo.lock	Updated lockfile to reflect new rustc-hash dependency

Comments suppressed due to low confidence (1)

crates/oxc_str/src/ident.rs:523

The test for hash compatibility only tests Ident-to-Ident lookup, which doesn't actually verify the stated purpose. To properly test hash compatibility with str, you should use the Borrow trait to look up an Ident using a &str key. For example: set.get("foo") or test in a HashMap where you insert with Ident and lookup with &str.

        use rustc_hash::FxHashSet;

        // This test verifies that Ident's Hash impl is compatible with str lookups
        // which is required for correct behavior in mixed hash containers
        let mut set: FxHashSet<Ident<'_>> = FxHashSet::default();
        set.insert(Ident::new("foo"));

        // The hash of Ident("foo") should allow finding it via another Ident("foo")
        let lookup = Ident::new("foo");
        assert!(set.contains(&lookup));
    }
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/oxc_str/src/ident.rs

codspeed-hq · 2026-01-22T11:34:51Z

CodSpeed Performance Report

Merging this PR will degrade performance by 4.09%

_{Comparing refactor/ident-precomputed-hash (f460a88) with main (87684d7)¹}

Summary

❌ 2 regressed benchmarks
✅ 40 untouched benchmarks
⏩ 3 skipped benchmarks²

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`parser[binder.ts]`	3.1 ms	3.3 ms	-4.09%
❌	Simulation	`parser[react.development.js]`	1.2 ms	1.3 ms	-3.06%

No successful run was found on main (993fd2b) during the generation of this report, so 87684d7 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Boshen · 2026-01-22T11:57:10Z

Design Constraint: `Borrow<str>` vs Precomputed Hash

There's a fundamental conflict between two requirements:

1. Precomputed Hash for Performance

The optimized Ident stores a precomputed FxHash in len_and_hash for fast hash map lookups:

impl Hash for Ident<'_> {
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        let hash = self.len_and_hash.rotate_left(32 - 7);
        hasher.write_u64(hash);
    }
}

2. `Borrow<str>` for Heterogeneous Lookups

Code like this requires Borrow<str>:

// In oxc_isolated_declarations/src/lib.rs:634
can_expando_function_names.contains(ident.name.as_str())

The Problem

Rust's Borrow trait contract requires: if T: Borrow<Q>, then hash(t) == hash(t.borrow()).

This means if Ident: Borrow<str>, then ident.hash() must equal ident.as_str().hash().

We cannot have both:

✅ Custom Hash using precomputed hash
✅ Borrow<str> for heterogeneous lookups

Options

Keep Borrow<str>, delegate Hash to str - Correct but loses hash optimization
Remove Borrow<str>, use precomputed Hash - Fast but requires changing call sites to use &Ident instead of &str for lookups
Use hashbrown::raw_entry API - Allows custom hashing during lookup, but more complex

@Boshen Which approach do you prefer?

Boshen · 2026-01-22T11:57:50Z

@camchenry see above, we have a problem :-)

Boshen · 2026-01-22T12:05:01Z

Oh, we just need to fix the call site, but it's not trivial for people 🤔

Boshen · 2026-01-22T12:05:23Z

Solution Found

After research into how Rust compilers handle this (rustc's Symbol, string interning crates), the solution is:

Remove `Borrow<str>`, Use Precomputed Hash

Why Borrow<str> is incompatible with precomputed hash:

The Rust Borrow trait contract requires: if T: Borrow<Q>, then hash(t) == hash(t.borrow()).

This means if Ident: Borrow<str>, then ident.hash() must equal ident.as_str().hash() - which defeats the purpose of precomputed hashing.

How rustc's Symbol handles this:

Does NOT implement Borrow<str>
Uses derived Hash on the index (fast integer hash)
Requires interning strings first - all lookups use Symbol, not &str

Changes in this PR:

Remove Borrow<str> from Ident
Hash::hash() uses precomputed hash (fast!)
Call sites updated: set.contains(name.as_str()) → set.contains(&name)

Sources:

overlookmotel · 2026-01-22T12:22:41Z

I can see another potential solution:

Implement Hash on Ident as ident.as_str().hash(hasher). So Ident::hash and str::hash are equivalent.
Add a wrapper type which uses optimized hashing scheme.

#[repr(transparent)]
struct FastHashIdent<'a>(Ident<'a>);

// Optimized hash
impl Hash for FastHashIdent<'_> {
    #[inline]
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        let hash = self.0.len_and_hash().rotate_left(25);
        hasher.write_u64(hash);
    }
}

Only use the optimized hashing in IdentHashMap.

IdentHashMap uses the wrapper type for hash map keys.

struct IdentHashMap<'a, V> {
    inner: hashbrown::HashMap<FastHashIdent<'a>, V, NoopBuildHasher, &'a Bump>,
}

impl IdentHashMap<'a, V> {
    pub fn insert(&mut self, ident: Ident<'a>, v: V) -> Option<V> {
        let wrapped = FastHashIdent(ident);
        self.inner.insert(wrapped, v)
    }

    // ... other `HashMap` methods
}

We need an IdentHashMap type anyway, as we need to use a no-op BuildHasher (not FxHashBuilder) for these hashmaps.

Result:

Ident can remain Borrow<str>.
Hashmaps keyed with Ident should use IdentHashMap to get better perf, but Ident still works as key of normal HashMaps.
Switching hashmaps over to IdentHashMap can be done incrementally.

EDIT: Or just ditch Borrow<str>. I don't think we need it. We can still implement Deref to &str and AsRef<str> for Ident. Those don't have the same requirements.

Or actually, Ident should probably IMO deref to Atom, and Atom deref to &str.

Boshen · 2026-01-22T12:51:15Z

@overlookmotel Question about the Ident design:

The new Ident uses a u64 for len_and_hash (length in bottom 32 bits, hash in top 32 bits), which requires 8-byte alignment on 32-bit platforms. The old Ident (which was just &str) had 4-byte alignment on 32-bit.

This alignment change cascades through the AST - for example, IdentifierName goes from 16 bytes to 24 bytes on 32-bit, and enums containing it also grow.

Alternative design using two u32 fields:

pub struct Ident<'a> {
    ptr: NonNull<u8>,    // 4 bytes on 32-bit
    len: u32,            // 4 bytes
    hash: u32,           // 4 bytes
    _marker: PhantomData<&'a str>,
}

This would be:

64-bit: 16 bytes, align 8 (with padding)
32-bit: 12 bytes, align 4

This keeps 4-byte alignment on 32-bit like the old &str-based Ident.

Should we use separate len and hash fields to avoid the alignment/size increase on 32-bit platforms? Or is the current u64 approach acceptable?

overlookmotel · 2026-01-22T13:00:27Z

This is a pain, but I think we need different implementations for 32 bit and 64 bit. I found that packing length + hash into a single u64 produced tighter assembly on 64 bit platforms, and allows Ident to be passed between functions in 2 x registers, instead of requiring 3 (quite a significant difference).

But, as you say, it's advantageous to store length and hash in 2 x u32s on 32 bit.

I'd suggest sticking with len_and_hash: u64 to start with, and then producing a more optimal 32 bit version later on.

Boshen · 2026-01-22T13:16:58Z

But, as you say, it's advantageous to store length and hash in 2 x u32s on 32 bit.

I don't, Claude did, you are talking to Claude ;-)

I'd suggest sticking with len_and_hash: u64 to start with, and then producing a more optimal 32 bit version later on.

  error[E0080]: evaluation panicked: assertion failed: size_of::<IdentifierName>() == 16
      --> crates/oxc_ast/src/generated/assert_layouts.rs:1650:5
       |
  1650 |     assert!(size_of::<IdentifierName>() == 16);
       |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ evaluation of `generated::assert_layouts::_` failed here

overlookmotel · 2026-01-22T13:50:02Z

https://oxc.rs/docs/contribute/introduction.html

You are responsible for all AI-generated issues or PRs you submit

😄 🤖

error[E0080]: evaluation panicked: assertion failed: size_of::<IdentifierName>() == 16

Set a custom layout for Ident in calculate_layout_for_primitive in ast_tools assert_layouts.rs.

Boshen · 2026-01-22T13:53:36Z

I'm going to continue from 2 u32s to get this PR rolling

overlookmotel · 2026-01-22T13:59:41Z

About len_and_hash: Actually, the hash implementation needs to be different on 32 bit too. If I remember right, hashbrown behaves differently on 32-bit systems - it only uses the bottom 32 bits of the u64 hash or something.

I suggest a separate LenAndHash struct which abstracts away the platform difference. Something like:

#[cfg(target_pointer_width = "64")]
mod len_and_hash {
    #[derive(Clone, Copy)]
    #[repr(transparent)]
    pub struct LenAndHash(u64);

    impl LenAndHash {
        pub fn new(len: u32, hash: u32) -> Self { Self(u64::from(len) | (u64::from(hash) << 32)) }
        pub fn len(self) -> u32 { self.0 as u32 }
        pub fn hash(self) -> u32 { (self.0 >> 32) as u32 }
        pub fn hash_for_hashing(self) -> u64 { self.0.rotate_left(25) }
    }
}

#[cfg(not(target_pointer_width = "64"))]
mod len_and_hash {
    #[derive(Clone, Copy)]
    pub struct LenAndHash {
        len: u32,
        hash: u32,
    }

    impl LenAndHash {
        pub fn new(len: u32, hash: u32) -> Self { Self { len, hash} }
        pub fn len(self) -> u32 { self.len }
        pub fn hash(self) -> u32 { self.hash }
        pub fn hash_for_hashing(self) -> u64 { self.hash as u64 }
    }
}

use len_and_hash::LenAndHash;

#[derive(Clone, Copy)]
#[repr(C)]
pub struct Ident<'a> {
    ptr: NonNull<u8>,
    len_and_hash: LenAndHash,
    _marker: PhantomData<&'a str>,
}

Then use hash_for_hashing for the Hash impl.

crates/oxc_str/src/ident.rs

Boshen · 2026-01-23T03:01:03Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

graphite-app · 2026-01-23T12:44:45Z

Merge activity

Jan 23, 12:44 PM UTC: Boshen added this pull request to the Graphite merge queue.
Jan 23, 12:47 PM UTC: The Graphite merge queue couldn't merge this PR because it was in draft mode.

…18400) ## Summary Upgrade `Ident` to store a precomputed hash for fast equality checks and efficient hash map operations. ### Changes - Change `Ident` layout from simple `&str` wrapper to `NonNull<u8>` + `len: u32` + `hash: u32` - Fast `PartialEq` between `Ident`s: compare length and hash first before doing full string comparison - `Hash` impl uses precomputed hash (no `Borrow<str>` - lookups must use `&Ident`) - Add `Ident::new()` and `Ident::len()` methods - Add specific `PartialEq` implementations for `&Ident`, `&str`, `String`, `Atom`, `&Atom` - Add `rustc-hash` dependency for `FxHasher` - Fix enum layout calculation in ast_tools to round size to alignment ### Layout - **64-bit**: 16 bytes, align 8 (ptr 8 + len 4 + hash 4) - **32-bit**: 12 bytes, align 4 (ptr 4 + len 4 + hash 4) Using two separate `u32` fields instead of a single `u64` keeps 4-byte alignment on 32-bit platforms, avoiding size increases in AST structs. 🤖 Generated with [Claude Code](https://claude.ai/code)

Copilot AI review requested due to automatic review settings January 22, 2026 11:29

github-actions bot added the C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior label Jan 22, 2026

Copilot started reviewing on behalf of Boshen January 22, 2026 11:29 View session

Boshen force-pushed the refactor/ident-precomputed-hash branch from 41be0e3 to 5274050 Compare January 22, 2026 11:29

Boshen commented Jan 22, 2026

View reviewed changes

crates/oxc_str/src/ident.rs Outdated Show resolved Hide resolved

Copilot AI reviewed Jan 22, 2026

View reviewed changes

Boshen force-pushed the refactor/ident-precomputed-hash branch from 5274050 to d002ce7 Compare January 22, 2026 12:04

Boshen requested a review from Dunqing as a code owner January 22, 2026 12:04

github-actions bot added the A-isolated-declarations Isolated Declarations label Jan 22, 2026

Boshen requested review from camc314 and overlookmotel as code owners January 22, 2026 12:22

github-actions bot added A-linter Area - Linter A-parser Area - Parser A-cli Area - CLI A-ast Area - AST A-ast-tools Area - AST tools A-linter-plugins Area - Linter JS plugins labels Jan 22, 2026

camchenry reviewed Jan 22, 2026

View reviewed changes

crates/oxc_str/src/ident.rs Show resolved Hide resolved

Boshen mentioned this pull request Jan 23, 2026

refactor(semantic): add ArenaIdentHashMap and replace IdentHashMap callsites #18425

Closed

Boshen force-pushed the refactor/ident-precomputed-hash branch from 1e0f0a9 to 4b0ab24 Compare January 23, 2026 08:14

Boshen changed the title ~~refactor(oxc_str): upgrade Ident with precomputed hash for fast equality~~ feat(oxc_str): upgrade Ident with precomputed hash for fast equality Jan 23, 2026

github-actions bot added the C-enhancement Category - New feature or request label Jan 23, 2026

Boshen force-pushed the refactor/ident-precomputed-hash branch from 4b0ab24 to c731466 Compare January 23, 2026 08:45

This was referenced Jan 23, 2026

perf(all): experiment: add dedicated identifier string type #16416

Closed

refactor(parser): add cur_ident() for identifiers with precomputed hash #18449

Closed

graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Jan 23, 2026

graphite-app bot force-pushed the refactor/ident-precomputed-hash branch from c731466 to cb0beca Compare January 23, 2026 12:45

Boshen marked this pull request as draft January 23, 2026 12:45

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Jan 23, 2026

This was referenced Jan 23, 2026

refactor(oxc_str): add IncrementalIdentHasher for byte-by-byte hash computation #18451

Closed

perf(semantic): compute identifier hash incrementally during lexing #18454

Closed

Boshen force-pushed the refactor/ident-precomputed-hash branch 2 times, most recently from de43879 to ec7dd07 Compare January 24, 2026 03:12

Boshen marked this pull request as ready for review January 24, 2026 03:12

Boshen marked this pull request as draft January 24, 2026 05:26

Boshen mentioned this pull request Jan 24, 2026

perf(semantic): preserve precomputed identifier hash throughout semantic analysis #18476

Closed

Boshen force-pushed the refactor/ident-precomputed-hash branch from ec7dd07 to a2af5d6 Compare January 24, 2026 11:16

Boshen marked this pull request as ready for review January 24, 2026 11:16

Boshen marked this pull request as draft January 24, 2026 11:17

Boshen force-pushed the refactor/ident-precomputed-hash branch from a2af5d6 to f460a88 Compare January 24, 2026 13:41

Boshen mentioned this pull request Jan 24, 2026

Create dedicated Ident type for identifier strings oxc-project/backlog#193

Closed

Boshen closed this Feb 6, 2026

overlookmotel deleted the refactor/ident-precomputed-hash branch February 10, 2026 20:48

Uh oh!

Conversation

Boshen commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Layout

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codspeed-hq bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging this PR will degrade performance by 4.09%

Summary

Performance Changes

Footnotes

Uh oh!

Boshen commented Jan 22, 2026

Design Constraint: Borrow<str> vs Precomputed Hash

1. Precomputed Hash for Performance

2. Borrow<str> for Heterogeneous Lookups

The Problem

Options

Uh oh!

Boshen commented Jan 22, 2026

Uh oh!

Boshen commented Jan 22, 2026

Uh oh!

Boshen commented Jan 22, 2026

Solution Found

Remove Borrow<str>, Use Precomputed Hash

Uh oh!

overlookmotel commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boshen commented Jan 22, 2026

Uh oh!

overlookmotel commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boshen commented Jan 22, 2026

Uh oh!

overlookmotel commented Jan 22, 2026

Uh oh!

Boshen commented Jan 22, 2026

Uh oh!

overlookmotel commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Boshen commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

graphite-app bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Boshen commented Jan 22, 2026 •

edited

Loading

codspeed-hq bot commented Jan 22, 2026 •

edited

Loading

Design Constraint: `Borrow<str>` vs Precomputed Hash

2. `Borrow<str>` for Heterogeneous Lookups

Remove `Borrow<str>`, Use Precomputed Hash

overlookmotel commented Jan 22, 2026 •

edited

Loading

overlookmotel commented Jan 22, 2026 •

edited

Loading

overlookmotel commented Jan 22, 2026 •

edited

Loading

Boshen commented Jan 23, 2026 •

edited

Loading

graphite-app bot commented Jan 23, 2026 •

edited

Loading