Skip to content

Reduce bounds checks in IndexVec #109

@overlookmotel

Description

@overlookmotel

oxc_index::IndexVec contains the following implementation of push:

pub fn push(&mut self, d: T) -> I {
    let idx = I::from_usize(self.len());
    self.raw.push(d);
    idx
}

AstNodeId, ScopeId, SymbolId and ReferenceId are used as indexes on IndexVecs. All are wrappers around NonMaxU32. NonMaxU32::from_usize bounds checks self.len() and panics if it's >= u32::MAX.

IndexVec::push is used extensively, so this extra panicking branch:

  1. Is on a hot path.
  2. Increases the size of push function, which makes it less likely to be inlined.

Vec::push already checks len vs capacity, and has a cold branch where the Vec needs to grow.

We can remove the extra bounds check in IndexVec::push by moving the bounds check into this cold path. As follows:

1. Extend Idx trait

trait Idx {
    const MAX: usize;

    unsafe fn from_usize_unchecked(idx: usize) -> Self;

    fn index(self) -> usize;

    fn from_usize(idx: usize) -> Self {
        assert!(idx <= Self::MAX);
        // SAFETY: We checked `idx` is valid
        unsafe { Self::from_usize_unchecked(idx) }
    }
}
struct AstNodeId(NonMaxU32);

impl Idx for AstNodeId {
    const MAX: usize = NonMaxU32::MAX.get() as usize;

    unsafe fn from_usize_unchecked(idx: usize) -> Self {
        Self(unsafe { NonMaxU32::new_unchecked(idx as u32) })
    }

    fn index(self) -> usize {
        self.0.get() as usize
    }
}

2. Reimplement push

impl<I: Idx, T> IndexVec<I, T> {
    // `std::vec::Vec`'s minimum
    const MIN_CAPACITY: usize = match std::mem::size_of::<T>() {
        1 => 8,
        s if s <= 1024 => 4,
        _ => 1,
    };
    // Smaller of `I::MAX` items or `isize::MAX` bytes
    const MAX_CAPACITY: usize = min(
        I::MAX,
        isize::MAX as usize / max(std::mem::size_of::<T>(), 1),
    );

    #[inline]
    pub fn push(&mut self, value: T) -> I {
        let len = self.raw.len();
        // This branch is rarely taken
        if len == self.raw.capacity() {
            self.grow_one();
        }
        // Manually add `value` to the `Vec` (code copied from `std::vec::Vec`)
        unsafe {
            let end = self.raw.as_mut_ptr().add(len);
            ptr::write(end, value);
            self.raw.set_len(len + 1);
        }
        // SAFETY: `len` cannot exceed `capacity`, and `capacity` is always `<= I::MAX`
        unsafe { I::from_usize_unchecked(len) }
    }

    #[cold]
    #[inline(never)]
    fn grow_one(&mut self) {
        let current_capacity = self.raw.capacity();
        let new_capacity = (current_capacity * 2).max(Self::MIN_CAPACITY).min(Self::MAX_CAPACITY);
        let additional = new_capacity - current_capacity;

        // Bounds check moved here, in cold branch
        assert!(additional > 0);

        // Avoid `reserve_exact()` doing bounds checks again
        let len = self.raw.len();
        unsafe {
            assert_unchecked!(len == current_capacity);
            assert_unchecked!(len.checked_add(additional).is_some());
        }

        self.raw.reserve_exact(additional);
    }
}

const fn min(n1: usize, n2: usize) -> usize {
    if n1 < n2 { n1 } else { n2 }
}
const fn max(n1: usize, n2: usize) -> usize {
    if n1 > n2 { n1 } else { n2 }
}

3. Prevent capacity ever exceeding I::MAX

Amend all other methods to ensure that capacity can never exceed MAX_CAPACITY.

Note: It's capacity which must not exceed MAX_CAPACITY, not len. This allows if len == self.0.capacity() in push to do double-duty of checking if Vec needs to grow, and ensuring indexes are in bounds, with a single check.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions