Skip to content

Explore using Aho-Corasick in Regex's FindFirstChar #62447

@stephentoub

Description

@stephentoub

Given an alternation like "abc|def|ghi", we will currently do an IndexOfAny to find the next position for a possible match. At that point, we try to avoid false positives by doing a few set lookups, e.g. we IndexOfAny('a', 'd', 'g'), and then we test whether the next character is in the set [beh]. But specifically for expressions that are small alternations, or that begin with small alternations, or that begin with constructs we can convert to small alternations, we could instead use Aho-Corasick to actually validate whether any of the strings in the alternation are a prefix of the text at the current location. Rust uses a vectorized implementation named Teddy.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions