Skip to content

/./u and /[^x]/u matching surrogate halves individually #16

@mathiasbynens

Description

@mathiasbynens

Reported by Marja Hölttä:

var string = '𝌆𝌆';
var match = string.match(/(....)/u);
console.log(match[1]);

I checked that the same behavior occurs for other character classes too, like this:

var string = 'a𝌆b';
var match = string.match(/a([^c][^c])b/u);
console.log(match[1]); // 𝌆

And as a bonus, it transforms /(.+)\1/u to this:

> r = /((?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF])+)\1/
> r.test('𝜆𝌆𝌇') // code units: D835 DF06 D834 DF06 D834 DF07
true

…which is pretty surprising :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions