Skip to content

Double-escaped RegEx patterns in output JSON schema causing issues with some RegEx flavors #182

@nikitawootten-nist

Description

@nikitawootten-nist

Describe the bug

Some flavors of RegEx (such as Go's regex package https://pkg.go.dev/regexp/syntax and PHP's PCRE) do not support Unicode character classes through the \u{code} syntax. The validation of certain datatypes such as the token type may improperly rely on this RegEx syntax.

Who is the bug affecting?

Tool developers that are trying to parse generated JSON schemas in some RegEx flavors (like Go's regex package or PHP's PCRE)

What is affected by this bug?

The regex present in some output JSON schema patterns is invalid for some RegEx flavors.

When does this occur?

Anytime a Unicode code sequence is placed within a RegEx pattern (such as for the token datatype)

Expected behavior (i.e. solution)

Single escaping the Unicode character pattern \u... instead of \\u... will be interpreted by all the JSON parsers I've tested (Go, JS, and Python) as the Unicode character directly. I have not tested how any of these regex flavors handle Unicode characters directly, but it could be a simple solution to this issue.

Other Comments

This bug is related to #181

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions