Skip to content

TomStrepsil/regex-partial-match

regex-partial-match

A zero-dependency regular expression transform for partial matching, enabling validation of incomplete input strings against regex patterns.

Problem statement

Unlike C/C++ (via PCRE/PCRE2, RE2, Boost.Regex), Python (via third party regex module) or Java (via hitEnd), Javascript has no canonical / innate partial-matching for regular expressions.

Overview

This library transforms regular expressions to best-effort support partial matching, allowing you to test if an incomplete string could potentially match the full pattern. This is particularly useful for real-time input validation, autocomplete systems, progressive form validation, stream chunk matching, etc.

Based on an algorithm created by Lucas Trzesniewski, re-created for NPM via ISC license, with permission.

Installation

npm install regex-partial-match

Usage

Basic Usage

import createPartialMatchRegex from "regex-partial-match";

const pattern = /hello world/;
const partial = createPartialMatchRegex(pattern);

partial.test("h"); // true - could match
partial.test("hello"); // true - could match
partial.test("hello world"); // true - full match
partial.test("goodbye"); // false - cannot match

Extending RegExp.prototype

import "regex-partial-match/extend";

const partial = /hello world/.toPartialMatchRegex();

partial.test("hel"); // true

How It Works

The library transforms a regular expression by wrapping each atomic element in a non-capturing group with an alternation to end-of-input ($):

/abc/  /(?:a|$)(?:b|$)(?:c|$)/

This allows the pattern to match prefixes of the original pattern, enabling validation of incomplete input.

Since the library accepts only valid regular expressions 1, this enables the algorithm to make lots of unguarded assumptions about the source of the expression.

The library has been stress-tested with various regular expression features in isolation, and some in likely combination, but obviously its an unbounded test space, and syntactically valid regular expressions nevertheless support contradictory patterns e.g.

  • /\b\B/ - impossible to match both a word boundary and a non-word boundary
  • /$^/ - end cannot come before start
  • x{2}? - lazy quantifiers are mutually exclusive to fixed-length assertions

Such combinations have not been tested.

Supported Features

Unsupported Features

The following regex features are not currently supported:

Browser Compatibility

The library is compiled to ES5 for broad compatibility with older browsers and JavaScript environments. However, certain regular expression features naturally require ES2015+ support:

Caveats

.test() behaviour and non-matching results from .exec() and .match()

The library produces an expression that always matches an empty string, at the end of the input. Feasibly, this is the start of a new partial match.

Hence:

/x/.test("a") === false; /* untransformed regex */
/(?:x|$)/.test("a") === true; /* what's produced by the library */

To mitigate, a start boundary anchor can prevent anything but an empty string matching:

/^(?:x|$)/.test("") === true;
/^(?:x|$)/.test("x") === true;
/^(?:x|$)/.test("a") === false;

On this basis, .test() should be used with caution, and a match of an empty string at the end of the input should instead be considered "no match", if validating that which came before.

i.e.

/(?:x|$)/.exec("a"); // ['', index: 1, input: "a", groups: undefined];
"a".match(/(?:x|$)/); // ['', index: 1, input: "a", groups: undefined];

Since the library produces a native RegExp object, no attempt to proxy / translate this output to null has been attempted, but a helper could be produced in future, for clarity. See issue.

Backreferences

Backreferences cannot be partially matched because they are atomic. A backreference like \1 must match the complete captured text or fail entirely, and cannot be split into individual characters for partial matching like regular atoms can.

Fixed-length patterns like /(abc)\1/ could theoretically become /(?:(a)|$)(?:(b)|$)(?:(c)|$)(?:\1|$)(?:\2|$)(?:\3|$)/ (accepting polluted capture indexes as a side-effect), but this doesn't work for variable-length captures.

Positive Lookbehinds

Whilst forming a match, a positive lookbehind must match in entirety, for the pattern to match. This is inherent in the concept of non-matching groups, since they are not match-worthy themselves, but just qualify matching atoms.

e.g.

/(?<=foo)bar/;

"f" through "foo" is not a match, but "foob" is.

Surrogate Pair Matching

In unicode-aware mode (u flag), only whole astral characters are supported. Partial matching of individual surrogate pairs is not supported. For example, /😀/u will match the complete emoji character, but not the first surrogate pair in isolation. Hence, if partially matching a byte stream, be sure to pipe via a TextDecoder first.

Sticky Flag (y)

The sticky flag may not behave as expected in partial matching scenarios. The sticky flag requires matches to start at lastIndex, but a partial match failure resets lastIndex to 0. This means subsequent attempts cannot "continue" from where the previous match failed, making progressive character-by-character validation problematic.

Example:

const pattern = /hello/y;
const partial = createPartialMatchRegex(pattern);

pattern.lastIndex = 0;
partial.test("h"); // succeeds, lastIndex advances
partial.test("he"); // succeeds, but lastIndex was reset by previous test
// Cannot reliably continue partial matching with sticky flag

Recommendation: Avoid using the y flag with partial matching unless you fully understand the implications.

Global Flag (g)

The global flag is preserved but may not be necessary for partial matching use cases. The g flag affects behavior when using .exec() repeatedly to find all matches, but partial matching typically validates a single prefix at a time.

The global flag does not cause issues like the sticky flag, as partial patterns naturally match from the beginning of the input. However, if you're using lastIndex to track position, be aware that failed matches will reset it to 0.

Examples

Form Validation

const emailPattern = /^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i;
const partial = createPartialMatchRegex(emailPattern);

function validateEmail(input) {
  return partial.test(input) ? "valid" : "invalid";
}

validateEmail("user"); // 'valid' - could become valid
validateEmail("user@"); // 'valid' - could become valid
validateEmail("user@example"); // 'valid' - could become valid
validateEmail("[email protected]"); // 'valid' - complete match
validateEmail("@@invalid"); // 'invalid' - cannot match

Autocomplete

const commandPattern = /^(help|quit|save|load)/;
const partial = createPartialMatchRegex(commandPattern);

function getSuggestions(input) {
  return partial.test(input) ? "valid prefix" : "no suggestions";
}

getSuggestions("h"); // 'valid prefix'
getSuggestions("hel"); // 'valid prefix'
getSuggestions("help"); // 'valid prefix'
getSuggestions("xyz"); // 'no suggestions'

Stream Processing

// Process streaming data with pattern matching at chunk boundaries
const pattern = /\{"[^"]+":"[^"]+"\}/; // Match JSON objects
const partial = createPartialMatchRegex(pattern);
let buffer = "";

function processChunk(chunk) {
  buffer += chunk;
  const matches = [];

  // Extract complete matches
  let match;
  while ((match = pattern.exec(buffer))) {
    matches.push(match[0]);
    buffer = buffer.slice(match.index + match[0].length);
  }

  // Discard buffer if it cannot possibly complete
  if (buffer && !partial.test(buffer)) {
    buffer = "";
  }

  return matches;
}

processChunk('{"na'); // [] - partial, buffer: '{"na'
processChunk('me":"Jo'); // [] - partial, buffer: '{"name":"Jo'
processChunk('hn"}{"age":'); // ['{"name":"John"}'] - buffer: '{"age":'
processChunk("25}"); // ['{"age":25}'] - buffer: ''
processChunk("invalid{"); // [] - discarded, buffer: ''

Useful for parsing log files, network streams, or any chunked data where records may be split across boundaries.

API

createPartialMatchRegex(regex: RegExp): RegExp

Transforms a regular expression to support partial matching.

Available via the default entry point of the package.

Parameters:

  • regex - The regular expression to transform

Returns:

  • A new RegExp that matches partial strings

RegExp.prototype.toPartialMatchRegex(): RegExp

When using import 'regex-partial-match/extend', this method is added to RegExp.prototype.

Returns:

  • A new RegExp that matches partial strings, created from the RegExp instance the method was called on.

License

ISC License - see LICENSE file for details.

Credits

Algorithm created by Lucas Trzesniewski.

Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

Related projects

Project Description
incr-regex-package Incremental regex matcher
dfa Compiles a regular expression like syntax to fast deterministic finite automata, which could be used to partial match?
refa Can convert regular expressions to an Abstract Syntax Tree, which might afford partial-match capability?
@eslint-community/regexpp A regular expression parser for ECMAScript with AST generation and visitor implementation
Regex+ template literal, transforming native regular expressions
Awesome Regex Curated list of tools, tutorials, libraries, and other resources, covering all major regex flavours

Footnotes

  1. To remain lightweight, no runtime type validation is applied, so non-typescript consumers will be reliant on underlying errors thrown, if used incorrectly.

About

A regular expression transform for partial matching

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •