Skip to content

Regex throws out of memory while constructing (not matching) simple regex string. #24749

@CyrusNajmabadi

Description

@CyrusNajmabadi

Found while writing dotnet/roslyn#23984 (a usable regex parser for use in IDE scenarios)

Regex pattern is: a{2147483647,}

Exception happens here:

 	System.Text.RegularExpressions.RegexFCD.Prefix(tree)	Unknown
 	System.Text.RegularExpressions.RegexWriter.RegexCodeFromRegexTree(tree)	Unknown
 	System.Text.RegularExpressions.Regex.Regex(pattern, options, matchTimeout, useCache)	Unknown
 	System.Text.RegularExpressions.Regex.Regex(pattern, options)	Unknown
>	Microsoft.CodeAnalysis.CSharp.UnitTests.RegularExpressions.CSharpRegexParserTests.TryParseTree(stringText, options, conversionFailureOk) Line 111	C#
 	Microsoft.CodeAnalysis.CSharp.UnitTests.RegularExpressions.CSharpRegexParserTests.Test(stringText, expected, options, runSubTreeTests, name) Line 34	C#
 	Microsoft.CodeAnalysis.CSharp.UnitTests.RegularExpressions.CSharpRegexParserTests.TestLargeOpenRangeNumericQuantifier1() Line 1651	C#

It looks like this is due to:

https://github.com/dotnet/corefx/blob/353bdc62ccb5c56148bf0e31814f1fd4f84a7fd9/src/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexFCD.cs#L89-L93

Where a string gets padded with the actual number of matches asked for.

THis is problematic on a few levels. First, it means that the .net regex library cannot be used to actually tell if a pattern is legal or not. i.e. if you just want to know if a{2147483647,} is a legal pattern, then your code will crash because there's not enough memory to ask the question. Second, it's not possible to actually perform these matches. These types of matches could be done without actually encoding them as a full string, but rather as an interpreted node of some sort.

--

Note: dotnet/roslyn#23984 comes with several thousands of regex tests that could be adoped by corefx to help ensure that parsing either succeeds, or throws an ArgumentException as per the regex constructor documentation https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.-ctor?view=netframework-4.7.1#System_Text_RegularExpressions_Regex__ctor_System_String_

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions