{"id":1927,"date":"2023-06-01T15:11:28","date_gmt":"2023-06-01T08:11:28","guid":{"rendered":"https:\/\/csharptutorial.net\/csharp-regular-expression\/greedy-quantifiers\/"},"modified":"2023-06-01T15:49:15","modified_gmt":"2023-06-01T08:49:15","slug":"greedy-quantifiers","status":"publish","type":"page","link":"https:\/\/www.csharptutorial.net\/csharp-regular-expression\/greedy-quantifiers\/","title":{"rendered":"Greedy Quantifiers"},"content":{"rendered":"\n<p><strong>Summary<\/strong>: in this tutorial, you will learn how greedy quantifiers work and how to avoid unexpected results by using lazy quantifiers.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Quantifiers are greedy<\/h2>\n\n\n\n<p>Quantifiers are metacharacters in regular expressions that specify the quantity of the preceding element. For example, the + quantifier matches one or more occurrences of the preceding element.<\/p>\n\n\n\n<p>By default, quantifiers are greedy, meaning that the quantifiers always try to match as much of the input text as possible while still allowing the whole pattern to match successfully.<\/p>\n\n\n\n<p>The reason that quantifiers are called greedy is that they try to consume as many characters in the input string as they can.<\/p>\n\n\n\n<p>Let&#8217;s take an example to understand how greedy quantifiers work.<\/p>\n\n\n\n<p>Suppose you have a link tag in an HTML fragment:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"HTML, XML\" data-shcb-language-slug=\"xml\"><span><code class=\"hljs language-xml\"><span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">a<\/span> <span class=\"hljs-attr\">href<\/span>=<span class=\"hljs-string\">\"https:\/\/csharptutorial.net\/\"<\/span> <span class=\"hljs-attr\">target<\/span>=<span class=\"hljs-string\">\"_blank\"<\/span>&gt;<\/span>Click Me<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">a<\/span>&gt;<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">HTML, XML<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">xml<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>To get attribute values like <code>\"https:\/\/csharptutorial.net\/\"<\/code> and <code>\"_blank\"<\/code>, you may come up with the following pattern:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"plaintext\" data-shcb-language-slug=\"plaintext\"><span><code class=\"hljs language-plaintext\">\".+\"<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">plaintext<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">plaintext<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>This pattern matches a text that starts with quotes (<code>\"<\/code>) followed by one or more characters (.) and ends with quotes (<code>\"<\/code>).<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"C#\" data-shcb-language-slug=\"cs\"><span><code class=\"hljs language-cs\"><span class=\"hljs-keyword\">using<\/span> System.Text.RegularExpressions;\r\n<span class=\"hljs-keyword\">using<\/span> <span class=\"hljs-keyword\">static<\/span> System.Console;\r\n\r\n\r\n<span class=\"hljs-keyword\">var<\/span> html = <span class=\"hljs-string\">\"\"<\/span><span class=\"hljs-string\">\"&lt;a href=\"<\/span>https:<span class=\"hljs-comment\">\/\/csharptutorial.net\/\" target=\"blank\"&gt;C# Tutorial&lt;\/a&gt;\"\"\";<\/span>\r\n\r\n<span class=\"hljs-keyword\">var<\/span> pattern = <span class=\"hljs-string\">\"\"<\/span><span class=\"hljs-string\">\" \r\n              \"<\/span>.+<span class=\"hljs-string\">\"\r\n              \"<\/span><span class=\"hljs-string\">\"\"<\/span>;\r\n\r\n<span class=\"hljs-keyword\">var<\/span> matches = Regex.Matches(html, pattern);\r\n<span class=\"hljs-keyword\">foreach<\/span> (<span class=\"hljs-keyword\">var<\/span> match <span class=\"hljs-keyword\">in<\/span> matches)\n{\n    WriteLine(match);\n}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">C#<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">cs<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"note\">Note that we use a <a href=\"https:\/\/csharptutorial.net\/csharp-tutorial\/csharp-raw-string\/\">raw string<\/a> for an HTML string and a regular expression pattern, which contain the quotes (&#8220;). The raw string has been available since C# 11.<\/p>\n\n\n\n<p>Output:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"plaintext\" data-shcb-language-slug=\"plaintext\"><span><code class=\"hljs language-plaintext\">\"https:\/\/csharptutorial.net\/\" target=\"blank\"<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">plaintext<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">plaintext<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>The result is not what we expected.<\/p>\n\n\n\n<p>The following describes how the greedy quantifier works in this example:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The regex pattern <code>\".+\"<\/code> is applied to the input string.<\/li>\n\n\n\n<li>The first character of the input string is <code>&lt;<\/code>, and the <code>.<\/code> in the pattern matches any character. Therefore, the regex engine matches the opening <code>&lt;<\/code> character.<\/li>\n\n\n\n<li>The greedy quantifier <code>+<\/code> allows one or more occurrences, so the regex engine attempts to match as many characters as possible.<\/li>\n\n\n\n<li>The engine continues matching all subsequent characters until it reaches the closing <code>><\/code> character. This includes the match <code>\"https:\/\/csharptutorial.net\/\" target=\"blank\">C# Tutorial&lt;\/a><\/code>.<\/li>\n\n\n\n<li>Once the engine reaches the closing <code>><\/code> character, it realizes that the pattern still needs to match the remaining part of the input string.<\/li>\n\n\n\n<li>At this point, the engine triggers <strong>backtracking<\/strong>. It backtracks from the last matched position, which is the closing <code>><\/code>, and starts considering shorter matches.<\/li>\n\n\n\n<li>The engine removes the last character <code>&gt;<\/code> from the match and attempts to match again. However, it still doesn&#8217;t find a complete match.<\/li>\n\n\n\n<li>The backtracking process continues, and the engine removes subsequent characters one by one until it finds a valid match.<\/li>\n\n\n\n<li>Finally, the engine reaches a valid match with the last &#8221; and returns the longest possible match: <code>href=\"https:\/\/csharptutorial.net\/\" target=\"blank\"<\/code>.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Turn off the greedy mode<\/h2>\n\n\n\n<p>To fix this issue, you need to explicitly force the quantifier (+) to use the non-greedy (lazy) mode instead by adding the question mark (?) after the + quantifier like this:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"JSON \/ JSON with Comments\" data-shcb-language-slug=\"json\"><span><code class=\"hljs language-json\"><span class=\"hljs-string\">\".+?\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JSON \/ JSON with Comments<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">json<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>For example:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"C#\" data-shcb-language-slug=\"cs\"><span><code class=\"hljs language-cs\"><span class=\"hljs-keyword\">using<\/span> System.Text.RegularExpressions;\r\n<span class=\"hljs-keyword\">using<\/span> <span class=\"hljs-keyword\">static<\/span> System.Console;\r\n\r\n\r\n<span class=\"hljs-keyword\">var<\/span> html = <span class=\"hljs-string\">\"\"<\/span><span class=\"hljs-string\">\"&lt;a href=\"<\/span>https:<span class=\"hljs-comment\">\/\/csharptutorial.net\/\" target=\"blank\"&gt;Click Me&lt;\/a&gt;\"\"\";<\/span>\r\n\r\n<span class=\"hljs-keyword\">var<\/span> pattern = <span class=\"hljs-string\">\"\"<\/span><span class=\"hljs-string\">\" \r\n              \"<\/span>.+?<span class=\"hljs-string\">\"\r\n              \"<\/span><span class=\"hljs-string\">\"\"<\/span>;\r\n\r\n<span class=\"hljs-keyword\">var<\/span> matches = Regex.Matches(html, pattern);\r\n<span class=\"hljs-keyword\">foreach<\/span> (<span class=\"hljs-keyword\">var<\/span> match <span class=\"hljs-keyword\">in<\/span> matches)\n{\n    WriteLine(match);\n}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">C#<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">cs<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Output:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"plaintext\" data-shcb-language-slug=\"plaintext\"><span><code class=\"hljs language-plaintext\">\"https:\/\/csharptutorial.net\/\"\r\n\"blank\"<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">plaintext<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">plaintext<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Now, the program returns the expected result.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Quantifiers are greedy by default.<\/li>\n\n\n\n<li>A greedy quantifier matches as much of the input string as possible while still allowing the overall regular expression pattern to match successfully.<\/li>\n<\/ul>\n<div class=\"helpful-block-content\" data-title=\"\">\n\t<header>\n\t\t<div class=\"wth-question\">Was this tutorial helpful ?<\/div>\n\t\t<div class=\"wth-thumbs\">\n\t\t\t<button\n\t\t\t\tdata-post=\"1927\"\n\t\t\t\tdata-post-url=\"https:\/\/www.csharptutorial.net\/csharp-regular-expression\/greedy-quantifiers\/\"\n\t\t\t\tdata-post-title=\"Greedy Quantifiers\"\n\t\t\t\tdata-response=\"1\"\n\t\t\t\tclass=\"wth-btn-rounded wth-yes-btn\"\n\t\t\t>\n\t\t\t\t<svg\n\t\t\t\t\txmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n\t\t\t\t\tviewBox=\"0 0 24 24\"\n\t\t\t\t\tfill=\"none\"\n\t\t\t\t\tstroke=\"currentColor\"\n\t\t\t\t\tstroke-width=\"2\"\n\t\t\t\t\tstroke-linecap=\"round\"\n\t\t\t\t\tstroke-linejoin=\"round\"\n\t\t\t\t\tclass=\"feather feather-thumbs-up block w-full h-full\"\n\t\t\t\t>\n\t\t\t\t\t<path\n\t\t\t\t\t\td=\"M14 9V5a3 3 0 0 0-3-3l-4 9v11h11.28a2 2 0 0 0 2-1.7l1.38-9a2 2 0 0 0-2-2.3zM7 22H4a2 2 0 0 1-2-2v-7a2 2 0 0 1 2-2h3\"\n\t\t\t\t\t><\/path>\n\t\t\t\t<\/svg>\n\t\t\t\t<span class=\"sr-only\"> Yes <\/span>\n\t\t\t<\/button>\n\n\t\t\t<button\n\t\t\t\tdata-response=\"0\"\n\t\t\t\tdata-post=\"1927\"\n\t\t\t\tdata-post-url=\"https:\/\/www.csharptutorial.net\/csharp-regular-expression\/greedy-quantifiers\/\"\n\t\t\t\tdata-post-title=\"Greedy Quantifiers\"\n\t\t\t\tclass=\"wth-btn-rounded wth-no-btn\"\n\t\t\t>\n\t\t\t\t<svg\n\t\t\t\t\txmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n\t\t\t\t\tviewBox=\"0 0 24 24\"\n\t\t\t\t\tfill=\"none\"\n\t\t\t\t\tstroke=\"currentColor\"\n\t\t\t\t\tstroke-width=\"2\"\n\t\t\t\t\tstroke-linecap=\"round\"\n\t\t\t\t\tstroke-linejoin=\"round\"\n\t\t\t\t>\n\t\t\t\t\t<path\n\t\t\t\t\t\td=\"M10 15v4a3 3 0 0 0 3 3l4-9V2H5.72a2 2 0 0 0-2 1.7l-1.38 9a2 2 0 0 0 2 2.3zm7-13h2.67A2.31 2.31 0 0 1 22 4v7a2.31 2.31 0 0 1-2.33 2H17\"\n\t\t\t\t\t><\/path>\n\t\t\t\t<\/svg>\n\t\t\t\t<span class=\"sr-only\"> No <\/span>\n\t\t\t<\/button>\n\t\t<\/div>\n\t<\/header>\n\n\t<div class=\"wth-form hidden\">\n\t\t<div class=\"wth-form-wrapper\">\n\t\t\t<div class=\"wth-title\"><\/div>\n\t\t\t\n\t\t\t<textarea class=\"wth-message\"><\/textarea>\n\n\t\t\t<button class=\"btn btn-primary wth-btn-submit\">Send<\/button>\n\t\t\t<button class=\"btn wth-btn-cancel\">Cancel<\/button>\n\t\t\n\t\t<\/div>\n\t<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, you will learn how the greedy quantifiers work and how to avoid unexpected result by using the lazy quantifiers.<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":1892,"menu_order":5,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1927","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/pages\/1927","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/comments?post=1927"}],"version-history":[{"count":5,"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/pages\/1927\/revisions"}],"predecessor-version":[{"id":1938,"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/pages\/1927\/revisions\/1938"}],"up":[{"embeddable":true,"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/pages\/1892"}],"wp:attachment":[{"href":"https:\/\/www.csharptutorial.net\/wp-json\/wp\/v2\/media?parent=1927"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}