{"id":3161,"date":"2021-11-30T03:18:24","date_gmt":"2021-11-30T03:18:24","guid":{"rendered":"https:\/\/www.pythontutorial.net\/?page_id=3161"},"modified":"2022-02-17T00:39:49","modified_gmt":"2022-02-17T00:39:49","slug":"python-regex-greedy","status":"publish","type":"page","link":"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-greedy\/","title":{"rendered":"Python Regex Greedy"},"content":{"rendered":"\n<p><strong>Summary<\/strong>: in this tutorial, you&#8217;ll learn about the Python regex greedy mode and how to change the mode from greedy to non-greedy.<\/p>\n\n\n\n<p>By default, all <a href=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-quantifiers\/\">quantifiers<\/a> work in a greedy mode. It means that the quantifiers will try to match their preceding elements as much as possible.<\/p>\n\n\n\n<p>Let&#8217;s start with an example to understand how the regex greedy mode works.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-unexpected-result-with-greedy-mode\" id='the-unexpected-result-with-greedy-mode'>The unexpected result with greedy mode <a href=\"#the-unexpected-result-with-greedy-mode\" class=\"anchor\" id=\"the-unexpected-result-with-greedy-mode\" title=\"Anchor for The unexpected result with greedy mode\">#<\/a><\/h2>\n\n\n\n<p>Suppose you have the following HTML fragment that represents a button element:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\">s = <span class=\"hljs-string\">'&lt;button type=\"submit\" class=\"btn\"&gt;Send&lt;\/button&gt;'<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>And you want to match the texts within the quotes (<code>\"\"<\/code>) like <code>submit<\/code> and <code>btn<\/code>. <\/p>\n\n\n\n<p>To do that, you may come up with the following pattern that includes the quote (&#8220;), the dot (<code>.<\/code>) <a href=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-character-set\/\">character set<\/a> and the (<code>+<\/code>) <a href=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-quantifiers\/\">quantifier<\/a>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-string\">\".+\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>The meaning of the pattern is as follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>\"<\/code> starts with a quote<\/li><li><code>.<\/code> matches any character except the newline<\/li><li><code>+<\/code> matches the preceding character one or more times<\/li><li><code>\"<\/code> ends with a quote<\/li><\/ul>\n\n\n\n<p>The following uses the <code>finditer()<\/code> function to match the string <code>s<\/code> with the pattern:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> re\n\ns = <span class=\"hljs-string\">'&lt;button type=\"submit\" class=\"btn\"&gt;Send&lt;\/button&gt;'<\/span>\n\npattern = <span class=\"hljs-string\">'\".+\"'<\/span>\nmatches = re.finditer(pattern, s)\n\n<span class=\"hljs-keyword\">for<\/span> match <span class=\"hljs-keyword\">in<\/span> matches:\n    print(match.group())<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>The program displays the following result:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-string\">\"submit\"<\/span> <span class=\"hljs-class\"><span class=\"hljs-keyword\">class<\/span>=\"<span class=\"hljs-title\">btn<\/span>\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>The result is not what you expected.<\/p>\n\n\n\n<p>By default, the quantifier (+) runs in the greedy mode, in which it tries to match the preceding element (<code>\".<\/code>) as much as possible.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-python-regex-greedy-mode-works\" id='how-python-regex-greedy-mode-works'>How Python regex greedy mode works <a href=\"#how-python-regex-greedy-mode-works\" class=\"anchor\" id=\"how-python-regex-greedy-mode-works\" title=\"Anchor for How Python regex greedy mode works\">#<\/a><\/h2>\n\n\n\n<p>First, the regex engine starts matching from the first character in the string <code>s<\/code>. <\/p>\n\n\n\n<p>Next, because the first character is <code>&lt;<\/code> which does not match the quote (<code>\"<\/code>), the regex engine continues to match the next characters until it reaches the first quote (<code>\"<\/code>):<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/www.pythontutorial.net\/wp-content\/uploads\/2021\/11\/Python-Regex-Greedy-Step-1.svg\" alt=\"\" class=\"wp-image-3162\"\/><\/figure>\n\n\n\n<p>Then, the regex engine examines the pattern and matches the string with the next rule <code>.+<\/code>. <\/p>\n\n\n\n<p>Because the <code>.+<\/code> rule matches a character one or more times, the regex engine matches all characters until it reaches the end of the string:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/www.pythontutorial.net\/wp-content\/uploads\/2021\/11\/Python-Regex-Greedy-Step-2.svg\" alt=\"\" class=\"wp-image-3163\"\/><\/figure>\n\n\n\n<p>After that, the regex engine examines the last rule in the pattern, which is a quote (&#8220;). However, it already reaches the end of the string. There&#8217;s no more character to match. It is too greedy to go too far.<\/p>\n\n\n\n<p>Finally, the regex engine goes back from the end of the string to find the quote (&#8220;). This step is called <strong>backtracking<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/www.pythontutorial.net\/wp-content\/uploads\/2021\/11\/Python-Regex-Greedy-Step-3.svg\" alt=\"\" class=\"wp-image-3164\"\/><\/figure>\n\n\n\n<p>As a result, the match is the following substring which is not what we expected:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/www.pythontutorial.net\/wp-content\/uploads\/2021\/11\/Python-Regex-Greedy-Step-4.svg\" alt=\"\" class=\"wp-image-3165\"\/><\/figure>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-string\">\"submit\"<\/span> <span class=\"hljs-class\"><span class=\"hljs-keyword\">class<\/span>=\"<span class=\"hljs-title\">btn<\/span>\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>To fix this issue, you need to instruct the quantifier (<code>+<\/code>) to use the non-greedy (or lazy) mode instead of the greedy mode. <\/p>\n\n\n\n<p>To do that, you add a question mark (<code>?<\/code>) after the quantifier like this:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-string\">\".+?\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>The following program returns the expected result:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> re\n\ns = <span class=\"hljs-string\">'&lt;button type=\"submit\" class=\"btn\"&gt;Send&lt;\/button&gt;'<\/span>\n\npattern = <span class=\"hljs-string\">'\".+?\"'<\/span>\nmatches = re.finditer(pattern, s)\n\n<span class=\"hljs-keyword\">for<\/span> match <span class=\"hljs-keyword\">in<\/span> matches:\n    print(match.group())<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Output:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-string\">\"submit\"<\/span>\n<span class=\"hljs-string\">\"btn\"<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\" id=\"summary\" id='summary'>Summary <a href=\"#summary\" class=\"anchor\" id=\"summary\" title=\"Anchor for Summary\">#<\/a><\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>By default, all quantifiers use the greedy mode. <\/li><li>Greedy quantifiers will match their preceding elements as much as possible.<\/li><\/ul>\n<div class=\"helpful-block-content\" data-title=\"\">\n\t<header>\n\t\t<div class=\"wth-question\">Was this tutorial helpful ?<\/div>\n\t\t<div class=\"wth-thumbs\">\n\t\t\t<button\n\t\t\t\tdata-post=\"3161\"\n\t\t\t\tdata-post-url=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-greedy\/\"\n\t\t\t\tdata-post-title=\"Python Regex Greedy\"\n\t\t\t\tdata-response=\"1\"\n\t\t\t\tclass=\"wth-btn-rounded wth-yes-btn\"\n\t\t\t>\n\t\t\t\t<svg\n\t\t\t\t\txmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n\t\t\t\t\tviewBox=\"0 0 24 24\"\n\t\t\t\t\tfill=\"none\"\n\t\t\t\t\tstroke=\"currentColor\"\n\t\t\t\t\tstroke-width=\"2\"\n\t\t\t\t\tstroke-linecap=\"round\"\n\t\t\t\t\tstroke-linejoin=\"round\"\n\t\t\t\t\tclass=\"feather feather-thumbs-up block w-full h-full\"\n\t\t\t\t>\n\t\t\t\t\t<path\n\t\t\t\t\t\td=\"M14 9V5a3 3 0 0 0-3-3l-4 9v11h11.28a2 2 0 0 0 2-1.7l1.38-9a2 2 0 0 0-2-2.3zM7 22H4a2 2 0 0 1-2-2v-7a2 2 0 0 1 2-2h3\"\n\t\t\t\t\t><\/path>\n\t\t\t\t<\/svg>\n\t\t\t\t<span class=\"sr-only\"> Yes <\/span>\n\t\t\t<\/button>\n\n\t\t\t<button\n\t\t\t\tdata-response=\"0\"\n\t\t\t\tdata-post=\"3161\"\n\t\t\t\tdata-post-url=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-greedy\/\"\n\t\t\t\tdata-post-title=\"Python Regex Greedy\"\n\t\t\t\tclass=\"wth-btn-rounded wth-no-btn\"\n\t\t\t>\n\t\t\t\t<svg\n\t\t\t\t\txmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n\t\t\t\t\tviewBox=\"0 0 24 24\"\n\t\t\t\t\tfill=\"none\"\n\t\t\t\t\tstroke=\"currentColor\"\n\t\t\t\t\tstroke-width=\"2\"\n\t\t\t\t\tstroke-linecap=\"round\"\n\t\t\t\t\tstroke-linejoin=\"round\"\n\t\t\t\t>\n\t\t\t\t\t<path\n\t\t\t\t\t\td=\"M10 15v4a3 3 0 0 0 3 3l4-9V2H5.72a2 2 0 0 0-2 1.7l-1.38 9a2 2 0 0 0 2 2.3zm7-13h2.67A2.31 2.31 0 0 1 22 4v7a2.31 2.31 0 0 1-2.33 2H17\"\n\t\t\t\t\t><\/path>\n\t\t\t\t<\/svg>\n\t\t\t\t<span class=\"sr-only\"> No <\/span>\n\t\t\t<\/button>\n\t\t<\/div>\n\t<\/header>\n\n\t<div class=\"wth-form hidden\">\n\t\t<div class=\"wth-form-wrapper\">\n\t\t\t<div class=\"wth-title\"><\/div>\n\t\t\t<textarea class=\"wth-message\"><\/textarea>\n\t\t\t<input type=\"button\" name=\"wth-submit\" class=\"wth-btn wth-btn-submit\" id=\"wth-submit\" \/>\n\t\t\t<input type=\"button\" class=\"wth-btn wth-btn-cancel\" value=\"Cancel\" \/>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, you&#8217;ll learn about the Python regex greedy mode and how to change the mode from greedy mode to non-greedy mode.<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":3122,"menu_order":5,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-3161","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/pages\/3161","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/comments?post=3161"}],"version-history":[{"count":0,"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/pages\/3161\/revisions"}],"up":[{"embeddable":true,"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/pages\/3122"}],"wp:attachment":[{"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/media?parent=3161"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}