{"id":3202,"date":"2021-12-03T01:36:46","date_gmt":"2021-12-03T01:36:46","guid":{"rendered":"https:\/\/www.pythontutorial.net\/?page_id=3202"},"modified":"2022-09-19T06:42:08","modified_gmt":"2022-09-19T06:42:08","slug":"python-regex-lookahead","status":"publish","type":"page","link":"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-lookahead\/","title":{"rendered":"Python Regex Lookahead"},"content":{"rendered":"\n<p><strong>Summary<\/strong>: in this tutorial, you&#8217;ll learn about Python regex lookahead and negative lookahead.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"introduction-to-the-python-regex-lookahead\" id='introduction-to-the-python-regex-lookahead'>Introduction to the Python regex lookahead <a href=\"#introduction-to-the-python-regex-lookahead\" class=\"anchor\" id=\"introduction-to-the-python-regex-lookahead\" title=\"Anchor for Introduction to the Python regex lookahead\">#<\/a><\/h2>\n\n\n\n<p>Sometimes, you want to match <code>X<\/code> but only if it is followed by <code>Y<\/code>. In this case, you can use the lookahead in <a href=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regular-expressions\/\">regular expressions<\/a>. <\/p>\n\n\n\n<p>The syntax of the lookahead is as follows:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\">X(?=Y)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>This syntax means to search for <code>X<\/code> but matches only if it is followed by <code>Y<\/code>. <\/p>\n\n\n\n<p>For example, suppose you have the following string:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-string\">'1 Python is about 4 feet long'<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>And you want to match the number (<code>4<\/code>) that is followed by a space and the literal string <code>feet<\/code>, not the number <code>1<\/code>. In this case, you can use the following pattern that contains a lookahead:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\">\\d+(?=\\s*feet)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>In this pattern:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>\\d+<\/code> is the combination of the digit <a href=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-character-set\/\">character set<\/a> with the <code>+<\/code> <a href=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-quantifiers\/\">quantifier<\/a> that matches one or more digits.<\/li><li><code>?=<\/code> is the lookahead syntax<\/li><li><code>\\s*<\/code> is the combination of the whitespace character set and <code>*<\/code> quantifier that matches zero or more whitespaces.<\/li><li><code>feet<\/code> matches the literal string <code>feet<\/code>.<\/li><\/ul>\n\n\n\n<p>The following code uses the above pattern to match the number that is followed by zero or more spaces and the literal string <code>feet<\/code>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> re\n\ns = <span class=\"hljs-string\">'1 Python is about 4 feet long'<\/span>\npattern = <span class=\"hljs-string\">'\\d+(?=\\s*feet)'<\/span>\n\nmatches = re.finditer(pattern,s)\n<span class=\"hljs-keyword\">for<\/span> match <span class=\"hljs-keyword\">in<\/span> matches:\n    print(match.group())<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Output:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\"><span class=\"hljs-number\">4<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\" id=\"regex-multiple-lookaheads\" id='regex-multiple-lookaheads'>Regex multiple lookaheads <a href=\"#regex-multiple-lookaheads\" class=\"anchor\" id=\"regex-multiple-lookaheads\" title=\"Anchor for Regex multiple lookaheads\">#<\/a><\/h2>\n\n\n\n<p>Regex allows you to have multiple lookaheads with the following syntax:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"> X(?=Y)(?=Z)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>In this syntax, the regex engine will perform the following steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Find X<\/li><li>Test if Y is immediately after X, skip if it isn&#8217;t.<\/li><li>Test if Z is also immediately after Y; skip if it isn&#8217;t.<\/li><li>If both tests pass, the X is a match; otherwise, search for the next match.<\/li><\/ol>\n\n\n\n<p>So the <code>X(?=Y)(?=Z)<\/code> pattern matches <code>X<\/code> followed by <code>Y<\/code> and <code>Z<\/code> simultaneously.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"regex-negative-lookaheads\" id='regex-negative-lookaheads'>Regex negative lookaheads <a href=\"#regex-negative-lookaheads\" class=\"anchor\" id=\"regex-negative-lookaheads\" title=\"Anchor for Regex negative lookaheads\">#<\/a><\/h2>\n\n\n\n<p>Suppose you want to match only the number <code>1<\/code> in the following text but not the number <code>4<\/code>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-string\">'1 Python is about 4 feet long'<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>To do that, you can use the negative lookahead syntax:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\">X(?!Y)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>The <code>X(?!Y)<\/code> matches <code>X<\/code> only if it is not followed by <code>Y<\/code>. It&#8217;s the <code>\\d+<\/code> not followed by the literal string <code>feet<\/code>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> re\n\ns = <span class=\"hljs-string\">'1 Python is about 4 feet long'<\/span>\npattern = <span class=\"hljs-string\">'\\d+(?!\\s*feet)'<\/span>\n\nmatches = re.finditer(pattern,s)\n<span class=\"hljs-keyword\">for<\/span> match <span class=\"hljs-keyword\">in<\/span> matches:\n    print(match.group())<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Output:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"plaintext\" data-shcb-language-slug=\"plaintext\"><span><code class=\"hljs language-plaintext\">1<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">plaintext<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">plaintext<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\" id=\"summary\" id='summary'>Summary <a href=\"#summary\" class=\"anchor\" id=\"summary\" title=\"Anchor for Summary\">#<\/a><\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>Use the Python regex lookahead <code>X(?=Y)<\/code> that matches <code>X<\/code> only if it is followed by <code>Y<\/code>.<\/li><li>Use the negative regex lookahead <code>X(?!Y)<\/code> that matches <code>X<\/code> only if it is not followed by <code>Y<\/code>.<\/li><\/ul>\n<div class=\"helpful-block-content\" data-title=\"\">\n\t<header>\n\t\t<div class=\"wth-question\">Was this tutorial helpful ?<\/div>\n\t\t<div class=\"wth-thumbs\">\n\t\t\t<button\n\t\t\t\tdata-post=\"3202\"\n\t\t\t\tdata-post-url=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-lookahead\/\"\n\t\t\t\tdata-post-title=\"Python Regex Lookahead\"\n\t\t\t\tdata-response=\"1\"\n\t\t\t\tclass=\"wth-btn-rounded wth-yes-btn\"\n\t\t\t>\n\t\t\t\t<svg\n\t\t\t\t\txmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n\t\t\t\t\tviewBox=\"0 0 24 24\"\n\t\t\t\t\tfill=\"none\"\n\t\t\t\t\tstroke=\"currentColor\"\n\t\t\t\t\tstroke-width=\"2\"\n\t\t\t\t\tstroke-linecap=\"round\"\n\t\t\t\t\tstroke-linejoin=\"round\"\n\t\t\t\t\tclass=\"feather feather-thumbs-up block w-full h-full\"\n\t\t\t\t>\n\t\t\t\t\t<path\n\t\t\t\t\t\td=\"M14 9V5a3 3 0 0 0-3-3l-4 9v11h11.28a2 2 0 0 0 2-1.7l1.38-9a2 2 0 0 0-2-2.3zM7 22H4a2 2 0 0 1-2-2v-7a2 2 0 0 1 2-2h3\"\n\t\t\t\t\t><\/path>\n\t\t\t\t<\/svg>\n\t\t\t\t<span class=\"sr-only\"> Yes <\/span>\n\t\t\t<\/button>\n\n\t\t\t<button\n\t\t\t\tdata-response=\"0\"\n\t\t\t\tdata-post=\"3202\"\n\t\t\t\tdata-post-url=\"https:\/\/www.pythontutorial.net\/python-regex\/python-regex-lookahead\/\"\n\t\t\t\tdata-post-title=\"Python Regex Lookahead\"\n\t\t\t\tclass=\"wth-btn-rounded wth-no-btn\"\n\t\t\t>\n\t\t\t\t<svg\n\t\t\t\t\txmlns=\"http:\/\/www.w3.org\/2000\/svg\"\n\t\t\t\t\tviewBox=\"0 0 24 24\"\n\t\t\t\t\tfill=\"none\"\n\t\t\t\t\tstroke=\"currentColor\"\n\t\t\t\t\tstroke-width=\"2\"\n\t\t\t\t\tstroke-linecap=\"round\"\n\t\t\t\t\tstroke-linejoin=\"round\"\n\t\t\t\t>\n\t\t\t\t\t<path\n\t\t\t\t\t\td=\"M10 15v4a3 3 0 0 0 3 3l4-9V2H5.72a2 2 0 0 0-2 1.7l-1.38 9a2 2 0 0 0 2 2.3zm7-13h2.67A2.31 2.31 0 0 1 22 4v7a2.31 2.31 0 0 1-2.33 2H17\"\n\t\t\t\t\t><\/path>\n\t\t\t\t<\/svg>\n\t\t\t\t<span class=\"sr-only\"> No <\/span>\n\t\t\t<\/button>\n\t\t<\/div>\n\t<\/header>\n\n\t<div class=\"wth-form hidden\">\n\t\t<div class=\"wth-form-wrapper\">\n\t\t\t<div class=\"wth-title\"><\/div>\n\t\t\t<textarea class=\"wth-message\"><\/textarea>\n\t\t\t<input type=\"button\" name=\"wth-submit\" class=\"wth-btn wth-btn-submit\" id=\"wth-submit\" \/>\n\t\t\t<input type=\"button\" class=\"wth-btn wth-btn-cancel\" value=\"Cancel\" \/>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, you&#8217;ll learn about Python regex lookahead and negative lookahead.<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":3122,"menu_order":12,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-3202","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/pages\/3202","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/comments?post=3202"}],"version-history":[{"count":0,"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/pages\/3202\/revisions"}],"up":[{"embeddable":true,"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/pages\/3122"}],"wp:attachment":[{"href":"https:\/\/www.pythontutorial.net\/wp-json\/wp\/v2\/media?parent=3202"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}