0% found this document useful (0 votes)
14 views19 pages

16 RegularExpressions

The document provides an overview of Regular Expressions (regex), which are sequences of characters used to specify search patterns in text and are supported by many programming languages, including Python. It details Python's built-in regex package and demonstrates various functions such as findall, search, split, and sub, along with examples of regex metacharacters and their meanings. Additionally, it includes references for further reading on regex.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views19 pages

16 RegularExpressions

The document provides an overview of Regular Expressions (regex), which are sequences of characters used to specify search patterns in text and are supported by many programming languages, including Python. It details Python's built-in regex package and demonstrates various functions such as findall, search, split, and sub, along with examples of regex metacharacters and their meanings. Additionally, it includes references for further reading on regex.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Regular Expressions

Outline

Regular Expressions

Regular Expressions in Python
Regular Expression

regex or regexp: sequence of characters that specifies a
search pattern in text

used by string-searching algorithms for "find" or "find and
replace" operations on strings, or for input validation

POSIX standard, Perl syntax.

Utilities: sed and AWK, lexical analysis

Most general-purpose programming languages support regex
capabilities either natively or via libraries, including Python, C,
C++, Java, Rust, OCaml, and JavaScript
https://en.wikipedia.org/wiki/Regular_expression
Regular Expressions in Python

Python’s built-in package for regular
expressions
import
import re
re
Regular Expressions in Python

Python’s built-in package for regular
expressions
import
import re
re

string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
findall()
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
import
import re
re
re_out
re_out == re.findall('World',
re.findall('World', string1)
string1)
print(re_out)
print(re_out)

['World']
['World']
search()
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
import
import re
re
re_out
re_out == re.search('World',
re.search('World', string1)
string1)
print(re_out.start(),
print(re_out.start(), re_out.end())
re_out.end())

66 11
11
split()
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
import
import re
re
re_out
re_out == re.split('World',
re.split('World', string1)
string1)
print(re_out)
print(re_out)

['Hello
['Hello ',
', '!
'! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python.']
Python.']
sub()
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
import
import re
re
re_out
re_out == re.sub('World',
re.sub('World', 'Universe',
'Universe', string1)
string1)
print(re_out)
print(re_out)

Hello
Hello Universe!
Universe! Welcome
Welcome to
to Regular
Regular Expressions
Expressions
in
in Python.
Python.
regex Metacharacters

All examples till now were verbatim texts

Say I want to match ‘best’ and ‘beast’

string1
string1 == "Hello
"Hello best!
best! Hello
Hello beast!"
beast!"
regex Metacharacters

All examples till now were verbatim texts

Say I want to match ‘best’ and ‘beast’
string1
string1 == "Hello
"Hello best!
best! Hello
Hello beast!"
beast!"

re_out
re_out == re.findall('bea?st',
re.findall('bea?st', string1)
string1)
print(re_out)
print(re_out)

['best',
['best', 'beast']
'beast']
Regex Metacharacters
regex Meaning Example
Metach
aracter
. Single character wild card ‘hell.’ --> hello or helly
\ Escape character, escape sequences ‘hello\.’ --> ‘hello.’
‘\d’ -->
^ Starts with character “^Hel--> “Hello” or “Hell”
$ Ends with character ‘hello$’ --> ‘to everyone hello’
^$ Start and end ‘^hello$’ --> ‘hello’. not ‘hello.’
? Zero or One occurrence of a character ‘bea?st’ --> ‘best’, ‘beast’. not ‘beaast’
+ One or more occurrences of a character ‘bea+st’ --> beast, ‘beaast’, ‘beaaaaaast’.
Not best
* Zero or more occurrences of a ‘bea*st’ --> best, beast, beaaaast
character
{} Exactly the number of occurrences ‘bea{3}st’ --> beaaast. Not ‘beaast’,
‘beaaaaaast’, best
Regex Metacharacters
More Examples

‘b..st’ --> ‘beast’, ‘beest’, ‘b12st’. not ‘beaast’

‘b.*st’ --> ‘bst’, ‘best’, ‘baaaaaaaaaaaaaaaast’, ‘b12123st’


‘b.+st’ --> ‘best’, ‘baaaaaaaaaaaaaaaast’, ‘b12123st’. Not ‘bst’

‘b.?st’ --> ‘bst’, ‘best’, ‘b1st’. Not ‘beast’

‘b.{3}st’ --> ‘b123st’, ‘beaust’, ‘b1x#st’. Not ‘beast’


Regex Metacharacters
regex Meaning Example
Metacha
racter

[] Group of characters ‘be[a-z]t’ --> beat, bebt, ..., bezt. Not


beabt.
[a-z] [A-Z] [0-9] [a-zA-Z0-9]
| Either or ‘beat|best’ --> beat, best. Not beast.
‘be[a|s]t’ --> beat, best. Not beast

() Group of characters be(a|s)t --> best, beat. Not beast.


More Examples
gray|grey z{3} mi.....ft

gr(a|e)y z{3,6} \d+\.\d+|\d+

gr[ae]y z{3,} [^i*&2@]


(^ means negate here)
b[aeiou]bble [Bb]rainf\*\*k //[^\r\n]*[\r\n]

[b-chm-pP]at|ot \d ^dog
colou?r \d{5}(-\d{4})? dog$

rege(x(es)?|xps?) 1\d{10} ^dog$

go*gle [2-9]|[12]\d|3[0-6]

go+gle 3[0-6]|[12]\d|[2-9]

g(oog)+le Hello\nworld

https://cs.lmu.edu/~ray/notes/regex/
Regex Metacharacter: \
regex Meaning Example
Metacha
racter
\A match if the specifed characters are \AHello
at the beginning of the string
\b match where the specifed characters
are at the beginning or at the end of a
word
\B match where the specifed characters
are present, but NOT at the beginning
(or at the end) of a word
\d match where the string contains digits
(numbers from 0-9)
\D match where the string DOES NOT
contain digits
Regex Metacharacter: \
Meaning Example
\s match where the string contains a white space character

\S match where the string DOES NOT contain a white space


character

\w match where the string contains any word characters [a-zA-Z0-9_]


(characters from a to Z, digits from 0-9, and the
underscore _ character)

\W match where the string DOES NOT contain any word


characters

\Z match if the specifed characters are at the end of the Python.\Z --> Welcome
string to Python.
References

https://www.regular-expressions.info/

http://www.rexegg.com/

https://regex101.com/

https://docs.python.org/3/library/re.html

https://cs.lmu.edu/~ray/notes/regex/
Summary

Regular Expressions

Regular Expressions in Python

You might also like