Regular Expressions
Outline
●
Regular Expressions
●
Regular Expressions in Python
Regular Expression
●
regex or regexp: sequence of characters that specifies a
search pattern in text
●
used by string-searching algorithms for "find" or "find and
replace" operations on strings, or for input validation
●
POSIX standard, Perl syntax.
●
Utilities: sed and AWK, lexical analysis
●
Most general-purpose programming languages support regex
capabilities either natively or via libraries, including Python, C,
C++, Java, Rust, OCaml, and JavaScript
https://en.wikipedia.org/wiki/Regular_expression
Regular Expressions in Python
●
Python’s built-in package for regular
expressions
import
import re
re
Regular Expressions in Python
●
Python’s built-in package for regular
expressions
import
import re
re
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
findall()
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
import
import re
re
re_out
re_out == re.findall('World',
re.findall('World', string1)
string1)
print(re_out)
print(re_out)
['World']
['World']
search()
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
import
import re
re
re_out
re_out == re.search('World',
re.search('World', string1)
string1)
print(re_out.start(),
print(re_out.start(), re_out.end())
re_out.end())
66 11
11
split()
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
import
import re
re
re_out
re_out == re.split('World',
re.split('World', string1)
string1)
print(re_out)
print(re_out)
['Hello
['Hello ',
', '!
'! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python.']
Python.']
sub()
string1
string1 == "Hello
"Hello World!
World! Welcome
Welcome to
to Regular
Regular
Expressions
Expressions in
in Python."
Python."
import
import re
re
re_out
re_out == re.sub('World',
re.sub('World', 'Universe',
'Universe', string1)
string1)
print(re_out)
print(re_out)
Hello
Hello Universe!
Universe! Welcome
Welcome to
to Regular
Regular Expressions
Expressions
in
in Python.
Python.
regex Metacharacters
●
All examples till now were verbatim texts
●
Say I want to match ‘best’ and ‘beast’
string1
string1 == "Hello
"Hello best!
best! Hello
Hello beast!"
beast!"
regex Metacharacters
●
All examples till now were verbatim texts
●
Say I want to match ‘best’ and ‘beast’
string1
string1 == "Hello
"Hello best!
best! Hello
Hello beast!"
beast!"
re_out
re_out == re.findall('bea?st',
re.findall('bea?st', string1)
string1)
print(re_out)
print(re_out)
['best',
['best', 'beast']
'beast']
Regex Metacharacters
regex Meaning Example
Metach
aracter
. Single character wild card ‘hell.’ --> hello or helly
\ Escape character, escape sequences ‘hello\.’ --> ‘hello.’
‘\d’ -->
^ Starts with character “^Hel--> “Hello” or “Hell”
$ Ends with character ‘hello$’ --> ‘to everyone hello’
^$ Start and end ‘^hello$’ --> ‘hello’. not ‘hello.’
? Zero or One occurrence of a character ‘bea?st’ --> ‘best’, ‘beast’. not ‘beaast’
+ One or more occurrences of a character ‘bea+st’ --> beast, ‘beaast’, ‘beaaaaaast’.
Not best
* Zero or more occurrences of a ‘bea*st’ --> best, beast, beaaaast
character
{} Exactly the number of occurrences ‘bea{3}st’ --> beaaast. Not ‘beaast’,
‘beaaaaaast’, best
Regex Metacharacters
More Examples
‘b..st’ --> ‘beast’, ‘beest’, ‘b12st’. not ‘beaast’
‘b.*st’ --> ‘bst’, ‘best’, ‘baaaaaaaaaaaaaaaast’, ‘b12123st’
‘b.+st’ --> ‘best’, ‘baaaaaaaaaaaaaaaast’, ‘b12123st’. Not ‘bst’
‘b.?st’ --> ‘bst’, ‘best’, ‘b1st’. Not ‘beast’
‘b.{3}st’ --> ‘b123st’, ‘beaust’, ‘b1x#st’. Not ‘beast’
Regex Metacharacters
regex Meaning Example
Metacha
racter
[] Group of characters ‘be[a-z]t’ --> beat, bebt, ..., bezt. Not
beabt.
[a-z] [A-Z] [0-9] [a-zA-Z0-9]
| Either or ‘beat|best’ --> beat, best. Not beast.
‘be[a|s]t’ --> beat, best. Not beast
() Group of characters be(a|s)t --> best, beat. Not beast.
More Examples
gray|grey z{3} mi.....ft
gr(a|e)y z{3,6} \d+\.\d+|\d+
gr[ae]y z{3,} [^i*&2@]
(^ means negate here)
b[aeiou]bble [Bb]rainf\*\*k //[^\r\n]*[\r\n]
[b-chm-pP]at|ot \d ^dog
colou?r \d{5}(-\d{4})? dog$
rege(x(es)?|xps?) 1\d{10} ^dog$
go*gle [2-9]|[12]\d|3[0-6]
go+gle 3[0-6]|[12]\d|[2-9]
g(oog)+le Hello\nworld
https://cs.lmu.edu/~ray/notes/regex/
Regex Metacharacter: \
regex Meaning Example
Metacha
racter
\A match if the specifed characters are \AHello
at the beginning of the string
\b match where the specifed characters
are at the beginning or at the end of a
word
\B match where the specifed characters
are present, but NOT at the beginning
(or at the end) of a word
\d match where the string contains digits
(numbers from 0-9)
\D match where the string DOES NOT
contain digits
Regex Metacharacter: \
Meaning Example
\s match where the string contains a white space character
\S match where the string DOES NOT contain a white space
character
\w match where the string contains any word characters [a-zA-Z0-9_]
(characters from a to Z, digits from 0-9, and the
underscore _ character)
\W match where the string DOES NOT contain any word
characters
\Z match if the specifed characters are at the end of the Python.\Z --> Welcome
string to Python.
References
●
https://www.regular-expressions.info/
●
http://www.rexegg.com/
●
https://regex101.com/
●
https://docs.python.org/3/library/re.html
●
https://cs.lmu.edu/~ray/notes/regex/
Summary
●
Regular Expressions
●
Regular Expressions in Python