Lecture 04
Regular Expressions (regex)
Natural Language Processing
COSC-3121
Ms. Humaira Anwer
[email protected] Lecture 03 Regular Expressions {regex} 1
Agenda
• Regular Expressions
• Optional Expressions
• Kleene+
• Kleene*
• Wildcard expression
• Anchors
• Alphanumeric characters
• Whitespaces
• Python implementation
• Online resources
• Summary
Lecture 03 Regular Expressions {regex} 2
Regular Expressions-Optional
Expression
• How can we talk about optional elements?
• like an optional s in woodchuck and woodchucks?
• For this we use the question mark ?
• /?/, which means “the preceding character or nothing.”
• Also means “zero or one instances of the previous
character”.
Lecture 03 Regular Expressions {regex} 3
Regular Expressions: * + .
Pattern Matches Examples
/oo*h/ 0 or more previous character oh, ooh,
oooh, oooooh
/[ab]*/ 0 or more a’s and b’s abababab,
aaaaabbbbb,
bbbbb, a
/o+h/ 1 or more of previous oh, ooh,
character (at least one) ooooooh
/beg.n/ Wildcard expression i.e. can begin, begun,
match any character (except began, beg3n,
a carriage return) beg’n
Lecture 03 Regular Expressions {regex} 4
Regular Expressions-Anchors
Pattern Matches Examples
/^The/ Matches character/string The KFUEIT is the best
at the start of a line
/^[A-Z]/ Matches character/string the Pakistan is my beloved Country
start of a line
/$/ Matches end of line The end? The end!
/ the shop\ . $ / Wildcard expression i.e. can She went to the shop.
match any character (except
a carriage return)
/ \b the \b / Matches a word boundary She went to the shop to buy the
birthday card for them.
/ \b 99 \b / Matches a word boundary These 299 bottles are for a
total of $99 only.
Lecture 03 Regular Expressions {regex} 5
Regular Expressions-Anchors
Pattern Matches Examples
/ a-zA-Z0-9˽ / Any alphanumeric KFUEIT.
or space
/ \w/
/^\w/ A non-alphanumeric !!!!!
/ \W/
/ ˽\t\n\r\f\v/ Whitespace, space,
tab
/ \s/
/^\s/ Non-whitespace
/ \S/
Lecture 03 Regular Expressions {regex} 6
Regular Expressions-Anchors
Pattern Matches Examples
/ Python\Z/ Matches if I like Python. Match
specified I like Python Programming. No Match
character is at Python is fun. No Match
end of string
['^a...s$'] A pattern defined abs No match
using RegEx can alias Match
be used to match abyss Match
against a string. Alias No match
An abacus No match
Lecture 03 Regular Expressions {regex} 7
Python Implementation- Importing
required modules
• Python’s re module is used to work with regex
• Built in library of python for handling regex
• The re module needs to be imported before actual
working starts
• To import re module in python program following code
snippet is used.
import re
Lecture 03 Regular Expressions {regex} 8
Python Implementation-Defining
patterns
• A Regular Expression (regex) is a sequence of
characters that defines a search pattern.
• E.g.
[pP]ython
• This defines a regex pattern.
• The pattern is: the word python starting with either
small p or capital P
Lecture 03 Regular Expressions {regex} 9
Python Implementation-Defining
patterns
• Now that re module is imported we can start by defining
string patterns to search for.
• Following code snippet can be used for this purpose.
import re
pattern = ‘[pP]ython’
• Pattern is written as per rules of regex and is enclosed is
single quotes.
• The variable pattern is used to save the search string.
Lecture 03 Regular Expressions {regex} 10
Python Implementation-Text String
to Search
• Now that we have defined the required pattern
that we want to search.
• A text corpus much be given from which search is
to be performed.
• An exemplary text string can be created by
following given code snippet
import re
pattern = ‘[pP]ython’
text_string = “I love Python.”
Lecture 03 Regular Expressions {regex} 11
Python Implementation-Search
function
• If there is a match anywhere in the string; the
search function returns
• The first occurrence of a Match object in the string
• Location index in the provided string
import re
text_string = “I love Python.”
x = re.search(“[pP]ython”, text_string)
print(x)
Output
<_sre.SRE_Match object; span=(7, 13), match='Python'>
Process finished with exit code 0
Lecture 03 Regular Expressions {regex} 12
Python Implementation-match
function
• re.match() function of re in Python will search the
regular expression pattern and return the first
occurrence.
• The Python RegEx Match method checks for a
match only at the start of the string.
• if a match is found at the start of string, it returns
the match object.
• But if a match is found in some other place in the
string, the Python RegEx Match function returns
null.
Lecture 03 Regular Expressions {regex} 13
Python Implementation-match
function
Example 01 Example 02
import re text_string = “Python is my favourite.”
text_string = “I love Python.” x = re.match(“[pP]ython”, text_string)
x = re.match(“[pP]ython”, text_string) print(x)
print(x) Output
Output <_sre.SRE_Match object; span=(0, 6),
match='Python'>
None
Process finished with exit code 0
Process finished with exit code 0
Lecture 03 Regular Expressions {regex} 14
Python Implementation-
findall function
• The findall() function returns a list containing all matches.
• The list contains the matches in the order they are found.
• If no matches are found, an empty list is returned.
Example
import re
text_string = “I love Python.”
x = re.findall('[A-Za-z .]',text,)
print(x)
Output
['I', ' ', 'l', 'o', 'v', 'e', ' ', 'P', 'y', 't', 'h', 'o', 'n', '.']
Process finished with exit code 0
Lecture 03 Regular Expressions {regex} 15
Python Implementation-Sub
Function
• The sub() function replaces the matched string with
the text of your choice.
• The following example code snippet replaces all spaces with the
number 9
Example
import re
text_string = “I love Python.”
x = re.sub(‘\s’,‘9’,text,)
print(x)
Output
I9love9Python.
Process finished with exit code 0
Lecture 03 Regular Expressions {regex} 16
Online Resources
• https://www.programiz.com/python-
programming/regex
• https://www.w3schools.com/python/python_regex
.asp
• https://docs.python.org/3/library/re.html
• https://docs.python.org/3/howto/regex.html
Lecture 03 Regular Expressions {regex} 17
Summing Up
• Regular expressions are good at representing
subsets of natural language
• But may be difficult for humans to understand for
any real (large) subset of a language
• Can be hard to scale up: e.g., when many choices at any
point (e.g. surnames)
• But quick, powerful and easy to use for small problems
Lecture 03 Regular Expressions {regex} 18