Regular Expression
• ∑ be an alphabet which is used to denote the
input set. The regular expression over ∑ can
be defined as follows.
• Ø is a regular expression which denotes the
empty set.
• Ε is a regular expression, standing for the
language {ε} and it is a null string.
• a for some a in the alphabet ∑, standing for
the language {a}
• If r and s are regular expressions denoting the
languages L1 and L2 respectively, then
• r+s is equivalent to L1 U L2 (Union)
• rs is equivalent to L1L2 (Concatenation)
• r* is equivalent to L1* (closure)
• The r* is known as kleen closure or closure
which indicates occurrence of r for number of
times.
• For example if ∑={a} and we have regular
expression R=a*, then R is a set denoted by
R={є,a,aa,aaa,aaaa….}
• That is R includes any number of a’s as well as
empty string which indicates zero number of
a’s appearing, denoted by є character.
• Positive closure L+ denotes set of all strings
except є or null string .
• If ∑={a} and if we have regular expression R=a+
then R is a set denoted by
• R={a,aa,aaa,aaaa…}
• We can construct L*= є. L+
PRECEDENCE
• Tightest ( )
• Star (“*”)
• Concatenation (“.”, “”)
• Loosest Union (“∪”, “+”, “|”)
• EXAMPLE
• R1*R2 ∪ R3 = ((R1*)R2)UR3
Write RE for the given Language
• Beginning with 1: 1(0+1)*
• Ends with 1 : (0+1)* 1
• Substring 1: (0+1)* 1 (0+1)*
• Even number of a’s: (b+ab*a)*
• Write r.e for the language accepting the
strings which are starting with 1 and ending
with 0, over the set ∑={0,1}.
• the r.e to denote the language L over ∑={a,b}
such that all the strings do not contain the
substring “ab”.
• Construct r.e for the language L over the set
∑={a,b} in which the total number of a’s are
divisible by 3.
• Solution:
– The total number of a’s are divisible by 3. So valid
strings can be
L={baaa,babaa,bababab,ababab,…..}
The r.e can be,
R=(b*.a.b*.a.b*.a.b*)*
• Write r.e to denote a language L which accepts
all the strings which begin or end with either
00 or 11.
Solution:
The r.e can be categorized into two subparts.
R=L1+L2
L1= The strings which begin with 00 or 11.
L2= The strings which end with 00 or 11.
Let us find out L1 and L2:
L1=(00+11)(any number of 0’s and 1’s)
L1=(00+11)(0+1)*
• Similarly,
L2=(any number of 0’s and 1’s)(00+11)
L2=(0+1)*(00+11)
Hence
R=[(00+11)(0+1)*]+[(0+1)*(00+11)]
EQUIVALENCE
• L can be represented by a regexp
• ⇔
• L is a regular language
• Given regular expression R, we show there
exists NFA N such that R represents L(N)
Given regular expression R, we show there
exists NFA N such that R represents L(N)
Induction on the length of R:
Base Cases (R has length 1):
R=a a
(matches a single symbol)
R=ε ε
(matches the empty string)
R=∅
(matches nothing)
THOMPSON’S CONSTRUCTION
Inductive Step:
Assume R has length k > 1 and that any regular
expression of length < k represents a language
that can be recognized by an NFA
Four possibilities for R:
R=R∪S (Union Theorem)
R = RS (Concatenation Theorem)
R = R* (Closure Theorem)
R = ( R) (Parenthesized R)
THOMPSON’S CONSTRUCTION
UNION THEOREM - (R+S)
ε R ε
ε S ε
THOMPSON’S CONSTRUCTION
CONCATENATION THEOREM - (R·S)
ε
R S
THOMPSON’S CONSTRUCTION
CLOSURE OPERATION OF R- R*
R
ε ε
ε
THOMPSON’S CONSTRUCTION
PARENTHESIZED R- ( R)
R
Have Shown
L can be represented by a regexp
⇒
L is a regular language
Transform (1(0 ∪ 1))* to an NFA
ε 0 ε
1
ε ε
ε 0 ε
ε 1 5 6
ε ε
1 2 3 4 9 10
1
ε 7 8 ε
ε
THOMPSON’S CONSTRUCTION METHOD
RE to ε-NFA Example
• Convert R= (ab+a)* to an NFA
– We proceed in stages, starting from simple
elements and working our way up
a
a
b
b
a ε b
ab
THOMPSON’S CONSTRUCTION
RE to ε-NFA Example (2)
ab+a
a ε b
ε ε
a
ε ε
(ab+a)* a ε b
ε ε
ε ε
a
ε ε