Reg. Expression (Updated)
Reg. Expression (Updated)
Lecture Summary
1. Regular Language, Regular Expression
2. Converting Regular Languages to Regular Expressions
3. Algebraic properties of regular expressions
4. Identities for regular expression
5. Converting Regular Expression to Regular Language
6. Equivalent proof of regular expressions, Simplification of Regular Expressions
7. Closure properties of regular languages
Module-2: FA : Regular Expression, Regular Language, Pumping Lemma
(Dt.22.08.2024)
1 Introduction
1.1 Regular Language :
The language accepted by some regular expression or finite automata is known as regular
language.
We know that language have two type finite language and infinite language.
Finite language is regular because we can construct it’s regular expressions or finite automata.
Infinite language may or may not be regular language.
Space for class examples
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 1
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
1.3 Precedence of Operators in decreasing order: Rules for Specifying Regular Expressions:
Two operator characteristics determine how operands group with operators:
Precedence is the priority for grouping different types of operators with their operands.
Associativity is the left-to-right or right-to-left order for grouping operands to operators that
have the same precedence.
Sl. No. Operator Associativity Remarks
1 ()
2 * LTR Unary
3 . LTR Binary
4 + or | LTR Binary
LTR - Left to Right
If r is a regular expression and let L(r) denote the language associated with r. The language L(r)
denoted by any regular expression r is defined by the following rules.
i. Ø is a regular expression denoting the empty set {}. L(Φ) = {} =Φ
ii. ε is a regular expression denoting {ε}. L(ε) = {ε}
iii. For every a ∈ Σ, a is a regular expression denoting {a}. L(a) = {a}
iv. If r and s are regular expressions, then
L (r + s) = L (r) ∪ L (s)
L (r · s) = L (r). L (s)
L ((r1)) = L (r)
L ( r*) = (L (r))*.
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 2
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
7 Language accepting ab r = ab
only over the set Σ = {a, b}
8 Language accepting ab or r = ab + bb
bb only over the set Σ= {a,
b}
9 Language accepting any r = (ab + bb)*
combinations of ab or bb
including ε over the set Σ
= {a, b}
10 Accepting all strings r = a(a + b)*b
starting with “a” and end
with “b” over the set Σ =
{a, b}
11 Accepting the set of all (any)1 0|1
(when discussing DFAs,
strings whose second last we write it this way)
symbol is 1 over {0, 1} This can be written
in re
r = (0+1)*1(0+1)
12 Accepting the set of all 1 0|1(any)
strings whose second (when discussing DFAs,
we write it this way)
symbol is 1 over {0, 1} This can be written
in re
r = (0+1)1 (0+1)*
13 Accepting the set of all r = 1(0+1)*01
strings over {0, 1} that
starts with 1 and end with
01.
14 {w | w ∈ Σ* and w contains r = 00* + 11*
only 0’s or only 1’s} =0+ + 1+
In other words, the
acceptance criteria allow
for one or more zeros or
one or more ones.
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 3
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
a) Commutative: Commutativity is the property of an operator that says we can switch the order of
its operands and get the same result.
If r1, r2 are RE, then
r1+r2 = r2+r1. For example, for r1 =a and r2 =b, then RE a+ b and b+ a are equal.
r1.r2 ≠ r2.r1. For example, for r1 = a and r2 = b, then RE a.b is not equal to b.a.
b) Associativity: Associativity is the property of an operator that allow us to regroup the operands
when the operator is applied twice.
If r1, r2, r3 are RE, then
r1+ (r2+r3) = (r1+r2) +r3
Example:
r1 = a , r2 = b , r3 = c, then
The resultant regular expression in LHS becomes a+(b+ c) and the regular set for the
corresponding RE is {a, b, c}.
For the RE in RHS becomes (a+ b) + c and the regular set for this RE is {a, b, c}, which is same
in both cases. Therefore, the associativity property holds for union operator.
r1.(r2.r3) = (r1.r2).r3
Example:
r1 = a , r2 = b , r3 = c
Then the string accepted by RE a.(b.c) is only abc.
The string accepted by RE in RHS is (a.b).c is only abc ,which is same in both cases.
Therefore, the associativity property holds for concatenation operator.
Note: Associativity property does not hold for Kleene closure(*) because it is unary operator.
c) Identity : An identity for an operator is a value that when the operator is applied to the identity
and some other value, the result is the other value
In the case of union operators
if r+ x = r ⇒ x= ∅ as r ∪ ∅= r, therefore ∅ is the identity for +.
Therefore, ∅ is the identity element for a union operator.
In the case of concatenation operator –
if r.x = r , for x= ε
r.ε = r ⇒ ε is the identity element for concatenation operator(.) .
d) Annihilator: An annihilator for an operator is value that when the operator is applied to the
annihilator and some other value, the result is the annihilator.
If r+ x = x ⇒ r ∪ x= x , there is no annihilator for +
In the case of a concatenation operator, r.x = x, when x = ∅, then r.∅ = ∅, therefore ∅ is the
annihilator for the (.)operator. For example {a, aa, ab}.{ } = { }
e) Distributed property: A distributive law involves two operators and asserts that one operator can
be pushed down to be applied to each argument of the other operator individually.
If r1, r2, r3 are regular expressions, then
r1.(r2+ r3) = r1.r2 + r1.r3 i.e. left distribution
(r1+r2).r3 = r1.r3 + r2.r3 i.e. Right distribution
(r1.r2) +r3 ≠ (r1+r3)(r2+r3)
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 4
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
f) Idempotent law : An operator is idempotent if the result of applying it to two of the same values
as arguments is that value.
r1 + r1 = r1 ⇒ r1 ∪ r1 = r1 , therefore the union operator satisfies idempotent property.
r1.r1 ≠ r1 ⇒ concatenation operator does not satisfy idempotent property.
h) Closure laws :
(r*)* = r*, closing an expression that is already closed does not change the language.
∅* = ε, a string formed by concatenating any number of copies of an empty string is empty
itself.
r+ = r.r* = r*r, as r* = ε + r + rr+ rrr …. and r.r* = r+ rr + rrr ……
r* = r*+ ε
i) Identities for regular expression: Two regular expressions p and q are equivalent if p and q
represent same set of strings or they generate the same language.
There are many identities for the regular expression. Let p, q and r are regular expressions.
I1. ∅ + r = r
I2. ∅.r= r.∅ = ∅
I3. ε.r = r.ε =r
I4. ε* = ε and ∅* = ε
I5. r + r = r
I6. r*.r* = r*
I7. r.r* = r*.r = r+
I8. (r*)* = r*
I9. ε +r.r* = r* = ε + r.r*
I10. (p.q)*.p = p.(q.p)*
I11. (p + q)* = (p*.q*)* = (p* + q*)*
I12. (p+ q).r= p.r+ q.r and r.(p+q) = r.p + r.q
Example-2: Find out the regular expressions over Σ = {0,1} for the following languages.
(Regular Language to Regular Expression)
Sl. Regular Language Regular Expression Remarks
No.
1 {w | w ∈ Σ* and w contains r = 11* = 1+ Contains 1 ‘s means, one one
1’s only} or any number of 1’s..
2 {w | w ∈ Σ* and w contains r = 00* + 11* Here + is for or.
only 0’s or only 1’s}
3 All strings ending with r = (0 + 1)*011
“011”
4 All strings containing the r = (0 + 1)*000(0 + 1)∗
substring 000.
5 Find a regular expression r = 1* 0 1* 0 1* Explanation:
corresponding to the A string in this language
language of all strings over must have at least two 0's.
the alphabet { 0, 1 } that Since any string of 1's can be
contain exactly two 0's. placed in front of the first 0,
behind the second 0 and
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 5
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 6
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
Example-3: Find out the regular expressions over Σ = {0,1} for the following languages.
(Regular Language to Regular Expression)
Sl. Regular Language Regular Expression Remarks
No.
1 {w | w ∈ Σ* and w does not r = 0* If it does not contain 1’s
contain 1’s} means only contains 0’s or ∈.
2 All strings with no “0” after r = 1*0* No 0’s after 1, but 1 after
“1” zero possible, only 1*
possible, only 0* possible.. so
is the answer.
3 All strings starting with “1” r = (1 + 10)*
and containing no “00”
4 All strings with at least one r = 00*11* No zero after 1 means, 1 after
“0” and one “1”, and no “0” 1* possible.
after “1
5 All strings not ending in 01. r = ε + 0 + 1 + (0 + 1)* The expression ε+0+1
(00 + 10 + 11) describes the strings with
OR length zero or one, and the
r = ( 0 + 1 )*( 0 + 11 ) expression (0 + 1)* (00 + 10
Any string in a language + 11) describes the strings
over {0 , 1 } must end in 0 with length two or more.
or 1. Hence if a string does
not end with 01 then it
ends with 0 or if it ends
with 1 the last 1 must be
preceded by a symbol 1.
Since it can have any
string in front of the last 0
or 11, ( 0 + 1 )*( 0 + 11) is
a regular expression for the
language.
6 does not contain 11 r = (0+10)* (1+ ε )
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 7
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
Example-4: Find out the regular language for the given regular expressions.
(Regular Expression to Regular Language)
Sl. Regular Expression Regular Language
No.
1 r = (a + b)*(a + bb) L(r) ={w ∈ {a,b}* | w end with a or bb}
i.e. All strings end with either a or bb.
2 r = (0 +1)*00 (0 +1)* L(r) ={w ∈ {0,1}* | w contains a substring 00}
i.e. All strings over {0, 1} contains a substring 00.
3 r = (0+1).(00+11) L(r) = L((0+1).(00+11))
=L(0+1).L(00+11)
={0,1}{01, 11}
={000, 011, 100, 111}
4 r = (a + b)a * L(r) = L((a + b)a*)
= L((a + b)) L(a*)
= L(a + b) L(a*)
= {a, b}{∈, a, aa, aaa, …}
= {a,aa,aaa,...,b,ba,baa,...}
5 r = (aa)*(bb)*b Set of strings consisting of even number of a’s followed by
odd number of b’s ,
so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb, ……..}
6 r = ( aa + ab + ba + bb )* Language of strings of even lengths over the alphabet of {
a, b }
7 Describe as simply as A string in the language can start and end with a or b, it has
possible in English the at least one b, and after the first b all the b's in the string
language corresponding to appear in pairs. Any number of a's can appear any place in
the regular expression the string. Thus simply put, it is the set of strings over the
a*b(a*ba*b)*a* . alphabet { a, b } that contain an odd number of b's
8 Describe as simply as (( a + b )3) represents the strings of length 3.
possible in English the Hence (( a + b)3)* represents the strings of length a
language corresponding to multiple of 3. Since (( a + b )3)*( a + b ) represents the
the regular expression (( a + strings of length 3n + 1, where n is a natural number, the
b )3)*( ε + a + b ) given regular expression represents the strings of length
3n and 3n + 1, where n is a natural number.
9 Describe as simply as (b+ab)* represents strings which do not contain any
possible in English the substring aa and which end in b, and (a+ab)* represents
language corresponding to strings which do not contain any substring bb. Hence
the regular expression ( altogether it represents any string consisting of a
b + ab )*( a + ab )*. substring with no aa followed by one b followed by a
substring with no bb.
10 Find the shortest string that It can easily be seen that , a, b, which are strings in the
is not in the language language with length 1 or less. Of the strings wiht length 2
represented by the regular aa, bb and ab are in the language. However, ba is not in it.
expression a*(ab)*b*. Thus the answer is ba.
11 r = b*a b*a b The language of all strings over the alphabet { a, b } that
contain exactly two a's.
12 r = ( a + b )*( a + bb ) The language of all strings over the alphabet { a, b } that
do not end with ab.
13 (b + ab)* + (b +ab)*a The language having input alphabets a and b, in which two
a’s do not come together.
14 a(b + ba)* + (b + ba)* The language having input alphabets a and b, in which two
a’s do not come together.
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 8
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
15 What is the regular language The set of all strings containing at least two 0’s.
over the alphabet {0,1} is
described by the regular
expression
r = (0+1)*0(0+1)*0(0+1)*
Example-5: Prove the following. Show that LHS regular expression is equivalent to RHS
regular expression in each case. (Equivalent proof of regular expressions, Simplification of
Regular Expressions)
a) Prove that the regular expression R =ε + 1*(011)*(1*(011)*)* also describes the same set
of strings as S= (1 + 011)*
Solution:
Let P = 1*(011)*
LHS=R = ε + 1*(011)*(1*(011)*)*
= ε + PP*
= P* using I9
= 1*(011)*
= (Q*T*)* where Q=1, T=011
= (Q + T)* using I11
= (1 + 011)* = S = RHS (Proved)
RHS = (0+1)*
= { ε,0,00,1,11,111,01,10,…..}
= { ε, any combinations of 0’s, any combinations of 1’s, any combinations of 0 and
1}
Hence, it is proved that LHS=RHS
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 9
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
Example-6: Find the length of the shortest string not in the language (over given alphabet) of
given regular expression.
Sl. Regular Length of Explanation
No. Expression shortest string
NOT in the
language
1 r = a*b*(ba)*a* 3 Reference: GATE-CS-2014-(Set-3)/ Q-65
For the language L(r) = a*b*(ba)*a*
String of length 0 => Null ∈ L
String of length 1 => {a, b} ∈ L
String of length 2 => {aa, ab, ba, bb} ∈ L
String of length 3 => {aaa, aab, aba, baa, abb, bba,
bbb} ∈ L, but {bab}∉L
“bab” is the shortest string not acceptable by given
regular expression whose length is 3.
2 r=a*(ab)*b* 2 It can easily be seen that , a, b, which are strings in
the language with length 1 or less. Of the strings
wih length 2 aa, bb and ab are in the language.
However, ba is not in it. Thus the answer is ba.
3 r= 3 We have two ways to solve this, the first way
(1 +01)* 00 (1 being, check every string starting from 0 length it
+10)* + (1+01)* it is generaged by the regular expression or not.
(0+ε) But if we observe the regular expression carefully,
r is actually broken into two parts.
= (Exactly 1 pair of consecutive zeroes)+(No pair
of consecutive zeroes)
= (At most 1 pair of consecutive zeroes)
So if we can recognize the language, we already
know that the shortest string not generated will be
the shortest string having more than one pair of
consecutive 0's and it is none other than 000. So
the length will be equal to 3
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 10
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
Example-2: Construct a DFA that do not starts with 0 over Σ = {0, 1}.
Solution:
Let, L1 = { 0x | x ∈ {0,1}*} = strings that starts with 0
So, L1' = { 1x | x ∈ {0,1}*} U {ε} = strings that do not start with 0
L1 denote the language containing strings L1' denote the language containing strings
that starts with 0 over Σ = {0, 1} that do not start with 0 over Σ = {0, 1}
Step-1:DFA for L1 Step-2: DFA for L1'
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 11
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
Step-2 and 3
- For union, at least one original DFA must accept:
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 12
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
L1 and L2 are regular ⇒ L1' and L2' are regular (by Complementation property)
L1' ∪ L2' is regular (by Union property)
L1 ∩ L2 is regular (by De Morgan's law)
As De morgan’s law states
In terms of DFA, we can say that a DFA(L1 ∩ L2) accepts those strings that are accepted by
both DFA(L1) and DFA(L2).
Construction of DFA for (L1 ∩ L2)
Step-1 &2: Same as union (L1 U L2)
Step-3: Initial state = The state lebeled as both DFAs initial states, Final State (s) = The state
lebeled as both DFAs final states.
Step-2 and 3
- For intersection, both original DFAs must accept:
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 13
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
For more examples, refer DFA Home Assignment Solution (AN Series) Note.
Step-2: Change the final state to initial and intial to final states and reverse the arrows.
But this is not the DFA as A has no output and B we are having two outputs on a. So this is a NFA. If you
want to find out a DFA, then follow the process of conversion of NFA to DFA.
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 14
School of Computer Engineering, CS21003 Automata Theory 2025
Study Material KIIT University, Bhubaneswar and Formal Language AUTUMN
Sl. Set
Explanation & Proof
No. Operations
e The concatenation of two regular languages, L1 and L2, which are
represented using L1.L2 is also regular and which represents the set of
strings that are formed by taking any string in L1 concatenating it with
L1.L2
any string in L2.
or
L1 = { 0,1 } and L2 = { 00, 11} then L1.L2 = {000, 011, 100, 111}.
L1L2
Proof:
(Concatenati
If L1 and L2 are regular, then there exist regular expressions r1 and r2
on)
such that L1 = L(r1) and L2 = L(r2). By definition, r1r2 is a regular
expression denoting the languages L1L2.
Note: If you notice any errors, please mail to [email protected] or contact at +91-9938853866
CSE18/CSE29-Lecture Notes of Automata Theory & Formal Language Dr. Anil Kumar Swain, CSE, KIIT University, Bhubaneswar Page 15