CSC312
Automata Theory
Lecture # 2
Languages
Introduction to languages
Kinds of languages:
Talking language (Informal languages or
Semantic languages)
Programming language (Formal Languages or
Syntactic languages)
Alphabet and Strings
Alphabet: An alphabet is a finite set of symbols, usually
letters, digits, and punctuations.
Valid/In-valid alphabets: An alphabet may contain
letters consisting of group of symbols for example Σ=
{a, ba, bab, d}.
Remarks: While defining an alphabet of letters
consisting of more than one symbols, no letter should
be started with the letter of the same alphabet i.e.
one letter should not be the prefix of another.
However, a letter may be ended in a letter of same
alphabet.
a,:ba, c
Valid alphabet
a, ab
Invalid alphabet : , c
3
Alphabets and Strings
String or word: A finite sequence of
letters/alphabets
Examples: “cat”, “dog”, “house”, “read” …
Defined over an alphabet:
a, b, c, , z
Language: A language is a set of
strings constructed from some
alphabet e.g. Urdu, English, Java, the
set of all binary strings
4
Sentences are made up of certain
combinations of words.
Not all combinations of words lead to a valid English sentence.
So we see that some basic units are
combined to make bigger units.
Languages
How can you tell whether a given sentence
belongs to a particular languages
Black is cat the
The tea is hot
I like chocolates two much
Rules give a clue to forming as well as
validating sentences.
Formal vs. Informal Rules
Informal language -> abstract languages
Incoherent strings are also
understandable
Slang, idiom, dialect etc.
Raise ambiguity
Interpretation varies with region
I am through (BrE/AmE)
Same words have multiple meanings.
Like, light, base, etc.
Summary of Languages
Three aspects/specifications
Lexical
Defines valid words/units of a language
Syntactic
Defines rules for combining the units to
form valid sentences (computer programs
in context of machines)
Semantic
Concerned with the interpretation or
meaning of a sentence (what output to
produce in context of machines)
Affected by ambiguity the most.
Formal languages
Rules defined explicitly and clearly
No ambiguities
Universally uniform understanding
Lets the machine
Interpret an input uniformly every time. i.e.
always produces same output for a
particular input
Avoid crashes because of ambiguity.
Explicitly and categorically reject invalid
input
Formal Languages
Need uniformly understandable notation
Representations
Alphabet
Represents a finite set of fundamental units
of lanauges, e.g. for English ={a,b,….z.A,…
Z,}
∑ = {0,1}
∑ = {0,1,2,3,4,5,6,7,8,9}
Formal Languages
List of words
Set of all valid words of a given language,
e.g., a language English_Words that
contains all valid words of English would
have a = {all entries of the dictionary
+ punctuation marks and blank space}
Denoted by
Is Finite or Infinite set.
Strings: A string a finite sequence of symbols
chosen from alphabet. For example
0111100 , 123045, abbbcdeg etc.
String Variable: A letter used for
denoting a string. The author uses w, x, y
and z as string variable. For example
w = 0111100 , x = 123045, z = abbbcdeg
Length of String: The number of positions
for symbols in the string. For simplicity
we can say that it is the number of
symbols in the string. For example
|w| = 7 , |x| = ? , |z| = ?
Alphabets and Strings
We will use small letters for alphabets:
a, b
Strings
a
ab u ab
abba v bbbaaa
baba w abba
aaabbbaabab
13
String Operations
Let we have following strings
w a1a2 an abba
v b1b2 bm bbbaaa
Concatenation
wv a1a2 anb1b2 bm abbabbbaaa
Reverse
R
w an a2a1 abba
aaabbb
14
String Length
w a1a2 an
Length: w n
Examples: abba 4
aa 2
a 1
15
Length of Concatenation
uv u v
Example: u aab, u 3
v abaab, v 5
uv aababaab 8
uv u v 3 5 8
16
Empty String
A string with no letters: or or
Observations: 0
w w w
abba abba abba
Note-1: A language that does not
contain any word at all is denoted by
or { }. This language doesn’t contain
any word not even the NULL string.
i.e. { } ≠ {}
17
Empty String
Note-2: Suppose a language L doesn’t
contain NULL then
L=L+
but L ≠ L + {}.
Important : NULL is identity element
with respect to concatenation.
18
Substring
Substring of string:
a subsequence of consecutive characters
String Substring
abbab ab
abbab abba
abbab b
abbab bbab
19
Prefix and Suffix
Let the string is abbab
Prefixes Suffixes
abbab w uv
a bbab
prefix
ab bab
suffix
abb ab
abba b
abbab
20
Repeat Operation
n
w - w repeated n time; that is,
n
w ww
w
n
2
Example: abba abbaabba
0
w
0
Definition: abba
21
The * Operation
: the set of all possible strings from
* alphabet , called closure of alphabets also known as
Kleene star operator or Kleene star closure.
a , b
i.e. infinitely many words each of finite length.
* , a, b, aa, ab, ba, bb, aaa, aab,
22
The + Operation
: the set of all possible strings from
alphabet except , also known as Kleene plus operator.
a, b
*
Note :
, a , b , aa , ab , ba
are infinite
, bb, aaa , aab,
*
a, b, aa, ab, ba, bb, aaa, aab,
* and
23
Languages
A language is a set of strings OR
A language is any subset of *, usually
denoted by L. It may be finite or infinite.
Example: a, b
* , a, b, aa, ab, ba, bb, aaa,
Languages:
a, aa, aab
{ , abba, baba, aa, ab, aaaaaa}
If a string w is in L, we say that w is a
sentence of L. 24
Note that:
Sets { } {}
Set size {} 0
Set size {} 1
String length 0
25
Another Example
n n
An infinite language L {a b : n 0}
ab
L abb L
aabb
aaaaabbbbb
26
Operations on Languages
The usual set operations
a, ab, aaaabb, ab {a, ab, bb, aaaa}
a, ab, aaaabb, ab {ab}
a, ab, aaaa bb, ab a, aaaa
Complement: L * L
a, ba , b, aa, ab, bb, aaa,
27
Reverse
R R
Definition: L {w : w L}
R
Examples: ab, aab, baba ba, baa, abab
n n
L {a b : n 0}
R n n
L {b a : n 0}
Concatenation
Definition: L1L2 xy : x L1, y L2
Examples: a, ab, bab, aa
ab, aaa, abb, abaa, bab, baaa
28
Repeat Operation
Definition: n
L LL L
n
L concatenated with itself n times.
3
a, b a, ba, ba, b
aaa, aab, aba, abb, baa, bab, bba, bbb
Special case: L
0
a, bba, aaa
0
29
More Examples
n n
L {a b : n 0}
2 n n m m
L {a b a b : n, m 0}
2
aabbaaabbb L
30
Star-Closure (Kleene *)
0 1 2
Definition: L* L L L
Example:
,
a, bb,
a, bb*
aa , abb, bba , bbbb,
aaa, aabb, abba, abbbb,
31
Positive Closure
1 2
Definition: L L L
L *
a, bb,
a, bb aa, abb, bba, bbbb,
aaa, aabb, abba, abbbb,
Note: L+ includes if and only if L includes
32
Lexicographical Order
Assume that the symbols in are
themselves ordered.
Definition: A set of strings is in
lexicographical order if
-The strings are grouped first according to
their length.
-Then, within each group, the strings are
ordered “alphabetically” according to the
ordering of the symbols.
33
Lexicographical Order
Ex: Let the alphabet be {a, b}
The set of all strings in Lexicographical
order is
, a, b, aa, ab, ba, bb, aaa, …., bbb, aaaa, …,
bbbb, ….
34
PALINDROME
Let us define a new language called PALINDROME over the
alphabet S= {a, b}
The language consisting of Λ and the strings s
defined over Σ such that Rev(s)=s.
It is to be noted that the words of PALINDROME
are called palindromes.
Example: For Σ={a,b},
PALINDROME={Λ , a, b, aa, bb, aaa, aba, bab,
bbb, ...}
Remark: There are as many palindromes of length
2n as there are of length 2n-1.
35
Courtesy Costas Busch - RPI 36
Courtesy Costas Busch - RPI 37
Courtesy Costas Busch - RPI 38
Courtesy Costas Busch - RPI 39
Courtesy Costas Busch - RPI 40
Courtesy Costas Busch - RPI 41
Courtesy Costas Busch - RPI 42
Courtesy Costas Busch - RPI 43
Courtesy Costas Busch - RPI 44
Courtesy Costas Busch - RPI 45
Courtesy Costas Busch - RPI 46
Courtesy Costas Busch - RPI 47
Courtesy Costas Busch - RPI 48
Courtesy Costas Busch - RPI 49
Courtesy Costas Busch - RPI 50