Finite Automata
• The regular expression is more than just a convenient meta-
language for text searching.
• Any regular expression can be implemented as a finite-state
automaton.
• Symmetrically, any finite-state automaton can be described
with a regular expression.
• Regular expression is one way of characterizing a particular
kind of formal language called a regular language.
• Both regular expressions and finite-state automata can be
used to describe regular languages.
REGULAR EXPRESSIONS AND AUTOMATA 1
Finite Automata
Finite Automata
The relationship between finite state automata, regular
expression, and regular language
Finite state automata
(Computataional Device)
Regular Expression Regular language
(Descriptive Notation) (Set of Objects)
REGULAR EXPRESSIONS AND 3
AUTOMATA
What is a Finite-State
Automaton?
• An alphabet of symbols,
• A finite set of states,
• A transition function from states and symbols to states,
• A distinguished member of the set of states called the start state,
and
• A distinguished subset of the set of states called final states.
• FSA recognize the regular languages represented by regular
expressions
• Directed graph with labeled nodes and arc transitions
REGULAR EXPRESSIONS AND AUTOMATA 4
Formally
FSA is a 5-tuple consisting of
Q: a finite set of N states q0, q1, …, qN
: a finite input alphabet of symbols
q0: the start state
F: the set of final states, F Q
(q,i): a transition function mapping Q
x to Q
REGULAR EXPRESSIONS AND AUTOMATA 5
FSA Accepter
Input
String
Output
“Accept”
Finite
or
A utomaton
“ Reject”
REGULAR EXPRESSIONS AND 6
AUTOMATA
Transition Graph
abba -Finite Accepter a,b
q5
a,b
b a a b
initial
state q0 a q1 b q2 b q3 a q 4
final state “accept”
state transition
REGULAR EXPRESSIONS AND AUTOMATA 7
Initial Configuration
Input String
a b b a
a,b
q5
a,b
b a a b
q0 a q1 b q2 b q3 a q 4
REGULAR EXPRESSIONS AND AUTOMATA 8
Reading Input
Input String
a b b a
a,b
q5
a,b
b a a b
q0 a q1 b q2 b q3 a q 4
REGULAR EXPRESSIONS AND AUTOMATA 9
Reading Input
Input String
a b b a
a,b
q5
a,b
b a a b
q0 a q1 b q2 b q3 a q 4
REGULAR EXPRESSIONS AND AUTOMATA 10
Reading Input
Input String
a b b a
a,b
q5
a,b
b a a b
q0 a q1 b q2 b q3 a q 4
REGULAR EXPRESSIONS AND AUTOMATA 11
Reading Input
Input String
a b b a
a,b
q5
a,b
b a a b
q0 a q1 b q2 b q3 a q 4
REGULAR EXPRESSIONS AND AUTOMATA 12
Reading Input
Input String
a b b a
a,b
q5
a,b
b a a b
q 0
a q 1
b q 2
b q 3
a q 4 Output: Ac ce p te d
REGULAR EXPRESSIONS AND AUTOMATA 13
Using an FSA to Recognize Sheep
talk
REGULAR EXPRESSIONS AND AUTOMATA 14
Using an FSA to Recognize
Sheep Talk
Sheep language can be defined as any string from the following (infinite) set:
The regular expression for this kind of sheeptalk is
/baa+!/
baa!
All RE can be represented as baaa!
baaaa!
FSA baaaaa
!
....
a
b a a !
q0
q0 q1
q1 q2 q3 q4
REGULAR EXPRESSIONS AND AUTOMATA 15
State Transition Table for Sheep
Talk
State Input
a b a !
b a a ! 0(null) 1 Ø Ø
1 Ø 2 Ø
q0 q1 q2 q3 q4
2 Ø 3 Ø
3 Ø 3 4
4: Ø Ø Ø
REGULAR EXPRESSIONS AND AUTOMATA 16
Algorithm
function D-RECOGNIZE(tape,machine) returns accept or reject
index <- Beginning of tape
current-state <- Initial state of machine
loop
if End of input has been reached then if current-state is an
accept state then return accept
else
return reject
elseif transition-table[current-state,tape[index]] is empty then return reject
else
current-state <- transition-table[current-state,tape[index]] index <- index +1
REGULAR EXPRESSIONS AND 1
AUTOMATA 7
Using an FSA to Recognize Sheep
Talk
FSA recognizes (accepts) strings of a regular language
baa! a
baaa! b a a !
baaaa! q0 q1 q2 q3 q4
…
Tracing the execution of FSA on some sheep talk
q q q q q q
0 1 2 3 3 4
... ... ... b a a a ! ... ... ... ... ... ... ... ...
REGULAR EXPRESSIONS AND AUTOMATA 18
Adding a fail state to FSA
a
b a a !
q0 q1 q2 q
3 q4
! ! b
b ! b !
b
?
a
c a
qf
REGULAR EXPRESSIONS AND AUTOMATA 19
Adding an else arch
REGULAR EXPRESSIONS AND AUTOMATA 20
Adding ϵ Transition
b a a !
q q q q
0 1 2 3 q4
REGULAR EXPRESSIONS AND AUTOMATA 21
Example FSA
An FSA for the words of English
numbers 1-99
REGULAR EXPRESSIONS AND AUTOMATA 22
FSA for NLP
Word Recognition
Dictionary Lookup
Spelling Conventions
REGULAR EXPRESSIONS AND AUTOMATA 23
Word Recognition
A word recognizer takes a string of characters as input and returns
“yes” or “no” according as the word is or is not in a given set.
Solves the membership problem.
e.g. Spell Checking, Scrabble(Un-ordered Concatenation)
Approximate methods
Has right set of letters (any order).
Has right sounds (Soundex).
Random (suprimposed) coding (Unix Spell)
REGULAR EXPRESSIONS AND AUTOMATA 24
Word Recognition
Exact Methods
Hashing
Search (linear, binary ...)
Digital search (“Tries”)
Finite-state automata
REGULAR EXPRESSIONS AND AUTOMATA 25
Dictionary Lookup
Dictionary lookup takes a string of characters as input and returns
“yes” or “no” according as the word is or is not in a given set and
returns information about the word.
Lookup Methods
Approximate — guess the information
If it ends in “ed”, it’s a past-tense verb.
Exact — store the information for finitely many words
Table Lookup
Hash
Search
REGULAR EXPRESSIONS AND AUTOMATA 26
Finite State Transducers
• A finite state transducer defines a relationship between
two regular languages and
REGULAR EXPRESSIONS AND AUTOMATA 27
Finite State Transducers
A finite state transducer essentially is a finite state
automaton that works on two (or more) tapes. The most
common way to think about transducers is as a kind of
``translating machine''. They read from one of the tapes and
write onto the other.
a:b
q 0
a:b at the arc means that in this transition the transducer
reads a from the first tape and writes b onto the second.
REGULAR EXPRESSIONS AND AUTOMATA 28
Finite State Transducers
Transducer behaves as follows in the different modes.
generation mode: It writes a string of as on one tape and a string bs on the
other tape. Both strings have the same length.
recognition mode: It accepts when the word on the first tape consists of exactly
as many as as the word on the second tape consists of bs.
translation mode (left to right): It reads as from the first tape and writes an b for
every a that it reads onto the second tape.
translation mode (right to left): It reads bs from the second tape and writes an a
for every f that it reads onto the first tape.
relator mode: Computes relations between sets
REGULAR EXPRESSIONS AND AUTOMATA 29
Finite State Transducers
• FST as recognizer:
• Take a pair of strings and accepts or rejects them.
• FST as generator:
• Outputs a pair of strings for a language
• FST as translator:
• Reads a string and outputs another string.
• Morphological parsing: letters (input), morphemes (output)
• FST as relator:
• Computes relations between sets
REGULAR EXPRESSIONS AND AUTOMATA 30
FST vs FSA
FSA can act as a FST can act as a
Recognizer Recognizer
Generator Generator
5 tuple Representation Translator
Equivalent to regular Set relator
languages 7 tuple Representation
Equivalent to regular
relations
REGULAR EXPRESSIONS AND AUTOMATA 31
REGULAR EXPRESSIONS AND AUTOMATA 32
REGULAR EXPRESSIONS AND AUTOMATA 33
REGULAR EXPRESSIONS AND AUTOMATA 34
REGULAR EXPRESSIONS AND AUTOMATA 35
FST Operations
Inversion: Switching input and output labels
If T maps from I to O, T-1 maps from O to I
Composition:
If T1 is a transducer from I1 to O1 and T2 is a transducer
from I2 to O2, then T1 T2 is a transducer from I1 to O2.
REGULAR EXPRESSIONS AND AUTOMATA 36
FST Operations
Inversion: Switching input and output labels
If T maps from I to O, T-1 maps from O to I
Composition:
If T1 is a transducer from I1 to O1 and T2 is a transducer
from I2 to O2, then T1 T2 is a transducer from I1 to O2.
REGULAR EXPRESSIONS AND AUTOMATA 37
FST for NLP
Tokenization
Morphological analysis
Transliteration
Parsing
Translation
Speech recognition
Spoken language
understanding
REGULAR EXPRESSIONS AND AUTOMATA 38
References
• Daniel Jurafsky, Speech and Language Processing
• [Link]
striegnk/courses/nlp-with-prolog/html/[Link]#[Link]
ansducers
• [Link]
REGULAR EXPRESSIONS AND AUTOMATA 39
END
REGULAR EXPRESSIONS AND AUTOMATA 40