0 ratings0% found this document useful (0 votes) 42 views170 pagesLecture Notes of Compiler Design Lab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Module -1
Introduction to Compiling:
1.1 INTRODUCTION OF LAN
GUAGE PROCESSING SYSTEM.
Skeletal Source Program
Preprocessor
Source program
Compiler
[= Assembly program
Assembler
Relocatable Machine Code
: ;
[ Loader/Linker-editor | «—
Librazy, relocatable obj file
’
Absolute Machine Code
Fig 1.1: Language Processing System
Preprocessor
‘A preprocessor produce input to compilers. They may perform the following functions.
1. Macro processing: A preprocessor may allow a user to define macros that are short hands for
longer constructs.
2. File inclusion: A preprocessor may include header files into the program text.
3. Rational preprocessor: these preprocessors augment older languages with more modern flow-of-
control and data structuring facilities.
4. Language Extensions: These preprocessor attempts to add capabilities to the language by certain
amounts to build-in macro
COMPILER
Compiler is a translator program that translates a program written in (HLL) the source program and
translate it into an equivalent program in (MLL) the target program. As an important part of a
compiler is error showing to the programmer.
Source 9 taragt pam
‘Compiler
J
Fig 1.2: Structure of Compiler
Enor mseExecuting a program written n HLL programming language is basically of two parts, the source
program must first be compiled translated into a object program. Then the results object program is
loaded into a memory executed.
Somes p20) — Sonar objpem
2b} pam inp Onj pam] oP PEM oytput
Fig 1.3: Execution process of source program in Compiler
ASSEMBLER
Programmers found it difficult to write or read programs in machine language. They begin to use a
mnemonic (symbols) for each machine instruction, which they would subsequently translate into
machine language. Such a mnemonic machine language is now called an assembly language.
Programs known as assembler were written to automate the translation of assembly language in to
machine language. The input to an assembler program is called source program, the output is a
‘machine language translation (object program).
INTERPRETER
An interpreter is a program that appears to execute a source program as if it were machine language.
INeor PROCESS ‘oureur
‘odo
Fig] 4: Execution in Interpreter
Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also uses
interpreter. The process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Synatx analysis
3. Semantic analysis
4. Direct Execution
Advantages:
Modification of user program can be easily made and implemented as execution proceeds,
‘Type of object that denotes a various may change dynamically.
Debugging a program and finding errors is simplified task for a program used for interpretation.
The interpreter for the language makes it machine independent.
Disadvantages:
The execution of the program is slower.
Memory consumption is mote.
LOADER AND LINK-EDITOR:
Once the assembler procedures an object program, that program must be placed into memory and
executed, The assembler could place the object program directly in memory and transfer control to it,thereby causing the machine language program to be execute. This would waste core by leaving the
assembler in memory while the user’s program was being executed. Also the programmer would
have to retranslate his program with each execution, thus wasting translation time. To over come this,
problems of wasted translation time and memory. System programmers developed another
component called loader
“A loader is a program that places programs into memory and prepares them for execution.” It would
be more efficient if subroutines could be translated into object form the loader could”relocate”
directly behind the user’s program. The task of adjusting programs o they may be placed in arbitrary
core locations is called relocation, Relocation loaders perform four functions.
1.2 TRANSLATOR
‘A translator is a program that takes as input a program written in one language and produces as
output a program in another language. Beside program translation, the translator performs another
very important role, the error-detection. Any violation of d HLL specification would be detected and
reported to the programmers. Important role of translator are:
1 Translating the HLL program input into an equivalent ml program.
2 Providing diagnostic messages wherever the programmer violates specification of the HLL.
1.3 LIST OF COMPILERS
1. Ada compilers
2.ALGOL compilers
3 BASIC compilers
4..C# compilers
5 .C compilers
6 C++ compilers
7 COBOL compilers
8 Common Lisp compilers
9. ECMASeript interpreters
10. Fortran compilers
11 Java compilers
12. Pascal compilers
13. PL/I compilers
14, Python compilers
15, Smalltalk compilers
1.4 STRUCTURE OF THE COMPILER DESIGN
Phases of a compiler: A compiler operates in phases. A phase is a logically interrelated operation
that takes source program in one representation and produces output in another representation, The
phases of a compiler are shown in below
‘There are two phases of compilation.
a. Analysis (Machine Independent/Language Dependent)
. Synthesis(Machine Dependent/Language independent)
Compilation process is partitioned into no-of-sub processes called ‘phases’
Lexical Analysis:
LA or Scanners reads the source program one character at a time, carving the source program into a
sequence of automic units called tokens.source program
lexical
analyzer
eomeeme)
A =
/ 1
Sf aa
oe | Bae
symbol table v
x. ae Soe code
\O cate
optimizer
¥
code Y
generator
target program
Fig 1.5: Phases of Compiler
Syntax Analysi
‘The second stage of translation is called Syntax analysis or parsing. In this phase expressions,
statements, declarations etc... are identified by using the results of lexical analysis. Syntax analysis is,
aided by using techniques based on formal grammar of the programming language.
Intermediate Code Generations:
An intermediate representation of the final machine language code is produced. This phase bridges
the analysis and synthesis phases of translation.
Code Optimization :-
‘This is optional phase described to improve the intermediate code so that the output runs faster and.
takes less space.
Code Generation:
‘The last phase of translation is code generation. A number of optimizations to reduce the length of
machine language program are cattied out during this phase. The output of the code generator is
the machine language program of the specified computer.Table Management (or) Book-keeping:- This is the portion to keep the names used by the
program and records essential information about each. The data structure used to record this
information called a ‘Symbol Table’
Error Handlers:~
It is invoked when a flaw error in the source program is detected. The output of LA is a stream of
tokens, which is passed to the next phase, the syntax analyzer or parser. The SA groups the tokens
together into syntactic structure called as expression. Expression may further be combined to form
statements, The syntactic structure can be regarded as a tree whose leaves are the token called as
parse trees.
The parser has two functions. It checks if the tokens from lexical analyzer, occur in pattern that are
permitted by the specification for the source language. It also imposes on tokens a tree-like structure
that is used by the sub-sequent phases of the compiler.
Example, if a program contains the expression A¥/B after lexical analysis this expression might
appear to the syntax analyzer as the token sequence id-H/id. On seeing the /, the syntax analyzer
should detect an error situation, because the presence of these two adjacent binary operators violates
the formulations rule of an expression, Syntax analysis is to make explicit the hierarchical structure
of the incoming token stream by identifying which parts of the token stream should be grouped.
Example, (A/B*C has two possible interpretations.)
1, divide A by B and then multiply by C or
2, multiply B by C and then use the result to divide A.
each of these two interpretations can be represented in terms of a parse tree.
Intermediate Code Generation:~
The intermediate code generation uses the structure produced by the syntax analyzer to create a
stream of simple instructions. Many styles of intermediate code are possible. One common style uses
instruction with one operator and a small number of operands. The output of the syntax analyzer is,
some representation of a parse tree. the intermediate code generation phase transforms this parse tree
into an intermediate language representation of the source program.
Code Optimization
This is optional phase described to improve the intermediate code so that the output runs faster and
takes less space. Its output is another intermediate code program that does the some job as the
original, but in a way that saves time and / or spaces.
‘a. Local Optimization:
There are local transformations that can be applied (o a program to make an improvement. For
example,
IfA>B goto L2Goto L3
Li
This can be replaced by a single statement
ILA 0|1|2|3)4\5|6|718|9
© list, digit : Grammar variables, Grammar symbols
© 0)1,2,3,4,5,6,7,8,9,-;+ : Tokens, Terminal symbols
Convention specifying grammar
©. Terminal symbols : bold face string if, num, id
© Nonterminal symbol, grammar symbol: italicized names, list, digit A,B
Grammar G=(N,T,P,S)
© N:aset of nonterminal symbols
T : a set of terminal symbols, tokens
© P:aset of production rules
start symbol, SEN
Grammar G for a language L={9-5+2, 3-1, ...}
o G=(N,T.P,S)
listdigit}
{0,1,2,3,4,5,6.7,8,9,-.+}
Ps list-> list + digit
list -> list - digit
list -> digit
digit -> 0|1|2|314|5|617/8)9
Selist
Some definitions for a language L and its grammar G
© Derivation :
A sequence of replacements S=>a1=942=...=>an is a derivation of an.
Example, A derivation 1+9 from the grammar G
© left most derivation
list > list + digit = digit + digit > 1+ digit > 1+9
‘© right most derivation
list = list + digit = list + 9 = digit +9 > 1
© Language of grammar L(G)
L(G) is a set of sentences that can be generated from the grammar G.
L(G)={x| S >* x) where x € a sequence of terminal symbols
«Example: Consider a grammar G-(N,T.P,S):
N={S} T={ab}
S=S P={S — aSb |=}
* is aabb a sentecne of L(g)? (derivation of string aabb)
S=3aSb=aaSbb=aacbb=aabb(or S=>* aabb) so, aabbeL(G)
there is no derivation for aa, so aa¢L(G)
note L(G)={anbn] n20} where anbn meas n a's followed by n b's.
9
Parse TreeA derivation can be conveniently represented by a derivation tree( parse tree).
The root is labeled by the start symbol.
Each leaf is labeled by a token or .
Each interior none is labeled by a nonterminal symbol.
When a production Axl... xn is derived, nodes labeled by x1... xn are made as
children
nodes of node labeled by A.
© root : the start symbol
‘* internal nodes : nonterminal
© leaf nodes : terminal
Example G:
list > list + digit | list - digit | digit
digit > 0(1/2)3/4)5|6)7|819
© left most derivation for 9-5+2,
list = listtdigit listdigit'digit > digitdigit'digit > 9digit+digit
=> 9S+digit > 95+2
‘right most derivation for 9-5+2,
list > listtdigit >list?2 —>listdigit+2 = list5+2
= digitS12 = 9512
parse tree for 9-5+2
\
digit
4 |
list digit |
|
digit
|
9 - 5 +4 2
Fig 2.2, Parse tree for 9-5+2 according to the grammar in Example
Ambiguity
‘© A grammar is said to be ambiguous if the grammar has more than one parse tree for a
given string of tokens.
© Example 2.5. Suppose a grammar G that can not distinguish between lists and digits as in
Example 2.1
© G:string — string + string | string - string |0{1|2)3/4|5|6(7|8)9string string
JIN JIN
siring + String string
JIN "| PAIN
|
: : : :
Fig 2.3. Two Parse tree for 9-5+2
1-542 has 2 parse trees => Grammar G is ambiguous.
of operator
A operator is said to be left associative if an operand with operators on both sides of itis
taken by the operator to its left.
eg) 9+5+2=(9+5)+2, ab
Left Associative Grammar
list — list + digit | list ~ digit
digit +0)])...19
© Right Associative Grammar :
=c)
right — letter = right letter
letter — a)b).
list right
ZAIN JIN
list - digit letter = right
4IN | | Zs
list’ = digit’ 2 a letter = right
| | |
digit 5 b letter
| |
9 ©
Fig 2.4, Parse tree left- and right-associative operators.
Precedence of operators
‘We say that a operator(*) has higher precedence than other operator(+) if the operator(*) takes
operands before other operator(+) does.
© ex, 945*2=94(542), 9*5+2=(9*5)+2.
‘¢ left associative operators :+,-,*,/
‘right associative operators : =, **«Syntax of full expressions
operator] associative | precedence
+= left
ai left
© expr expr * term | expr term | term
term — term * factor | term /factor | factor
factor — digit | (expr )
digit > 0\1|..\9
Syntax of statements
stmt > id = expr;
| if (expr) stmt ;
| if (expr ) stmt else stmt ;
| while (expr ) stmt ;
expr — expr + term | expr term | term
term + term * factor | term / factor | factor
factor — digit | (expr )
digit > 0|1|.../9
2.3 SYNTAX-DIRECTED TRANSLATION(SDT)
A formalism for specifying translations for programming language constructs.
(attributes of a construct: type, string, location, etc)
‘* Syntax directed definition(SDD) for the translation of constructs
‘* Syntax directed translation scheme(SDTS) for specifying translation
Postfix notation for an expression E
If Bisa variable or constant, then the postfix nation for E is E itself ( E.t=
‘* if Eis an expression of the form E1 op E2 where op is a binary operator
© El'is the postfix of El,
© E2Vis the postfix of E2
©. then El’ E2' op is the postfix for El op E2
1 is (E1), and El’ is a postfix
then EI’ is the postfix for E
«9-526 ED
9-6+2) >
Syntax-Directed Definition(SDD) for translation
‘© SDD isa set of semantic rules predefined for each productions respectively for
translation.
A translation is an input-output mapping procedure for translation of an input X,
© construct a parse tree for X.
© synthesize attributes over the parse tree.+ Suppose a node n in parse tree is labeled by X and X.a denotes the value
of attribute a of X at that node.
= compute X's attributes X.a using the semantic rules associated with X.
Example 2.6. SDD for infix to postfix translation
PRODUCTION SEMANTIC RULE
expr => expr, + term | exprat = expry.t | terms | +
expr + expr, ~ term | expr := expry.t |i terms | =
expr > term expr = terms
term > 0 term = '0
term > 4 term = 1"
term > 9 term.
Fig 2.5. Syntax-directed definition for infix to postfix translation.
An example of synthesized attributes for input X=9-5+2
expr expr.t = 95-2
te | ing = ™~
exor verm expr = 98- Terma = 2
aN N
expr term exprt=9 ferm.t= 5
| |
term term =9
| \
3 = 3s ¢ 2 ° s . Ml 4
@ ®
Fig 2.6. Attribute values at nodes in a parse tree.
Syntax-directed Translation Schemes(SDTS)
‘* A translation scheme is a context-free grammar in which program fragments called
translation actions are embedded within the right sides of the production
productions(postfix) ‘SDD for postfix to | SDTS
infix notation
Tist > list + term Tistt=Tist.t | term.t]|"=" Tist > list + term
© {print("+");} : translation(semantic) action.
* SDTS generates an output for each sentence x generated by underlying grammar by
executing actions in the order they appear during depth-first traversal of a parse tree for x.2. Translate :
a) parse the input string x and
b) emit thi
Fig 2.7. Example of a depth-first traversal ofa tree,
Design translation schemes(SDTS) for translation
tion result encountered during the depth-first traversal of parse tree,
rest
=f
+ term {print('+')} rest
Fig 2.8, An extra leaf is constructed for a semantic action,
Example 2.8
* SDD vs. SDTS for infix to postfix translation.
productions SDD SDTS
expr — list term expri=listt]/termt] "| expr— list + term
expr — list + term expr.t = list term.t |) "=" printf{"+")}
expr — term exprt expr — list + term printf{"-")}
term 0 term = expr term
term > 1 term.t="1" term — 0 printf{"0")}
term — I printf{"1")}
term > 9
term — 9 printf}
© Action translating for input 9-542
expr.
aid
expr
/
term
9
1) Parse.
2) Translate,
Do we have to maintain the whole parse tre
i
{print('9"))
expr
. Term
>. a ag
iprint(’~") 2 print('2'))
{print(’5'))
Fig 2.9. Actions translating 9-5+2 into 95-24.
No, Semantic actions are performed during parsing, and we don't need the nodes (whose
semantic actions done).2.4 PARSING
if token string x © L(G), then parse tee
else error message
Top-Down parsing
1, At node n labeled with nonterminal A, select one of the productions whose left part is
‘A and construct children of node n with the symbols on the right side of that production.
2. Find the next node at which a sub-tree is to be constructed,
ex. G: type — simple
Itid
[array [ simple ] of type
simple — integer
[char
num dotdot num
Fig 2.10. Top-down parsing while scanning the input from left to right.@ pe
pe
i) ee SS
array ~~ simple pe
pe
AZ } Ss =<
(©) array simple of ope
AN
um dotdot num
ype
aoe —
()— array ~~ simple of pe
IN
mum dotdot “num simple
pe
ah Se
UAT i
|
sie
Fig 2.11. Steps in the top-down construction of a parse tree.
‘The selection of production for @ nonterminal may involve trial-and-error. =>
backtracking
G: { $>aSb| ¢| ab}
According to topdown parsing procedure, acb , aabb€L(G)?
Slacb=aSb/acb=>aSb/acb=PaaSbb/ach => X
(SovaSh) move (S-raS0) backtracking
=aSb/acb=racb/acb=dacb/acb=sacb/ach
(3) move move
so, acb€ L(G)
Is is finished in 7 steps including one backtracking,
Slaabb=saSb/aabb—>aSb/aabb-aaSbb/aabb—>aaSbb/aabb—aaaSbbb/aabb => X
(Sas) move (Sa8b) rove (Soa85) backing
=saaSbb/aabb=aacbb/aabb = X
S93 backtracking
=saaSbb/aabb=saaabbb/aabb=> X
(Sab) backwacking
=saaSbb/aabb= X
backtracking
=saSblaabb=acb/aabb
S3) bacraeking
=:aSb/aabb=>aabb/aabb=>aabb/aabb=>aabb/aabb=raaba/aabb
(Sab) move move move
so, aabbEL(G)
but process is too difficult. It needs 18 steps including 5 backtrackings* procedure of top-down parsing
let a pointed grammar symbol and pointed input symbol be g, a respectively.
© if g €N) select and expand a production whose left part equals to g next to
current production.
else if g =a) then make g and a be a symbol next to current symbol
else iff g 4a ) back tracking
= Tet the pointed input symbol a be the symbol that moves back to steps
same with the number of current symbols of underlying production
= eliminate the right side symbols of current production and let the pointed
symbol g be the left side symbol of current production.
Predictive parsing (Recursive Decent Parsing,RDP)
‘* A sstrategy for the general top-down parsing
Guess a production, see if it matches, if not, backtrack and try another.
=
+ Itmay fail to recognize correct string in some grammar G and is tedious in processing,
>
+ Predictive parsing
ois akind of top-down parsing that predicts a production whose derived terminal
symbol is equal to next input symbol while expanding in top-down paring.
without backtracking,
© Procedure decent parser is a kind of predictive parser that is implemented by
disjoint recursive procedures one procedure for each nonterminal, the procedures
are patterned after the productions.
* procedure of predictive parsing(RDP)
let a pointed grammar symbol and pointed input symbol be g, a respectively.
o if(geN)
= select next production P whose left symbol equals to g and a set of first
terminal symbols of derivation from the right symbols of the production P
includes a input symbol a.
"expand derivation with that production P.
©. else if( g=a)) then make g anda be a symbol next to current symbol
©. else if( g 4a) error
* G: { SssaSb|¢| ab} => GI: S->aS'|eS'>Sb |ab }
According to predictive parsing procedure, acb , aabb€L(G)?
(© Slacb=> confused in { S—raSb, Sab }
© so, a predictive parser requires some restriction in grammar, that is, there should.
be only one production whose left part of productions are A and each first
terminal symbol of those productions have unique terminal symbol.
+ Requirements for a grammar to be suitable for RDP: For each nonterminal either
1. A> Ba, or
2. A—alal |a2a2 |... |anan
1) for 1 $i,j Snandifj, ai ¢ aj
2) Ae may also occur if none of ai can follow A in a derivation and if we have Are* If the grammar is suitable, we can parse efficiently without backtrack.
General top-down parser with backtracking
t
Recursive Descent Parser without backtracking
‘
Picture Parsing (a kind of predictive parsing ) without backtracking
Left Factoring
Ifa grammar contains two productions of form
S— aa and S — a
it is not suitable for top down parsing without backtracking. Troubles of this form can
sometimes be removed from the grammar by a technique called the left factoring.
© Inthe left factoring, we replace { S— aa, Saf } by
{$—> aS), Sa, S'> B } ef. S— ata)
(Hopefully a and B start with different symbols)
© left factoring for G { SaSb | c| ab }
S—aS'|c ef. S(=aSb | ab | c= a(Sb|b)|c) > aS'|c
S'Sb |b
* Aconerete example:
> IF THEN |
IF THEN ELSE
is transformed into
— IF THEN S!
so ELSE | ©
* Example,
for G1: { $+aSb |c| ab }
According to predictive parsing procedure, acb , aabb €L(G)?
+ Siaabb=> unable to choose { S—uSb, S—rab ?}
‘© According for the feft factored gtrammar G1, acb , aabb L(G)?
G1: {Sa8|fe SSb)b} <= {S=a(SbIb) | c }
© Shacb=uS/acb=22/acb > ASb/acb = ablach = aéfacb= acb/ac
(S-a8) move (SSWeNED) (Se) move move
so, acb€ L(G)
It needs only 6 steps whithout any backtracking.
cf. General top-down parsing needs 7 steps and I backtracking
© Slaabb=%$'/aabb=4 Yaabb=Sb/aabb=PaS'b/aabb=Pad blaabb=>adb/aabb=> >
S28) move SHSHASD) (SS) move 0) move move
so, aabb ©L(@)
but, process is finished in 8 steps without any backtracking,
cf, General top-down parsing needs 18 steps including 5 backtrackings.
Left Recursion
© A grammar is left recursive iff it contains a nonterminal A, such that
A=+ Aq, where is any string.
© Grammar {S—> Sa | c} is left recursive because of S=>Sa
© Grammar {S— Aa, A— Sb | c} is also left recursive because of S>Aa=> Sba
* Ifa grammar is left recursive, you cannot build a predictive top down parser for it.1) Ifa parser is trying to match $ & S—Sa, it has no idea how many times S must be
applied
2) Given a left recursive grammar, it is always possible to find another grammar that,
generates the same language and is not left recursive.
3) The resulting grammar might or might not be suitable for RDP.
* After this, iff we need left factoring, it is not suitable for RDP.
+ Right recursion: Special care/Harder than left recursion/SDT can handle.
Eliminating Left Recursion
LetGbeS>SA|A
Note that a top-down parser cannot parse the grammar G, regard
are tried.
=? The productions generate strings of form AA:--A
=> They can be replaced by SA S' and SA S]é
of the order the productions
Example : " i
« A Aal Bp A A | g =
aaa A> Aal B i . R-
R—aR|e
ABR |
Fig 2.12. Left-and right-recursive ways of generating a string
* In general, the rule is that
o IfA—Aal | Aa2|... | Aan and
A- fil | 62 | ... | fm (no Bi's start with A),
then, replace by
A BIR | B2R| ... | PmR and
Z—alR|@2R|...|anR |
Exercise: Remove the left recursion in the following grammar
expr — expr + term | expr term
expr term
solution:
expr — term rest
rest + term rest | - term rest |=2.5 A TRANSLATOR FOR SIMPLE EXPRESSIONS
‘© Convert infix into postfix (polish notation) using SDT.
‘© Abstract syntax (annotated parse tree) tree vs. Conerete syntax tree
E
le
ss \
e |
: |
© Concrete syntax tree : parse tree
© Abstract syntax tree: syntax tree
© Concrete syntax : underlying grammar
Adapting the Translation Scheme
«Embed the semantic action in the production
Design a translation scheme
© Left recursion elimination and Left factoring
© Example
3) Design a translate scheme and eliminate left recursion
ESET {+} E>TOUR
ESE-T(¥} RotTCH}R
EST ty R+-TY}R
T= 0(0}|...]9€9) Ros
T= 0f0}...|9¢9
‘Translate of input string 9-5+2 : parsing and SDT
E
y
Tt print) RI
9 print’: at pint) R
Sprint's) + Trinny
2 print21 e
Result: 95-2 +Example of translator design and execution
‘A translation scheme and with left-recursion.
Tnitial specification for infix-to-postfix with left recursion eliminated
translator
expr — expr + term {printf("")y expr — term rest
expr — expr term {printf{"-")} rest —> + term {printf{"+")} rest
expr — term rest > = term {printf{"-")} rest
term — 0 {printf{"0")} rest > ©
term — I {printf{"1")} term — 0 {printf{"0")}
term > 1 {printf{"1")}
term + 9 {printf{"0"))
term — 9 {printf{"0")}
| a
termi rest
9° {prim('9')) = ~ term (prints’~")y L rest
ee oe
8° (prin(’5')) + teri { print('s')~ rest
2° { print('2")¥ |
Fig 2.13. Translation of 9 ~ 5 +2 into 95-2+,
Procedure for the Nonterminal expr, term, and rest
expe () |/cexgr > tem rsd
: :
rest () [IGiest-+ + tm pint'+I ret | ~ term print
term() (Jeter 0 pint} ~ term + 9 print}
else error()
Fig 2.14, Function for the nonterminals expr, rest, and term,Optimizer and Translator
1. expr) {
2. tormO: rest
3
4. rest) roct()
Bt ‘
6 Hlcokaread = 5°) ¢ T Wlookahead == F TT
7. (+ term(): p+): rest 0: term; B('+): goto L
| ote miookanead ==~'){ | = dice iflookahead == =!) {
2. m=": termO: 9(' rest: (=; term: p=; gato L
10._} ese Toe
nt }
12. exor0) £
13. tend
1A while(t) £
"5 isokahead == 4!) £
16 (+): torm0: p+"
17. Pelee itbookshead =") {
18. m(—'r: form: pt
10. Pelee break:
a)
2.6 LEXICAL ANALYSIS
* reads and converts the input into a stream of tokens to be analyzed by parser.
* Iexeme : a sequence of characters which comprises a single token.
* Lexical Analyzer —+Lexeme / Token — Parser
Removal of White Space and Comments
«Remove white space(blank, tab, new line etc.) and comments
Contsants
* Constants: For a while, consider only integers
+ eg) for input 31 + 28, output(token representation)?
input : 31 + 28
output: <+, >
num + token
31 28 : attribute, value(or lexeme) of integer token num,
Recognizing
«Identifiers
© Identifiers are names of variables, arrays, functions,
A grammar treats an identifier as a token.
eg) input : count = count + increment;
output : <=, > <+, > ;
Symbol table
tokens | attributes(lexeme)
id count
id increment
+ Keywords are reserved, ie. they cannot be used as identifiers.‘Then a character string forms an identifier only if it is no a keyword.
‘© punctuation symbols
© operators : +
Interface to lexical analyzer
read Pass
character token and
lexical its attributes
[> parser
analyzer
push back
character
Fig 2.15. Inserting a lexical analyzer between the input and the parser
A Lexical Analyzer
uses getchar() returns token
to read character lexan() to caller
lexical
Pushes back © using — analyzer
ungete(c, stdin)
sets global variable
tokenval to attribute value
Fig 2.16, Implementing the interactions in Fig. 2.15.
* e=getchear(); ungete(c,stdin);
* token representation
0 #define NUM 256
+ Function lexan()
eg) input string 76+ a
input , output(retumed value)
16 NUM, tokenval=76 (integer)
+ +
A id, tokeval="a"
‘A way that parser handles the token NUM returned by laxan()
© consider a translation scheme
factor — ( expr)
| num { print(num.value) }
fédefine NUM 256factor() {
if{lookahead =="(") {
match(’(); exor(); mateh("));
} else if (lookahead = NUM) {
printf(" %f ",tokenval); match(NUM);
} else error();
}
The implementation of function lexan
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)
21)
22)
23)
24)
25)
include
include
int lino = 1;
int tokenval = NONE;
int lexan) {
int t;
while(1) {
t= getchar();
f(t" |);
else if (t—"w' )lineno +=1;
else if (isdigit(®) {
tokenval = t
t= getchar();
while (isdigit(®)) {
tokenval = tokenval*10 + t
t=getchar);
}
ungete(t.stdin);
retunr NUM;
} else {
tokenval = NONE;
return t;
}
}
2.7 INCORPORATION A SYMBOL TABLE
The symbol table interface, operation, usually called by parser.
© insert(s,t): input s: lexeme
tt token
output index of new entry
© lookup(s): input s: lexeme
output index of the entry for string s, or 0 if's is not found in the symbol
table.
Handling reserved keywords
1
Inserts all keywords in the symbol table in advance,
ex) insert("div", div)insert("mod", mod)
2. while parsing
whenever an identifier s is encountered.
if (lookup(s)'s token in {keywords} ) s is for a keyword; else s is fora identifier;
© example
preset
insert("div",div);
insert("mod", mod);
© while parsing
lookup("count")=>0 insert("countid);,
lookup("i") =>0 insert("i"id);
lookup("i") =>4, id
Hokup("div")=>1 div
ARRAY symtable
Jexpt ken
ttributes
div
mod
id
id
o [4 Bosc |o|u|n|t fos i Bos)
[e
ARRAY lexenes
Fig 2.17. Symbol table and array for storing strings.
2.8 ABSTRACT STACK MACHINE
© An abstract machine is for intermediate code generation/execution.
© Instruction classes: arithmetic / stack manipulation / control flow
* 3 components of abstract stack machine
1) Instruction memory : abstract machine code, intermediate code(instruction)
2) Stack
3) Data memory
* An example of stack machine operation
© fora input (5~a)*b, intermediate codes : push 5 rvalue 2Instruction memo:
push 5
waue 2
+ To
Taue 3 2 [1 Ja
5 3 L7_]e
Stack
iis] eo] se] 4 sie
L-value and r-value
# values a : address of location a
* values a : if is location, then content of location a
if'a is constant, then value a
+b;
Walue 92 r value 5 = 5 rvalue of b => 7
+ eg)a:
Stack Manipulation
Some instructions for assignment operation
push v : push v onto the stack
rvalue a : push the contents of data location a.
Ivalue a : push the address of data location a.
pop : throw away the top element of the stack.
'=! assignment for the top 2 elements of the stack.
copy : push a copy of the top element of the stack.
Translation of Expressions
* Infix expression(IE) — SDD/SDTS —> Abstact macine codes(ASC) of postfix expression for
stack machine evaluation,
eg)
IE: a+b, (PE: ab +) IC: rvalue a
rvalue b
+
day = (1461 * y) div 4 + (153 *m+2)div5 +d
( day 1462 y * 4 div 153 m*245div+d+=)
= 1) Ivalue day 6) div 11) push 16):=
2) push 1461 7) push 15312) div
3)rvaluey 8) rvaluem 13) +
4* 9)push2 14) valued
5)push4 10) + 15) +
+ A translation scheme for assignment-statement into abstract astack machine code e can be
expressed formally In the form as follows
stmt — id = expr
{ stmt.t :=lvalue' | id.lexeme || expr.t |
eg) day -atb = Ivalue day rvalue a rvalue b +=Control Flow
* 3 types of jump instructions
Absolute target location
© Relative target location( distance :Current +Target)
© Symbolic target location(i.e. the machine supports labels)
© Control-flow instructions:
label a: the jump’s target a
goto a: the next instruction is taken from statement labeled a
gofalse a: pop the top & ifit is 0 then jump to a
gotrue a: pop the top & if it is nonzero then jump to a
halt : stop execution
Translation of Statements
‘* Translation scheme for translation if-statement into abstract machine code.
stmt + if expr then stmt!
{out = newlabell)
stmt.t := expr-t || gofalse’ out || stmt!-t ||'label' out }
le Wnite
Tabel test
code for expr code for expr
gofalse out gofalse out
code for stmt code for stmt
abel out goto test
abel out
Fig 2.18. Code layout for conditional and while statements.
‘Translation scheme for while-statement ?
Emitting a Translation
© Semantic Action(Tranaslation Scheme):
1. stmt if
expr { out := newlabel; emit(‘gofalse’, out) }
then
stmt] { emit(label’, out) }
2. stmt — id { emit(‘Ivalue’, id.lexeme) }
expr { emit(:~!) }
3. stmt i
expr { out := newlabel; emit(‘gofalse’, out) }
then
stmt! { emit(label’, out) ; outl := newlabel; emit(‘goto', out’ 1); }else
stmt2 { emit(label’, outl) ; }
iflexpr—false) goto out
stmt] goto out]
out : stmt2
out!
bottom
Implementation
‘© procedure stmt()
© var test,out:integer;
# begin
© end
if lookahead ~ id then begin
= emit(‘Ivalue',tokenval); match(id);
match(':="); expr(); emit('=");
end
else if lookahead
match(‘if);
expr;
out = newlabel();
emit(‘gofalse’, out);
match(‘then’);
stmt;
emit(‘label’, out)
if then begin
end
else error();
Control Flow with Analysis
© if E1 or E2 then S vs if El and E2 then S
or
El and E;
if El then true else E2
if El then E2 else false
© The code for El or E2.
© The full code for
Codes for El Evaluation result: e1
copy
gotrue OUT
Pop
Codes for E2 Evaluation result: 2
label OUT
El or E2 then
codes for El
copy
gotrue OUTI
pop
codes for E2
label OUTIwart ~ list cot
0 gofalse OUT2 list ~ expr ; list
° code for S le
© label OUT2 expr = expr + term | pris") }
* Exercise: How about if E1 and E2 then $; fe ee
© ifEl and E2 then S1 else $2; erm term = factor prla0}
term / factor —{ print’) }
| term div factor { print(“DIv") }
2.9 Putting the techniques together! term mod factor { print(’MoD") }
# infix expression = postfix expression factor
eg) id+(id-id)*num/id = id id id - num * id / feos Seed
i i { priatiJeseme) }
num { priny(oum value) |
Description of the Translator
* Syntax directed translation scheme
(SDTS) to translate the infix expressions
into the postfix expressions,
Fig 2.19. Specification for infix-to-posttix translation
Structure of the translator,
infix expressions
=e
eymbol..c ee |
postin pressions
Fig 2.19. Modules of infix to postfix translator.
‘© global header file "header.h”
‘The Lexical Analysis Module lexer.c
© Description of tokens
+-*/DIV MOD () ID NUM DONELEXEME TOKEN Artripute VALUE
white space
sequence of digits NUM numeric value of sequence
av viv
mod —— MoD
other sequences of a letter
then letters and digits ID index into symtable
end-of-file character . DONE
any other character ..
that character
NONE
Fig 2.20. Description of tokens,
‘The Parser Module parser.c
SDTS
|| — left recursion elimination
New SDTS
start - list eof
start
list
expr
soreerge
smoreters
list > expr 5 list
le
expr expr + term —{ print 's') }
| exor - term { print") }
term ~ term + factor | prints")
| term / factor { print’) }
| term div factor { print(*D2v") }
| term mod factor print(*moD") }
| factor
factor > ( expr)
1
| num
= list cof
= expr
le
= term soreenpe
+ term prints) } woreenpe
[= term print('=") } aoceerse
factor morstere
fist
+ factor {print '«') }noreters
| 7 factor { priny(’'/") | mecetera
| div factor { print’ Drv") } woceters
| mod factor { print’ moo’) } sereters
le
factor ( expr)
A print teneme) |
{ print(num.value) }
( print (ia.texemey
( print(wum.vatue) }
Fig 2.20. Specification for infix to postfix translator & syntax directed translation scheme after
eliminating left-recursion.The Emitter Module emitter.c
emit (t,tval)
The Symbol-Table Modules symbol.c and init.c
Symbol.c
data structure of symbol table Fig 2.29 p62
insert(s,t)
lookup(s)
The Error Module error.c
Example of execution
input 12 div 5 +2
output 12
5
div
2
43. Lexical Analy
3.1 OVER VIEW OF LEXICAL ANALYSIS
© To identify the tokens we need some method of describing the possible tokens that can
appear in the input stream, For this purpose we introduce regular expression, a notation
that can be used to describe essentially all the tokens of programming language.
* Secondly , having decided what the tokens are, we need some mechanism to recognize
these in the input stream. This is done by the token recognizers, which are designed using
transition diagrams and finite automata.
3.2 ROLE OF LEXICAL ANALYZER:
‘The LA is the first phase of a compiler. It main task is to read the input character and produce as
output a sequence of tokens that the parser uses for syntax analysis.
soo] ata yZER PARSER
SEMEOL
Fig. 3.1: Role of Lexical analyzer
Upon receiving a ‘get next token’ command form the parser, the lexical analyzer reads
the input character until it can identify the next token. The LA return to the parser representation
for the token it has found. The representation will be an integer code, if the token is a simple
construct such as parenthesis, comma or colon.
LA may also perform certain secondary tasks as the user interface. One such task is
striping out from the source program the commands and white spaces in the form of blank, tab
and new line characters. Another is correlating error message from the compiler with the source
program.
3.3 TOKEN, LEXEME, PATTERN:
Token: Token is a sequence of characters that can be treated as a single logical entity.
Typical tokens are,
1) Identifiers 2) keywords 3) operators 4) special symbols 5)constants
Pattern: A set of strings in the input for which the same token is produced as output. This set
of strings is described by a rule called a pattern associated with the token.
Lexeme: A lexeme is a sequence of characters in the source program that is matched by the
pattern for a tokenToken Texeme pattern
const const const
F 7 iF
relation => ‘Of = of = or <> or = oF letter
followed by letters & digit
7 Pi ‘any aumeric constant
aaa 314 ‘any character biw “and “except”
feral "core pate
Fig. 3.2: Example of Token, Lexeme and Pattern
3.4, LEXICAL ERRORS:
Lexical errors are the errors thrown by your lexer when unable to continue. Which means that
there's no way to recognise a /exeme as a valid token for you lexer. Syntax errors, on the other
side, will be thrown by your scanner when a given set of already recognised valid tokens don't
match any of the right sides of your grammar rules. simple panic-mode error handling system
requires that we return to a high-level parsing function when a parsing or lexical error is
detected.
Error-recovery actions are:
i. Delete one character from the remaining input
ii, Insert a missing character in to the remaining input.
iii, Replace a character by another character.
iv. Transpose two adjacent characters.
3.5, REGULAR EXPRESSIONS
Regular expression is a formula that describes a possible set of string. Component of regular
expression.
x the character x
: any character, usually accept a new line
Ixyz] any of the characters X, y, % «+
R? a R or nothing (optionally as R)
R* zero or more occurrence:
Rt one or more occurrences
RIR2 an RI followed by an R2
RIRL either an RI or an R2.
A token is either a single string or one of a collection of strings of a certain type. If we view the
set of strings in each token class as an language, we can use the regular-expression notation to
describe tokens,
Consider an identifier, which is defined to be a letter followed by zero or more letters or digits.
In regular expression notation we would write.
Identifier = letter (letter | digit)*Here are the rules that define the regular expression over alphabet .
© is a regular expression denoting { € }, that is, the language containing only the empty
string,
+ Foreach ‘a’ in 5, is a regular expression denoting { a }, the language with only one string
consisting of the single symbol ‘a’ .
© IfRand S are regular expressions, then
(R) | (S) means L(t) U Ls)
R.S means L(®).L(S)
R* denotes L(r*)
3.6, REGULAR DEFINITIONS
For notational convenience, we may wish to give names to regular expressions and to define
regular expressions using these names as if they were symbols.
Identifiers are the set or string of letters and digits beginning with a letter. The following regular
definition provides a precise specification for this class of string,
Example-l,
Ab*[cd? Is equivalent to (a(b*)) | (e(€?))
Pascal identifier
Letter A|B| .....)Z[a[B feos 2
Digits -0|1/2)....|9
Id letter (letter / digit)*
Recognition of tokens:
‘We learn how to express pattern using regular expressions. Now, we must study how to take the
patterns for all the needed tokens and build a piece of code that examins the input string and
finds a prefix that is a lexeme matching one of the patterns.
Stmt if expr then stmt
| If expr then else stmt
le
Expr —term relop term
| term
Term id
‘number
For relop ,we use the comparison operations of languages like Pascal or SQL where = is “equals”
and <> is “not equals” because it presents an interesting structure of lexemes.
The terminal of grammar, which are if, then , else, relop ,id and numbers are the names of tokens
as far as the lexical analyzer is concerned, the patterns for the tokens are described using regular
definitions.
digit — [0,9]
digits digit
number —sdigit( digit) (e.[+-]?digits)?
letter > [A-Z,a-z]
id letter(letter/digit)*
if it
then thenelse else
relop >< [> |=
=|<>
In addition, we assign the lexical analyzer the job stripping out white space, by recognizing the
“token” we defined by:
WS — (blank/tab/newline)#
Here, blank, tab and newline are abstract symbols that we use to express the ASCII characters of
the same names. Token ws is different from the other tokens in that when we recognize it, we do
not return it to parser ,but rather restart the lexical analysis from the character that follows the
white space . It is the following token that gets returned to the parser.
Lexeme | Token Name | Attribute Value
‘Any WS = =
if if =
then then 5
else else =
Any id Id Pointer to table entry
‘Any number [number | Pointer to table entry
< relop LT
= relop LE
= relop EQ
= relop NE
3.7. TRANSITION DIAGRAM:
Transition Diagram has a collection of nodes or circles, called states. Each state represents a
condition that could occur during the process of scanning the input looking for a lexeme that
matches one of several patterns
Edges are directed from one state of the transition diagram to another. each edge is labeled by a
symbol or set of symbols.
If we are in one state s, and the next input symbol is a, we look for an edge out of state s labeled
by a. if we find such an edge we advance the forward pointer and enter the state of the transition
diagram to which that edge leads.
Some important conventions about transition diagrams are
1. Certain states are said to be accepting or final .These states indicates that a lexeme has
been found, although the actual lexeme may not consist of all positions b/w the lexeme
Begin and forward pointers we always indicate an accepting state by a double circle.
2. Inaddition, if it is necessary to return the forward pointer one position, then we shall
additionally place a * near that accepting state.
3. One state is designed the state ,or initial state ., it is indicated by an edge labeled “start”
entering from nowhere .the transition diagram always begins in the state before any input
symbols have been used,RenrtepLT)
Pace GE
Fig. 3.3: Transition diagram of Relational operators
As an intermediate step in the construction of a LA, we first produce a stylized flowchart,
called a transition diagram, Position in a transition diagram, are drawn as circles and are
called as states.
letter or digit
return (gettoken(),installID())
Fig. 3.4: Transition diagram of Identifier
The above TD for an identifier, defined to be a letter followed by any no of letters or digits.A
sequence of transition diagram can be converted into program to look for the tokens specified
by the diagrams. Each state gets a segment of code.
3.8, FINITE AUTOMATOD
© A recognizer for a language is a program that takes a string x, and answers “yes” if x is a
sentence of that language, and “no” otherwise.
‘We call the recognizer of the tokens as a finite automaton.
A finite automaton ean be: deterministic (DFA) ot non-deterministic (NFA)
‘This means that we may use a deterministic or non-deterministic automaton as a lexical
analyzer.
Both deterministic and non-deterministic finite automaton recognize regular sets.
Which one?
deterministic ~ faster recognizer, but it may take more space
— non-deterministic — slower, but it may take less space
Deterministic automatons are widely used lexical analyzers.
‘* First, we define regular expressions for tokens; Then we convert them into a DFA to get a
lexical analyzer for our tokens.3.9. Non-Deterministic Finite Automaton (NFA)
‘* Anon-deterministie finite automaton (NFA) is a mathematical model that consists of:
S -asct of states
© &~asset of input symbols (alphabet)
© move -a transition function move to map state-symbol pairs to sets of states.
sO a start (initial) state
F- asset of accepting states (final states)
‘+ ©- transitions are allowed in NFAs. In other words, we can move from one state to
another one without consuming any symbol.
* ANPA accepts a string x, if and only if there is a path from the starting state to one of
accepting states such that edge labels along this path spell out x.
Example:
Transivon Graph
‘Transition Function:
«>a
0 [on | om
ito le
2,0 [oe
‘The language recognized by thie NFA i (alb)tab
3.10, Deterministic Finite Automaton (DFA)
‘© A Deterministic Finite Automaton (DFA) is a special form of a NEA.
© No state has e- transition
‘* For cach symbol a and state s, there is at most one labeled edge a leaving s. i. transition
function is from pair of state-symbol to state (not set of states)
Example:“The DFA to recognize the language (ab) ab is as follows.
0 fe the start state 20
{P) is the set of fina states F
E = fb)
S=101.2)
‘Transition Function:
ra
o[:7|o
1 [2
cea | eee |
Note thatthe entries in this function ae single value and not set of values (unlike NFA).
3.11. Converting RE to NFA.
This is one way to convert a regular expression into a NFA.
There can be other ways (much efficient) for the conversion.
Thomson’s Construction is simple and systematic method.
It guarantees that the resulting NFA will have exactly one final state, and one start state
Construction starts from simplest parts (alphabet symbols).
To create a NFA for a complex regular expression, NFAs of its sub-expressions are
combined to create its NFA.
To recognize an empty string ¢:
N(t1) and N72) are NFAs for regular expressions rl and 12.© For regular expression rl 12
Here, final state of N(rl) becomes the final state of N(rl12)..
© For regular expression r*
Example:
For a RE (alb) * a, the NFA construction is shown below.
a +0+O e700
PO am ORO
b +O-+O Ono”
3.12. Converting NFA to DFA (Subset Construction)
‘We merge together NFA states by looking at them from the point of view of the input characters:
‘+ From the point of view of the input, any two states that are connected by an —transition
may as well be the same, since we can move from one to the other without consuming
any character. Thus states which are connected by an -transition will be represented by
the same states in the DFA.
‘* If itis possible to have multiple transitions based on the same symbol, then we can regard
2 transition on a symbol as moving from a state to a set of states (ie. the union of all those
states reachable by a transition on the current symbol). Thus these states will be
combined into a single DFA state
‘To perform this operation, let us define two functions:
‘* The -closure function takes a state and returns the set of states reachable from it based on
(one or more) -transitions. Note that this will always include the state itself. We should be
able to get from a state to any state in its -closure without consuming any input.
‘© The function move takes a state and a character, and retums the set of states reachable by
one transition on this character.We can generalise both these functions to apply to sets of states by taking the union of the
application to individual states.
For Example, if A, B and C are states, move({A,B,C},’a’) = move(A,’a') move(B,‘a')
move(
The Subset Construction Algorithm is a follows:
put e-closure( {s0}) as an unmarked state into the set of DEA (DS)
while (there is one unmarked $1 in DS) do
begin
mark SI
for each input symbol a do
begin
$2 < s-closure(move(S1,a))
if (S2 is not in DS) then
add S2 into DS as an unmarked state
‘ransfune[S1,a] — S2
end
end
‘+ astate S in DS is an accepting state of DFA if a state in S is an accepting state of NFA
‘© the start state of DPA is e-closure({s0})
3.13. Lexical Analyzer Generator
AGS) >] cope [Pee
ol c > aout
lexyyic Compiler
3.18, Lex specifications:
A Lex program (the . file ) consists of three parts:
declarations
%%
translation rules
%%
auxiliary procedures1. The declarations section includes declarations of variables,manifest constants(A manifest
constant is an identifier that is declared to represent a constant e.g. # define PIE 3.14),
and regular definitions
2. The translation rules of a Lex program are statements of the form
pl faction 1}
2 {action 2}
3 {action 3}
‘Where, each p is a regular expression and each action is a program fragment describing
what action the lexical analyzer should take when a pattern p matches a lexeme. In Lex
the actions are written in C.
3. The third section holds whatever auxiliary procedures are needed by the
actions. Alternatively these procedures can be compiled separately and loaded with the
lexical analyzer.
Note: You can refer to a sample lex program given in page no. 109 of chapter 3 of the book:
Compilers: Principles, Techniques, and Tools by Aho, Sethi & Ullman for more clarity.
3.19, INPUT BUFFERING
‘The LA scans the characters of the source pgm one at a time to discover tokens. Because of large
amount of time can be consumed scanning characters, specialized buffering techniques have been
developed to reduce the amount of overhead required to process an input character.
Buffering techniques:
1. Buffer pairs
2. Sentinels
The lexical analyzer scans the characters of the source program one a t a time to discover tokens
Often, however, many characters beyond the next token many have to be examined before the
next token itself can be determined. For this and other reasons, it is desirable for thelexical
analyzer to read its input from an input buffer. Figure shows a buffer divided into two haves of,
say 100 characters each, One pointer marks the beginning of the token being discovered. A look
ahead pointer scans ahead of the beginning point, until the token is discovered .we view the
position of each pointer as being between the character last read and thecharacter next to be read.
In practice each buffering scheme adopts one convention either apointer is at the symbol last
read or the symbol it is ready to read.
t
uw
Token beginnings _ 1ook ahead pointer
Token beginnings look ahead pointerThe distance which the lookahead pointer may have to
travel past the actual token may belarge. For example, in a PL/I program we may see:DECALRE (ARG1, ARG2... ARG n) Without knowing whether DECLARE is a keyword or an
array name until we see the character that follows the right parenthesis. In either case, the token
itself ends at the second E. If the look ahead pointer travels beyond the buffer half in which it
began, the other half must be loaded with the next characters from the source file. Since the
buffer shown in above figure is of limited size there is an implied constraint on how much look
ahead can be used before the next token is discovered. In the above example, ifthe look ahead
traveled to the left half and all the way through the left half to the middle, we could not reload
the right half, because we would lose characters that had not yet been groupedinto tokens. While
we can make the buffer larger if we chose or use another buffering scheme,we cannot ignore the
fact that overhead is limited.4.1 ROLE OF THE PARSER
Parser for any grammar is program that takes as input string w (obtain set of strings tokens
from the lexical analyzer) and produces as output either a parse tree for w , if w is a valid
sentences of grammar or error message indicating that _w is not a valid sentences of given
grammar. The goal of the parser is to determine the syntactic validity of a source string is
valid, a tree is built for use by the subsequent phases of the computer. The tree reflects the
sequence of derivations or reduction used during the parser. Hence, it is called parse tree. If
string is invalid, the parse has to issue diagnostic message identifying the nature and cause of
the errors in string. Every elementary subtree in the parse tree corresponds to a production of
the grammar.
There are two ways of identifying an elementry sutree:
1, By deriving a string from a non-terminal or
2. By reducing a string of symbol to a non-terminal.
‘The two types of parsers employed are:
a. Top down parser: which build parse trees from top(root) to
bottom(leaves)
b. Bottom up parser: which build parse trees from leaves and work up the
root
lexical |__ token parser | parse] restof | intermediate
program "| analyzer tree ”] frontend | representation
get next token
“
symbol
table
Fig . 4.1: position of parser in compiler model.
4.2 CONTEXT FREE GRAMMARS
Inherently recursive structures of a programming language are defined by a context-free
S).
Here , V is finite set of terminals (in our case, this will be the set of tokens)
Grammar. In a context-free grammar, we have four triples G¢V,
T is a finite set of non-terminals (syntactic-variables)Pis a finite set of productions rules in the following form
A— a where A is a non-terminal and a is a string of terminals and non-terminals
(including the empty string)
S isa start symbol (one of the non-terminal symbol)
L(G) is the language of G (the language generated by G) which is a set of sentences.
A sentence of L(G) is a string of terminal symbols of G. IFS is the start symbol of G then
is a sentence of L(G) iff $ =o where « is a string of terminals of G. If G is a context-
free grammar, L(G) is a context-free language. Two grammar G, and Gy are equivalent, if
they produce same grammar.
Consider the production of the form $ =?a, If « contains non-terminals, it is called as a
sentential form of G. If « does not contain non-terminals, itis called as a sentence of G.
4.2.1 Derivations
In general a derivation step is
AB = ay is sentential form and if there is a production rule A—y in our grammar.
where «and Bare arbitrary strings of terminal and non-terminal symbols al =¥a2 =... =>
«an (an derives from «lor al derives an ). There are two types of derivaion
1 Ateach derivation step, we can choose any of the non-terminal in the sentential form of G
for the replacement.
2. If we always choose the left-most non-terminal in each derivation step, this derivation is
called as left-most derivation.
E>E+E/E-EIE*E/E/E|-E
E(B)
Esid
Leftmost derivation
EE+E
SE * E+E Sid* E+E id Mid+Eid*idsid
The string is derive from the grammar w= id*id+id, which is consists of all terminal
symbols
Rightmost derivation
ESE+E
— EE * EE+ BtidEtid*ididtid*id
Given grammar G : E> E+E | B*E | (E)|-E lid
Sentence to be derived : ~ (id tid)LEFTMOST DERIVATION RIGHTMOST DERIVATION
Es-E E+-E
E+-(E) E+-(E)
E—- (E+E) E-- (E+E)
E-- (id+E) E--(Esid)
Es. (idtid ) E—- (idtid )
String that appear in leftmost derivation are called left sentinel forms.
* String that appear in rightmost derivation are called right sentinel forms.
Sentinels:
© Given a grammar G with start symbol S, if $ — a, where a may contain non-
terminals or terminals, then a is called the sentinel form of G.
Yield or frontier of tree:
‘© Each interior node of a parse tree is a non-terminal. The children of node can be a
terminal or non-terminal of the sentinel forms that are read from left to right. The
sentinel form in the parse tree is called yield or frontier of the tree.
4.2.2 PARSE TREE
‘+ Inner nodes of a parse tree are non-terminal symbols.
¢ leaves of a parse tree are terminal symbols.
‘© Apparse tree can be seen as a graphical representation of a derivation.
Eval
bok YA A
me = 7
7" |
CAD
el ely
aie
edit “IN
Cad
ll li
ee
|
h
Ambiguity:
A grammar that produces more than one parse for some sentence is said to be ambiguous
grammar.Example : Given grammar G : E> E+E |E*E|(E)|- Elid
The sentence id+id*id has the following two distinct leftmost derivations:
ESEtE ESE*E
E> id+E EE
Es id+E*E Eid+E*E
Es id+id*E Es id+id* EB
Es id+id*id Es id+id*id
The two corresponding parse trees are
E E
ae | A ZAIN
BE + 8B EB * &
] aN aI™ |
id H *) EF E+ E id
id id id id
Example:
To disambiguate the grammar E + E+E | E*E | EME | id | (
‘we can use precedence of
operators as follows
» (right to left)
1,* (left to right)
“ot (left to right)
We get the following unambiguous grammar:
ES
e4T |T
TOT FIF
F>GRIG
Grid)
Consider this example, G: stmt — if expr then stmt |if expr then stmt else stmt | other
This grammar is ambiguous since the string if El then if E2 then SI else S2 has the
followingTwo parse trees for leftmost derivation
then stmt ele simt
To eliminate ambiguity, the following grammar may be used:
stmt + matched_stmt | unmatched_stmt
‘matched_simt — if expr then matched_stmt else matched_stmt | other
unmatched_stmt —> if expr then stmt lif expr then matched_stmt else unmatched_stmr
Eliminating Left Recursion:
A grammar is said to be left recursive if it has a non-terminal A such that there is a derivation
“Aa. for some string a, Top-down parsing methods cannot handle left-recursive grammars.
Hence, left recursion can be eliminated as follows:If there is a production A — Aa | f it can be replaced with a sequence of two
productions
As BA’
Ai aA’ le
Without changing the set of strings derivable from A.
Example : Consider the following grammar for arithmetic expressions:
ESEsTIT
TO TFIF
F> @®)lid
First eliminate the left recursion for E as
ESTE
ES 4TE' le
Then eliminate for T as
Torr
ToT le
Thus the obtained grammar after eliminating left recursion is,
ESTE
E+ 4TE' le
torr
Tote le
F> @®)lid
Algorithm to eliminate left recursion:
1. Arrange the non-terminals in some order Al, A2... An,
2. for i:= 1 ton do begin
for j= | toi-1 do begin
replace each production of the form Ai —» Aj-y
by the productions Ai 61 y 182y1...13k7
where Aj— 811521... 18k are all the current Aj-productions;
end
climinate the immediate left recursion among the Ai-productions
endLeft factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing. When it is not clear which of two alternative productions to use to expand
@ non-terminal A, we can rewrite the A-productions to defer the decision until we have seen
enough of the input to make the right choice.
If there is any production A — aB1 | a2, it can be rewritten as
Asay’
Ay pI p2
Consider the grammar , G : $ + iE1S 1iE1SeS la
Eob
Leff factored, this grammar becomes
S—iFISS' la
SeSle
Esb
TOP-DOWN PARSING
It-can be viewed as an attempt to find a left-most derivation for an input string or an
attempt to construct a parse tree for the input starting from the root to the leaves.
Types of top-down parsing :
1. Recursive descent parsing,
2. Predictive parsing
1, RECURSIVE DESCENT PARSING
> Recursive descent parsing is one of the top-down parsing techniques that uses a set of
recursive procedures to scan its input.
> This parsing method may involve backtracking, that is, making repeated scans of the
input.
Example for backtracking :
Consider the grammar G : S—> cAd
Asabla
and the input string w=cad,
The parse tree can be constructed using the following top-down approach :
Step]:
Initially create a tree with single node labeled S. An input pointer points (0 ‘c’, the first
symbol of w. Expand the tree with the production of S,Step2:
The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second
symbol of w ‘a’ and consider the next leaf ‘A’. Expand A using the first alternative.
s
~ IX
4 | \
e A d
/ ‘
\
a b
Step3:
The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input
pointer to third symbol of w ‘d’. But the third leaf of tree is b which does not match with the
input symbol d.
Hence discard the chosen production and reset the pointer to second position. This is called
backtracking.
Step4:
Now try the second alternative for A.
JN
a
Now we can halt and announce the successful completion of parsing.Example for recursive decent parsing:
A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop.
Hence, elimination of left-recursion must be done before parsing.
Consider the grammar for arithmetic expressions
ESETIT
TOTEIF
F (lid
After eliminating the left-recursion the grammar becomes,
ESTE
BE’ +TE'le
TO
PFT le
Fo @)lid
Now we can write the procedure for grammar as follows:
Recursive procedure:
Procedure E()
begin
TO:
EPRIME();
End
Procedure
If input_symbol="+' then
ADVANCE();
TO:
EPRIME();
end
Procedure T())
begin
FO;
‘TPRIME( );
EndProcedure TPRIME( )
begin
If input_symbol="*" then
ADVANCE();
FO:
‘TPRIME( }
end
Procedure F()
begin
If input-symbol="id’ then
ADVANCE);
else if input-symbol="(‘ then
ADVANCE();
EO:
else if input-symbol=")' then
ADVANCE();
end
else ERROR();
Stack implementation:
PROCEDURE INPUT STRING
EO pridid
TO) idvidtid
FO idvidid
ADVANCE() idsidid
TPRIME() ideidid
EPRIMEQ) idgid"id
[ADVANCE —sifidvid*idSC*
TO ideidid
FO idvidid
ADVAN idtidzid
TPRIMEQ) idtideid
ADVANCE() idtideid
FO idtidtid
ADVANCE) ididid
TPRIME() idnid idPREDICTIVE PARSING
V Predictive parsing is a special case of recursive descent parsing where no
backtracking is required.
Y The key problem of predictive parsing is to determine the production to be applied
for a non-terminal in case of alternatives.
Non-recursive predictive parser
INPUT ay +]e]s
STACK
Predictive parsing program une
| 4 “f=
Parsing Table M
The table-driven predictive parser has an input buffer, stack, a parsing table and an output
stream.
Input buffer:
It consists of strings to be parsed, followed by $ to indicate the end of the input string,
Stack:
It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack.
Initially, the stack contains the start symbol on top of S.
Parsing table:
Itis a two-dimensional array MIA, a], where ‘A’ is a non-terminal and ‘a’ is a terminal.
Predictive parsing program:
The parser is controlled by a program that considers X, the symbol on top of stack, and a, the
current input symbol. These two symbols determine the parser action, There are three
possibilities:
1. 4x
=, the parser halts and announces successful completion of parsing
2, IfX=a#5, the parser pops X off the stack and advances the input pointer to
the next input symbol.
3. IfX is a non-
erminal , the program consults entry MLX, a] of the parsing table
M. This entry will either be an X-production of the grammar or an error entry.If MIX, a] = (X + UVW],the parser replaces X on top of the stack by UVW
MIX, a
Algorithm for nonrecursive predictive parsing:
error, the parser calls an error recovery routine.
Input : A string w and a parsing table M for grammar G.
Output : If w is in L(G), a leftmost derivation of w; otherwise, an error indication.
Method : Initially, the parser has $5 on the stack with S, the start symbol of G on top, and w$
in the input buffer. The program that utilizes the predictive parsing table M to produce a parse
for the input is as follows:
set ip to point to the first symbol of wS;
repeat
let X be the top stack symbol and a the symbol pointed to by ip;
if X is a terminal or $ then
it
=athen
pop X from the stack and advance ip
else errori)
else /* X is anon-terminal */
if MIX, a] =X SY/Y2 ... Yk then begin
pop X from the stack;
push Yk, Yio
‘output the production X + YI ¥2... Yk
. + 7 onto the stack, with ¥7 on top;
end
‘else error)
until X= $
Predictive parsing table construction:
The construction of a predictive parser is aided by two functions associated with a grammar
G:
1. FIRST
2. FOLLOW
Rules for first):
1. If X is terminal, then FIRST(X) is (X}
2. IfX —+ cis a production, then add ¢ to FIRST(X).
3. IfX is non-terminal and X — aa is a production then add a to FIRST(X),4, If X is non-terminal and X + Y; ¥p...¥% is a production, then place a in FIRST(X) if for
some i, a is in FIRST(Y0), and «is in all of FIRST(Y/),...,FIRST(Vi-1); that is, ¥I,....¥ied
=> e. Ife is in FIRST(Y)) for all j=1,2,...k, then add € to FIRST(X)..
Rules for follow( ):
1. If Sis start symbol, then FOLLOW(S) contains S.
2. If there is a production A — aBB, then everything in FIRST(P) except « is placed in
follow(B).
3. If there is a production A — aB, or a production A — aBB where FIRST(B) contains e, then
everything in FOLLOW(A) is in FOLLOW(B).
Algorithm for construction of predictive parsing table:
Input : Grammar G
Output : Parsing table M
Method
1. For each production A —> a of the grammar, do steps 2 and 3.
2. For each terminal a in FIRST(a), add A — a to MIA, al.
3. If cis in FIRST(q), add A — a to MIA, b] for each terminal b in FOLLOW(A). If cis in
FIRST(q) and $ is in FOLLOW(A) , add A — a to MIA, $).
4, Make each undefined entry of M be error.
Example:
Consider the following grammar :
TOT FIF
F— @®)lid
After eliminating left-recursion the grammar is
(id)
FIRST(E’) =(+,€ }
FIRST(T) = { (, id}
FIRST(T) = (*,£ )
FIRST(F) = { (, id }
Follow( ):
FOLLOWE)
FOLLOW(E’
$))
{S,)}FOLLOW(T) = { +,$,))
FOLLOW(T” +5))
FOLLOW(F) = {+,*,$.)}
Predictive parsing table:
NON- id + . ( ) s
E ETE’ ETE
E Por Eos | Boe
T | Torr Tor
7 Poo [Por Toe | Poe
F Foi F>@
Stack implementation:
stack Input Output
SE idtid*id $
SET ididtid $ ETE
SETF idtid*id $ TIT
SET id idrid*id S Foid
SET’ sidtid $
© HPS Toe
SET+ +idtid $ E+E
3ET was
SETF idtid S eer
SET id idtid $ Foid
SET’ “dS
Te dS P4+er
SETF id
SET id id$ Frid
eT 5
SE’ s Toe
$ $ Boe
LL() grammar:
The parsing table entries are single entries. So cach location has not more than one entry.
This type of grammar is called LL(1) grammar,
Consider this following grammar:
S —iEXS |iEISeS |
Esbfier eliminating left factoring, we have
SEIS’ la
Sle
Eb
To construct a parsing table, we need FIRST() and FOLLOWO for all the non-terminals.
FIRST(S) = (i, a}
FIRST(S") = {¢, © }
7 ={b)
FOLLOW(S)
FOLLOWS"
FOLLOW)
Se)
Parsing table:
NOX = a = 7 t 5
TERMINAL
S Soe Sa iESS
# ses Soe
Soe
E Eb
Since there are more than one production, the grammar is not LL(1) grammar.
Actions performed in predictive parsing:
1. Shift
2. Reduce
3. Accept
4, Error
Implementation of predictive parser:
1. Elimination of left recursion, left factoring and ambiguous grammar.
2. Construct FIRST() and FOLLOW( for all non-terminals.
3. Construct predictive parsing table.
4, Parse the given input string using stack and parsing table.
BOTTOM-UP PARSING
Constructing a parse tree for an input string beginning at the leaves and going towards the
root is called bottom-up parsing.
A general type of bottom-up parser is a shift-reduce parser.
SHIFT-REDUCE PARSING
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree for
an input string beginning at the leaves (the bottom) and working up towards the root (the
top),
Example:
Consider the grammar:
S— aABe
Ax Abc lb
Bod
The sentence to be recognized is abbede.REDUCTION (LEFTMOST) RIGHTMOST DERIVATION
abbede (A> b) S—>aABe
aAbede (A —> Abe) — adde
aAde (Bd) — aAbede
aABe (S— aABe) — abbede
Ss
The reductions trace out the right-most derivation in reverse.
Handles
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Example:
Consider the grammar:
And the input string id;+ids*id
The rightmost derivation is :
> Etids*ids
> idyFids*ids
In the above derivation the underlined substrings are called handles.
dle prur
A rightmost derivation in reverse can be obtained by “handle pruning”.
(i.e.) if w is a sentence or string of the grammar at hand, then w = Yq, where is then right-
sentinel form of some rightmost derivation.‘Stack Tnput ‘Action
5 dri S shift
Sid Vidic § reduce by Eid
SE Vidic $ shift
3 iri S shift
SBtid; FS Teduce by Eid
SEE Fibs shift
SEE id § shift
SEEMS $ Teduce by Eid
SEE $ Teduce by E> E*E
SEE $ reduce by E> EXE
SE $ ‘accept
— The next input symbol is shifted onto the top of the stack.
reduce — The parser replaces the handle within a stack with a non-terminal.
* accept —The parser announces successfil completion of parsing.
* error The parser discovers that a syntax error has occurred and calls an error recovery
routine.
Conflicts in shift-reduce parsing:
‘There are two conflicts that occur in shift shift-reduce parsing:
1. Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.
2, Reduce-reduce conflict: The parser cannot decide which of several reductions to make.
1. Shiftereduce conflict:
Example:
Consider the grammar
E-SE+E | E*E | id and input idid*idStack Input ‘Action Stack Input ‘Action
FEE “dS Reduce by | SEVE mas Shin
EOEHE
5E dS Shift SEE ds Shin
ids Shift SEEid $ Reduce by
Esid
FEFid $ Reduce by | SEIT $ Reduce by
Eid ESE*E
SEE 5 Reduce by | SEHE $ Reduce by
ESE*E ESE*E
3 SE
2. Reduce-reduce conflict:
Consider the grammar:
M— RAR [Rte [R
Roe
and input e+e
Stack Input ‘Action Stack Input ‘Action
¥ oes Shift g eres Shit
Fe 8 Reduceby | Se eS Reduce by
Roe Roe
R we Shift SR eS Shit
$Re es ‘Shift SRE es Shit
FRte 5 Reduce by | SRte 3 Reduce by
Roe MoR+e
FRR 3 Reduce by [SM g
MoRIR
3M 5Viable prefixes:
«ais a viable prefix of the grammar if there is w such that aw is a right sentinel form,
‘The set of prefixes of right sentinel forms that can appear on the stack of a shift-redu
are called viable prefix
> The set of viable prefixes is a regular language,
parser
OPERATOR-PRECEDENC!
-ARSIN
An efficient way of constructing shift-reduce parser is called operator-precedence parsing.
Operator precedence parser can be constructed from a grammar called Operator-grammar. These
grammars have the property that no production on right side is ¢ or has two adjacent non
terminals.
Example:
Consider the grammar:
E> BAE|(E)|-E id
ASel[t|t
Since the right side EAE has three consecutive non-terminals, the grammar can be written as
follows:
E+ E+E | EE | E*E | B/E | ETE |-E [id
Operator precedence relations
There are three disjoint precedence relations namely
<' = less than
= = equalto
> = greater than
The relations give the following meaning:
a<"b — ayields precedence to b
a=b — ahas the same precedence as b
a'>b — atakes precedence over b
Rules for binary operations:
1, If operator 8; has higher precedence than operator 83, then make
0; °> Ozand 02 <" 0
2. operators 0; and 02, are of equal precedence, then make
01°> 8: and 63"> 0 if operators are left associative
01<' 8) and @2<'6; if right associative
3. Make the following for all operators 6:
0< id,id'>8
e<( (<8
y>8, 8>)
O>S,8<0‘Also make
ECS
Example:
Operator-precedence relations for the grammar
E> E+E | E-E | E*E | E/E | E7E | (&) |-E |i is given in the following table assuming
>). C), S<'id, id >$.$<(,) >8
1. is of highest precedence and right-associative
2, * and /are of next higher precedence and left-associative, and
3. + and- are of lowest precedence and left-associative
Note that the blanks in the table denote error entries,
TABLE : Operator-precedence relations
+ = * 7 T id C y $
+ > > = < = < = > >
7 > > = = = = = > >
* > > > > = = = > >
7 > > > > = = = > >
T > > > = = > >
id > > = > > ES
C = = = = = = =
) > > > > > >
$ = = = = = = =
Operator precedence parsing algorithm:
Input: An input string w and a table of precedence relations.
Output : If w is well formed, a skeletal parse tree ,with a placeholder non-terminal E labeling all
interior nodes; otherwise, an error indication,
‘Method : Initially the stack contains $ and the input buffer the string w $. To parse, we execute
the following program
(1)Set ip to point to the first symbol of wS;
(2)repeat forever
(3) if Sis on top of the stack and ip points to $ then
(4) return
else begin
(5) eta be the topmost terminal symbol on the stack
and let b be the symbol pointed to by ip,
(6) ifa b then Mreduce*!
(10) repeat
ay pop the stack
a2) until the top stack terminal is related by <
to the terminal most recently popped
(13) else error()
end
Stack implementation of operator precedence parsing:
Operator precedence parsing uses a stack and precedence relation table for its
implementation of above algorithm. It is a shift-reduce parsing containing all four actions shift,
reduce, accept and error.
The initial configuration of an operator precedence parsing is
STACK INPUT
8 ws
where w is the input string to be parsed,
Example:
Consider the grammar E ~» E+B | E-E | E*E | E/E | E7E | (E)| id, Input string is idid*id The
implementation is as follows:
STACK INPUT ‘COMMENT
7 = dried S shift id
Sid = adrid $ op the top of the stack id
5 = FdeidS shift
3= <__ iid S shiftid
Said > *idS pop id
oF = ds shift *
x ids id
> $ pop id
> $ pop*
3 pop*
¥ ‘accept
Advantages of operator precedence parsing:
1. Tris easy to implement.
2, Once an operator precedence relation is made between all pairs of terminals of a grammar ,
the grammar can be ignored, The grammar is not referred anymore during implementation,
Disadvantages of operator precedence parsing:
1. Itis hard to handle tokens like the minus sign (-) which has two different precedence.
2. Only a small class of grammar can be parsed using operator-precedence parser.LR PARSERS
An efficient bottom-up syntax analysis technique that can be used to parse a larg.
CFG is called LR(K) parsing, The *L? is for left-to-right scanning of the input, the *R’ for
constructing a rightmost derivation in reverse, and the ‘#° for the number of input symbols.
When *X’ is omitted, it is assumed to be 1
Advantages of LR parsing:
Y It recognizes virtually all programming language constructs for which CFG can be
written
¥ Itis an efficient non-backtracking shift-reduce parsing method
YA grammar that can be parsed using LR method is a proper superset of @ grammar that
can be parsed with predictive parser.
Y Itdetects a syntactic error as soon as possible
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.
‘ypes of LR parsing method:
1, SLR- Simple LR
= Easiest to implement, least powerful
2. CLR- Canonical LR
= Most powerful, most expensive.
3, LALR- Look-Ahead LR
* Intermediate in size and cost between the other two methods.
‘The LR parsing algorithm:
The schematic form of an LR parser is as follows:
INPUT a a ay $
Sf tase pom + ure
Xe
Tan
Xe
action | goto
So
STACKIt consists of : an input, an output, a stack, a driver program, and a parsing table that has two
parts (action and goto).
> The driver program is the same for all LR parser:
> The parsing program reads characters from an input buffer one at a time.
v
The program uses a stack to store a string of the form soX1SiX282...XaSm Where Sq is on
top. Each X; is a grammar symbol and each s; is a state
> The parsing table consists of two parts : action and goto functions.
Action : The parsing program determines sy, the state currently on top of stack, and ay, the
current input symbol. It then consults action{sn2i in the aetion table which can have one of four
values
shifts, where s isa state,
reduce by a grammar production A — B,
accept, and
error.
Goto : The function goto takes a state and grammar symbol as arguments and produces a state,
LR Parsing algorithm:
Input: An input string w and an LR parsing table with funetions action and goto for grammar G.
Output: If w is in L(G), a bottom-up-parse for ws otherwise, an error indication,
Method: Initially, the parser has sp on its stack, where so is the initial state, and wS in the input
buffer. The parser then executes the following program
set ip to point to the first input symbol of w$;
repeat forever begin
let s be the state on top of the stack and
4 the symbol pointed to by ip;
it.action|s, al = shift s’ then begin
push a then s* on top of the stack;
advance jp to the next input symbol
end
else if action|s, a] = reduce AB then begin
pop 2* | | symbols off the stack;
let s” be the state now on top of the stack;
push A then gotofs’, A] on top of the stack;
‘output the production A—
end
else if action(s, a] = accept then
return
else error( )
endCONSTRUCTING SLR(1) PARSING TABLE
To perform SLR parsing, take grammar as input and do the following:
1, Find LR(0) items
2. Completing the closure,
3. Compute gofo(1,X), whe
c, Lis set of items and X is grammar symbol.
LR(O) items:
‘An LR(0) item of a grammar G is a production of G with a dot at some position of the
right side, For example, production A —> XYZ yields the four items
A>.XYZ
ASX.YZ
Closure operation:
If Tis a set of items for a grammar G, then closure({) is the set of items constructed from I
by the two rules:
1. Initially, every item in I is added to closure(1).
2. IA a. BB is in closure(1) and B — y is a production, then add the item B +. yto I, if it
is not already there. We apply this rule until no more new items can be added to closure().
Goto operation:
Goto{I, X) is defined to be the closure of the set of all items [A—+ aX . f] such that
[A a. XB] is in,
Steps to construct SLR parsing table for grammar G are:
Augment G and produce G*
Construct the canonical collection of set of items C for G?
Construct the parsing action function action and goto using the following algorithm that
requires FOLLOW(A) for each non-terminal of grammar.
Algorithm for construction of SLR parsing table:
Input: An augmented grammar G°
Output : The SLR parsing table functions action and goto for G?
Method :
1. Construct € = {Io,h,_. In. the collection of sets of LR(0) items for G
2. State # is constructed from I, The parsing functions for state i are determined as follows
(a) If {A+eraf} is in I, and goto(l,a) = I, then set action|i,a] to “shift ;”. Here a must be
terminal
(b) If [Aer] is in 1, then set action{i,a] to “reduce A—>a” for all a in FOLLOW(A).
(6) If[S*+S.] is in L, then set action{i,$] to “accept”.
Ifany conflicting a enerated by the above rules, we say grammar is not SLR(1),3. The goto transitions for state i are constructed for all non-terminals A using the rule:
If goto(l,,A) = Ij, then gato[iA] =).
All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the set of items containing,
[ss].
Example for SLR parsing:
Construct SLR parsing for the following grammar
G:ESE+T|T
TOTtFIF
F>@)lid
The given grammar is
G:ESE+T
Est -
TOT*r -
TOF
Fo)
Foid
Step 1 : Convert given grammar into augmented grammar.
Augmented grammar :
EOE+T
EOT
TOT*r
TOF
F>@
F id
Step 2 : Find LR (0) items.
BiB? .8
Bo.EeT
Bot
To.TtF
Tor
Fo.)
Faia
GOTO (6) GOTO (1s, id)
li BoE. Ie Pid.
EOEGOTO (1)
LET.
TOT.tF
GOTO (1p. F)
Tor.
GOTO (1p.
AF (8)
E>.EtT
GOTO (In,.id
Is: Fy id.
GOTO (y+)
1: EE+.T
To.TtF
TO.F
F>.6)
Fo. id
GOTO (h.*)
bb: ToT*.F
F>.&)
Fo.id
GoTo (EB)
Ie B > (E.)
SE.HT
GoTO(h,.1)
hiEoT.
TOT.*E
GOTO (1. F)
b:ToF.
GOTO (Ie. T
L:ESE+T.
TOT.F
GOTO (Ig, F)
hiTor.
GOTO (1.0)
UW:F+(.E)
GOTO (Ig, id)
Is: F id.
GOTO (Ir.F)
Ip: TOT*F.
GOTO (hy
lL: FOCE)
E>.E+T
E>.T
To.T*F
T>.F
Fo.
F > .id
GOTO (iy, id
Is:F id.
GOTO (I.))
In:F(E).
GOTO (Ig .+
:EOE+.T
T>.TtF
To.P
F>.(E)
Fo id
GOTO (ly, *)
b:ToT*.F
Fo.(B)
Fo .idGOTO (1.0)
LiF (5)
ES.E+
Es.T
To.T
TLE
Foe)
F id
T
F
FOLLOW (F)={$,),+)
FOLLOW (T)
FOOLOW (F)
{S49}
FSF
JR parsing tabl
ACTION GOTO
id ¥ * ( ) 3 T ¥
To s 4 2 3
1 6 ‘ACC
b 2 7 2 2
b 1 v4 ro 4
Ty s 4 z 3
7 76 6 6 76
Te s 9 3
G 5 : 10
Is 6 si
b a 7 FI i
To 5 5 eS 3
Tn 5 5 5
Blank entries are error entries.
Stack implement:
Check whether the input id + id * id is valid or not.STACK INPUT ACTION
0 id? id* dS | GOTO(h, id) =s5 shift
0is ¥id*idS | GOTO (Is, +) =16; reduce by Pid
OFS ¥id*idS | GOTO(,F)=3
GOTO (Is, +) =14; reduce by T> F
oT? ¥id*idS | GOTO(), T)=2
GOTO (In, +)=12; reduce by E> T
OFT rid*idS | GOTO(,,E)=1
GOTO (1), +) = 36: shift
OEI=6 id*id$ | GOTO (I, d)=s5; shift
DEI +6idS FidS [GOTO (Is, *)=16; reduce by Fd
OEI+6F3 ¥idS | GOTO(I,F)=3
GOTO (Is, *)=r4 ; reduce by T+ F
DEI+6T9 ¥idS | GOTO(L,T)
GOTO (Is ,*)=s7: shift
OEI+6T9*7 iS | GOTO(h, id) =s5; shite
OEI+6T9* 705 S| GOTO(Is,$) =16; reduce by F—> id
OEI+6T9*7FIO § | GOTO(h,F)=10
GOTO (ho. $)=13 ; reduce by
OEI+6T9 3 |GOTO(k,T)=9
GOTO (1p, $)=rl ; reduce by E> E+
OE 3 |GOTO(h,E)=1
GOTO (1,8) = accept