Operator precedence parsing
Bottom-up parsing methods that follow the idea of shift-reduce parsers Several avors: operator, simple, and weak precedence. In this course, only weak precedence Main dierences with respect to LR parsers:
I I
There is no explicit state associated to the parser (and thus no state pushed on the stack) The decision of whether to shift or reduce is taken based solely on the symbol on the top of the stack and the next input symbol (and stored in a shift-reduce table) In case of reduction, the handle is the longest sequence at the top of stack matching the RHS of a rule
Syntax analysis
194
Structure of the weak precedence parser
input stack
Xm Xm
1
a1
ai
an
Weak precedence parsing
X2 X1
output
Shift-reduce table
terminals and $ terminals, nonterminals and $
Shift/Reduce/Error
Syntax analysis
modier) (A
195
Weak precedence parsing algorithm
Create a stack with the special symbol $ a = getnexttoken() while (True) if (Stack= = $S and a = = $) break / / Parsing is over Xm = top(Stack) if (SRT [Xm , a] = shift) Push a onto the stack a = getnexttoken() elseif (SRT [Xm , a] = reduce) Search for the longest RHS that matches the top of the stack if no match found call error-recovery routine Let denote this rule by Y ! Xm r +1 . . . Xm Pop r elements o the stack Push Y onto the stack Output Y ! Xm r +1 . . . Xm else call error-recovery routine
Syntax analysis
196
Example for the expression grammar
Example:
Shift/reduce table E T F + ( ) id $ S R + S R R ( ) S R R id $ R R R
E E T T F F
!E +T !T !T F !F ! (E ) ! id
S S S R R R R S R R
S S S R R S
Syntax analysis
197
Example of parsing
Stack $ $id $F $T $E $E + $E + id $E + F $E + T $E + T $E + T id $E + T F $E + T $E Input id $ id $ id $ id $ id $ id $ id $ id $ id $ id $ $ $ $ $ Action Shift Reduce Reduce Reduce Shift Shift Reduce Reduce Shift Shift Reduce Reduce Reduce Accept
id + id +id +id +id +id id
by F ! id by T ! F by E ! T by F ! id by T ! F by F ! id by T ! T F by E ! E + T
Syntax analysis
198
Precedence relation: principle
We dene the (weak precedence) relations l and m between symbols of the grammar (terminals or nonterminals)
I I
X l Y if XY appears in the RHS of a rule or if X precedes a reducible word whose leftmost symbol is Y X m Y if X is the rightmost symbol of a reducible word and Y the symbol immediately following that word
Shift when Xm l a, reduce when Xm m a Reducing changes the precedence relation only at the top of the stack (there is thus no need to shift backward)
Syntax analysis
199
Precedence relation: formal denition
Let G = (V , , R , S ) be a context-free grammar and $ a new symbol acting as left and right end-marker for the input word. Dene V 0 = V [ {$} The weak precedence relations l and m are dened respectively on V 0 V and V V 0 as follows:
1. X l Y if A ! XB 2. X l Y if A ! XY + 3. $ l X if S ) X 4. X m a if A ! B + 5. X m $ if S ) X is in R , and B ) Y , is in R
+ +
is in R , and B ) X and
)a
for some , , , and B
Syntax analysis
200
Construction of the SR table: shift
Shift relation, l:
Initialize S to the empty set. add $ l S to S for each production X ! L1 L2 . . . Lk for i = 1 to k 1 add Li l Li +1 to S 3 repeat for each pair X l Y in S for each production Y ! L1 L2 . . . Lk Add X l L1 to S until S did not change in this iteration. 1 2
We only need to consider the pairs X l Y with Y a nonterminal that were added in
S at the previous iteration
Syntax analysis 201
Example of the expression grammar: shift
Step 1 Step 2 Sl$ E l+ +lT T l lF (lE E l) +lF l id l( (lT + l id +l( (lF (l( (lid
E E T T F F
!E +T !T !T F !F ! (E ) ! id
Step 3.1
Step 3.2
Step 3.3
Syntax analysis
202
Construction of the SR table: reduce
Reduce relation, m:
Initialize R to the empty set. add S m $ to R for each production X ! L1 L2 . . . Lk for each pair X l Y in S add Lk m Y in R 3 repeat for each pair X m Y in R for each production X ! L1 L2 . . . Lk Add Lk m Y to R until R did not change in this iteration. 1 2
We only need to consider the pairs X m Y with X a nonterminal that were added in
R at the previous iteration.
Syntax analysis 203
Example of the expression grammar: reduce
Step 1 Step 2 E m$ T m+ F m T m) T m$ F m+ id m )m F m) F m$ id m + )m+ )m) id m $ )m$
E E T T F F
!E +T !T !T F !F ! (E ) ! id
Step 3.1
Step 3.2
Step 3.3
Syntax analysis
204
Weak precedence grammars
Weak precedence grammars are those that can be analysed by a weak precedence parser. A grammar G = (V , , R , S ) is called a weak precedence grammar if it satises the following conditions:
1. There exist no pair of productions with the same right hand side 2. There are no empty right hand sides (A ! ) 3. There is at most one weak precedence relation between any two symbols 4. Whenever there are two syntactic rules of the form A ! X and B ! , we dont have X l B
Conditions 1 and 2 are easy to check
Conditions 3 and 4 can be checked by constructing the SR table.
Syntax analysis
205
Example of the expression grammar
Shift/reduce table
E E T T F F !E +T !T !T F !F ! (E ) ! id
E T F + ( ) id $ S R + S R R ( ) S R R id $ R R R
S S S R R R R S R R
S S S R R S
Conditions 1-3 are satised (there is no conict in the SR table) Condition 4:
I I
E ! E + T and E ! T but we dont have + l E (see slide 202) T ! T F and T ! F but we dont have l T (see slide 202)
Syntax analysis
206
Removing rules
Removing rules of the form A ! is not di cult For each rule with A in the RHS, add a set of new rules consisting of the dierent combinations of A replaced or not with . Example: S ! AbA|B
B ! b |c A ! is transformed into S
B ! b |c
! AbA|Ab |bA|b |B
Syntax analysis
207
Summary of weak precedence parsing
Construction of a weak precedence parser Eliminate ambiguity (or not, see later) Eliminate productions with and ensure that there are no two productions with identical RHS Construct the shift/reduce table Check that there are no conict during the construction Check condition 4 of slide 205
Syntax analysis
208
Using ambiguous grammars with bottom-up parsers
All grammars used in the construction of Shift/Reduce parsing tables must be un-ambiguous We can still create a parsing table for an ambiguous grammar but there will be conicts We can often resolve these conicts in favor of one of the choices to disambiguate the grammar Why use an ambiguous grammar?
I I
Because the ambiguous grammar is much more natural and the corresponding unambiguous one can be very complex Using an ambiguous grammar may eliminate unnecessary reductions
Example: E ! E + E |E E |(E )|id ) E ! E + T |T T ! T F |F F ! (E )|id
209
Syntax analysis
Set of LR(0) items of the ambiguous expression grammar
E ! E + E |E E |(E )|id
Follow (E ) = {$, +, , )} ) states 7 and 8 have shift/reduce conicts for + and .
(Dragonbook)
Syntax analysis 210
Disambiguation
Example: Parsing of id + id id will give the conguration (0E 1 + 4E 7, id$) We can choose:
I I
ACTION [7, ] =shift ) precedence to ACTION [7, ] =reduce E ! E + E ) precedence to +
Parsing of id + id + id will give the conguration (0E 1 + 4E 7, +id$) We can choose:
I I
(same analysis for I8 )
Syntax analysis
ACTION [7, +] =shift ) + is right-associative ACTION [7, +] =reduce E ! E + E ) + is left-associative
211
Error detection and recovery
In table-driven parsers, there is an error as soon as the table contains no entry (or an error entry) for the current stack (state) and input symbols The least one can do: report a syntax error and give information about the position in the input le and the tokens that were expected at that position In practice, it is however desirable to continue parsing to report more errors There are several ways to recover from an error:
I I I I
Panic mode Phrase-level recovery Introduce specic productions for errors Global error repair
Syntax analysis
212
Panic-mode recovery
In case of syntax error within a phrase, skip until the next synchronizing token is found (e.g., semicolon, right parenthesis) and then resume parsing In LR parsing:
I I I
Scan down the stack until a state s with a goto on a particular nonterminal A is found Discard zero or more input symbols until a symbol a is found that can follow A Stack the state GOTO (s , A) and resume normal parsing
Syntax analysis
213
Phrase-level recovery
Examine each error entry in the parsing table and decide on an appropriate recovery procedure based on the most likely programmer error. Examples in LR parsing: E ! E + E |E E |(E )|id
I
id + id : is unexpected after a +: report a missing operand error, push an arbitrary number on the stack and go to the appropriate next state id + id ) + id : Report a unbalanced right parenthesis error and remove the right parenthesis from the input
Syntax analysis
214
Other error recovery approaches
Introduce specic productions for detecting errors: Add rules in the grammar to detect common errors Examples for a C compiler: I ! if E I (parenthesis are missing around the expression) I ! if (E ) then I (then is not needed in C) Global error repair: Try to nd globally the smallest set of insertions and deletions that would turn the program into a syntactically correct string Very costly and not always eective
Syntax analysis
215
Building the syntax tree
Parsing algorithms presented so far only check that the program is syntactically correct In practice, the parser needs also to build the parse tree (also called concrete syntax tree) Its construction is easily embedded into the parsing algorithm Top-down parsing:
I I
Recursive descent: let each parsing function return the sub-trees for the parts of the input they parse Table-driven: each nonterminal on the stack points to its node in the partially built syntax tree. When the nonterminal is replaced by one of its RHS, nodes for the symbols on the RHS are added as children to the nonterminal node
Syntax analysis
216
in which tokens are grouped ea often represented inname a parse token such as <id,1>. The id is short for identifier. The value 1 is
ymbol table produced by the compiler. This table is used to pass
Building the syntax tree
Bottom-up parsing:
he token <=>. In reality it is probably mapped to a pair, whose second hat there are many different identifiers so we need the second component, mbol =. I Each stack element points to a subtree of the syntax tree n <id,2> n <+>. e right. I When performing a reduce, a new syntax tree is built with g and is discussed further in subsequent chapters. It is mapped to e something. On the one hand there is only one 3 so root we could just use the nonterminal at the and the popped-o stack elements ammar containing rules as can be a difference between how such this should be printed (e.g., in an error hases) and how it should be stored (fixed vs. float vs double). Perhaps the le where an entry for "this kind of 3" is stored. Another possibility is to <;>.
the as children
Note:
I I
In C, most blanks are non-significant. rlly removed during scanning. simplied abstract syntax tree
In practice, the concrete syntax tree is not built but rather an Depending on the complexity of the compiler, the syntax tree might
rs, and the various symbols and punctuation without using recursion
evenalso notthe behierarchical constructed ression (expr). Note decomposition in the figure on the right.
ng) parsing is somewhat arbitrary, but invariably if a recursive definition is involved,
g. ch tokens are grouped
represented in a parse
d the syntax tree with operators as interior nodes and rator. The syntax tree on the right corresponds to the parse
epresents ansuch assignment expression not an assignment statement. In C an containing rules as railing semicolon. That is, in C (unlike in Algol) the semicolon is a statement Syntax analysis
217
Conclusion: top-down versus bottom-up parsing
Top-down
I I I
Easier to implement (recursively), enough for most standard programming languages Need to modify the grammar sometimes strongly, less general than bottom-up parsers Used in most hand-written compilers More general, less strict rules on the grammar, SLR(1) powerful enough for most standard programming languages More di cult to implement, less easy to maintain (add new rules, etc.) Used in most parser generators like Yacc or Bison (but JavaCC is top-down)
Bottom-up:
I I I
Syntax analysis
218
For your project
The choice of a parsing technique is left open for the project but we ask you to implement the parser by yourself (Yacc, bison or other parser generators are forbidden) Weak precedence parsing was the recommended method in previous implementations of this course Motivate your choice in your report and explain any transformation you had to apply to your grammar to make it t the parsers constraints To avoid mistakes, you should build the parsing tables by program
Syntax analysis
219