0% found this document useful (0 votes)
6 views109 pages

CD - CH3 - Syntax Analysis (Parsing)

The document outlines the concepts of syntax analysis in compiler design, focusing on the role of syntax analyzers (parsers) in verifying program structure and reporting errors. It discusses context-free grammars (CFGs) as a specification tool for programming languages and details various parsing techniques, including top-down and bottom-up methods. Additionally, it covers error handling strategies and the construction of predictive parsing tables using FIRST and FOLLOW functions.

Uploaded by

andualem.second
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views109 pages

CD - CH3 - Syntax Analysis (Parsing)

The document outlines the concepts of syntax analysis in compiler design, focusing on the role of syntax analyzers (parsers) in verifying program structure and reporting errors. It discusses context-free grammars (CFGs) as a specification tool for programming languages and details various parsing techniques, including top-down and bottom-up methods. Additionally, it covers error handling strategies and the construction of predictive parsing tables using FIRST and FOLLOW functions.

Uploaded by

andualem.second
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CSE 4310 – Compiler Design

CH3 – Syntax Analysis

1
Outline
• Introduction
• What is a Syntax Analyzer (Parser)?
• The Role of Syntax Analysis (Parsing)
• Specification of Programming Languages
– Context Free Grammars (CFGs)
• Parsing Context Free Languages
– Deterministic Pushdown Automata (DPDA)
• Parsing Techniques
– Top-down Parsing [LL(1) and Recursive-Descent Parsing
(RDP)]
– Bottom-up Parsing [LR – Parsing ]

2
Introduction
• Every programming language has precise grammar rules that
describe the syntactic structure of well-formed programs.
• In C, for example, the rules state
– how functions are made out of parameter lists, declarations, and
statements;
– how statements are made of expressions, etc.
• Grammars are easy to understand, and parsers for
programming languages can be constructed automatically
from certain classes of grammars.
• Parsers or syntax analyzers are generated for a particular
grammar.
• Context-free grammars(CFGs) are usually used for syntax
specification of programming languages.

3
What is Syntax Analyzer (Parser)?
• A Syntax Analyzer (Parser) for a grammar of a programming
language:
1. Verifies that the string of tokens for a program in that language
can indeed be generated from that grammar
2. Reports any syntax errors in the program
3. Constructs a parse tree representation of the program (not
necessarily explicit)
• The Parser usually calls the lexical analyzer to supply a token
to it
• When necessary could be hand-written or automatically
generated
• Is based on context-free grammars (CFGs)
– Grammars are generative mechanisms like regular expressions
– Pushdown automata are machines recognizing context-free
languages (like FSA for RL)
4
The Role of Syntax Analyzer (Parser)
• The parser obtains a string of tokens from the lexical
analyzer, as shown in the above below and verifies
that the string of token names can be generated by
the grammar for the source language.

5
The Role of Syntax Analyzer (Parser)
• The Parser performs:
– Context-free syntax analysis
– Guides context-sensitive analysis
– Constructs an intermediate representation
– Produces meaningful error messages
– Attempts error correction
• The parser is expected to report any syntax errors in an
intelligible fashion and to recover from commonly
occurring errors to continue processing the remainder
of the program.
• Conceptually, for well-formed programs, the parser
constructs a parse tree and passes it to the rest of the
compiler for further processing.
6
The Role of Syntax Analyzer (Parser)
Syntax Error Handling
• Common programming errors include:
– Lexical errors, Syntactic errors, Semantic errors and logical Errors
• Error handler goals are:
– Report the presence of errors clearly and accurately
– Recover from each error quickly enough to detect subsequent errors
– Add minimal overhead to the processing of correct programs
• Common error-recovery strategies includes:
1. Panic mode recovery: discard input symbol one at a time until one of
designated set of synchronization tokens is found.
2. Phrase level recovery: replacing a prefix of remaining input by some
string that allows the parser to continue.
3. Error productions:- augment the grammar with productions that
generate the erroneous constructs.
4. Global correction: choosing minimal sequence of changes to obtain a
globally least-cost correction.
7
Specification of Programming Languages
• A grammar gives a precise, yet easy-to-understand, syntactic
specification of a programming language.
– From certain classes of grammars, we can construct automatically
an effi-cient parser that determines the syntactic structure of a
source program.
– As a side benefit, the parser-construction process can reveal
syntactic ambiguities and trouble spots that might have slipped
through the initial design phase of a language.
– The structure imparted to a language by a properly designed
grammar is useful for translating source programs into correct
object code and for detecting errors.
• A grammar allows a language to be evolved or developed
iteratively, by adding new constructs to perform new tasks.
– These new constructs can be integrated more easily into an
implementation that follows the grammatical structure of the
language.
8
Specification of Programming Languages

• Context Free Grammars(CFGs) are used as a


tool to describe the syntax of a programming
language.
• Parsers use the a CFG as an input to construct
the parse tree.
– [Please refer to your Formal Language and
Automata Theory course for more information
about CFGs and related concepts like:
• Derivation, leftmost and rightmost derivation, ambiguity,
• Pushdown Automata, etc.]
9
Parsing Context Free Languages
• In practice we need DPDA, since they have exactly one possible move at any
instant. Our parsers are all DPDA.[Read more about Deterministic pdas?]

• Parsing is the process of constructing a parse tree for a sentence generated


by a given grammar.

• If there are no restrictions on the language and the form of grammar used,
parsers for context-free languages require �(�3 ) time (� being the length of
the string parsed).
Example Algorithms:
• Cocke-Younger-Kasami’s (CYK) algorithm
• Earley’s algorithm

• Subsets of context-free languages typically require �(�) time


– Predictive parsing using ��(1) grammars (top-down parsing method)
– Shift-Reduce parsing using ��(1) grammars (bottom-up parsing method)

10
Parsing Context Free Languages
• There are three general types of parsers for grammars: universal, top-down,
and bottom-up.
1. Universal parsing methods such as the Cocke-Younger-Kasami algorithm and
Earley's algorithm can parse any grammar [Read more on this].
– These general methods are, however, too inefficient to use in production
compilers.
• The methods commonly used in compilers can be classified as being either
top-down or bottom-up.
2. Top-Down Methods: as implied by their names, top-down methods build
parse trees from the top (root) to the bottom (leaves ).
3. Bottom-Up Methods: start from the leaves and work their way up to the root
to build the parse tree.
• In either case, the input to the parser is scanned from left to right, one
symbol at a time.
– The most efficient top-down and bottom-up methods work only for sub-classes
of grammars, but several of these classes, particularly, LL and LR gram-mars, are
expressive enough to describe most of the syntactic constructs in modern
programming languages.

11
Parsing Techniques
• Syntax analyzers follow production rules
defined by means of context–free grammar.
• The way the production rules are implemented
(derivation) divides parsing into two general
types (approaches):
1. Top-down parsing
• Recursive Descent Parsing
• Non-recursive Predictive Parsing
2. Bottom-up parsing
• LR Parsing
12
Top-down Parsing using LL Grammars
• The top-down parsing technique parses the input, and
starts constructing a parse tree from the root node
gradually moving down to the leaf nodes.
• The types of top-down parsing are depicted below:

13
Top-down Parsing using LL Grammars
Recursive Descent Parsing (RDP)

• This method of top-down parsing can be considered as


an attempt to find the left most derivation for an input
string.

• It may involve backtracking (make repeated scans of the


input)
– which can be inefficient.

• This parsing technique is regarded recursive as it uses


context-free grammar which is recursive in nature.
14
Top-down Parsing using LL Grammars
Recursive Descent Parsing (RDP)
• To construct the parse tree using RDP:
1. Create a one node tree consisting of �(start symbol) .
2. Two pointers, one for the tree and one for the input,
will be used to indicate where the parsing process is.
3. Initially, they will be on � and the first input symbol,
respectively
4. Then we use the first � − ���������� to expand the
tree. The tree pointer will be positioned on the left
most symbol of the newly created sub-tree.
15
Top-down Parsing using LL Grammars
Recursive Descent Parsing (RDP)
• To construct the parse tree using RDP:
5. As the symbol pointed by the tree pointer matches that of
the symbol pointed by the input pointer, both pointers are
moved to the right.
6. Whenever the tree pointer points on a non-terminal, we
expand it using the first production of the non-terminal.
7. Whenever the pointers point on different terminals, the
production that was used is not correct, thus another
production should be used. We have to go back to the step
just before we replaced the non-terminal and use another
production.(Backtracking)
8. If we reach the end of the input and the tree pointer
passes the last symbol of the tree, we have finished
parsing.
16
Top-down Parsing using LL Grammars
Recursive Decent Parsing (RDP)
• Example:

17
Top-down Parsing using LL Grammars
Recursive Decent Parsing (RDP)
• Exercise: Given the grammar G with
productions:
� → ���
� → �� | �
– Draw the parse tree for the input string “���” using
RDP.
(Hint: follow the previous steps)
18
Top-down Parsing using LL Grammars
Predictive Parsing

19
Top-down Parsing using LL Grammars
Predictive Parsing
• It is possible to build a non-recursive parser by explicitly
maintaining a stack.
• „This method uses a parsing table that determines the
next production to be applied.
• Predictive parsing uses a stack and a parsing table to
parse the input and generate a parse tree.
• Both the stack and the input contains an end symbol $ to
denote that the stack is empty and the input is consumed.
• The parser refers to the parsing table to take any decision
on the input and stack element combination.
20
Top-down Parsing using LL Grammars
Predictive Parsing
• Top-down parsing using predictive parsing, traces the left-most
derivation of the string while constructing the parse tree.
– Starts from the start symbol of the grammar, and “predicts” the next
production used in the derivation.
– Such “prediction” is aided by parsing tables (constructed off-line).
– The next production to be used in the derivation is determined using the
next input symbol to lookup the parsing table (look-ahead symbol).

• Placing restrictions on the grammar ensures that no slot in the


parsing table contains more than one production.
– At the time of parsing table construction, if two productions become
eligible to be placed in the same slot of the parsing table, the grammar is
declared unfit for predictive parsing.

21
Top-down Parsing using LL Grammars
Predictive Parsing
• The non-recursive predictive parser uses an input
buffer, a stack and a parsing table �.

22
Top-down Parsing using LL Grammars
Predictive Parsing
• The input buffer contains the string to be parsed
followed by $ (the right end marker).

• The stack contains a sequence of grammar symbols


with $ at the bottom.
– Initially, the stack contains the start symbol of the
grammar followed by $.

• The parsing table(M) is a two dimensional array


�[�, �] where � is a non-terminal of the grammar
and � is a terminal or $
23
Top-down Parsing using LL Grammars
Predictive Parsing
• The parser program behaves as follows:
– The program always considers �, a symbol on top of
the stack and �, the current input symbol.
– There are three possibilities:
i. � = � = $ : the parser halts and announces successful
completion of parsing
ii. � = � ≠ $ : the parser pops � off the stack and
advances the input pointer to the next symbol
iii. � is a non-terminal : the program consults entry
�[�, �] which can be an � − ���������� or an error
entry.
24
Top-down Parsing using LL Grammars
Predictive Parsing
• The parser program behaves as follows:
(cont.…)
iii. � is a non-terminal : the program consults entry
�[�, �] which can be an � − ���������� or an error
entry:
– If �[�, �] = {� → ���}, � on top of the stack will be replaced
by ��� (� at the top of the stack). As an output, any code
associated with the x-production can be executed.
– If �[�, �] = �����, the parser calls the error recovery
method.

25
Top-down Parsing using LL Grammars
Predictive Parsing
• Example: Consider a grammar G:
E → TR
R → +TR
R → -TR
R→λ
T → 0|1|...|9
• The parsing table for G is:
x\a 0 1 ... 9 + - $

E E → TR E → TR ... E → TR Error Error Error

R Error Error ... Error R → +TR R → -TR R→λ

T T→0 T→1 ... T→9 Error Error Error

26
Top-down Parsing using LL Grammars
Predictive Parsing
• Example: Cont.… Input Stack

– Parsing the input string 1 + 2: 1+2$ E$

1+2$ TR$

1+2$ 1R$

+2$ R$

+2$ +TR$

2$ TR$

2$ 2R$

$ R$

$ $
27
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table
• To construct the parsing table we make use of
two functions:
– FIRST and FOLLOW to fill in the entries of a
predictive parsing table for grammar G, whenever
possible.
• Sets of tokens yielded by the FOLLOW function
can be used as a synchronizing tokens during
panic mode error recovery.
28
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table
FIRST
• �����(�) = set of terminals that begin the strings
derived from �.
• If � ⇒ � in zero or more steps, � is in �����(�).
• �����(�), where � is a grammar symbol, can be found
using the following rules:
i. If � is a terminal, then �����(�) = {�}
ii. If � is a non-terminal
• If � → � is a production, then add � to �����(�)
• For each production � → �1 �2 …�� , place � in �����(�) if for some
�, � ∈ �����(�� ) and � ∈ �����(�� ), for 1 < � < �
• If � ∈ �����(�� ), for � = 1, …, � then � ∈ �����(�� )
29
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table
FIRST (cont....)
• For any string � = �1 �2 …��
– Add all non-λ symbols of �����(�� ) in �����(�)
– Add all non-λ symbols of �����(�� ) for � ≠ 1 if for
all � < �, �����(�� )
– � ∈ �����(�) if � ∈ �����(�� ) for 1 ≤ � ≤ �.

30
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table
FIRST (cont....)
• Example: Given a grammar G with productions
E → TR
R → +TR
R → -TR
R→λ
T → 0|1|...|9
• Find: FIRST(TR), FIRST(+TR), FIRST(-TR) and FIRST(0)?
• Solution:
– FIRST(TR) = FIRST(T) = {0, 1, …, 9}
– FIRST(+TR) = {+}
– FIRST(-TR) = {-}
– FIRST(0) = {0}
31
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table
FOLLOW
• ������(�) = set of terminals that can appear
immediately to the right of � in some sentential
form.
1. Place $ in ������(�), where � is the start symbol.
2. If there is a production � → ���, then everything in
�����(�), except �, should be added to
������(�).
3. If there is a production � → �� or � → ��� and � ∈
�����(�), all elements of ������(�) should be
added to ������(�).
32
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table
FOLLOW(cont....)
• Example: Given a grammar G with productions
E → TR
R → +TR
R → -TR
R→λ
T → 0|1|...|9
• Find: FOLLOW(E), FOLLOW(R) and FOLLOW(T)?
• Solution:
FOLLOW (E) = {$}
FOLLOW(R) = FOLLOW(E) = {$}
FOLLOW(T) = FIRST(R) U FOLLOW(R) = {-, +, $}
33
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table „
• Now let us see how the parsing table M is
constructed:
1. For each production of the form � → � of the
grammar do
a) For each terminal � in �����(�), add � → � to
�[�, �].
b) If � is in �����(�), add � → � to �[�, �] for each �
in ������(�). If � is in �����(�) and $ is in
������(�), add � → � to �[�, $].
2. Make each undefined entry of � be an error.
34
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table „
Example: Given a grammar G with productions:
� → ��′
�′ → + ��′ | �
� → ��′
�′ → ∗ ��′ | �
� → ( � ) | ��
• Then construct parsing table for G and construct a
parsing tree for the input �� + �� ∗ ��.
Solution:
• A predictive parsing table for the grammar G is given
below.
– Blanks are error entries; non-blanks indicate a production
with which to expand the top nonterminal on the stack.
35
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table „
Solution: cont.…
Steps:
1. Compute the First and Follow sets for each nonterminal.
2. Construct the table using the steps how the parsing table is
constructed above.
INPUT SYMBOL
NONTER-MINAL
id + * ( ) $

E E → TE' E → TE'

E' E' → +TE' E' → λ E' → λ

T T → FT' T → FT'

T' T' → λ T' → *FT' T' → λ T' → λ

F F → id F → (E)
36
Top-down Parsing using LL Grammars
Constructing a Predictive Parsing Table „
Solution: cont.… $E
STACK INPUT
id + id * id $
OUTPUT

$TE' id + id * id $ E → TE'
• Now, with input �� + �� ∗ �� the
$E'T'F id + id * id $ T → FT'
predictive parser makes the sequence
$E' T' id id + id * id $ F → id
of moves as shown in the table. $E' T' + id * id $
• The input pointer points to the $E' + id * id $ T' → λ
leftmost symbol of the string in the $E' T+ + id * id $ E' → +TE'
INPUT column. $E' T id * id $
$E' T' F id * id $ T → FT'
$E' T' id id * id $ F → id
$E' T' * id $
$E' T' F* * id $ T' → *FT'
$E' T' F id $
$E' T' id id $ F → id
$E' T' $
$E' $ T' → λ
$ $ E' → λ 37
Top-down Parsing using LL Grammars
�� ������
• An LL Parser accepts LL grammar.
• LL grammar is a subset of context-free grammar but with
some restrictions to get the simplified version, in order
to achieve easy implementation.
• LL grammar can be implemented by means of both
algorithms namely, recursive-descent or table-driven.
• LL parser is denoted as ��(�).
– The first L in ��(�) is parsing the input from left to right,
– The second L in ��(�) stands for left-most derivation and
– � itself represents the number of look-aheads.
• Generally � = �, so ��(�) may also be written as ��(�).
38
Top-down Parsing using LL Grammars
��(�) ��������
• „A grammar for which the parsing table does not have
a multiply-defined entries is called an ��(�) grammar.
Determining ��(�) grammars: „
• A grammar is ��(�) iff for each pair of A-productions
� → �|�
– For no terminal � do � and � can derive strings beginning
with �.
– At most one of � and � can derive to an empty string (�)
– If � → � then � does not derive to any string beginning with
a terminal in ������(�).

39
Top-down Parsing using LL Grammars
��(�) ��������
„Example: Let G be given by:
� → �����′ | �
�′ → �� | �
� → �
– Is G an ��(�) grammar?
Solution: The parsing table for this grammar is given below:
NONTER- INPUT SYMBOL
MINAL a b e i t $

S S→ a S → iEtSS'

S' → λ
S' S' → λ
S' → eS

E E→b

40
Top-down Parsing using LL Grammars
��(�) ��������
Solution: cont.….
• The grammar G is not ��(�).
– The entry for �[�′, �] contains both �′ → �� and
�′ → �, since ������(�′) = {�, $}.
– The grammar is ambiguous and the ambiguity is
manifested by a choice in what production to use
when an � (else) is seen.

41
Top-down Parsing using LL Grammars
LL Parsing Algorithm
Input: string ω, parsing table M for grammar G
Output: If ω is in L(G) then left-most derivation of ω, error otherwise.
Initial State: $S on stack (with S being start symbol)
ω$ in the input buffer
SET ip to point the first symbol of ω$.
repeat
let X be the top stack symbol and a the symbol pointed by ip.
if X Î VT or $
if X = a then POP X and advance ip.
else error()
else /* X is non-terminal */
if M[X,a] = X → Y1, Y2,... Yk
POP X
PUSH Yk, Yk-1,... Y1 /* Y1 on top */
Output the production X → Y1, Y2,... Yk
else error()
until X = $ /* empty stack */ 42
Error Recovery in Predictive Parsing
• An error is detected in predictive parsing when
– The terminal on top of the stack does not match the next input
symbol or
– There is a non-terminal � on top of the stack and � is the next
input symbol and �[�, �] = �����.
• Panic mode error recovery method with synchronization
tokens:
1. If A is the non-terminal on top of the stack, all symbols of
FOLLOW(A) can be in the set of synchronization symbols for that
non-terminal.
• Drop tokens until an element in FOLLOW(A) is seen and pop A from the
stack, most likely parsing can continue.
2. If the symbol in FOLLOW(A) is missing, then the next constructs
will also be skipped until the synchronization token of a similar
construct is obtained.
• Read more on this issue from the book: page 192
43
Bottom-up Parsing [��(�)-Parsing]
• Bottom-up parsing is a parsing method that
consists of producing the parse tree from the
leaves to the root.
• Bottom-up parsing starts from the leaf nodes of
a tree and works in upward direction till it
reaches the root node.
• Here, we start from a sentence and then apply
production rules in reverse manner in order to
reach the start symbol.
44
Bottom-up Parsing [��(�)-Parsing]
• The figure given below depicts the bottom-up
parsers available.

• We will see a general style of bottom-up parsing


called shift-reduce parsing.
45
Bottom-up Parsing [��(�)-Parsing]
Shift-Reduce Parsing
• Shift-reduce parsing uses two unique steps for bottom-
up parsing: the shift-step and reduce-step.

Shift step: refers to the advancement of the input pointer to


the next input symbol(shifted symbol).
• This symbol is PUSHED onto the stack.
• The shifted symbol is treated as a single node of the parse tree.

Reduce step: When the parser finds a complete grammar rule


(RHS) and replaces it to (LHS), it is known as reduce-step.
• This occurs when the top of the stack contains a handle.
• To reduce, a POP function is performed on the stack which pops off
the handle and replaces it with LHS non-terminal symbol.
46
Bottom-up Parsing [��(�)-Parsing]
Shift-Reduce Parsing
Example:
� → ����
� → ��� | �
�→�
– Input string: ������
• A bottom-up derivation of ������:
������ ⟸ ������ ⟸ ���� ⟸ ���� ⟸ �
• At each step, we have to find � such that � is a
substring of the sentence and replace � by �, where
� → �.
– Here � is called a Handle, and the process of replacing by �
is referred to as Handle pruning.

47
Bottom-up Parsing [��(�)-Parsing]
Shift-Reduce Parsing
Handle
• Informally, a “handle” of a string is a production of the
grammar:
– whose right side matches with a substring of the string.
– whose use in the reduction of the substring to the non-terminal
on the left side of the production represents one step along the
reverse of a rightmost derivation.
• Definition: A handle of a right sentential form � is a
production � → � and a position of � where the string �
may be found and replaced by � to produce the previous
sentential form in the rightmost derivation of �.
– That is, if � ⇒ ��� ⇒ ���, then � → � in the position
following � is a handle of ���.
48
Bottom-up Parsing [��(�)-Parsing]
Shift-Reduce Parsing
Handle Pruning
– Replacing � by � in the string ��� using the production � → � is called handle
pruning.
– Therefore, a rightmost derivation can be obtained by handle pruning.
Example: Let G be given by: � → � + �|� ∗ �|(�)|��.
– The rightmost derivation for: �� + �� ∗ �� is:
E⇒E + E ⇒ E + E * E ⇒ E + E * id3 ⇒ E + id2 *id3 ⇒ id1 + id2 * id3
�� �� �� �� ��
– The sequence of reductions of the input ��1 + ��2 ∗ ��3 to the start symbol �
using handle pruning is as follows:
Right-Sentential Form Handle Reducing Production

id1 + id2 * id3 id1 E → id

E + id2 * id3 id2 E → id

E + E * id3 id3 E → id

E+E*E E*E E→E*E

E+E E+E E→E+E

E 49
Bottom-up Parsing [��(�)-Parsing]
Stack Implementation of Shift-Reduce Parsing
• In LR parsing the two major problems are:
– locate the substring that is to be reduced
– locate the production to use for the reduction
• A shift/reduce parser operates by shifting zero or
more input into the stack until the right side of the
handle (�) is on top of the stack. „
– The parser then replaces � by the non-terminal of the
production. „
– This is repeated until the start symbol is in the stack
and the input is empty, or until error is detected.
50
Bottom-up Parsing [��(�)-Parsing]
Stack Implementation of Shift-Reduce Parsing
• Four actions are possible:
1. shift: the next input is shifted on to the top of the
stack
2. reduce: the parser knows the right end of the
handle is at the top of the stack.
• It should then decide what non-terminal should replace
that substring
3. accept: the parser announces successful
completion of parsing
4. error: the parser discovers a syntax error
51
Bottom-up Parsing [��(�)-Parsing]
Stack Implementation of Shift-Reduce Parsing
Example: Let us step operations of a shift/reduce parser in parsing the
input string ��� + ��� ∗ ��� according to the grammar in above example
• The sequence actions is shown below:
STACK INPUT ACTION

$ id1 + id2 * id3 $ Shift


$ id1 + id2 * id3 $ Reduce by E → id
$E + id2 * id3 $ Shift
$E+ id2 * id3 $ Shift
$ E + id2 * id3 $ Reduce by E → id
$E+E * id3 $ Shift
$E+E* Id3 $ Shift
$ E + E * id3 $ Reduce by E → id
$E+E*E $ Reduce by E → E * E
$E+E $ Reduce by E → E + E
$E $ Accept
52
Bottom-up Parsing [��(�)-Parsing]
Stack Implementation of Shift-Reduce Parsing
Conflict during shift/reduce parsing
• „There are two types of conflicts in
shift/reduce parsing:
1. shift/reduce conflict: the parser knows the entire
stack content and the next � symbols but cannot
decide whether it should shift or reduce.
2. reduce/reduce conflict: the parser cannot decide
which of the several productions it should use for
a reduction.
53
Bottom-up Parsing [��(�)-Parsing]
��(�) Grammars
• „Grammars for which we can construct an
��(�) parsing table are called ��(�) grammars.
• Most of the grammars that are used in practice
are ��(1).
– � stands for the number of look ahead symbols
used by the parser.

54
LR Parsers
• The LR parser is a non-recursive, shift-reduce,
bottom-up parser.
– It uses a wide class of context-free grammar which
makes it the most efficient syntax analysis technique.
• LR parsers are also known as ��(�) parsers,
where
– L stands for left-to-right scanning of the input stream;
– R stands for the construction of right-most derivation in
reverse, and
– k denotes the number of lookahead symbols to make
decisions.
55
LR Parsers
• There are three widely used algorithms available
for constructing an LR parser:
1. ���(�) – Simple LR Parser:
• works on smallest class of grammar and has few number of
states, hence very small table, and simple and fast
construction.
2. ��(�) – LR Parser:
• works on complete set of ��(1) Grammar, and generates
large table and large number of states and slow construction.
3. ����(�) – Look-Ahead LR Parser:
• works on intermediate size of grammar, and the number of
states are same as in ���(�).

56
LR Parsers
• The block diagram for an LR parser is shown
below:

57
LR Parsers
• The LR(k) parser stack stores strings of the form:
�0�0�1�1 . . . ���� where
– �� is a new symbol called state that summarizes the
information contained in the stack,
– �� is the state on top of the stack, and
– �� is a grammar symbol.
• The parser program decides the next step by using
– the top of the stack (��),
– the input symbol (��), and
– the parsing table which has two parts: action and goto.
58
LR Parsers
• Action[��, ��] can have one of the following values:
1. ������[��, ��] = ����� �:
• In this case the parser program executes a shift move, entering
the configuration (s0 X1 s1 X2 s2. . . Xmsm ai s, ai+1 . . . an $)
• Here the parser has shifted both the current input symbol �� and
the next state �, which is given in ������[��, ��] onto the stack;
��+� becomes the current input symbol.
2. If ������[��, ��] = ������ � → �:
• The parser pops the first 2� symbols off the stack, where � = |�| (at
this point, ��−� will be the state on top of the stack).
• Then � and � are pushed on top of the stack where � =
����[��−� , �]
• The input buffer is not modified.
3. If ������[��, ��] = ������: parsing is completed.
4. If ������[��, ��] = �����: the parser has discovered an
error and calls an error recovery routine.
59
LR Parsers
• The LR parsing algorithm is summarized below.
– All LR parsers behave in this fashion; the only
difference between one LR parser and another is the
information in the parsing action and goto fields of the
parsing table.
Input: string w, parsing table with action and goto
functions for given grammar G.
Output: if w in L(G), a bottom up parse of w, otherwise
error.
Method: initially, the parser has �� on its stack, where ��
is the initial state and �$ in the input buffer.
– The parser then repeats the following steps until an
accept or error action is encountered.
60
LR Parsers
• The LR parsing algorithm is summarized below.
token = next_token()
repeat forever begin
s = top of stack
if action[s, token] = “shift si” then
PUSH token
PUSH si
token = next_token()
else if action[s, tpken] = “reduce A β“ then
POP 2 * |β| symbols
s = top of stack
PUSH A
PUSH goto[s, A]
else if action[s, token] = “accept” then
return
else
error()
end
61
LR Parsers
Example: Consider the grammar G for arithmetic
expressions and its parsing table below:
action goto
STATE
id + * ( ) $ E T F
1) �→�+� 0 s5 s4 1 2 3
2) �→� 1 s6 acc

3) �→�∗� 2 r2 s7 r2 r2
3 r4 r4 r4 r4
4) �→� 4 s5 s4 8 2 3
5) � → (�) 5 r6 r6 r6 r6

6) � → �� 6 s5 s4 9 3
7 s5 s4 10
Legend: 8 s6 s11
• si means shift and stack state i, 9 r1 s7 r1 r1
• rj means reduce by production 10 r3 r3 r3 r3
numbered j, 11 r5 r5 r5 r5
• acc means accept, and
• blank entry mean an error. 62
LR Parsers
Example: cont.…
– On input � = �� ∗ �� + ��, the sequence of stack and input
contents is shown in the table below:
STACK INPUT ACTION
(1) 0 id * id + id $ Shift
(2) 0 id 5 * id + id $ Reduce by F → id
(3) 0F3 * id + id $ Reduce by T → F
(4) 0T2 * id + id $ Shift
(5) 0T2*7 id + id $ Shift
(6) 0 T 2 * 7 id 5 + id $ Reduce by F → id
(7) 0 T 2 * 7 F 10 + id $ Reduce by T → T * F
(8) 0 T2 + id $ Reduce by E → T
(9) 0 E1 + id $ Shift
(10) 0 E1+6 id $ Shift
(11) 0 E 1 + 6 id 5 $ Reduce by F → id
(12) 0 E1+6F3 $ Reduce by T → F
(13) 0 E1+6T9 $ Reduce by E → E + T
(14) 0 E1 $ accept 63
Constructing LR Parsing Tables
• There are three methods for constructing LR
parsing tables:
– SLR (Simple LR) method
– LALR method
– Canonical LR method

64
Constructing SLR Parsing Tables
• This method is the simplest of the three methods
used to construct an LR parsing tables. „
• It is called SLR (simple LR) because it is the easiest
to implement.
• However, it is also the weakest in terms of the
number of grammars for which it succeeds. „
• A parsing table constructed by this method is
called SLR table. „
• A grammar for which an SLR table can be
constructed is said to be an SLR grammar.
65
Constructing SLR Parsing Tables
�� (�) ����
• An �� (0) item (item for short) is a production of a grammar
G with a dot at some position of the right side.
– For example, for the production � → ��� we have four items:
� → ·���
� → �·��
� → ��·�
� → ���·
– For the production � → � we only have one item: � → · .
• An item indicates what part of a production we have seen
and what we expect to see.
• One collection of sets of LR(0) items is called the canonical
LR(0) collection, provides the basis for constructing the SLR
parsers.
66
Constructing SLR Parsing Tables
• The central idea in SLR method is to construct, from the
grammar, a DFSA to recognize viable prefixes.
• A viable prefix is a prefix of a right sentential form that
can appear on the stack of a shift/reduce parser.
– If we have a viable prefix in the stack it is possible to have
inputs that will reduce to the start symbol.
– Otherwise, we can never reach the start symbol; therefore
we have to call the error recovery procedure.

• To construct the canonical LR(0) collection for a


grammar, we define an augmented grammar and two
functions, closure and goto.
67
Constructing SLR Parsing Tables
The Closure Operation
• If � is a set of items of �, then �������(�) is
the set of items constructed by the two rules:
1. Initially, every item in � is added to �������(�).
2. If � → � ∙ �� is in �������(�) and � → � is a
production, then add � → ∙ � to �, if it is not
already there.
• This rule is applied until no more new item can be
added to �������(�).

68
Constructing SLR Parsing Tables
The Closure Operation
Example:
• Consider the augmented • If � is the set of one item {[�’ → ·�]},
expression grammar G’: then �������(�) contains the items:
�’ → � �′ → ·�
� → � + � � → ·� + �
� → � � → ·�
� → � ∗ � � → ·� ∗ �
� → � � → ·�
� → (�) � → ·(�)
� → �� � → ·��
• Here, �′ → ·� is put in �������(�) by rule (1).
– Since there is an � immediately to the right of a dot, by rule (2) we add the E-
productions with dots at the left end, that is, � → ·� + � and � → ·�.
– Now there is a � immediately to the right of a dot, so we add � → ·� ∗ � and
� → ·�.
– Next, the � to the right of a dot forces � → ·(�) and � → ·�� to be added.
– No other items are put into �������(�) by rule (2).

69
Constructing SLR Parsing Tables
The Goto Operation
• The second useful function is ����(�, �) where � is a set of items and � is a
grammar symbol.
• ����(�, �) is defined to be the closure of the set of all items [� →
��·�] such that [� → �·��] is in �.
• Intuitively, if � is the set of all items that are valid for some viable prefix �,
then ����(�, �) is the set of items that are valid for the viable prefix ��.
Example: If � is the set of two items {[�’ → �·], [� → �· + �]}, then
����(�, +) consists of
� → � + ·�
� → ·� ∗ �
� → ·�
� → ·(�)
� → ·��
• We computed ����(�, +) by examining � for items with + immediately to
the right of the dot.
– �’ → �· is not such an item, but � → �· + � is. We moved the dot over the + to
get {� → � + ·�} and then took the closure of this set.
70
Constructing SLR Parsing Tables
The Sets-of-Items Construction
• We are now ready to give the algorithm to construct �,
the canonical collection of sets of LR(0) items for an
augmented grammar �′; the algorithm is:
��������� �����(�’);
�����
� : = {������� ({[�’ → ·� ]})};
repeat
for each set of items � in � and each grammar symbol �
such that ����(�, �) is not empty and not in � do
add ����(�, �) to �;
until no more sets of items can be added to �
���
71
Constructing SLR Parsing Tables
The Sets-of-Items Construction
Example: The canonical collection of set of LR(0) items for the above grammar
is shown below:
�� = �������({[�′ → ·�]}) = {[�′ → ·�], [� → ·� + �], [� → �], [� → ·� ∗ �], [�
→ ·�], [� → ·(�)], [� → ·��]}

�� = ����(��, �) = {[�′ → �·], [� → �· + �]},

�� = ����(��, �) = {[� → �·], [� → � · ∗ �]}

�� = ����(��, �) = {[� → �·]}

�� = ����(��, ( ) = {[� → (·�)], [� → ·� + �], [� → ·�], [� → ·� ∗ �], [� →


·�], [� → ·(�)], [� → ·��]}

�� = ����(��, ��) = {[� → ��·]}

�� = ����(��, +) = {[� → � + ·�], [� → ·� ∗ �], [� → ·�], [� → ·(�)], [� → ·��]}

�� = ����(��, ∗ ) = {[� → � ∗ ·�], [� → ·(�)], [� → ·��]}

72
Constructing SLR Parsing Tables
The Sets-of-Items Construction
Example: cont.…
�� = ����(��, �) = {[� → (�·)], [� → � · + �]
����(��, ( ) = ��
����(��, ��) = ��
�� = ����(��, � ) = {[� → � + �·], [� → � · ∗ �]}
����(��, �) = ��
����(��, ( ) = ��
����(��, �� ) = ��
��� = ����(��, �) = {[� → � ∗ � ·]}
����(��, ( ) = ��
����(��, ��) = ��
��� = ����(��, ) ) = {[� → (�)·]}
����(��, +) = ��
����(��, ∗ ) = ��
• The Goto function for this set of items is shown as the transition diagram of
a deterministic finite automaton D in the next figure.
73
Constructing SLR Parsing Tables
The Sets-of-Items Construction
Example: cont.…

74
Constructing SLR Parsing Tables
Valid Items
• We say an item � → �1·�2 is valid for a viable
prefix

��1 ∗if there is derivation
�′⇒ ���⇒ ������; that is, �’ drives in zero or
�� ��
more steps to ��� and ��� drives in zero or
more steps to ��1�2�.
• If � → �1·�2 is valid for ��1, it tells a lot about
whether to shift or reduce when we find ��1 on
stack.

Theorem: The set of valid items embodies all useful


information that can be gleamed from the stack.
75
Constructing SLR Parsing Tables
SLR Table Construction Method
Input: An augmented grammar G'.
Output: The SLR parsing table functions action and goto for
G'.
Method:
1. Construct � = {�0, �1, . . . , ��}, the collection of the set
of LR(0) items for G’.
2. State � is constructed from ��. The parsing actions for state
� are determined as follows:
a) If [� → �·��] is in �� and ����(��, �) = �� then set
������[�, �] to “�ℎ��� �.” Here � must be a terminal.
b) If [� → �·] is in ��, then set ������[�, �] to “������ � → �” for
all � in ������(�); here � may not be �’.
c) If [�′ → �·] is in �� then set ������[�, $] to “������.”

76
Constructing SLR Parsing Tables
SLR Table Construction Method
Input: An augmented grammar G'.
Output: The SLR parsing table functions action and goto for G'.
Method: cont.…
If no conflicting action is created by 1 and 2, then the grammar is
SLR(1); otherwise it is not.

3. The ���� transitions for state � are constructed for all


nonterminal � using the rule:
If ����(��, �) = �� then ����[�, �] = � .
4. All entries of the parsing table not defined by rules (2) and (3)
are made “error.”
5. The initial state of the parser is the one constructed from the
set of items containing [�′ → ·�].

77
Constructing SLR Parsing Tables
Example 1: Construct the SLR parsing table for the grammar G‘:
�’ → �
� → � + �
� → �
� → � ∗ �
� → �
� → (�)
� → ��
• We need the ������(�), where � is any nonterminal in the given
grammar G.
• The ������ set for E, T and F are:
������(�) = { + , ), $ },
������(�) = { + , ), $, ∗ },
������(�) = { + , ) , $ }

• By following the above method to construct the parsing table, we can


find the Parsing table we have used earlier.
78
Constructing SLR Parsing Tables
Example 2: Construct the SLR parsing table for the following
grammar G:
(1) � → � = �
(2) � → �
(3) � → ∗�
(4) � → ��
(5) � → �
• The canonical collection of set of LR(0) items for the
grammar is shown below:
�0 = �������({[�′ → ·�]}) = {[�′ → ·�], [� → ·� = �], [�
→ ·�], [� → · ∗ �], [� → ·��], [� → ·�]}
�1 = ����(�0, �) = {[�′ → �·]}
�2 = ����(�0, �) = {[� → �· = �], [� → �·]}
�3 = ����(�0, �) = {[� → �·]}
79
Constructing SLR Parsing Tables
Example 2: cont.…
�4 = ����(�0, ∗) = {[� → ∗ ·�], [� → ·�], [� → · ∗ �], [� → ·��]}
�5 = ����(�0, ��) = {[� → �� ·]}
�6 = ����(�2, =) = {[� → � = ·�], [� → ·�], [� → · ∗ �], [� → ·��]}
�7 = ����(�4, �) = {[� → ∗ �·]}
�8 = ����(�4, �) = {[� → �·]}
����(�4, ∗) = �4
����(�4, ��) = �5
�9 = ����(�6, �) = {[� → � = � ·]}
����(�6, �) = �8
����(�6, ∗) = �4
����(�6, ��) = �5
• The ������(�) for all nonterminal � in G’ are:
������(�) = {$}
������(�) = {$, = }
������(�) = {$, = }

80
Constructing SLR Parsing Tables
Example 2: cont.…
• The SLR(1) parsing table constructed from the above
canonical collection of set of LR(0) is as in the table below:
action goto
States
id * = $ S L R

0 s5 s4 1 2 3

1 accept

2 s6, r5 r5

3 r2

4 s5 s4 8 7

5 r4 r4

6 s5 s4 8 9

7 r3 r3

8 r5 r5

9 r1
81
Constructing SLR Parsing Tables
Example 2: cont.…
• We have shift/reduce conflict since = is in ������(�)
and [� → �·] is in �2 and ���� (�2, =) = �6.
• G is not an ambiguous grammar, however, it is not
SLR(1).
– This is because the SLR parser is not powerful enough to
remember enough left context to decide whether to shift or
reduce when it sees an =.
• The next two methods are capable of solving this
problem.
• But, there still are some grammars that cannot be
parsed by any of the methods that are seen here.
82
Canonical ��(�) Parsing
• The states produced using ��(0) items do not
hold enough information. „
– It is possible to hold more information in the state to
rule out some invalid reductions. „
– By splitting states when necessary, we indicate which
symbol can exactly follow a handle. „
• Consider the previous example:
������(�) = {$}; ������(�) = {$, =}; ������(�) = {$, =}
• We have shift/reduce conflict since
= is in Follow (R) and � → �. is in �2 and ����(�2 , =) =
�6
������[ 2, = ] = �ℎ��� 6 / ������ � → �
83
Canonical ��(�) Parsing
��(�) ����
• An LR (1) item has the form of [� → �·�, �] where � → �� is a
production and � is a terminal or the right end marker $.
• Formally, we say ��(�) item ∗
[� →∗�·�, �] is valid for a viable prefix
� if there is a derivation �⇒ ���⇒ ����, where
�� ��
1. � = �� , and
2. either � is the first symbol of �, or � is  and � is $.
Example: Let us consider the grammar � → �� , � → �� | �

– There is a rightmost derivation �⇒ �����⇒ ������.
�� ��
– We see that item [� → �·�, �] is valid for a viable prefix � = ��� by
letting � = ��, � = �, � = ��, � = �, and � = � in the above
definition.

– There is also a rightmost derivation �⇒ ���⇒ ����.
�� ��
– From this derivation we see that item [� → � · �, $] is valid for viable
prefix ���. 84
Canonical ��(�) Parsing
��(�) ����
• An LR (1) item has the form of [� → �·�, �] where � → �� is a
production and � is a terminal or the right end marker $.
• Formally, we say ��(�) item ∗
[� →∗�·�, �] is valid for a viable prefix
� if there is a derivation �⇒ ���⇒ ����, where
�� ��
1. � = �� , and
2. either � is the first symbol of �, or � is  and � is $.
Example: Let us consider the grammar � → �� , � → �� | �

– There is a rightmost derivation �⇒ �����⇒ ������.
�� ��
– We see that item [� → �·�, �] is valid for a viable prefix � = ��� by
letting � = ��, � = �, � = ��, � = �, and � = � in the above
definition.

– There is also a rightmost derivation �⇒ ���⇒ ����.
�� ��
– From this derivation we see that item [� → � · �, $] is valid for viable
prefix ���. 85
Canonical ��(�) Parsing
• An LR (1) item has the form of [� → � ∙ �, �]
where � is a terminal or $.

• The functions ������� (�), G���(�, �) and


Items (G’) will be slightly different from those
used for the production of an SLR parsing table.

86
Canonical ��(�) Parsing
The Closure Operation
• � is a set of �� (1) items, ������� (�) is found
using the following algorithm:
������� [�] : = �
Repeat
For each item [� → � ∙ ��, �] in C������(�), each
production � → � in G’ and each terminal � in
�����(��) such that [� → ∙ �, �] is not in
C������ (�)
do
add [� → ∙ �, �] to ������� (�)
until no more items can be added to C������ (�)
87
Canonical ��(�) Parsing
The Closure Operation
Example: Let the grammar G‘ with 5 productions be: (1) � → � =
� (2) � → � (3) � →∗ � (4) � → �� (5) � → �.
• We begin by computing the closure of the LR(1) item [�’ → ·�, $].
– To close, we match the item [�’ → ·�, $] with the item [� → �·��, �] in the
procedure Closure.
• That is, � = �′, � = �, � = �, � = �, and � = $.
– Function Closure tells us to add [� → ·�, �] for each production � → � and
terminal � in �����(��).
– In terms of the present grammar, � → � must be � → � = � and � → �, since
� is �, � is $, � may only be $.
• Thus we add [� → ·� = �, $] and [� → · �, $].
• We continue by adding all items [� → ·�, �] for � in �����(= �$).
– That is, matching [� → ·� = �, $] against [� → �·��, �] we have � = �, � =
�, � = �, � = = � and � = $.
– Since � does not derive the empty string, �����(= �$) = �����(= �).
– Since �����(= �) contains terminal =, we add items [� → · ∗ �, =] and
[� → ·��, =].
• None of the new items has a nonterminal immediately to the right of the dot.
88
Canonical ��(�) Parsing
The Closure Operation
Example: cont.….
• Continuing in a similar way for the second item [� →
·�, $], and newly obtained items, we have the initial set
of items �0 below:
�� = �������({[�’ → ·�, $]}) =
{[�’ → ·�, $], [� → ·� = �, $], [� → ·�, $],
[� → · ∗ �, =], [� → ·��, =], [� → ·�, $],
[� → · ∗ �, $], [� → · ��, $]}

• The next set of items are computed by using the goto


function ����(�0, �) for the various values of �.
89
Canonical ��(�) Parsing
The Goto operation
• ����(�, �) is defined as the closure of all items
[� → ��·�, �] such that [� → �·��, �] is in �.
• Below is the algorithm to compute ����(�, �).
�������� ����(�, �)
�����
Let � be the set of items [� → ��·�, �]
such that [� → �·��, �] is in �;
������ �������(�)
���;

Example: ����(�0, �) = {[�′ → �·, $]}. 90


Canonical ��(�) Parsing
The set of ��(�) Items Construction
• Algorithm to construct �, the canonical collection of sets of
��(�) items for augmented grammar G‘ is given below:

��������� �����(�’);
�����
� : = {�������({ [�’ → ·�, $] }) };
������
for each set of items � in � and each grammar symbol
� such that ����(�, �) is not empty and not in � ��
add ����(�, �) to �;
����� no more sets of items can be added to �
���

91
Canonical ��(�) Parsing
The set of ��(�) Items Construction
Example:
• Continuing from the above example, we have following sets of ��(�) items
computed by using the above procedure.
�� = �������({[�’ → ·�, $]}) = {[�’ → ·�, $], [� → ·� = �, $], [�
→ · �, $], [� → · ∗ �, =], [� → ·��, =], [� → ·�, $], [� → · ∗ �, $], [�
→ · ��, $]}.
• Now we compute ����(��, �) for the various values of �.
– For � = � we must close the item [�′ → �·, $]. No additional closure is possible,
since the dot is at the right end.
– Thus we have the next set of items:
�� = ���� (��, �) = {[�′ → �·, $]}
– For � = � we close [� → �· = �, $] and [� → �·, $]. No additional closure is
possible, since no nonterminal immediately to the right of the dot.
– The we have the next set of items:
�� = ����(��, �) = {[� → �· = �, $], [� → �·, $]}

92
Canonical ��(�) Parsing
The set of ��(�) Items Construction
Example: cont.…
• Continuing in a similar fashion, we have:
�� = ����(��, �) = {[� → �·, $]}
�� = ����(��, ∗) = {[� → ∗ · �, = |$ ], [� → · ∗ �, = |$], [� → ·��, =
|$], [� → ·�, = |$]}
�� = ���� (��, ��) = {[� → �� · , = | $ ]}
�� = ����(��, =) = {[� → � = · �, $], [� → · �, $ ], [� → · ∗ �, $], [� →
· ��, $]}
�� = ����(��, �) = {[� → ∗ � · , = | $ ]}
�� = ����(��, �) = {[� → � · , = | $ ]} , ���� (��, ∗) = ��,
����(��, ��) = ��
�� = ����(��, �) = {[� → � = � · , $]}
��� = ����(��, �) = {[� → � · , $ ]}
��� = ����(��, ∗) = {[� → ∗ · �, $], [� → · ∗ �, $], [� → · ��, $], [� →
· �, $]}
��� = ����(��, ��) = {[� → ��· , $]}, ����(���, ∗) = ���, ����(���, ��) =
���,
����(���, �) = ��� 93
Canonical ��(�) Parsing
Construction of the canonical LR parsing table
Input: An augmented grammar G'.
Output: The canonical LR parsing table functions action and goto for G'.
Method:
1. Construct � = {�0, �1, . . . , ��}, the collection of the set of ��(�) items for
G’.
2. State � is constructed from ��. The parsing actions for state � are determined as
follows:
a) If [� → �·��, �] is in �� and ����(�� , �) = �� then set ������[�, �] to “�ℎ��� �.” Here
� must be a terminal.
b) If [� → �·, �] is in ��, � ≠ �′, then set ������[�, �] to “������ � → �.”.
c) If [�′ → �·, $] is in �� then set ������[�, $] to “������.”
If a conflict results from the above rules, the grammar is said not to be ��(�), and
the algorithm is said to fail.
1. The goto transitions for state � are determined as follows:
����(��, �) = �� �ℎ�� ����[�, �] = � .
2. All entries of the parsing table not defined by rules (2) and (3) are made
“�����.”
3. The initial state of the parser is the one constructed from the set containing
item [�′ → ·�, $]. 94
Canonical ��(�) Parsing
Construction of the canonical LR parsing table
• The table constructed using this method is called the
canonical ��(�) parsing table.
• An LR Parser using this table is a called a
��������� ��(�) parser.
• If the parsing action has no multiply-defined entries,
then the given grammar is called an ��(�) �������.

Example: Construct a canonical LR(1) parsing table for the


augmented grammar G' with productions:
�′ → �
� → ��
� → �� | �
95
Canonical ��(�) Parsing
Construction of the canonical LR parsing table
Example: G’: (1)�′ → � (2)� → �� (3) � → �� | �
• The initial set of items is:
�� = �������({[� → ·�, $]}) = {[� → ·�, $], [�
→ ·��, $], [� → ·��, �|�], [� → ·�, �|�]}
• Here we use a shortened notation [� → ·��, �|�] for the two items [� →
·��, �] and [� → ·��, �].
• Now we compute ����(��, �) for the various values of �.
For � = � we must close the item [� → �·, $].
– No additional closure is possible, since the dot is at the right end.
– Thus we have the next set of items:
�� = ����(��, �) = {[� → �·, $]}
For X = C we close [� → �·�, $].
– We add the C-productions with second component $ and then can add
no more, yielding
�� = ����(��, �) = {[� → �·�, $], [� → ·��, $], [� → ·�, $]}
96
Canonical ��(�) Parsing
Construction of the canonical LR parsing table
Example: cont.…
• Now, let � = �. We must close {[� → �·�, �|�]}.
– We add the C-productions with the second component yielding:
�� = ����(��, �) = {[� → �·�, �|�], [� → ·��, �|�], [� → ·�, �|�]}
• Finally, we let � = �, and we wind up with the set of items:
�� = ����(��, �) = {[� → �·, �|�]}
• We have finished considering goto on ��.
• We get no new sets from ��, but �� has goto's on �, �, and �.
– On � we get:
�� = ����(��, �) = {[� → ��·, $]}
no closure being needed.
– On � we take the closure of {[� → �·�, $]}, to obtain:
�� = ����(��, �) = {[� → �·�, $], [� → ·��, $], [� → ·�, $]}
• Note that �� differs from �� only in second components.
– We shall see that it is common for several sets of LR(1) items for a grammar to
have the same first component and differ in their second components.
97
Canonical ��(�) Parsing
Construction of the canonical LR parsing table
Example: cont.…
• Continuing with the goto function for ��, ����(��, �) is
seen to be:
�� = ����(��, �) = {[� → �·, $]}
• Turning now to ��, the goto's of �� on � and � are �� and ��,
respectively, and ����(��, �) is:
�� = ����(��, �) = {[� → ��·, $]}
����(��, �) = ��
����(��, �) = ��
• The remaining sets of items yield no goto's, so we are done.
• The following figure shows the ten sets of items with their
goto's.
98
Canonical ��(�) Parsing
Example: cont.…

99
Canonical ��(�) Parsing
Construction of the canonical LR parsing table
Example: cont.…
• Now we can construct the canonical parsing table for the grammar G with
its productions numbered as in below by using the above algorithm.
(1) � → �� (2) � → �� (3) � → �
action goto
STATE
c d $ S C
0 s3 s4 1 2
1 accept
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2
100
���� Parsing
• The last parser construction method that we are going to
cover is the (lookahead-LR) technique.
• The most widely used method since the table obtained using
this method is much smaller than that obtained by canonical
LR method.
• For example, for a language such as Pascal, LALR table will
have hundreds of states canonical LR table will have
thousands of states.
• Much more grammars are LALR than SLR, but there are only
few grammars that are LR but not LALR, and these grammars
can be easily avoided.
• Thus, it is much easier and more economical to construct
SLR and LALR tables than the canonical LR tables.

101
���� Parsing
• We now give the LALR table construction
algorithms.
• The general idea is to construct the sets of LR(1)
items, and if no conflict arise, merge sets with
common cores.
• We then construct the parsing table from the
collection of merged sets of items.
• The method we are about to describe serves
primarily as a definition of LALR(1) grammars.
• Constructing the entire collection of LR(1) sets of
items requires too much space and time to be
useful in practice.
102
���� Parsing
Algorithm: An easy, but space-consuming LALR table construction.
Input: An augmented grammar G'.
Output: The LALR parsing table functions action and goto for G'.
Method:
– Construct � = {�0, �1, . . . , ��}, the collection of sets ��(�) items.
– For each core present among the set of ��(�) items, find all sets having that
core, and replace these sets by their union.
– Let �′ = {�0, �1, . . . , ��} be the resulting sets of ��(�) items.
– The parsing actions for state � are constructed from �� in the same manner as in
LR algorithm.
If there is a parsing action conflict, the algorithm fails to produce a parser, and the
grammar is said not to be LALR(1).
• The goto table is constructed as follows.
– If � is the union of one or more sets of LR(1) items, that is, � = �1 ∪ �2 ∪ . . . ∪
��, then the cores of ����(�1, �), ����(�2, �), . . . , ����(��, �) are the same,
since �1, �2, . . . , �� all have the same core.
– Let � be the union of all sets of items having the same core as ����(�1, �).
– Then ����(�, �) = �.
103
���� Parsing
• The table produced by the above algorithm is
called LALR parsing table for G.
• If there are no parsing action conflicts, then the
given grammar is said to be an LALR(1)
grammar.
• The collection of sets of items constructed in
step (3) is called the LALR(1) collection.

104
���� Parsing
Example: Again consider the grammar in the previous
example:
�′ → �
� → ��
� → �� | �
• As we can see, there are three pairs of sets of items that
can be merged.
– �� and �� are replaced by their union:
���: {[� → �·�, �|�|$], [� → ·��, �|�|$], [� → ·�, �|�|$]}
– �� and �� are replaced by their union:
���: {[� → ·�, �|�|$]}
– And �� and �� are replaced by their union:
���: {[� → ��·, �|�|$]}

105
���� Parsing
Example: cont.…
• The LALR action and goto functions for the condensed
sets of items are shown in the table below:
action goto
STATE
c d $ S C

0 s36 s47 1 2

1 accept

2 s36 s47 5

36 s36 s47 89

47 r3 r3 r3

5 r1

89 r2 r2 r2

106
���� Parsing
Example: cont.…
• To see how the goto's are computed, consider ����(���, �).
– In the original set of LR(1) items, ����(��, �) = ��, and �� is now part
of ���, so we make ����(���, �) be ���.
– We could have arrived at the same conclusion if we considered ��, the
other part of ���.
– That is ����(��, �) = ��, and �� is now part of ���.
– For another example, consider ����(��, �), an entry that is exercised
after the shift action of �� on input �.
– In the original sets of LR(1) items, ����(��, �) = ��.
– Since �� is now part of ���, ����(��, �) becomes ���.
• Thus, the entry in above table for state 2 and input � is made �36,
meaning shift and push state 36 on the stack.
• There is also an efficient construction of LALR parsing tables.
– An interested reader can go for that algorithm.
– For now we conclude our discussion here.
107
Limitations of Syntax Analyzers
• Syntax analyzers receive their inputs, in the form of tokens,
from lexical analyzers.
• Lexical analyzers are responsible for the validity of a token
supplied by the syntax analyzer.
• Syntax analyzers have the following drawbacks -
– it cannot determine if a token is valid,
– it cannot determine if a token is declared before it is being used,
– it cannot determine if a token is initialized before it is being used,
– it cannot determine if an operation performed on a token type
is valid or not.
• These tasks are accomplished by the semantic analyzer,
which we shall study in Semantic Analysis.

108
End of CH3!
• Thank You!!!

109

You might also like