Lecture 3: Top-Down
Parsing (Part-1)
Dr. Raheel Siddiqi
Compiler Construction
Parsing is the process of determining how a
string of terminals can be generated by a
grammar.
Most parsing methods fall into one of two
classes, called the top-down and bottom-up
methods.
These terms refer to the order in which nodes in
the parse tree are constructed.
In top-down parsers, construction starts at the
root and proceeds towards the leaves, while in
bottom-up parsers, construction starts at the leaves
and proceeds towards the root.
The popularity of top-down parsers is due
to the fact that efficient parsers can be
constructed more easily by hand using top-
down methods.
Bottom-up parsing, however, can handle a
larger class of grammars, so software tools
for generating parsers directly from
grammars often use bottom-up methods.
• A grammar (we call it “Grammar 1”) for some statements in C and
Java
Input String: for ( ; expr ; expr ) other
A parse tree according to the grammar
Top-down parsing
while scanning the
input from left to
right
Recursive Predictive Parsing
• In predictive parsing, the lookahead symbol
unambiguously determines the flow of control
through the procedure body for each nonterminal.
• The recursive predictive parser consists of procedures
for the non-terminals (e.g. stmt and optexpr) of the
grammar and an additional procedure match, used
to simplify the code for stmt and optexpr.
Pseudocode for a recursive predictive parser for “Grammar 1”
Procedure match(t) compares its argument t
with the lookahead symbol and advances to the
next input terminal if they match.
Parsing begins with a call of the procedure for the
starting nonterminal stmt.
• Predictive parsing relies on information about the first
symbols that can be generated by a production body.
• The FIRST sets must be considered if there are two
productions A -> α and A -> β.
• Predictive parsing requires FIRST(α) and FIRST(β) to
be disjoint.
Left-Recursion
It is possible for a recursive predictive parser to loop forever.
A problem arises with "left-recursive" productions like:
expr -> expr + term
where the leftmost symbol of the body is the same as the nonterminal at the
head of the production.
Suppose the procedure for expr decides to apply this production.
The body begins with expr so the procedure for expr is called
recursively.
Since the lookahead symbol changes only when a terminal in the body
is matched, no change to the input took place between recursive calls
of expr.
As a result, the code execution is stuck in an infinite sequence of
recursive calls.
Consider a nonterminal A with two productions
A -> Aα | β
where α and β are sequences of terminals and non-terminals that do
not start with A.
For example, in
expr -> expr + term | term
nonterminal A = expr, string α = + term, and string β = term.
The nonterminal A and its production are said to be left recursive,
because the production A -> Aα has A itself as the leftmost symbol on
the right side.
Repeated application of this production builds up a sequence of α's
to the right of A
When A is finally replaced by β, we have a β followed by a sequence
of zero or more α's.
The same effect can be achieved by rewriting the productions for A in
the following manner, using a new nonterminal R:
A -> β R
R -> α R | ϵ
Nonterminal R and its production R -> αR are right recursive because
this production for R has R itself as the last symbol on the right side.
Right-recursive productions lead to trees that grow down towards
the right.
Exercises
Construct recursive predictive parsers, starting with the following
grammars:
i.S -> + S S ǀ - S S ǀ a
ii.S -> 0 S 1 | 0 1