0% found this document useful (0 votes)
13 views83 pages

Chapter 4 - Syntax Analysis-1

Uploaded by

fozia tariq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views83 pages

Chapter 4 - Syntax Analysis-1

Uploaded by

fozia tariq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Compiler course

Chapter 4
Syntax Analysis
Outline
 Role of parser
 Context free grammars
 Top down parsing
 Bottom up parsing
 Parser generators
The role of parser

Lexical token Rest of


Source Parse tree Intermediate
program Analyze Parser Front representation
r getNext End
Token

Checks whether the


tokens are syntactically
Symbol
true.
table Checks code is according
to rules or not.
The Parser
 A parsing is a process of deriving string from the given
grammar. (CFG)
 It check the proper grammar of the token generated by
lexical analyzer (Scanner).
 Generate the parse tree (Derivation etc.)
 A parser implements a C-F grammar as a recognizer of strings
 The role of the parser in a compiler is two-fold:
1. To check syntax (= string recognizer)
 And to report syntax errors accurately
2. To invoke semantic actions
 For static semantics checking, e.g. type checking of
expressions, functions, etc.
 For syntax-directed translation of the source code to an
intermediate representation

4
Uses of grammars

E -> E + T | T s->T+S|T2
T -> T * F | F T->id*T|id|(s)
F -> (E) | id w= id*id*id
Error handling
 Common programming errors
◦ Lexical errors
◦ Syntactic errors
◦ Semantic errors

 Error handler goals


◦ Report the presence of errors clearly and
accurately
◦ Recover from each error quickly enough to detect
subsequent errors
◦ Add minimal overhead to the processing of
correct programs
Error-recover strategies
 Error recovery strategies in syntax analysis aim to
enable a parser to continue processing input even after
encountering a syntax error, rather than halting
immediately. This allows for the detection of multiple
errors and provides more comprehensive feedback to
the user.
 Panic mode recovery
◦ Discard input symbol one at a time until one of
designated set of synchronization tokens (semicolons,
braces, keywords like end) is found
◦ Resume parsing from that point.
◦ Easy to implement and prevents infinite loops.
 Phrase level recovery
◦ When an error is detected, the parser tries to replace,
insert, or delete a small number of tokens to transform
the erroneous phrase into a valid one, allowing parsing
to continue.
◦ More effective than panic mode.
Error-recover strategies
 Error productions
◦ Augmenting the grammar with special "error
productions" that explicitly describe common syntactic
errors. When the parser encounters an input sequence
matching an error production, it can recognize the error
and potentially provide specific, helpful error messages
or even attempt a predefined correction.
 Global correction
◦ Choosing minimal sequence of changes to obtain a
globally least-cost correction by using complex
algorithms.
 Statement Mode Recovery:
◦ Similar to phrase-level recovery, but focused on
recovering at the statement level. When an error occurs
within a statement, the parser attempts to make local
corrections (e.g., inserting missing semicolons,
correcting misplaced commas) to allow the parsing of
the rest of the statement to proceed.
Context free grammars
(Components)
 Terminals
◦ They are the concrete "words" that the lexical analyzer
(scanner) feeds to the parser.
◦ The parser cannot break down terminals any further.
◦ They appear as leaf nodes in a parse tree.
◦ They are the actual tokens in the input stream, such as
keywords, operators, identifiers, and literals.
 Non terminals
◦ They represent the abstract syntactic categories
like expression, statement, or variable.
◦ They are often represented by uppercase letters in grammar
examples, such as E for expression or S for statement.
Context free grammars
(Components)
 Start symbol
 Productions

◦ These are formal rules of the grammar that specify


how terminals and non-terminals can be combined
to form strings.
◦ Each production consists of a non-terminal on the
left side and an arrow (->) pointing to a sequence of
terminals and/or non-terminals on the right side.
Context free grammars
expression -> expression + term
expression -> expression – term
expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id
In this example, Expression, Term, and Factor are non-terminals,
while +, *, (, ), and Number are terminals.
Each line represents a production rule, defining how an
Expression can be formed from Terms, how Terms can
be formed from Factors, and so on.
Derivations
 Productions are treated as rewriting rules to
generate a string
 Rightmost and leftmost derivations

◦ E -> E + E | E * E | -E | (E) | id
◦ Derivations for –(id+id)
 E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
Parse trees
 A parse tree is a graphical
representation of a derivation that
filters out the order in which
productions are applied to replace
non-terminals.
 Each interior node of a parse tree
represents the application of a
production.
 -(id+id)
 E => -E => -(E) => -(E+E) => -
(id+E)=>-(id+id)
Ambiguity
 For some strings
there exist more
than one parse tree
 Or more than one
leftmost derivation
 Or more than one
rightmost derivation
 Example: id+id*id
Elimination of ambiguity

Here “other” stands for any other statement. According to


this grammar, the compound conditional statement
if E1 then S1 else if E2 then S2 else S3
if E1 then if E2 then S1 else S2
Elimination of ambiguity
(cont.)
 Idea:
◦ A statement appearing between a then and an
else must be matched
Top Down Parsing
Introduction

 A Top-down parser tries to create a parse tree from


the root towards the leafs scanning input from left to
right
 It can be also viewed as finding a leftmost derivation
for an input string
 Example: id+id*id

E -> TE’ E E E E E E
lm lm lm lm lm
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’
T’ -> *FT’ | Ɛ F T’ F T’ F T’ F T’ + T E’

F -> (E) | id id id Ɛ id Ɛ
Recursive descent parsing

 Consists of a set of procedures, one for each


nonterminal
 Execution begins with the procedure for start
symbol
 A typical procedure for a non-terminal
void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Recursive descent parsing
(cont)
 General recursive descent may require

backtracking
 The previous code needs to be modified to

allow backtracking
 In general form it cant choose an A-production

easily.
 So we need to try all alternatives
 If one failed the input pointer needs to be reset

and another alternative should be tried


 Recursive descent parsers cant be used for left-

recursive grammars
ExamplePointer moves toward next leaf
and next terminal in input
S->cAd Step:2 Buffer c a d
A->ab | a S

Input: cad c A d

Step:1 Buffer c a d a b Pointer will


backtrack in
S
parse tree and
Step:3 Buffer c a d input and take
c A d
S next alternate
for production
At start pointer will be at and not
c A d
left leaf node of parse tree. consider
a b current
Example

Step:4 Buffer c a d Step:5 Buffer c a d


S
S Pointer in
parse tree
c A d
c A d and input
matches
a
a terminal so
move next.
First and Follow
 Provide aid to both top down and bottom-up approaches.
 During top-down parsing, FIRST and FOLLOW allow us to
choose which production to apply, based on the next
input symbol.
 First(A): contains all terminals present in first place
of every string derived by A.
 If α=>ɛ then ɛ is also in First(α)
 A=>cy
Computing First
 To compute First(X) for all grammar symbols
X, apply following rules until no more
terminals
* or ɛ can be added to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a non terminal and X->Y1Y2…Yk is a
production for some k>=1, then place a in First(X)
if for some i a is in First(Yi) and ɛ is in all of
First(Y1),…,First(Yi-1) that is Y1…Yi-1 => ɛ. if ɛ is
in First(Yj)* for j=1,…,k then add ɛ to First(X).
3. If X-> ɛ is a production then add ɛ to First(X)
 Example!
Computing follow
 To compute First(A) for all nonterminals A,
apply following rules until nothing can be
added to any follow set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything
in First(β) except ɛ is in Follow(B).
3. If there is a production A->B or a production
A->αBβ where First(β) contains ɛ, then
everything in Follow(A) is in Follow(B)
 Example!
LL(1)

*
Home Work
Bottom-up Parsing
in order to reach the start symbol.
The image given below depicts the bottom-up parsers available.
Shift-Reduce Parsing
Shift-reduce parsing uses two unique steps for bottom-up parsing.
These steps are known as shift-step and reduce-step.
Shift step: The shift step refers to the advancement of the input
pointer to the next input symbol, which is called the shifted
symbol. This symbol is pushed onto the stack. The shifted symbol
is treated as a single node of the parse tree.
Reduce step : When the parser finds a complete grammar rule
(RHS) and replaces it to (LHS), it is known as reduce-step. This
occurs when the top of the stack contains a handle. To reduce, a
POP function is performed on the stack which pops off the handle
and replaces it with LHS non-terminal symbol.
LR Parser
The LR parser is a non-recursive, shift-reduce, bottom-up parser.
It uses a wide class of context-free grammar which makes it the
most efficient syntax analysis technique. LR parsers are also
known as LR(k) parsers, where L stands for left-to-right scanning
of the input stream; R stands for the construction of right-most
derivation in reverse, and k denotes the number of lookahead
symbols to make decisions.
LALR(1) – Look-Ahead LR Parser:
1. LALR Parser is lookahead LR parser.
2. It can handle large classes of grammar.
3. Size of CLR parsing table is quite large as compared to
other parsing table.
4. LALR reduces the size of this table produces by CLR.
5. LALR works similar to CLR. The only difference is , it
combines the similar states of CLR parsing table into one
single state.
Error Handling in Compiler Design

 The tasks of the Error Handling process


are to detect each error, report it to the
user, and then make some recover strategy
and implement them to handle error. During
this whole process processing time of
program should not be slow. An Error is the
blank entries in the symbol table.
Types or Sources of Error –
There are two types of error: run-time and compile-
time error:
 A run-time error: is an error which takes place

during the execution of a program, and usually


happens because of adverse system parameters
or invalid input data. The lack of sufficient
memory to run an application or a memory
conflict with another program and logical error
are example of this. Logic errors, occur when
executed code does not produce the expected
result. Logic errors are best handled by
meticulous program debugging.
Types or Sources of Error –
 Compile-time errors: rises at compile
time, before execution of the program.
Syntax error or missing file reference that
prevents the program from successfully
compiling is the example of this.
Classification of Compile-time
error –
 Lexical : This includes misspellings of
identifiers, keywords or operators
 Syntactical : missing semicolon or

unbalanced parenthesis
 Semantical : incompatible value

assignment or type mismatches between


operator and operand
 Logical : code not reachable, infinite loop.
Finding error or reporting an error

Viable-prefix is the property of a parser which
allows early detection of syntax errors.
 Goal: detection of an error as soon as possible

without further consuming unnecessary input


 How: detect an error as soon as the prefix of the

input does not match a prefix of any string in the


language.
 Example: for(;), this will report an error as for

have two semicolons inside braces.


Error Recovery
basic requirement for the compiler is to simply stop
and issue a message, and cease compilation. There
are some common recovery methods that are
follows.
Error Recovery
 Panic mode recovery: This is the easiest way of
error-recovery and also, it prevents the parser from
developing infinite loops while recovering error. The
parser discards the input symbol one at a time until
one of the designated (like end, semicolon) set of
synchronizing tokens (are typically the statement or
expression terminators) is found. This is adequate
when the presence of multiple errors in same
statement is rare. Example: Consider the erroneous
expression- (1 + + 2) + 3. Panic-mode recovery: Skip
ahead to next integer and then continue. Bison: use
the special terminal error to describe how much input
to skip.
 E->int|E+E|(E)|error int|(error)
Error Recovery
 Phase level recovery: Perform local
correction on the input to repair the error.
But error correction is difficult in this
strategy.
 Error productions: Some common errors

are known to the compiler designers that


may occur in the code. Augmented
grammars can also be used, as productions
that generate erroneous constructs when
these errors are encountered. Example:
write 5x instead of 5*x
Error Recovery
 Global correction: Its aim is to make as
few changes as possible while converting an
incorrect input string to a valid string. This
strategy is costly to implement.
Error detection and Recovery in Compiler

 In this phase of compilation, all possible


errors made by the user are detected and
reported to the user in form of error
messages. This process of locating errors
and reporting it to user is called Error
Handling process.
Functions of Error handler
 Detection
 Reporting
 Recovery
 Classification of Errors
Lexical phase errors

These errors are detected during the lexical


analysis phase. Typical lexical errors are
 Exceeding length of identifier or numeric

constants.
 Appearance of illegal characters
 Unmatched string

Example 1 : printf("Geeksforgeeks");$
This is a lexical error since an illegal character $
appears at the end of statement.
Example 2 : This is a comment */
This is an lexical error since end of comment is
present but beginning is not present.
Errorrecovery:
 Panic Mode Recovery
In this method, successive characters from
the input are removed one at a time until a
designated set of synchronizing tokens is
found. Synchronizing tokens are delimiters
such as; or }
 Advantage is that it is easy to implement

and guarantees not to go to infinite loop


 Disadvantage is that a considerable amount

of input is skipped without checking it for


additional errors
Syntactic phase errors
These errors are detected during syntax analysis phase.
Typical syntax errors are
 Errors in structure

 Missing operator

 Misspelled keywords

 Unbalanced parenthesis

 Example : swicth(ch)

{
.......
.......
}
The keyword switch is incorrectly written as
swicth. Hence, “Unidentified
keyword/identifier” error occurs.
Errorrecovery:
 Panic Mode Recovery
◦ In this method, successive characters from
input are removed one at a time until a
designated set of synchronizing tokens is found.
Synchronizing tokens are deli-meters such as ;
or }
◦ Advantage is that its easy to implement and
guarantees not to go to infinite loop
◦ Disadvantage is that a considerable amount of
input is skipped without checking it for
additional errors
Errorrecovery:
 Statement Mode recovery
◦ In this method, when a parser encounters an
error, it performs necessary correction on
remaining input so that the rest of input
statement allow the parser to parse ahead.
◦ The correction can be deletion of extra
semicolons, replacing comma by semicolon or
inserting missing semicolon.
◦ While performing correction, atmost care should
be taken for not going in infinite loop.
◦ Disadvantage is that it finds difficult to handle
situations where actual error occured before
point of detection.
Errorrecovery:
 Error production
◦ If user has knowledge of common errors that
can be encountered then, these errors can be
incorporated by augmenting the grammar with
error productions that generate erroneous
constructs.
◦ If this is used then, during parsing appropriate
error messages can be generated and parsing
can be continued.
◦ Disadvantage is that its difficult to maintain.
Errorrecovery:
 Global Correction
◦ The parser examines the whole program and
tries to find out the closest match for it which is
error free.
◦ The closest match program has less number of
insertions, deletions and changes of tokens to
recover from erroneous input.
◦ Due to high time and space complexity, this
method is not implemented practically.
Semantic errors
These errors are detected during semantic analysis phase.
Typical semantic errors are
 Incompatible type of operands

 Undeclared variables

 Not matching of actual arguments with formal one

 Example : int a[10], b;


.......
.......
a = b;
It generates a semantic error because of an
incompatible type of a and b.
Error recovery
 If error “Undeclared Identifier” is encountered then, to
recover from this a symbol table entry for corresponding
identifier is made.
 If data types of two operands are incompatible then,
automatic type conversion is done by the compiler.

You might also like