0% found this document useful (0 votes)

26 views

Day 5 - Syntax Analysis

Nlp ml

Uploaded by

Yeabsira

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Day 5 - Syntax Analysis

Nlp ml

Uploaded by

Yeabsira

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Syntax Analysis

Yared Y.
Syntax Analysis
(Parsing)
• After lexical analysis (scanning), we have a series of
tokens.
• In syntax analysis (or parsing), we want to interpret
what those tokens mean.
• Goals:
Recover the structure described by that series of
tokens.
Report errors if those tokens do not properly encode
a structure
Syntax Analysis at a glance
Syntax Analysis
• The second phase of a compiler is called syntax
analysis. The input to this phase consists of a stream of
tokens put out by the lexical analysis phase. They are
then checked for proper syntax, i.e. the compiler checks
to make sure the statements and expressions are
correctly formed.
• Some examples of syntax errors in Java are:

x = (2+3) * 9); // mismatched

parentheses
if x>y x = 2; // missing parentheses
while (x==3) do f1(); // invalid keyword do
• Syntax analysis is all about discovering structure in
code.
• It determines whether or not a text follows the expected
format. The main aim of this phase is to make sure that
the source code was written by the programmer is
correct or not.
• Tasks performed in this phase
• Obtain tokens from the lexical analyzer
• Checks if the expression is syntactically correct or not
• Report all syntax errors
• Construct a hierarchical structure which is known as a parse
tree
• The output of the syntax analysis phase (if there are no
syntax errors) could be a stream of atoms or syntax
trees.
• An atom is a primitive operation which is found in most
computer architectures, or which can be implemented
using only a few machine language instructions.
Purpose of Syntax analysis
• Validation:
Syntax analysis ensures that the input program's syntax
is correct according to the programming language's
grammar rules.
If the syntax is incorrect, the parser generates error
messages.
• Structure Representation:
It constructs a data structure, usually a parse tree or
abstract syntax tree (AST), which represents the
syntactical structure of the program.
Key concepts of a Parse Tree
• Nodes
• Root Node: Represents the start symbol of the grammar (usually the
whole expression or statement being parsed).
• Leaf Nodes: Represent terminal symbols, which are the actual tokens
or literals in the input string.
• Internal Nodes: Represent non-terminal (uppercase, set of
variables) symbols of the grammar, which are derived using the
production rules.
• Edges
• The edges in the parse tree connect nodes and represent the application
of production rules.
• Each edge corresponds to a derivation step in the grammar.
• Ambiguity
Parse tree for: (a + b)* c
• each interior node represents an operation
• the children of the node represent the arguments of the
operation
Formal Grammars

• A formal grammar is a system

of rules (called productions)
that control the order in which
words may occur in a sentence.

• Take English sentence:

Generation process for our example
<S>  <Noun> <Verb Phrase>
<Verb Phrase>  <Verb> <Adjective>
<Noun>  Learners | You | Alex
<Verb>  are | is
<Adjective>  awesome | good

So,
Our grammar is: (N, T, P, S), where N – nonterminals, T-
terminals, P- production rules, S - Sentences
“Learners are awesome “ can be
specific as:

“You are good”

“Alex is good”

{<Noun>, <Verb>, <Adj>, <Verb

phrase>} are variables or
nonterminasl

{You, are, is, alex, good} are

terminals, fixed, specific, constants
• Our grammar is: (N, T, P, S), where N – nonterminals, T-
terminals, P- production rules, S - Sentences

N – nonterminals: a finite, non-empty set of

Nonterminals
T is a finite, nonempty set of Terminals
S is a special non-terminal called Start symbol
P is a finite set whose elements are of the form, a  B
Parse Trees
• A parse tree pictorially shows how the start symbol of a
grammar derives a string in the language.
If nonterminal A has a production A  XYZ, then a parse
tree may have an interior node labeled A with three
children labeled X, Y, and Z, from left to right:
Parse tree for: A * B + C * D
Parse Tree for: -(id + id)
• A syntax analyzer or parser takes the input from a
lexical analyzer in the form of token streams. The parser
analyzes the source code (token stream) against the
production rules to detect any errors in the code. The
output of this phase is a parse tree.
Example: Construct a parse tree given the
following production rule and an input string
(i.e., check if the input is part of the rule)

Suppose the production rules for the Grammar of a

language are:

S  cAd
A  bc | a
Input string is “cad”
• As you can see the parser can be used not only to check
for proper syntax, but to produce output as well. This
process is called syntax directed translation.

• Syntax-Directed Translation (SDT) is a method used in compiler design where the translation
of a source program is driven by the syntax of the program as defined by a context-free grammar.
The translation process is directed by the syntax, meaning that as the parser recognizes the
structure of the source code (using grammar rules), it simultaneously generates an output, often in
the form of intermediate code, abstract syntax trees (AST), or other representations.
Why do you need Syntax analyzer
• Check if the code is valid grammatically
• The syntactical analyser helps you to apply rules to the code
• Helps you to make sure that each opening brace has a corresponding
closing balance
• Each declaration has a type and that the type must be exists
• Helps you to detect all types of Syntax errors
• Find the position at which error has occurred
• Clear & accurate description of the error.
• Recovery from an error to continue and find further errors in the code.
• Should not affect compilation of “correct” programs.
• The parse must reject invalid texts by reporting syntax errors
Grammar, Languages, and
Pushdown machines
• Languages can be specified using regular expressions,
finite state machines, grammars
• A grammar is a list of rules which can be used to
produce or generate all the strings of a language, and
which does not generate any strings which are not in
the language.
• A grammar is a set of structural rules which describe a
language. Grammars assign structure to any sentence
• A grammar consists of:
• A finite set of characters, called the input alphabet, the input
symbols, or terminal symbols.
• A finite set of symbols, distinct from the terminal symbols, called
nonterminal symbols, exactly one of which is designated the
starting nonterminal (if no nonterminal is explicitly designated
as the starting nonterminal, it is assumed to be the nonterminal
defined in the first rule).
• A finite list of rewriting rules, also called productions, which
define how strings in the language may be generated. Each of
these rewriting rules is of the form a → b, where a and b are
arbitrary strings of terminals and nonterminals, and a is not null.
Terminal vs Nonterminal symbols
• Terminal symbols (in short: terminals) are the symbols from
which the programs of the programming language are built.
• Examples for such symbols are reserved keywords of the language, or
• symbol classes such as identifiers which comprise a set of symbols.
• Nonterminals of the grammar denote sets of words that can
be produced from them by means of the production rules of
the grammar.
• Nonterminal symbols are terms that have to be defined further
somewhere in the grammar (they have to appear on a left-hand side),
and terminal symbols such as tokens which need no further definition.
The nonterminal nodes form interior nodes in the parse tree; the
terminal nodes are all at the ends of branches.
• The grammar specifies a language in the following way:
beginning with the starting nonterminal, any of the
rewriting rules are applied repeatedly to produce a
sentential form, which may contain a mix of terminals
and nonterminals.
If at any point, the sentential form contains no
nonterminal symbols, then it is in the language of this
grammar.

• If G is a grammar, then we designate the language

specified by this grammar as L(G)
Derivation in Compilers
• The sequence of the production rules which are used
for obtaining the input string is known as derivation.
• A derivation is a sequence of rewriting rules, applied to
the starting nonterminal, ending with a string of
terminals. A derivation thus serves to demonstrate that
a particular string is a member of the language
Example
Grammar G1 consists of 4 rules, the terminal symbols
{0,1}, and the starting nonterminal, S, see the figure

Now, an example of a derivatioin using G1 is:

S ⇒ 0S0 ⇒ 00S00 ⇒ 001S100 ⇒0010100

Thus 0010100 is in L(G1), it’s one of the strings in the language

of the grammar G1.
L(G1) = {0, 1, 000, 010, 101, 111, 00000, …}
• An input sentence can be recognized using this
grammar, with a series of replacements, as follows:
(1) Start out with the topmost symbol in the
grammar, the goal
symbol.
(2) Replace that symbol with one of its right-hand
sides.
(3) Continue replacing nonterminals, always
replacing the
leftmost nonterminal with its right-hand side,
until there are
• Let the terminal symbols are {a,b} ( Ꜫ represents the null string
and is not a terminal symbol).
Given the G2 as shown in the figure, do the derivation
S ⇒ ASB ⇒AASBB ⇒AaSBB ⇒AaBB ⇒AaBb ⇒Aabb ⇒aabb

Thus aabb is in L(G2)

G2 specifies the set of all strings of a’s and b’s
Which contain the same number of a’s as b’s and in which all the
a’s precede all the b’s. Note that the null string is permitted in a
rewriting rule
L(G2) = {Ꜫ, ab, aabb, aaabbb, aaaabbbb, …} = {anbn} such that n
≥0
• Two grammars, g1 and g2, are said to be equivalent if
L(g1) = L(g2); i.e., they specify the same language.

• Convention(can be changed)
all lower case letters and numbers are terminal symbols,
and all upper case letters (or words which begin with an
upper case letter) are nonterminal symbols.
The starting nonterminal is always S unless otherwise
specified.
Each of the grammars will be numbered (G1, G2, G3, ...)
for reference purposes.
Example:
Show three different derivations
using the grammar shown below:
Solution
S⇒aSA⇒aBAA⇒aBabA⇒aBabab⇒
abAabab⇒abababab

S⇒aSA⇒aSab⇒aBAab⇒abAAab⇒
ababAab⇒abababab

S⇒BA⇒bAA⇒babA⇒babab
Derivation (revisited)
• A derivation is basically a sequence of production rules,
in order to get the input string.
• During parsing, we take two decisions for some
sentential form of input:
• Deciding the non-terminal which is to be replaced.
• Deciding the production rule, by which, the non-terminal will
be replaced.
• To decide which non-terminal to be replaced with
production rule, we can have two options.
• Left-most derivation
• Right-most derivation
Left-most derivation & right-most
derivation
• If the sentential form of an input is scanned and
replaced from left to right, it is called left-most
derivation. The sentential form derived by the left-most
derivation is called the left-sentential form.

• If we scan and replace the input with production rules,

from right to left, it is known as right-most derivation.
The sentential form derived from the right-most
derivation is called the right-sentential form.
• Left-most Derivation
• When the sentential form of input is scanned and replaced in
left to right sequence, it is known as left-most derivation. The
sentential form which is derived by the left-most derivation is
called the left-sentential form.
• In left-most derivation, we always replace the non-terminal to
the far left first.
• Right-most Derivation
• Rightmost derivation scan and replace the input with production
rules, from right to left, sequence. It’s known as right-most
derivation. The sentential form which is derived from the
rightmost derivation is known as right-sentential form.
Example: given the production rules
and the input string in the fig, find
out the derivations
• Left-most derivation
EE*E
EE+E*E
E id + E * E
E  id + id * E
E  id + id * id

• Notice that the left-most side non-terminal is always

processed first.
• Right-most derivation
EE+E
EE+E*E
E E + E * id
E  E + id * id
E  id + id * id
Imagine that we have the following
grammar
Parse Tree (revisisted)
• A parse tree is a graphical depiction of a derivation. It is
convenient to see how strings are derived from the start
symbol. The start symbol of the derivation becomes the
root of the parse tree
• We take the left-most derivation of a + b * c
• The left-most derivation is:
EE*E
EE+E*E
E id + E * E
E  id + id * E
E  id + id * id
Steps to draw parse tree
EE*E

EE+E*E

E  id + id * id
• In a parse tree:
• All leaf nodes are terminals.
• All interior nodes are non-terminals.
• In-order traversal gives original input string.
• A parse tree depicts associativity and precedence of
operators. The deepest sub-tree is traversed first,
therefore the operator in that sub-tree gets precedence
over the operator which is in the parent nodes.
• Terminals (constants) are leaf nodes.
• Terminals are the basic symbols from which strings are
formed.
• Terminals are the elementary symbols of the language defined
by the grammar
• Non-terminals (variables) are interior nodes.
• Non-terminals are syntactic variables that denote sets of
strings. The sets of strings denoted by nonterminals help
define the language generated by the grammar.
• Nonterminals impose a hierarchical structure on the language
that is key to syntax analysis and translation
List out the terminals, nonterminals
and start symbol
•G
EE+T
EE–T
E  T (read this as: E drives T)
TT*F
TT/F
TF
F(E)
F  id
Ambiguity
• A grammar G is said to be ambiguous if it has more than one
parse tree (left or right derivation) for at least one string.
Example:
EE+E
EE–E
E  id

• For the string id + id - id, the above grammar generates two

parse trees
Classes of Grammars
• Unrestricted Grammar
• Context-sensitive Grammar (that does consider
semantics when analyzing a sentence)

• Context-free Grammar (our focus here)

• Right Linear Grammar

Emmanuel Todd's The Origin of Family Systems (L'origine Des Systèmes Familiaux) in English
100% (2)
Emmanuel Todd's The Origin of Family Systems (L'origine Des Systèmes Familiaux) in English
667 pages
Rice Terraces PDF
No ratings yet
Rice Terraces PDF
19 pages
FLDFU Download Manual - GEN2U GEN3
100% (1)
FLDFU Download Manual - GEN2U GEN3
33 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
2 Syntax Analysis - Introduction
No ratings yet
2 Syntax Analysis - Introduction
8 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
20 pages
CH 6
No ratings yet
CH 6
18 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
133 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Structure Ofa Compiler: Front End
No ratings yet
Structure Ofa Compiler: Front End
95 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
II. Parser: Syntax Analysis
No ratings yet
II. Parser: Syntax Analysis
18 pages
Unit 2 - Sessions 1 - 2
No ratings yet
Unit 2 - Sessions 1 - 2
36 pages
Compilerdesign Coarse
No ratings yet
Compilerdesign Coarse
128 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Unit Iii
No ratings yet
Unit Iii
28 pages
Unit 2
No ratings yet
Unit 2
45 pages
Class Three
No ratings yet
Class Three
74 pages
Unit-2 PCD
No ratings yet
Unit-2 PCD
36 pages
CD Unit 2
100% (1)
CD Unit 2
20 pages
Chapter 3 Syntax Analysis Full Reading Material
No ratings yet
Chapter 3 Syntax Analysis Full Reading Material
76 pages
COMPILER DESIGN UNIT 2
No ratings yet
COMPILER DESIGN UNIT 2
44 pages
Unit 3 Syntax - Analyzer
No ratings yet
Unit 3 Syntax - Analyzer
56 pages
ACD-UNIT-4 Notes
No ratings yet
ACD-UNIT-4 Notes
32 pages
Syntax Analysis
No ratings yet
Syntax Analysis
58 pages
CSC 409 Note 2
No ratings yet
CSC 409 Note 2
12 pages
Lec 01. Grammar, Derivations, Parse Tree
No ratings yet
Lec 01. Grammar, Derivations, Parse Tree
14 pages
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
No ratings yet
Syntax Analysis: - Check Syntax and Construct Abstract Syntax Tree
22 pages
Lecture 03
No ratings yet
Lecture 03
7 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
CD Paper Solution 2022-23
No ratings yet
CD Paper Solution 2022-23
49 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
C Depart
No ratings yet
C Depart
7 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
2. Simple Syntax Directed Translation
No ratings yet
2. Simple Syntax Directed Translation
51 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
Unit 2
No ratings yet
Unit 2
29 pages
CH03
No ratings yet
CH03
57 pages
UNIT 3 Syntax Analysis-Part1: Harshita Sharma
No ratings yet
UNIT 3 Syntax Analysis-Part1: Harshita Sharma
70 pages
CSE 12 Abstract Syntax Trees
No ratings yet
CSE 12 Abstract Syntax Trees
38 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
No ratings yet
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
76 pages
SEN 317 Lecture 3
No ratings yet
SEN 317 Lecture 3
10 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
24 pages
[Week 4] Syntax Analysis (CFG)
No ratings yet
[Week 4] Syntax Analysis (CFG)
50 pages
CD UNIT 3
No ratings yet
CD UNIT 3
76 pages
2.2 - Syntax Analysis (Upto Top-down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-down Parsing)
91 pages
Learning Materials, CD, Unit-3 (Syntax Analysis)
No ratings yet
Learning Materials, CD, Unit-3 (Syntax Analysis)
42 pages
003chapter 3 - Syntax Analysis
No ratings yet
003chapter 3 - Syntax Analysis
171 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
6 pages
CD Unit-2 (R20)
No ratings yet
CD Unit-2 (R20)
38 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Compiler Theory: (A Simple Syntax-Directed Translator)
No ratings yet
Compiler Theory: (A Simple Syntax-Directed Translator)
50 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
8 Notes
No ratings yet
8 Notes
12 pages
Compiler Design Module 2 Notes 2022-23 02-04-2023 Modified
No ratings yet
Compiler Design Module 2 Notes 2022-23 02-04-2023 Modified
46 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
compiler_design- Module3
No ratings yet
compiler_design- Module3
19 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
56 pages
Compiler Design 3
No ratings yet
Compiler Design 3
9 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Day 4 - Finite Automata
No ratings yet
Day 4 - Finite Automata
52 pages
Improved Distributed Systems Chapter
No ratings yet
Improved Distributed Systems Chapter
4 pages
Day 3 - Regexps
No ratings yet
Day 3 - Regexps
52 pages
Day - 1 Intro To Compilers
No ratings yet
Day - 1 Intro To Compilers
53 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Circuit Assignment
No ratings yet
Circuit Assignment
6 pages
Yeabsira Mokonnen ATE/9596/12 Department of Software Engineering
No ratings yet
Yeabsira Mokonnen ATE/9596/12 Department of Software Engineering
9 pages
Datastructure Stacks
No ratings yet
Datastructure Stacks
2 pages
Alexander CH 01 Final R1
No ratings yet
Alexander CH 01 Final R1
31 pages
DGS-1510 Series CLI Reference Guide v1.70
No ratings yet
DGS-1510 Series CLI Reference Guide v1.70
815 pages
Junde Chen Et Al - 2021 - Identification of Rice Plant Diseases Using Lightweight Attention Networks
No ratings yet
Junde Chen Et Al - 2021 - Identification of Rice Plant Diseases Using Lightweight Attention Networks
13 pages
Resume Rosevee Ruiz 577
0% (2)
Resume Rosevee Ruiz 577
2 pages
(Database Management System (DBMS) PDF
No ratings yet
(Database Management System (DBMS) PDF
136 pages
General Specification For Remote Fault Passage Indicator For Underground Medium Voltage Network
No ratings yet
General Specification For Remote Fault Passage Indicator For Underground Medium Voltage Network
10 pages
(Ebook) The English in Love: The Intimate Story of an Emotional Revolution by Claire Langhamer ISBN 9780199594436, 0199594430 - The full ebook version is available, download now to explore
100% (1)
(Ebook) The English in Love: The Intimate Story of an Emotional Revolution by Claire Langhamer ISBN 9780199594436, 0199594430 - The full ebook version is available, download now to explore
36 pages
Chapter 11: Consumer-Oriented E-Commerce: Week 7
No ratings yet
Chapter 11: Consumer-Oriented E-Commerce: Week 7
18 pages
Evaluation 1 ETIC
No ratings yet
Evaluation 1 ETIC
8 pages
Case Study
No ratings yet
Case Study
2 pages
Week 02-Introduction To ICT and E-Governance MBA E-GOV
No ratings yet
Week 02-Introduction To ICT and E-Governance MBA E-GOV
37 pages
Spa&Saloon-Srinivasa V
No ratings yet
Spa&Saloon-Srinivasa V
38 pages
TKHTS Cuoiky 20192 Thuy - CTTT
No ratings yet
TKHTS Cuoiky 20192 Thuy - CTTT
1 page
Apaar Id Final
No ratings yet
Apaar Id Final
19 pages
Operating Procedure For Metal Detector
No ratings yet
Operating Procedure For Metal Detector
7 pages
Calculating Memory System Power For DDR3r PDF
No ratings yet
Calculating Memory System Power For DDR3r PDF
24 pages
Janome MC350E Sewing Machine Service Manual
No ratings yet
Janome MC350E Sewing Machine Service Manual
55 pages
Unit-I: Cs8691 - Artificial Intelligence
100% (1)
Unit-I: Cs8691 - Artificial Intelligence
17 pages
User Transaction Guide Stahl Sap 4.7: Trigger
No ratings yet
User Transaction Guide Stahl Sap 4.7: Trigger
4 pages
Project Report 1 (ThinkNEXT Projects)
100% (3)
Project Report 1 (ThinkNEXT Projects)
67 pages
AI and Emerging Research
No ratings yet
AI and Emerging Research
44 pages
Dps Holiday Homework For Class 4
100% (1)
Dps Holiday Homework For Class 4
8 pages
Cse373 Lecture2 Vertical
No ratings yet
Cse373 Lecture2 Vertical
13 pages
Nonlinear Feedback Control FOR Operating A Nonisothermal CSTR Near An Unstable Steady State
No ratings yet
Nonlinear Feedback Control FOR Operating A Nonisothermal CSTR Near An Unstable Steady State
8 pages
Sirius en
No ratings yet
Sirius en
425 pages
Engg Math 1 - Module 1 Algebra
No ratings yet
Engg Math 1 - Module 1 Algebra
35 pages
2101267044class VII Summer Vacation Holiday HW 04.05.2019 1st Shift
No ratings yet
2101267044class VII Summer Vacation Holiday HW 04.05.2019 1st Shift
5 pages
DSV Code of Conduct January 2022 - English
No ratings yet
DSV Code of Conduct January 2022 - English
15 pages
Functions of Database Management System (DBMS)
No ratings yet
Functions of Database Management System (DBMS)
10 pages