Language Processors
Language Processors
Semantic Gap
Application Execution
Domain Domain
Application PL Execution
Domain Domain Domain
C++ Program
C++ C Program
preprocessor
Errors
C++ Program
C++ Machine Language
translator Program
Interpreters
An interpreter is a language processor which bridges an
execution gap without generating a machine language program.
An interpreter is a language translator according to classification.
Interpreter Domain
Application PL Execution
Domain Domain Domain
Language Processing Activities
Program Generation Activities
Program Execution Activities
Program Generation
Errors
Specification Gap
m/c
Source Translator language Target
Program Program
program
A program must be translated before it can be executed.
The translated program may be saved in a file. The saved program may be
executed repeatedly.
A program must be retranslated following modifications.
Program Execution
Program interpretation
PC PC Machine
Source
Language
Program
Program
+
Errors +
Data
Data
Language Processor
Source
Analysis Synthesis Target
Program Phase Phase Program
Errors Errors
real :=
a b a +
a, b : real b i
a := b + i
Semantic Analysis
It identifies the sequence of actions necessary to implement the
meaning of a source statement.
It determines the meaning of a sub tree in the IC, it adds information
to a table or adds an action to the sequence of actions. The analysis
ends when the tree has been completely processed.
:= := :=
Lexical
Errors Scanning
Tokens
Symbol Table
Syntax Parsing
Errors
Constants Table
Trees Other tables
Semantic Semantic
Errors
Analysis
IC
IR
Synthesis Phase (Back end)
It performs memory allocation and code generation.
Memory Allocation
The memory requirement of an identifier is computed from its type,
length and dimensionality and memory is allocated to it.
The address of the memory area is entered in the symbol table.
IC Memory
Allocation
Symbol Table
Constants Table
Other tables
Code
Generation
Target
Program
Fundamentals of Language Specification
PL Grammars
The lexical and syntactic features of a programming
language are specified by its grammar.
A language L can be considered to be a collection of
valid sentences.
Each sentence can be looked upon as a sequence of
words, and each word as a sequence of letters or
graphic symbols acceptable in L.
A language specified in this manner is known as a
formal language.
Alphabet
The alphabet of L, denoted by the Greek symbol ∑
is the collection of symbols in its character set.
We use lower case letters a, b, c, etc. to denote
symbols in ∑
A symbol in the alphabet is known as a terminal
symbol (T) of L.
The alphabet can be represented using mathematical
notation of a set, e.g.
∑ = { a, b, …, z, 0, 1, …, 9 }
where {, “,”, } are called meta symbols.
String
A string is a finite sequence of symbols.
We represent strings by Greek symbols α, β, γ, etc.
Thus α= axy is a string over ∑
The length of a string is the number of symbols in it.
Absence of any symbol is also a string, null string ε.
Example
α= ab, β=axy
αβ = α.β = abaxy [concatenation]
Nonterminal symbols
A Nonterminal symbol (NT) is the name of a syntax
category of a language, e.g. noun, verb, etc.
An NT is written as a single capital letter, or as a
name enclosed between <…>, e.g. A or <Noun>.
It is a set of symbols not in ∑ that represents
intermediate states in the generation process.
Productions
A production, also called a rewriting rule, is a rule
of the grammar.
It has the form
A nonterminal symbol ::= String of Ts and NTs
L.H.S. R.H.S.
e.g. <article> ::= a | an | the
<Noun> ::= boy | apple
<Noun Phrase> ::= <article> <Noun>
Derivation, Reduction and Parse Trees
<Article> 4 <Noun> 5
TT*F|F P + P * P P + P * P
FF^P|P F + P * P F + F * P
P id T + F * F T + T * F
E + T * T E + T
id a | b | c
E * T (?? Ambiguous) E (Unambiguous)
E E
E E T
T T T T T
F F F F F F
P P P P P P
id + id * id id + id * id
GTU Examples
List out the unambiguous production rules (grammar)
for arithmetic expression containing +, –, *, / and ^
(exponent).
EE+T|E–T|T
TT*F|T/F|F
FF^P|P
P (E) | <id>
id id
Another Example
Consider the following grammar:
SaSbS|bSaS|ε