0% found this document useful (0 votes)

2 views

Language processing

The document outlines the fundamentals of language processing, which involves analyzing a source program and synthesizing a target program through a two-phase model. It details the analysis phase, which includes lexical, syntax, and semantic analysis, and the synthesis phase, which focuses on memory allocation and code generation. Additionally, it discusses the multi-pass organization of language processors, the use of intermediate representations, and the operations performed on symbol tables.

Uploaded by

ravindramhaske38

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Language processing

Uploaded by

ravindramhaske38

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 59

Language processing

fundamentals
Language processing
fundamentals
• Language processing =
– Analysis of the source program +
Synthesis of the target program

– This definition motivates a generic two-phase

model of language processing activities – the
source program analyzed during the analysis
phase and target program is synthesized
during the synthesis phase.
1. Analysis Phase
• This model partially applies to interpreters
in that an interpreter does not have a
synthesis phase.

• Figure below shows a schematic of a

language processor that uses this model.
The specification consist of three components

1. Lexical Rules: govern formation of valid lexical

units in the source language, such as operators,
identifiers and constants.

2. Syntax rules: Govern formation of valid statements

in the source language.

3. Semantic rules: Associate meanings with valid

statements of the source language.
• Example: percent_profit:=(profit*100)/cost_price;

• Lexical analysis identifies :=,*and / as operators,100 as

constant and remaining string as identifiers.

• Syntax analysis identifies the statement as an

assignment statement with percent_profit as the varible
on the left size and other as the expression on the right
side.

• Semantic analysis determine meaning of statement to be

assignment of
profit x 100
cost_price
to percent_profit
2. Synthesis phase
Is concerned with construction of target language
statements which would implement the meaning of a
source statement.

it consists of two main tasks as follows:

a. Creation of Data structure in the target

program(memory allocation)

b. Generation of target code( code generation)

example
• A language processor generates the following target
program for the source statement.

• This program is in the assembly language of a computer,

which we call the target machine of the language
processor.

• The DW statements in the program reserve words in

memory for holding data.

• DC statement reserves a word in memory and stores the

constant 100 in it.

• The statement MOVER and MOVEM move a value from

a memory location to a CPU register and vice versa.
Multi pass organization of LP
• We have impression that language processing
can be performed on a statement by statement
basis

• That is , analysis of a statement in the source

program can be immediately followed by
synthesis of target statements.

• But statement –by- statement may not feasible

due to two reasons.
1) A source program may contain forward reference
– A forward reference of a program entity is reference to the
entity in some statement of the program that occurs
before the statement containing the definition or
declaration of the entity.

2) may require more memory than is available for its

operation
• Forward reference causes problem in statement-by-
statement processing because

– The language processor would not know the attributes of the

entity when it encounters the forward reference
– Consequently, it would not be able to generate the correct code
for it
– Percent_profit = (profit* 100) / cost_price;
– ….
– Long profit;
– Reference to profit in the assignment statement constitutes a
forward reference because the statement occurs before
statement long profit;
• If program is processed on a statement-by-
statement basis, the language processor would not
know the type of variable profit while processing the
assignment statement, hence it would not be able to
generate the correct code.

These problems are addressed by using a multi-

pass organization of language processors
Multi-pass
• The language processor splits the task of
language processing into sequence of stages

• Problem of forward reference is solved by

analysis of program in one stage and generating
target program in a later stage.

• So that now in our example language processor

would have deduced type of profit before
generating code.
Language processor pass
is the processing of every statement in a source
program, or in its equivalent representation, to
perform a language processing function

• Pass I : perform analysis of the source program and

note the relevant information

• Pass II : Perform synthesis of the target program

Ex: Information regarding type of profit noted in Pass-1.
Information used in Pass-2 to perform code generation.
Intermediate Representation
• In pass I – it would analyze the source program
to find the type of each variable.

• In pass II – it would again analyze the source

program to generate target code by using type
information noted in pass I.

• This duplication can be avoided by using an

intermediate representation of the source
program
Intermediate Representation (IR)
• An IR reflects the effect of some but not all,
analysis and synthesis tasks performed during
language processing.

Source Front End Back End Target

Program Program

Intermediate
Representation
(IR)
• Second pass now would read the intermediate
representation generated by previous pass, instead of
reading source program

• ‘but not all’ simply differentiate target program and

intermediate representation.

• To be effective, an intermediate representation should

have following properties:
1. Ease of use
2. Processing Efficiency
3. Memory Efficiency
• Ease of use : should be easy to construct and
analyze the intermediate representation

• Processing Efficiency : Efficient algorithm should be

available for accessing the data structures used in
intermediate representation

• Memory Efficiency: The intermediate

representation should be compact so that it does
not occupy much memory
Front end and back end
• The first pass is concerned exclusively with
source language issues, it is called front end of
language processor.

• The second pass is concerned with synthesis of

the target program, it is called back end of a
language processor.
• Front end and back end need not coexist in
memory. So it will reduce the memory
requirements
• The code for the front end is first loaded in
memory, LP analyzes the source program,
generates the intermediate
representation , and writes it to disk
• Now the code of front end is removed from
the memory and code of the back end is
loaded.
• LP reads the intermediate representation
and synthesizes the target program
:= :=

+ a, real +
a, real

b, real i, int b, real i*, real

(a) (b)
:=

a, real temp, real

(c)

Steps in semantic analysis of an assignment statement

Toy Compiler
• Front End
Performs lexical, syntax and semantic
analysis of SP.
Analysis involves
1. Determine validity of source stmt.
2. Determine the content of source stmt.
3. Construct a suitable representation.
Output of Front End
1. Tables of information
2. Intermediate code (IC)

IC is a sequence of IC units, represents meaning

of one action in SP

Example: i: integer;
a,b: real;
a := b+i;
Symbol Table:
No. Symbol Type Length Address
1 i int
2 a real
3 b real
4 i* real
5 temp real

Intermediate code
1. Convert (id, #1) to real, giving (id, #4)
2. Add (id, #4) to (id, #3) giving (id, #5)
3. Store (id, #5) in (Id, #2)
Lexical Analysis (Scanning)
• Identifies lexical units in a source statement.
• Classifies units into different classes e.g id’s,
constants, reserved id’s etc and enters them into
different tables
• Token contains
– Class code and number in class

Code #no e.g. Id #10

Example: Statement a := b + i;
a := b + i ;

Id #2 Op #5 Id #3 Op #3 Id #1 Op #10
Syntax Analysis (Parsing)
• Processes tokens for a statement to determine its
grammatical structure and build intermediate code that
represents structure.
• A tree is chosen for the intermediate code because it can
represent the hierarchical structure.
• Determines the statement class such as assignment
statement, if stmt etc.
e.g.:- a , b : real; and a = b +i ;

real :=

a b a +

b i
Semantic Analysis
• Determines the meaning of the SP

• Results in addition of info such as type, length

etc.

• Determines the meaning of the subtree in IC and

adds info to the IC tree.
Stmt a := b +i; proceeds as

1. Type is added to the IC tree

2. Rules of assignment indicate expression on

RHS should be evaluated first.

3. Rules of addition indicate i should be converted

before addition
i. Convert i to real giving i*;
ii. Add i* to b giving temp.
iii. Store temp in a.
Intermediate code after analysis phase
No. Symbol Type Length Address
1 i int
2 a real
3 b real
4 i* real
5 temp real

1. Convert (Id,#1) to real, giving (Id,#4)

2. Add (Id, #4) to (Id, #3), giving (Id, #5)
3. Store (Id, #5) in (Id, #2)
Source Program

Lexical
errors Scanning

tokens
Symbol table
Syntax Constant table
errors Parsing
Other table
Parse tree

Semantic Semantic analysis

errors

IR
Figure: Front end of a Toy Compiler
Back End
1. Memory Allocation
Calculated from its type, length.
No. Symbol Type Length Address
1 i int 2000
2 a real 2001
3 b real 2002
• Back end computes the memory requirement of
a variable from its type, length found in the
symbol table, and allocates memory to it.

• The address of allocated memory area is

entered in the symbol table.

• Temporary results i* and temp are both

computed and used in the assignment statement
a = b + i;
Symbol table after memory
allocation
Code generation

• Two key decisions involved in generating

good quality target code.

• 1. What instructions should be used for

each of the actions in the intermediate
code?

• 2. What CPU registers should be used for

evaluating expressions?
1. Convert (Id,#1) to real, giving (Id,#4)
2. Add (Id, #4) to (Id, #3), giving (Id, #5)
3. Store (Id, #5) in (Id, #2)
(IC Code)

If back end decides to hold the values of i* and temp in the

CPU register AREG, it would generate assembly code

CONV_R AREG, I a=b+i

ADD_R AREG, B
MOVEM AREG, A
(Target Code)
IR

IC
Memory allocation

Symbol table

Code generation

Target
program

Figure: Back End of the toy compiler

Symbol Tables
• Identifier used in source program called symbol

• Names of variables, functions and procedures are

symbols

• A Language processor uses the symbol table to

maintain information about attributes of symbols
used in a source program
Four kinds of operations
are performed on symbol table by LP

1. Add a symbol and its attributes : Make a new entry in the

symbol table

2. Locate a symbol’s entry : Find a symbol’s entry in the

symbol table

3. Delete a symbol’s entry : Remove the symbol’s

information from the table

4. Access a symbol’s entry : Access the entry and set,

modify or copy its attribute information.
Add and Locate
• Add and Locate operations take a symbol as a
parameter e.g. ADD(s1)
• The Add operation indicates an error if an entry
already exists for the symbol. Otherwise , it makes
a new entry

• The Locate operation returns a pointer to the

symbol’s entry in the table if one exists,
• Otherwise, it returns a null pointer
• int *p; p = Locate(s1);
Delete and Access
• The delete and access operations may either
take a symbol or the pointer to a symbol’s entry
returned by the locate operation as parameters.
Delete (s1) or Delete (p), access(a)

• Thus, the add and locate operations necessarily

involve a search in the symbol table, whereas
the delete and access operation may involve a
search
Design goals of symbol table
• Table’s organization should facilitate efficient
search

• And table should be compact ; it should not

occupy much memory

• Not possible to meet both these goals

simultaneously because of the principle
called time-space-trade-off
Data Structures for Language
Processing
Data structure used in Language processor can be
classified on the basis of following criteria:

1. Nature of DS: Whether linear or nonlinear ds.

2. Purpose of DS: whether search or allocation ds.

3. Lifetime of DS: whether used during language

processing or during target program execution.
Linear Data Structure
• Entries in the symbol table occupy
adjoining areas of memory.
• This property used to facilitate search

Non Linear Data Structure

• Entries in the symbol table do not occupy
contiguous areas of memory.
• The entries are searched and accessed
using pointers
• Use of linear – single contiguous area of
memory that can accommodate all entries
• This requirement forces designer to allocate
memory for the largest number of entries the
symbol table may contain

• However, it is wasteful because some part of the

allocate memory may not be used.

• Nonlinear avoids the problem because it allows

memory to be allocated in entry- by – entry
manner
Diagram
• Figure (a) Memory allocation for four linear data
structures named A-D used in a program

• Parts of data structure shown shaded with

dotted lines are not in current use

• These parts may remain unused throughout

execution of program;

• However memory allocated to them, cannot be

used for other data structures.
• Figure (b) shows allocation to four non linear
data structures named E – H.

• The program created different parts of these

data structures at different times, so these parts
do not occupy contiguous area of memory
• Two memory area to ds E, three memory area to
ds F and so on.
• No parts of the allocated memory areas are
currently unused.
Symbol table entry formats
• Entry in ST comprised of fields that
accommodate the attributes of one symbol
• One of these field is symbol field
• The field which forms basis for search is
called key field
• Hence symbol field is the key field

• An entry may be declared as record

• A programming language may specify
large number of attributes of symbols.
• These attributes applicable to specific
class is smaller
• Table below shows four class
– Variable – 3 (no.of attributes)
– Procedure - 2
– Function - 4
– Label - 1
– Total number of attributes is 8; see the diagra
Entry formats for
accommodating attribute
• Fixed length entries
– Each entry in ST has fields for all attributes in
the Programming language
• Variable-length entries
– Field only for the attributes specified for
symbols of its class
• Hybrid entries
– It has a fixed-length part and a variable-length
part
• Here in (a) each entry has 8 attribute fields
• When class=label, all attribute fields
except the field for storing its statement
number are redundant.(useless)

• (b) shows variable length entry format for

a symbol that is used as a label.
• It has only one attribute field
Fixed length formats
• When fixed entry formats are used all
entries in ST has identical format
• So it will use linear data structures like
array
• It will enables the use of an efficient
search procedure
• This organization makes inefficient use of
memory since many entries may contain
redundant fields
Variable length formats
• Use of variable length format leads to
compact organization in which memory
wastage does not occur
• Search method would have to know the
length of an entry
• So each entry would have to include a
length field
• Format for such an entry
Hybrid format
• Combines search efficiency of fixed length
and memory efficiency of variable length
• Each entry is split into two halves , fixed
part and variable part
• Pointer field is added to fixed part
• It points to variable part of an entry
• Fixed part of all entries are organized into
linear data structures
• As it has pointer to a variable part,
• Variable part does not need to be located
through search
• Hence it could be put into a linear or non
linear data structure
End of the Chapter..!!
Thank you.!!

Python Basic Version 1 (Myanmar) PDF
77% (13)
Python Basic Version 1 (Myanmar) PDF
145 pages
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
Caret
100% (2)
Caret
30 pages
Lec-2.4 & 2.5 Language Processing Activity Part-III & IV
No ratings yet
Lec-2.4 & 2.5 Language Processing Activity Part-III & IV
18 pages
Fundamental Language Processing
No ratings yet
Fundamental Language Processing
25 pages
Language Processors
No ratings yet
Language Processors
41 pages
Compiler Interpreter
No ratings yet
Compiler Interpreter
98 pages
CD - Unit I
No ratings yet
CD - Unit I
30 pages
Language Processing
50% (8)
Language Processing
29 pages
Introduction System Software new (1) (1)
No ratings yet
Introduction System Software new (1) (1)
38 pages
What Is Translators
No ratings yet
What Is Translators
95 pages
CD Unit I
No ratings yet
CD Unit I
27 pages
Compiler Design Material
No ratings yet
Compiler Design Material
107 pages
Compiler Lecture 2
No ratings yet
Compiler Lecture 2
19 pages
Pure Impure Interpreters PDF
No ratings yet
Pure Impure Interpreters PDF
98 pages
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
Compiler Construction Week 2
No ratings yet
Compiler Construction Week 2
29 pages
Phases of Compiler
No ratings yet
Phases of Compiler
17 pages
Lec 2
No ratings yet
Lec 2
21 pages
Compiler Lecture 2
No ratings yet
Compiler Lecture 2
16 pages
CD Unit 1
No ratings yet
CD Unit 1
54 pages
BSIT 1 1 Programming
No ratings yet
BSIT 1 1 Programming
33 pages
Lecture#1 2
No ratings yet
Lecture#1 2
54 pages
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
27 pages
SCS13033
No ratings yet
SCS13033
121 pages
Compiler 1
No ratings yet
Compiler 1
40 pages
Compiler Lecture 2
No ratings yet
Compiler Lecture 2
16 pages
1 Compiler Phases
No ratings yet
1 Compiler Phases
30 pages
Compiler Construction Iii B.E. - Vi Sem: Unit - I
No ratings yet
Compiler Construction Iii B.E. - Vi Sem: Unit - I
77 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
Compiler Design
No ratings yet
Compiler Design
47 pages
AT_Module6_Compiler and its phases_PS
No ratings yet
AT_Module6_Compiler and its phases_PS
32 pages
Compiler Design
100% (2)
Compiler Design
52 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
CD FINALIZED NOTES
No ratings yet
CD FINALIZED NOTES
6 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
Compiler Design and Construction Note
No ratings yet
Compiler Design and Construction Note
97 pages
Chapter 1 Overview of Compilation
No ratings yet
Chapter 1 Overview of Compilation
17 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
148 pages
CD_ UNIT-1
No ratings yet
CD_ UNIT-1
10 pages
CS4031 Compiler Construction Lecture 1
No ratings yet
CS4031 Compiler Construction Lecture 1
42 pages
System Software
No ratings yet
System Software
20 pages
compiler design unit I 2025
No ratings yet
compiler design unit I 2025
75 pages
C.K.Pithawala College of Engg. & Tech: Topic: Language Processors
No ratings yet
C.K.Pithawala College of Engg. & Tech: Topic: Language Processors
42 pages
Chapter 1.1
No ratings yet
Chapter 1.1
37 pages
Basic Concepts of C Language
No ratings yet
Basic Concepts of C Language
127 pages
Cousins of Compiler
100% (1)
Cousins of Compiler
25 pages
Compiler Unit5notes
No ratings yet
Compiler Unit5notes
20 pages
CC 1
No ratings yet
CC 1
41 pages
Module 1-1
No ratings yet
Module 1-1
22 pages
Compiler Desining Complete Notes
No ratings yet
Compiler Desining Complete Notes
175 pages
Assembler Pass1a
No ratings yet
Assembler Pass1a
2 pages
Compiler Notes 1
No ratings yet
Compiler Notes 1
21 pages
C SparkNotes
No ratings yet
C SparkNotes
45 pages
CD Unit 1
No ratings yet
CD Unit 1
42 pages
CD Full Material
No ratings yet
CD Full Material
74 pages
System Software Notes
No ratings yet
System Software Notes
81 pages
Compiler Design Assignment
100% (1)
Compiler Design Assignment
16 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
C Programmin Language
From Everand
C Programmin Language
Knowledge Flow
No ratings yet
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
From Everand
Problem Solving in C and Python: Programming Exercises and Solutions, Part 1
Yana Kortsarts
4.5/5 (2)
Computer Organization With Assembly Language
100% (1)
Computer Organization With Assembly Language
35 pages
Tutorial Orthogonal Matching Pursuit Mod2
No ratings yet
Tutorial Orthogonal Matching Pursuit Mod2
522 pages
Femres
No ratings yet
Femres
16 pages
PArt 1 of Organisation of Programming Languages1
No ratings yet
PArt 1 of Organisation of Programming Languages1
14 pages
Kanji 1 To 2042-Vol 2
No ratings yet
Kanji 1 To 2042-Vol 2
26 pages
CMP3201 Embedded Systems
No ratings yet
CMP3201 Embedded Systems
4 pages
CS36 C Programming Homework 5 20 Points (Lesson 1 To Lesson 61)
No ratings yet
CS36 C Programming Homework 5 20 Points (Lesson 1 To Lesson 61)
5 pages
Modern Compiler Design Worldwide Series in Computer Science 1st Edition by Dick Grune, Henri Bal, Ceriel Jacobs, Koen Langendoen ISBN 0471976970 9780471976974 - The ebook in PDF format is available for download
No ratings yet
Modern Compiler Design Worldwide Series in Computer Science 1st Edition by Dick Grune, Henri Bal, Ceriel Jacobs, Koen Langendoen ISBN 0471976970 9780471976974 - The ebook in PDF format is available for download
42 pages
TungDT BH02089 AssignmentII Programming st2
No ratings yet
TungDT BH02089 AssignmentII Programming st2
62 pages
Week 1
No ratings yet
Week 1
4 pages
Unit 1 CS3353 - C Programming and Data Structures
100% (2)
Unit 1 CS3353 - C Programming and Data Structures
6 pages
Optimiziongpython
No ratings yet
Optimiziongpython
22 pages
What Is A Language Processor
No ratings yet
What Is A Language Processor
4 pages
What Is Java Lecture1
No ratings yet
What Is Java Lecture1
14 pages
C Lib Clibabi
No ratings yet
C Lib Clibabi
35 pages
Flat Assembler 1.69: Tomasz Grysztar
No ratings yet
Flat Assembler 1.69: Tomasz Grysztar
125 pages
CCCCCCCCCCCCCCCCC C C CC C
No ratings yet
CCCCCCCCCCCCCCCCC C C CC C
17 pages
Week 1
No ratings yet
Week 1
25 pages
FUZE A Methodology For The Investigation of DNS
No ratings yet
FUZE A Methodology For The Investigation of DNS
3 pages
Codewarrior™ Development Studio 8/16-Bit Ide User'S Guide: Revised: August 28, 2007
No ratings yet
Codewarrior™ Development Studio 8/16-Bit Ide User'S Guide: Revised: August 28, 2007
348 pages
Platform Based Behavior Level and System
No ratings yet
Platform Based Behavior Level and System
4 pages
ME
No ratings yet
ME
37 pages
Payment Billing System Document
No ratings yet
Payment Billing System Document
71 pages
Unit 1-Lecture Note-1, 2
No ratings yet
Unit 1-Lecture Note-1, 2
24 pages
MATH49111/69111 Scientific Computing: Dr. Paul Johnson
No ratings yet
MATH49111/69111 Scientific Computing: Dr. Paul Johnson
23 pages
Compiler Lab
No ratings yet
Compiler Lab
2 pages
WM - W800 - SDK User Manual V1.1: Beijing Lianshengde Microelectronics Co., Ltd. (Winner Micro)
No ratings yet
WM - W800 - SDK User Manual V1.1: Beijing Lianshengde Microelectronics Co., Ltd. (Winner Micro)
20 pages
4 LanguageProcessors
No ratings yet
4 LanguageProcessors
40 pages