Unit-1 Notes Compiler Design
Unit-1 Notes Compiler Design
Introduction to Compilers
Source Target
Program Compiler Program
Execution of the target program: During execution, the target program is first loaded into the
main memory and then the user interacts with the target program to generate the output. The exe-
cution phase is shown in Figure 1.2.
2 Principles of Compiler Design
Source Program
Interpreter Output
Inputs
Machine
Source New Source Language
Program Program Code
Preprocessor Compiler
Assemblers: In some cases, compiler generates the target program in assembly language. In that
case, the assembly language program is given to the assembler as input. An assembler then trans-
lates the assembly language program into machine language program which is relocatable machine
code. An assembly language program is in mnemonics.
Assembly Machine
Source Language Language
Program Program Code
Compiler Assembler
(Mnemonics)
Loaders and link editors: The larger source programs are compiled in small pieces by the com-
piler. To run the target machine code of any source program successfully, there is a need to link the
relocated machine language code with library files and other relocatable object files. So, loader and
link editor programs are used for the link editing and loading of the relocated codes. Link editors
create a single program from several files of relocated machine code. Loaders read the relocated
machine code and alter the relocatable addresses. To run the machine language program, the code
with altered data and commands is placed at the correct location in the memory.
5. D
iscuss the steps involved in the analysis of a source program with the help of a block
diagram.
Ans: The steps involved in the analysis of
source program are given below.
Source program acts as an input to the prepro- Source Program
cessor. Preprocessor modifies the source code
by replacing the header files with the suitable
content. Output (modified source program)
Preprocessor
of the preprocessor acts as an input for the
compiler.
Modified Source Program
Compiler translates the modified source pro-
gram of high-level language into the target
program. If the target program is in machine Compiler
language, then it can be executed directly. If Target Program in Assembly
the target program is in assembly language, Language
then that code is given to the assembler for
translation. Assembler translates the assembly Assembler
language code into the relocatable machine
language code. Relocatable Machine Code
Relocatable machine language code acts as an
Library Files and
input for the linker and loader. Linker links the Linker/Loader Relocatable Object
relocatable code with the library files and the
Files
relocatable objects, and loader loads the inte-
grated code into memory for the execution. The
Target Machine Code
output of the linker and loader is the equivalent
machine language code for the source code. Figure 1.6 Block Diagram of Source Program Analysis
4 Principles of Compiler Design
Source Program
Character Stream
Token
Stream
Syntax Analysis
Semantic Analysis
Parse Tree
Intermediate
Code
Code Optimization
Phase
Intermediate
Code
Lexical analysis phase: Lexical analysis (also known as scanning) is the first phase of a compiler.
Lexical analyzer or scanner reads the source program in the form of character stream and groups
the logically related characters together that are known as lexemes. For each lexeme, a token is
generated by the lexical analyzer. A stream of tokens is generated as the output of the lexical analy-
sis phase, which acts as an input for the syntax analysis phase. Tokens can be of different types,
namely, keywords, identifiers, constants, punctuation symbols, operator symbols, etc. The syntax
for any token is:
(token_name, value)
here token_name is the name or symbol which is used during the syntax analysis phase and
w
value is the location of that token in the symbol table.
Syntax analysis phase: Syntax analysis phase is also known as parsing. Syntax analysis phase
can be further divided into two parts, namely, syntax analysis and semantic analysis.
Syntax analysis: Parser uses the token_name token from the token stream to generate the
output in the form of a tree-like structure known as syntax tree or parse tree. The parse tree
illustrates the grammatical structure of the token stream.
Semantic analysis: Semantic analyzer uses the parse tree and symbol table for checking the
semantic consistency of the language definition of the source program. The main function of
the semantic analysis is type checking in which semantic analyzer checks whether the oper-
ator has the operands of matching type. Semantic analyzer gathers the type information and
saves it either in the symbol table or in the parse tree.
Intermediate code generation phase: In intermediate code generation phase, the parse tree rep-
resentation of the source code is converted into low-level or machine-like intermediate representa-
tion. The intermediate code should be easy to generate and easy to translate into machine language.
There are several forms for representing the intermediate code. Three address code is the most
popular form for representing intermediate code. An example of three address code language is
given below.
x1 = x2 + id
id1 = x3
Code optimization phase: Code optimization phase, which is an optional phase, performs the
optimization of the intermediate code. Optimization means making the code shorter and less com-
plex, so that it can execute faster and takes lesser space. The output of the code generation phase is
also an intermediate code, which performs the same task as the input code, but requires lesser time
and space.
Code generation phase: Code generation phase translates the intermediate code representation of
the source program into the target language program. If the target program is in machine language,
the code generator produces the target code by assigning registers or memory locations to store
variables defined in the program and to hold the intermediate computation results. The machine
code produced by the code generation phase can be executed directly on the machine.
Symbol table management: A symbol table is a data structure that is used by the compiler to
record and collect information about source program constructs like variable names and all of its
attributes, which provide information about the storage space occupied by a variable (name, type,
and scope of the variables). A symbol table should be designed in an efficient way so that it permits
the compiler to locate the record for each token name quickly and to allow rapid transfer of data
from the records.
6 Principles of Compiler Design
Error handler: Error handler is invoked whenever any fault occurs in the compilation process of
source program.
Both the symbol table management and error handling mechanisms are associated with all phases of
the compiler.
7. D
iscuss the action taken by every phase of compiler on the following instruction of
source program while compilation.
Total = number1 + number2 * 5 Total = number 1 + number 2 * 5
Ans: Consider the source program as a stream
of characters.
Total = number1 + number2 * 5 Lexical Analyzer
Lexical analysis phase: Stream of charac-
ters (source program) acts as an input for the
lexical analyzer, which produces the token
stream as output (see Figure 1.8). <id, 1> < = > <id, 2> <+> <id, 3> <*> <5>
Syntax analysis phase: The token stream Figure 1.8 Lexical Analysis Phase
acts as the input for the syntax analyzer.
Output of the syntax analyzer is a parse tree (see Figure 1.9(a)) that acts as the input for the
semantic analyzer; the output of the semantic analyzer is also a parse tree after type checking
(see Figure 1.9(b)).
<id, 1>
+
<id, 2> *
<id, 1> < = > <id, 2> <+> <id, 3> <*> <5>
<id, 3> 5
Syntax Analyzer
Semantic Analyzer
=
=
<id, 1> +
<id, 1> +
<id, 2>
* <id, 2>
*
<id, 3> 5
<id, 3>
(a) Syntax Analyzer inttofloat
5
(b) Semantic Analyzer
t3 = id3 * 5.0
t3 = inttofloat (5)
id1 = id2 + t3
t2 = id3 * t3
t1 = id2 + t2
id1 = t1
Code Generator
Code Optimizer
Figure 1.11 Code Optimization Phase Figure 1.12 Code Generation Phase
8. W hat is a pass in the compilation process? Compare and contrast the features of a
single-pass compiler with multi-pass compiler.
Ans: In an implementation of a compiler, the activities of one or more phases are combined into a
single module known as a pass. A pass reads the input, either as a source program file or as the output of
the previous pass, transforms the input and writes the output into an intermediate file. The intermediate
file acts as either the input for the next pass or the final machine code.
When all the phases of a compiler are grouped together into a single pass, then that compiler is known
as single-pass compiler. On the other hand, when different phases of a compiler are grouped together
into two or more passes, then that compiler is known as multi-pass compiler.
A single-pass compiler is faster than the multi-pass compiler because in multi-pass compiler each
pass reads and writes an intermediate file, which makes the compilation process time consuming.
Hence, time required for compilation increases with the increase in the number of passes in a compiler.
8 Principles of Compiler Design
A single-pass compiler takes more space than the multi-pass compiler because in multi-pass compiler
the space used by the compiler during one pass can be reused by the subsequent pass. So, for comput-
ers having small memory, multi-pass compilers are preferred. On the other hand, for computers having
large memory, single-pass compiler or compiler with fewer number of passes can be used.
In a single-pass compiler, the complicated optimizations required for high quality code generation are
not possible. To count the exact number of passes for an optimizing compiler is a difficult task.
Source Target
Language S T
Language
Implementation
I Language
S T A M
A M
S T S T
A M
A M
Bootstrapping: Bootstrapping is an important concept for building a new compiler. This concept
uses a simple language to translate complicated programs which can further handle more complicated
programs. The process of bootstrapping can be better understood with the help of an example given
here.
Suppose we want to create a cross compiler for the new source language S that generates a target code
in language T, and the implementation language of this compiler is A. We can represent this compiler as
CST
A (see Figure 1.14(a)). Further, suppose we already have a compiler written for language A with both
target and implementation language as M. This compiler can be represented as CAM M (see Figure 1.14(b)).
Now, if we run CSTA with the help of C AM
M , then we get a compiler C ST
M (see Figure 1.14(c)). This com-
piler compiles a source program written in language S and generates the target code in T, which runs on
machine M (that is, the implementation language for this compiler is M).
11. Explain error handling in compiler.
Ans: Error detection and reporting of errors are important functions of the compiler. Whenever
an error is encountered during the compilation of the source program, an error handler is invoked.
Error handler generates a suitable error reporting message regarding the error encountered. The error
reporting message allows the programmer to find out the exact location of the error. Errors can be
encountered at any phase of the compiler during compilation of the source program for several rea-
sons such as:
In lexical analysis phase, errors can occur due to misspelled tokens, unrecognized characters, etc.
These errors are mostly the typing errors.
In syntax analysis phase, errors can occur due to the syntactic violation of the language.
In intermediate code generation phase, errors can occur due to incompatibility of operands type for
an operator.
10 Principles of Compiler Design
In code optimization phase, errors can occur during the control flow analysis due to some unreach-
able statements.
In code generation phase, errors can occurs due to the incompatibility with the computer architec-
ture during the generation of machine code. For example, a constant created by compiler may be
too large to fit in the word of the target machine.
In symbol table, errors can occur during the bookkeeping routine, due to the multiple declaration
of an identifier with ambiguous attributes.
Multiple-Choice Questions
1. A translator that takes as input a high-level language program and translates into machine language
in one step is known as —————.
(a) Compiler (b) Interpreter
(c) Preprocessor (d) Assembler
2. ————— create a single program from several files of relocated machine code.
(a) Loaders (b) Assemblers
(c) Link editors (d) Preprocessors
3. A group of logically related characters in the source program is known as —————.
(a) Token (b) Lexeme
(c) Parse tree (d) Buffer
4. The ————— uses the parse tree and symbol table checking the semantic consistency of the
source program.
(a) Lexical analyzer (b) Intermediate code generator
(c) Syntax translator (d) Semantic analyzer
5. The ————— phase converts an intermediate code into an optimized code that takes lesser space
and lesser time to execute.
(a) Code optimization (b) Syntax directed translation
(c) Code generation (d) Intermediate code generation
6. ————— is invoked whenever any fault occurs in the compilation process of source program.
(a) Syntax analyzer (b) Code generator
(c) Error handler (d) Lexical analyzer
7. In compiler, the activities of one or more phases are combined into a single module known as a
—————.
(a) Phase (b) Pass
(c) Token (d) Macro
8. For the construction of a compiler, the compiler writer uses different types of software tools that are
known as —————.
(a) Compiler writer tools (c) Programming tools
(c) Compiler construction tools (d) None of these
Introduction to Compilers 11
9. A compiler that runs on one machine and produces the target code for another machine is known
as —————.
(a) Cross compiler (b) Linker
(c) Preprocessor (d) Assembler
10. If we run a compiler CST AM
A with the help of another compiler C M , then we get a new compiler that
is —————.
(a) CSM
M (b) CST
A
(c) CST
M (d) CAM
M
Answers
1. (a) 2. (c) 3. (b) 4. (d) 5. (a) 6. (c) 7. (b) 8. (c) 9. (a) 10. (c)