0% found this document useful (0 votes)
0 views11 pages

Unit-1 Notes Compiler Design

Uploaded by

vivekkumars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views11 pages

Unit-1 Notes Compiler Design

Uploaded by

vivekkumars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1

Introduction to Compilers

1. What do you understand by the terms translator and compiler?


Ans: A translator or language processor is a program that translates an input program written in a
programming language into an equivalent program in another language. Compiler is a type of transla-
tor, which takes a program written in a high-level programming language as input and translates into
an equivalent program in low-level language such as machine language or assembly language. The
program written in high-level language is known as source program, and the program converted into
low-level language is known as object (or target) program. Moreover, compiler traces the errors in the
source program and generates the error report. Without compilation, no program written in a high-level
language can be executed. After compilation only the program in machine language is loaded into the
memory for execution. For every programming language, we have a different compiler; however, the
basic tasks performed by every compiler are same.
2. E
 xplain the steps required for the execution of a high-level language program with the
help of compiler.
Ans: The execution of a high-level language program is performed basically in two steps:
 Compilation or translation: During compilation the source program is translated into the target
program. The target program can either be a machine code or an assembly language code. If the
target program is executable machine language code, then it can be executed directly to generate
the output. Figure 1.1 shows the compilation phase.

Source Target
Program Compiler Program

Figure 1.1 Compilation of Source Program

 Execution of the target program: During execution, the target program is first loaded into the
main memory and then the user interacts with the target program to generate the output. The exe-
cution phase is shown in Figure 1.2.
2 Principles of Compiler Design

Input supplied by Output produced


the user Target Program after execution

Figure 1.2 Executing Target Program

3. What are the difference between compiler and interpreter?


Ans: Compiler translates the whole source program into the target program in one step (see
Figure 1.1). That is, it first scans the entire input program and then translates it into the target program.
The target program is then executed separately for generating the output according to the given inputs.
Interpreter, on the other hand, directly executes the source program line by line according to the
given inputs. That is, translation and execution of each statement are carried out side by side. Thus, sep-
arate execution of the program is not required. The line by line execution of the program provides better
debugging environment than a compiler. The main drawback of an interpreter is that the execution time
of an interpreted program is generally slower than that of a compiled program because the program
needs to be translated every time it is executed. The interpretation process is shown in Figure 1.3.

Source Program
Interpreter Output
Inputs

Figure 1.3 Working of an Interpreter

4. What do you understand by the term cousins of compiler?


Ans: The term ‘cousins of compiler’ refers to the type of programs which are required for the exe-
cution of the source program. These are the programs along with which compiler operates. The cousins
of compilers are preprocessors, assemblers, and loaders and link editors.
 Preprocessors: Before compilation, the source program is processed by the preprocessor to pre-
pare it for the compilation. The preprocessor program creates modified source program from the
original source program by replacing the preprocessor directives with the suitable content. The
new source program acts as an input to the compiler (see Figure 1.4). Preprocessor performs vari-
ous tasks as given here.
 It permits the user to include the header files in the program and user can make use of the
functions defined in these header files.
 It permits the user to include macros in the program. Macros are the small set of instructions
that are used in a program repetitively. Macros have two attributes, namely, macro name and
macro definition. Whenever the macro name is encountered in the program then it is replaced
by the macro definition (set of statements correspond to the macro).

Machine
Source New Source Language
Program Program Code
Preprocessor Compiler

Figure 1.4 Preprocessor’s Role


Introduction to Compilers 3

 Assemblers: In some cases, compiler generates the target program in assembly language. In that
case, the assembly language program is given to the assembler as input. An assembler then trans-
lates the assembly language program into machine language program which is relocatable machine
code. An assembly language program is in mnemonics.

Assembly Machine
Source Language Language
Program Program Code
Compiler Assembler
(Mnemonics)

Figure 1.5 Assembler’s Role

 Loaders and link editors: The larger source programs are compiled in small pieces by the com-
piler. To run the target machine code of any source program successfully, there is a need to link the
relocated machine language code with library files and other relocatable object files. So, loader and
link editor programs are used for the link editing and loading of the relocated codes. Link editors
create a single program from several files of relocated machine code. Loaders read the relocated
machine code and alter the relocatable addresses. To run the machine language program, the code
with altered data and commands is placed at the correct location in the memory.
5. D
 iscuss the steps involved in the analysis of a source program with the help of a block
diagram.
Ans: The steps involved in the analysis of
source program are given below.
 Source program acts as an input to the prepro- Source Program
cessor. Preprocessor modifies the source code
by replacing the header files with the suitable
content. Output (modified source program)
Preprocessor
of the preprocessor acts as an input for the
­compiler.
Modified Source Program
 Compiler translates the modified source pro-
gram of high-level language into the target
program. If the target program is in machine Compiler
language, then it can be executed directly. If Target Program in Assembly
the target program is in assembly language, Language
then that code is given to the assembler for
translation. Assembler translates the assembly Assembler
language code into the relocatable machine
language code. Relocatable Machine Code
 Relocatable machine language code acts as an
Library Files and
input for the linker and loader. Linker links the Linker/Loader Relocatable Object
relocatable code with the library files and the
Files
relocatable objects, and loader loads the inte-
grated code into memory for the execution. The
Target Machine Code
output of the linker and loader is the equivalent
machine language code for the source code. Figure 1.6 Block Diagram of Source Program Analysis
4 Principles of Compiler Design

6. Explain the different phases of compiler with diagram.


Or
Explain the structure of compiler.
Ans: Compiler translates an input source program written in any high-level programming language
into an equivalent target program in machine language. As compilation is a complex process, it is
divided into several phases. A phase is a reasonably interrelated procedure that takes input in one
representation and produces the output in another representation. The structure of compiler comprises
various phases as shown in Figure 1.7.

Source Program

Character Stream

Lexical Analysis Phase

Token
Stream

Syntax Analysis

Syntax Analysis Phase

Semantic Analysis

Parse Tree

Symbol Table Intermediate Code Error


Management Generation Phase Handler

Intermediate
Code

Code Optimization
Phase

Intermediate
Code

Code Generation Phase

Target Program in Machine Code

Figure 1.7 Phases of a Compiler


Introduction to Compilers 5

 Lexical analysis phase: Lexical analysis (also known as scanning) is the first phase of a compiler.
Lexical analyzer or scanner reads the source program in the form of character stream and groups
the logically related characters together that are known as lexemes. For each lexeme, a token is
generated by the lexical analyzer. A stream of tokens is generated as the output of the lexical analy-
sis phase, which acts as an input for the syntax analysis phase. Tokens can be of different types,
namely, keywords, identifiers, constants, punctuation symbols, operator symbols, etc. The syntax
for any token is:
(token_name, value)
 here token_name is the name or symbol which is used during the syntax analysis phase and
w
value is the location of that token in the symbol table.
 Syntax analysis phase: Syntax analysis phase is also known as parsing. Syntax analysis phase
can be further divided into two parts, namely, syntax analysis and semantic analysis.
 Syntax analysis: Parser uses the token_name token from the token stream to generate the
output in the form of a tree-like structure known as syntax tree or parse tree. The parse tree
illustrates the grammatical structure of the token stream.
 Semantic analysis: Semantic analyzer uses the parse tree and symbol table for checking the
semantic consistency of the language definition of the source program. The main function of
the semantic analysis is type checking in which semantic analyzer checks whether the oper-
ator has the operands of matching type. Semantic analyzer gathers the type information and
saves it either in the symbol table or in the parse tree.
 Intermediate code generation phase: In intermediate code generation phase, the parse tree rep-
resentation of the source code is converted into low-level or machine-like intermediate representa-
tion. The intermediate code should be easy to generate and easy to translate into machine language.
There are several forms for representing the intermediate code. Three address code is the most
popular form for representing intermediate code. An example of three address code language is
given below.
x1 = x2 + id
id1 = x3
 Code optimization phase: Code optimization phase, which is an optional phase, performs the
optimization of the intermediate code. Optimization means making the code shorter and less com-
plex, so that it can execute faster and takes lesser space. The output of the code generation phase is
also an intermediate code, which performs the same task as the input code, but requires lesser time
and space.
 Code generation phase: Code generation phase translates the intermediate code representation of
the source program into the target language program. If the target program is in machine language,
the code generator produces the target code by assigning registers or memory locations to store
variables defined in the program and to hold the intermediate computation results. The machine
code produced by the code generation phase can be executed directly on the machine.
Symbol table management: A symbol table is a data structure that is used by the compiler to
record and collect information about source program constructs like variable names and all of its
attributes, which provide information about the storage space occupied by a variable (name, type,
and scope of the variables). A symbol table should be designed in an efficient way so that it permits
the compiler to locate the record for each token name quickly and to allow rapid transfer of data
from the records.
6 Principles of Compiler Design

Error handler: Error handler is invoked whenever any fault occurs in the compilation process of
source program.
Both the symbol table management and error handling mechanisms are associated with all phases of
the compiler.
7. D
 iscuss the action taken by every phase of compiler on the following instruction of
source program while compilation.
Total = number1 + number2 * 5 Total = number 1 + number 2 * 5
Ans: Consider the source program as a stream
of characters.
Total = number1 + number2 * 5 Lexical Analyzer
 Lexical analysis phase: Stream of charac-
ters (source program) acts as an input for the
lexical analyzer, which produces the token
stream as output (see Figure 1.8). <id, 1> < = > <id, 2> <+> <id, 3> <*> <5>
 Syntax analysis phase: The token stream Figure 1.8 Lexical Analysis Phase
acts as the input for the syntax analyzer.
Output of the syntax analyzer is a parse tree (see Figure 1.9(a)) that acts as the input for the
semantic analyzer; the output of the semantic analyzer is also a parse tree after type checking
(see Figure 1.9(b)).

<id, 1>
+

<id, 2> *
<id, 1> < = > <id, 2> <+> <id, 3> <*> <5>
<id, 3> 5

Syntax Analyzer
Semantic Analyzer

=
=
<id, 1> +
<id, 1> +
<id, 2>
* <id, 2>
*
<id, 3> 5
<id, 3>
(a) Syntax Analyzer inttofloat

5
(b) Semantic Analyzer

Figure 1.9 Syntax Analysis Phase


Introduction to Compilers 7

=  Intermediate code generation phase: The parse tree


acts as the input for the intermediate code generator,
<id, 1> which produces an intermediate code as output (see
+
Figure 1.10).
<id, 2>  Code optimization phase: The intermediate code of
*
the source program acts as the input for the code opti-
<id, 3> inttofloat mizer. The output of the code optimizer is also an inter-
mediate code (see Figure 1.11), that takes lesser space
5 and lesser time to execute, and does the same task as
the input intermediate code.
Intermediate Code Generator
 Code generation phase: The optimized code acts
as the input for the code generator. The output of the
code generator is the machine language code (see
t3 = inttofloat (5) Figure 1.12), known as the target program, which can
t2 = id3 * t3 be directly executed.
t1 = id2 + t2 Note that the first operand in each instruction specifies a
id1 = t1
destination, and F in each instruction indicates that it deals
Figure 1.10 Intermediate Code Generation Phase with floating-point numbers.

t3 = id3 * 5.0
t3 = inttofloat (5)
id1 = id2 + t3
t2 = id3 * t3
t1 = id2 + t2
id1 = t1

Code Generator

Code Optimizer

LDF R2, id3


MULF R2, R2, #5.0
LDF R1,id2
t3 = id3 * 5.0 ADDF R1, R1, R2
id1 = id2 + t3 STF id1, R1

Figure 1.11 Code Optimization Phase Figure 1.12 Code Generation Phase

8. W hat is a pass in the compilation process? Compare and contrast the features of a
single-pass compiler with multi-pass compiler.
Ans: In an implementation of a compiler, the activities of one or more phases are combined into a
single module known as a pass. A pass reads the input, either as a source program file or as the output of
the previous pass, transforms the input and writes the output into an intermediate file. The intermediate
file acts as either the input for the next pass or the final machine code.
When all the phases of a compiler are grouped together into a single pass, then that compiler is known
as single-pass compiler. On the other hand, when different phases of a compiler are grouped together
into two or more passes, then that compiler is known as multi-pass compiler.
A single-pass compiler is faster than the multi-pass compiler because in multi-pass compiler each
pass reads and writes an intermediate file, which makes the compilation process time ­consuming.
Hence, time required for compilation increases with the increase in the number of passes in a compiler.
8 Principles of Compiler Design

A single-pass compiler takes more space than the multi-pass compiler because in multi-pass compiler
the space used by the compiler during one pass can be reused by the subsequent pass. So, for comput-
ers having small memory, multi-pass compilers are preferred. On the other hand, for computers having
large memory, single-pass compiler or compiler with fewer number of passes can be used.
In a single-pass compiler, the complicated optimizations required for high quality code generation are
not possible. To count the exact number of passes for an optimizing compiler is a difficult task.

9. What are the various compiler construction tools?


Ans: For the construction of a compiler, the compiler writer uses different types of software tools
that are known as compiler construction tools. These tools make use of specialized languages for
specifying and implementing specific components, and most of them use sophisticated algorithms. The
tools should hide the details of the algorithm used and produce component in such a way that they can
be easily integrated into the rest of the compiler. Some of the most commonly used compiler construc-
tion tools are:
 Scanner generators: They automatically produce lexical analyzers or scanners.
 Parser generators: They produce syntax analyzers or parsers.
 Syntax-directed translation engines: They produce a collection of routines, which traverses the
parse tree and generates the intermediate code.
 Code generators: They produce a code generator from a set of rules that translates the inter-
mediate language instructions into the equivalent machine language instructions for the target
machine.
 Data-flow analysis engines: They gather the information about how the data is transmitted from
one part of the program to another. For code optimization, data-flow analysis is a key part.
 Compiler-construction toolkits: They provide an integrated set of routines for construction of the
different phases of a compiler.

10. What is a cross compiler? Explain the concept of bootstrapping.


Ans: A compiler which may run on one machine and produce the target code for another machine is
known as cross compiler. For example, a number of minicomputer and microprocessor compilers are
implemented in such a way that they run on bigger machines and the output produced by them acts as
an object code for smaller machines. Thus, the cross compilation technique facilitates platform inde-
pendence. A cross compiler can be represented with the help of a T diagram as shown in Figure 1.13. It
consists of three symbols S, T and I, where:
 S is the source language in which the source program is written,
 T is the target language in which the compiler produces its output or target program, and
 I is the implementation language in which compiler is written.

Source Target
Language S T
Language

Implementation
I Language

Figure 1.13 T Diagram Representation


Introduction to Compilers 9

S T A M

A M

(a) Compiler CAST (b) Compiler CMAM

S T S T

A M
A M

These two These two


languages languages
must be same must be same

(c) Compiler CMST

Figure 1.14 Bootstrapping

Bootstrapping: Bootstrapping is an important concept for building a new compiler. This concept
uses a simple language to translate complicated programs which can further handle more complicated
programs. The process of bootstrapping can be better understood with the help of an example given
here.
Suppose we want to create a cross compiler for the new source language S that generates a target code
in language T, and the implementation language of this compiler is A. We can represent this compiler as
CST
A (see Figure 1.14(a)). Further, suppose we already have a compiler written for language A with both
target and implementation language as M. This compiler can be represented as CAM M (see Figure 1.14(b)).
Now, if we run CSTA with the help of C AM
M , then we get a compiler C ST
M (see Figure 1.14(c)). This com-
piler compiles a source program written in language S and generates the target code in T, which runs on
machine M (that is, the implementation language for this compiler is M).
11. Explain error handling in compiler.
Ans: Error detection and reporting of errors are important functions of the compiler. Whenever
an error is encountered during the compilation of the source program, an error handler is invoked.
Error handler generates a suitable error reporting message regarding the error encountered. The error
reporting message allows the programmer to find out the exact location of the error. Errors can be
encountered at any phase of the compiler during compilation of the source program for several rea-
sons such as:
 In lexical analysis phase, errors can occur due to misspelled tokens, unrecognized characters, etc.
These errors are mostly the typing errors.
 In syntax analysis phase, errors can occur due to the syntactic violation of the language.
 In intermediate code generation phase, errors can occur due to incompatibility of operands type for
an operator.
10 Principles of Compiler Design

 In code optimization phase, errors can occur during the control flow analysis due to some unreach-
able statements.
 In code generation phase, errors can occurs due to the incompatibility with the computer architec-
ture during the generation of machine code. For example, a constant created by compiler may be
too large to fit in the word of the target machine.
 In symbol table, errors can occur during the bookkeeping routine, due to the multiple declaration
of an identifier with ambiguous attributes.

Multiple-Choice Questions
1. A translator that takes as input a high-level language program and translates into machine language
in one step is known as —————.
(a) Compiler (b) Interpreter
(c) Preprocessor (d) Assembler
2. ————— create a single program from several files of relocated machine code.
(a) Loaders (b) Assemblers
(c) Link editors (d) Preprocessors
3. A group of logically related characters in the source program is known as —————.
(a) Token (b) Lexeme
(c) Parse tree (d) Buffer
4. The ————— uses the parse tree and symbol table checking the semantic consistency of the
source program.
(a) Lexical analyzer (b) Intermediate code generator
(c) Syntax translator (d) Semantic analyzer
5. The ————— phase converts an intermediate code into an optimized code that takes lesser space
and lesser time to execute.
(a) Code optimization (b) Syntax directed translation
(c) Code generation (d) Intermediate code generation
6. ————— is invoked whenever any fault occurs in the compilation process of source program.
(a) Syntax analyzer (b) Code generator
(c) Error handler (d) Lexical analyzer
7. In compiler, the activities of one or more phases are combined into a single module known as a
—————.
(a) Phase (b) Pass
(c) Token (d) Macro
8. For the construction of a compiler, the compiler writer uses different types of software tools that are
known as —————.
(a) Compiler writer tools (c) Programming tools
(c) Compiler construction tools (d) None of these
Introduction to Compilers 11

9. A compiler that runs on one machine and produces the target code for another machine is known
as —————.
(a) Cross compiler (b) Linker
(c) Preprocessor (d) Assembler
10. If we run a compiler CST AM
A with the help of another compiler C M , then we get a new compiler that
is —————.
(a) CSM
M (b) CST
A

(c) CST
M (d) CAM
M

Answers
1. (a) 2. (c) 3. (b) 4. (d) 5. (a) 6. (c) 7. (b) 8. (c) 9. (a) 10. (c)

You might also like