Chapter 1 - Introduction To Comp
Chapter 1 - Introduction To Comp
(CoSc-3112)
Chapter One
Introduction
Mar 2024
Objective
At the end of this session students will be able to:
Understand the basic concepts and principles of Compiler Design
• Introduction
• Analysis and Synthesis in compilation
• Various phases in a compilation
• Compiler construction tools
Introduction to language Processors
Compiler is an executable program that can read a program in one high-level language
During analysis, the operations implied by the source program are determined and
Analysis
End
Synthesis
End
5. Optimization
Takes the tree structure and translates the operations into the target program
Various phases in a compilation
Analysis
1. Linear/Lexical analysis (Scanning)
The stream of characters is read from left to right and grouped into tokens.
A token is a sequence of characters having a collective meaning.
Token is the basic lexical unit.
Examples:
• Identifiers are variables
• Keywords
• Symbols (+, -, …)
• Numbers
• Etc…
Blanks, new lines, tabulation marks will be removed during lexical analysis.
Example
DIST1 = DIST2 + 5 * 4
<IDENT,1> <ASSIGN> <IDENT,2> <PLUS> <NUMB,3> <MULT> <NUMB,4>
2. Hierarchical/Syntax analysis (Parsing)
Tokens are grouped hierarchically into nested collections with collective meaning.
The result is generally a parse tree.
Most syntactic errors in the source program are caught in this phase.
Syntactic rules of the source language are given via a Grammar.
Consider the previous example: DIST1
= DIST2 + 5 * 4
IDENT ASSIGN IDENT PLUS NUMB MULT NUMB
3. Semantic analysis
Certain checks are performed to make sure that the components of the program fit
together meaningfully.
Unlike parsing, this phase checks for semantic errors in the source program (e.g.
type mismatch)
Semantic analysis uses the symbol table.
Symbol table:- is a data structure with a record for each identifier and its attributes
• Attributes include storage allocation, type, scope, etc
• All the compiler phases insert and modify the symbol table
The result of semantic analysis is Intermediate Code (IC).
The IC can be represented using either abstract tree or Three address code
The previous example in three address code:
TEMP1 = 5 * 4
TEMP2 = inttoreal( TEMP1 )
TEMP3 = IDENT2 + TEMP2
IDENT1 = TEMP3
Synthesis
Synthesis is composed of two phases:
1. Code optimization
2. Code generation
3. Code optimization
This phase changes the IC so that the code generator produces a faster and less memory
consuming program.
The optimized code does the same thing as the original (non-optimized) code but with less cost
in terms of CPU time and memory space.
There are several techniques of optimizing code and they will be discussed in the last chapter.
Example
Unnecessary lines of code in loops (i.e. code that could be executed outside of the loop) are moved out
of the loop.
for(i=1; i<10; i++){ x = y+1;
x = y+1; for(i=1; i<10; i++)
z = x+i; z = x+i;
}
2. Code generation
The final phase of the compiler.
Generates the target code in the target language (e.g. Assembly)
The instructions in the IC are translated into a sequence of machine instructions that
perform the same task.
Phase I: Lexical Analysis
Typically, spaces, tabs, end-of-line characters and comments are ignored by the lexical analyzer.
To design a lexical analyzer: input a description (regular expressions) of the tokens in the language, and
output a lexical analyzer (a program).
Phase II: Parsing (Syntax Analysis)
A parser gets a stream of tokens from the scanner, and determines if the syntax (structure) of the
program is correct according to the (context-free) grammar of the source language.
Then, it produces a data structure, called a parse tree or an abstract syntax tree, which describes the
if if (x > 3) then x = x + 1
Context-free grammars will be used (as the input) by the parser generator to describe the syntax of
the compiling language
Most compilers do not generate a parse tree explicitly but rather go to intermediate code directly as
syntax analysis takes place.
Parse Tree
Is output of parsing that shows the Top-down description of program syntax
Root node is entire program and leaves are tokens that were identified during lexical
analysis
It gets the parse tree from the parser together with information about some syntactic elements
It determines if the semantics (meanings) of the program is correct.
It detects errors of the program, such as using variables before they are declared, assign an
integer value to a Boolean variable, …
semantic of programs that can be checked by reading off from the program only.
syntax of the language which cannot be described in context-free grammar.
Mostly, a semantic analyzer does type checking (i.e. Gathers type information for subsequent code
generation.)
It modifies the parse tree in order to get that (static) semantically correct code
In this phase, the abstract syntax tree that is produced by the parser is traversed, looking for
semantic errors
Contd.
The main tool used by the semantic analyzer is a symbol table
Symbol table:- is a data structure with a record for each identifier and its attributes
Attributes include storage allocation, type, scope, etc
All the compiler phases insert and modify the symbol table
Discovery of meaning in a program using the symbol table
Do static semantics check
Simplify the structure of the parse tree ( from parse tree to abstract syntax tree (AST) )
Static semantics check
Making sure identifiers are declared before use
Type checking for assignments and operators
Checking types and number of parameters to subroutines
Making sure functions contain return statements
Making sure there are no repeats among switch statement labels
Phase IV: Intermediate Code Generation
In some compilers, a source program is translated into an intermediate code first and then the
In other compilers, a source program is translated directly into the target language.
Compiler makes a second pass over the parse tree to produce the translated code
If there are no compile-time errors, the semantic analyzer translates the abstract syntax tree into the
The abstract assembly tree will be passed to the code optimization and assembly code generation
phase
Contd.
Using intermediate code is beneficial when compilers which translates a single source
Different back-ends:- code optimizer and code generator is required for each
target language.
Code generator coverts the abstract assembly tree into the actual assembly code
To do code generation
The generator covers the abstract assembly tree with tiles (each tile represents a small portion of
Output the actual assembly code associated with the tiles that we used to cover the tree
The final phase of compilation coverts the assembly code into machine code and links (by a linker) in
1. Local
2. Global
Local Optimization
The compiler looks at a very small block of instructions and tries to determine how it
can improve the efficiency of this local code block
1. Constant evaluation
2. Strength reduction
The compiler looks at large segments of the program to decide how to improve
performance
Much more difficult; usually omitted from all but the most sophisticated and
expensive production- level “optimizing compilers”
Optimization cannot make an inefficient algorithm efficient
Compiler construction tools
More specialized tools have been created to help implement various phases of a compiler.