0% found this document useful (0 votes)
45 views28 pages

BCS 324 Lesson 1

The document provides an overview of compiler design and construction, defining key concepts such as programs, translators, and compilers. It outlines the compilation process, including analysis and synthesis phases, and discusses the importance of understanding compilers for programming efficiency and language design. Additionally, it covers various aspects of compiler design, such as optimization, error handling, and the phases of compilation including lexical, syntax, and semantic analysis.

Uploaded by

kipchumbaian80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views28 pages

BCS 324 Lesson 1

The document provides an overview of compiler design and construction, defining key concepts such as programs, translators, and compilers. It outlines the compilation process, including analysis and synthesis phases, and discusses the importance of understanding compilers for programming efficiency and language design. Additionally, it covers various aspects of compiler design, such as optimization, error handling, and the phases of compilation including lexical, syntax, and semantic analysis.

Uploaded by

kipchumbaian80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BCS 324: COMPILER DESIGN &

CONSTRUCTION

1. Introduction to Compiling

Dr. D. K. Muyobo
Introduction
 Definitions:
◦ A program: is a set of instructions written in a
programming language to perform a specific task.
◦ Programs can be written in high-level languages
(e.g., Python, Java) or low-level languages (e.g.,
Assembly, Machine Code).
◦ A translator: is any program that converts code
from one language to another.
◦ They include:
1. Assemblers
2. compilers, and
3. interpreters.
 Definition:
◦ A compiler is a program that reads a program
written in one language (the source language) and
translates into an equivalent program in another
language (the target language).
 Why study compilers?
◦ Compiler writing is useful to programming
languages, machine architecture, language theory,
algorithms and software engineering
 Understanding compilers helps in:
 Writing efficient programs.
 Learning about programming language design.
 Optimizing performance.
 Developing new programming languages.

 Issues in Compiler Design


 Optimization: How to make code run faster and
consume fewer resources.
 Error Handling: Detecting and reporting
syntax/semantic errors.
 Portability: Making the compiler work on different
architectures.
 Speed: Efficient translation of code without long
compilation times.
 In the process of compiling
◦ The compiler reports errors in the source
program

Source Target
Compiler
Program Program

Error messages

Fig.1: An oversimplified view of a compiler

5
 In contrast, an interpreter is a program that
◦ Performs the operations implied by the source
program
◦ This course will focus on compilers.

Source
Interpreter Output
Program

Error messages

Fig.2: An oversimplified view of an interpreter

6
The analysis-synthesis model of compilation

 There are two parts to compilation:


◦ Part 1:Analysis
 The analysis part is also known as the front end of a
compiler
 It breaks up source program into pieces and
imposes a grammatical structure on them
 Creates intermediate representation of source
program
 Determines the operations and records them in a
tree structure, syntax tree

7
◦ Part2: Synthesis
 Synthesis part is also known as back end of a compiler
 It constructs a target program from intermediate
representation
 Takes the tree structure and translates the operations
into the target program

Intermediate
Language

Source Target
Language Front End Back End Language

Fig.3:The analysis-synthesis model


8
 During analysis
◦ Source program statements are recorded in a
syntax tree
◦ Each node of the tree represents an operation
◦ The children of the node represents the
arguments of the operation e.g.
 Stmt: Totalmarks = test1 + test2 * 3

=
Totalmarks +
test1 *
test2 3
Fig.4: Syntax tree for Totalmarks = test1 + test2 * 3

9
Other tools that use the analysis-synthesis
model
 Editors
◦ Can perform text highlighting like an ordinary text
editor
◦ Can analyze text into hierarchical structures
 Pretty printers
◦ Makes structure of program visible e.g. may indent
nested statements
 Static checkers
◦ Analyzes a program and attempts to find errors in it
without running it.

10
 Interpreters
◦ Translates one statement at a time instead of
producing a complete target program
 Text formatters
◦ Takes a stream of characters as input
◦ Recognizes paragraphs, figures, tables etc.
 Silicon compilers
◦ Used in circuit design
◦ Its variables are logical signals of “0” and “1” and not
memory locations
 Query interpreters
◦ Used in query translation in databases.

11
A language-processing system
Skeletal source program

Preprocessor
Source program
Compiler
Target assembly program
Assembler
Relocatable machine code
Linker Libraries and
relocatable object files

Absolute machine code


Fig.5: A language processing system
12
Analysis
 Analysis has three phases:
1. Lexical analysis (linear analysis/scanning):
 Reads a stream of characters from left-to-right and
groups them into tokens
 Eliminates white space
2. Syntax analysis (hierarchical analysis/parsing):
 Groups the tokens hierarchically
3. Semantic analysis:
 Checks if the program components fit together
meaningfully e.g. if float and integer are not being
computed together.

13
1. Lexical analysis
 The stream of characters in source program is read
from left to right and grouped into tokens (sequences
of characters having a collective meaning)
 Tokens: smallest units like keywords, operators,
identifiers [KEYWORD int] [IDENTIFIER x] [ASSIGN =] [NUMBER 5]
[SEMICOLON ;]

Example: Totalmarks = test1 + test2 * 3 is broken into


seven tokens as follows:
1. The identifier Totalmarks
2. The assignment =
3. The identifier test1
4. The plus sign
5. The identifier test2
6. The multiplication sign
7. The number 3 14
2. Syntax analysis (parsing)
 The tokens are grouped into grammatical phrases
needed by the compiler to synthesize the output
 The phrases are represented by a parse tree e.g.
asignment
stmt
id = expr

Totalmarks expr + expr

id expr * expr

test1 id number

test2 3

Fig.6: Parse tree for Totalmarks = test1 + test2 * 3


15
 Syntax analysis is usually expressed by
recursive rules e.g. rules for definition of
expression:
1. Any identifier is an expression
2. Any number is an expression
3. If expression1 and expression2 are expressions, then so
are
 expression1 + expression2
 expression1 * expression2
 (expression)

16
 This means that
◦ Rules (1) and (2) are non-recursive
◦ Rule (3) defines expressions in terms of
operators applied to other expressions.
 Thus
◦ By rule (1) – test1 and test2 are expressions
◦ By rule (2) – 3 is an expression
◦ By rule (3) – ‘test2 * 3’ is first inferred as an
expression then ‘test1 + test2 * 3’ is an
expression.

17
 NB:
◦ The difference between lexical and syntax analysis
is that lexical doesn’t need recursion
◦ A syntax tree is a compressed form of a parse
tree. Thus, in a syntax tree, operators are interior
nodes.
 Task:
◦ Compare and contrast Figs. 4 and 6 above

18
3. Semantic analysis
 Checks source program for semantic errors
 Gathers type information for subsequent code
generation (also called type checking)
 Identifies operators and operands of
expressions and statements
 Thus a semantic analyzer reports errors
◦ when two different types are involved in an operation
e.g. in Fig.6, the number 3 raises an error, and is
therefore converted to float i.e. to 3.0.

19
=
Totalmarks +
test1 *
test2 inttofloat

Fig.7: Semantic analysis converts int to float

20
Phases of a compiler

Fig.8: Phases of a compiler


21
Symbol-Table Management
 Symbol table – data structure with a
record for each identifier and its
attributes
 Attributes include storage allocation, type,
scope, etc.
 All the compiler phases insert and modify
the symbol table e.g.
◦ Type information is detected late at semantic
phase, hence it is added to the symbol table
immediately after being seen.
22
Error detection and reporting
 Each phase can detect an error
 After detecting it, the phase must get a way of dealing
with the error
 Syntax and semantic phases handle a large portion of
the errors
 Lexical
◦ Detects errors where a character does not form a token of the
language e.g. := is an assignment statement in Pascal language
but meaningless in Java
 Syntax
◦ Detects errors where structure (syntax) rules are violated
 Semantic
◦ Detects constructs that may have the right syntax but no
meaning

23
 The running example is given using real
characters and variables
 However, internal representation is usually
done using id1,id2 and id3 instead of
actual token names
 Therefore, Totalmarks = test1 + test2 * 3
is represented internally as
◦ id1 = id2 + id3 * 3.

24
Intermediate code generation
 Comes after semantic analysis
 It is like a program for an abstract
machine with two properties:
◦ Should be easy to produce
◦ Easy to translate into target program
 It is a sequence of instructions, each with
at most three operands.

25
 The intermediate code for our example is
written as follows:
temp1 = inttofloat(3)
temp2 = id3 * temp1
temp3 = id2 + temp2
id1 = temp3

26
Code generation
 Code generation phase is used to
◦ Generate target code, which is either
assembly code or machine code
◦ The following assembly code translates each
statement in the intermediate code program
using registers as memory locations
MOVF id3, R2
MULF #3.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1

27
Code optimization
 The code optimization phase
◦ Attempts to condense the number of
statements so as to increase speed in
executing the code i.e. Improve intermediate
code by producing code that runs faster e.g.
temp1 = id3 * 3.0
id1 = id2 * id2 + temp1

NB: The inttofloat statement is eliminated, and


temp3 is used only once.

28

You might also like