Lecture01 Introduction 01
Lecture01 Introduction 01
Kenneth C. Louden
Content
1. INTRODUCTION
2. SCANNING
3. CONTEXT-FREE GRMMARS AND PARSING
4. TOP-DOWN PARSING
5. BOTTOM-UP PARSING
6. SEMANTIC ANALYSIS
7. RUNTIME ENVIRONMENT
8. CODE GENERATION
Main Reference
Book:
BACK
1.2 Programs related to Compiler
Interpreters
• Execute the source program immediately
rather than generating object code
• Examples: BASIC, LISP, used often in
educational or development situations
• Speed of execution is slower than compiled
code by a factor of 10 or more
• Share many of their operations with
compilers
Assemblers
• A translator for the assembly language of a
particular computer
• Assembly language is a symbolic form of
one machine language
• A compiler may generate assembly
language as its target language and an
assembler finished the translation into
object code
Linkers
• Collect separate object files into a directly
executable file
• Connect an object program to the code for
standard library functions and to resource
supplied by OS
• Becoming one of the principle activities of a
compiler, depends on OS and processor
Loaders
• Resolve all re-locatable address relative to a
given base
• Make executable code more flexible
• Often as part of the operating environment,
rarely as an actual separate program
Preprocessors
• Delete comments, include other files, and
perform macro substitutions
• Required by a language (as in C) or can be
later add-ons that provide additional
facilities
Editors
• Compiler have been bundled together with
editor and other programs into an
interactive development environment (IDE)
• Oriented toward the format or structure of
the programming language, called structure-
based
• May include some operations of a compiler,
informing some errors
Debuggers
• Used to determine execution error in a
compiled program
• Keep tracks of most or all of the source
code information
• Halt execution at pre-specified locations
called breakpoints
• Must be supplied with appropriate symbolic
information by the compiler
Profilers
• Collect statistics on the behavior of an
object program during execution
– Called Times for each procedures
– Percentage of execution time
• Used to improve the execution speed of the
program
Project Managers
• Coordinate the files being worked on by different
people, maintain coherent version of a program
• Language-independent or bundled together with a
compiler
• Two popular project manager programs on Unix
system
– Sccs (Source code control system)
– Rcs (revision control system)
BACK
1.3 The Translation Process
The Phases of a Compiler
• Six phases • Three auxiliary
– Scanner components
– Parser – Literal table
– Semantic Analyzer – Symbol table
– Source code optimizer – Error Handler
– Code generator
– Target Code Optimizer
The Phases of a Compiler
Source code
Scanner
Tokens
Literal
Parser
Table
Syntax Tree
Semantics Analyzer
Symbol
Annotated Tree
Table
Source Code Optimizer
Intermediate code
Error
Code Generator Handler
Target code
RETURN
The Parser
• Syntax analysis: it determines the structure
of the program
• The results of syntax analysis are a parse
tree or a syntax tree
• An example: a[index]=4+2
– Parse tree
– Syntax tree ( abstract syntax tree)
The Parse Tree
expression
Assign-expression
expression = expression
subscript-expression additive-expression
Assign-expression
subscript-expression additive-expression
RETURN
The Semantic Analyzer
• The semantics of a program are its “meaning”, as
opposed to its syntax, or structure, that
– determines some of its running time behaviors prior to
execution.
• Static semantics: declarations and type checking
• Attributes: The extra pieces of information
computed by semantic analyzer
• An example: a[index]=4+2
– The syntax tree annotated with attributes
The Annotated Syntax Tree
Assign-expression
subscript-expression additive-expression
integer integer
RETURN
The Source Code Optimizer
• The earliest point of most optimization steps is
just after semantic analysis
• The code improvement depends only on the
source code, and as a separate phase
• Individual compilers exhibit a wide variation in
optimization kinds as well as placement
• An example: a[index]=4+2
– Constant folding performed directly on annotated tree
– Using intermediate code: three-address code, p-code
Optimizations on Annotated Tree
Assign-expression
subscript-expression additive-expression
integer integer
subscript-expression
integer
t = 4 + 2
a[index]=t
t= 6
a[index]=t
a[index]=6
RETURN
The Code Generate
• It takes the intermediate code or IR and generates
code for target machine
• The properties of the target machine become the
major factor:
– Using instructions and representation of data
• An example: a[index]=4+2
– Code sequence in a hypothetical assembly
language
A possible code sequence
RETURN
The Target Code Optimizer
• It improves the target code generated by the
code generator:
– Address modes choosing
– Instructions replacing
– As well as redundant eliminating
MOV R0, index
MUL R0,2 MOV R0, index
MOV R1,&a SHL R0
ADD R1,R0 MOV &a[R0],6
MOV *R1,6
BACK
1.4 Major Data Structure in a
Compiler
Principle Data Structure for
Communication among Phases
• TOKENS
– A scanner collects characters into a token, as a value of an
enumerated data type for tokens
– May also preserve the string of characters or other derived
information, such as name of identifier, value of a number token
– A single global variable or an array of tokens
• THE SYNTAX TREE
– A standard pointer-based structure generated by parser
– Each node represents information collect by parser or later, which
maybe dynamically allocated or stored in symbol table
– The node requires different attributes depending on kind of
language structure, which may be represented as variable record.
Principle Data Structure for
Communication among Phases
• THE SYMBOL TABLE
– Keeps information associated with identifiers: function,
variable, constants, and data types
– Interacts with almost every phase of compiler.
– Access operation need to be constant-time
– One or several hash tables are often used,
• THE LITERAL TABLE
– Stores constants and strings, reducing size of program
– Quick insertion and lookup are essential
Principle Data Structure for
Communication among Phases
• INTERMEDIATE CODE
– Kept as an array of text string, a temporary text, or a
linked list of structures, depending on kind of
intermediate code (e.g. three-address code and p-code)
– Should be easy for reorganization
• TEMPORARY FILES
– Holds the product of intermediate steps during
compiling
– Solve the problem of memory constraints or back-patch
addressed during code generation
BACK
1.5 Other Issues in Compiler
Structure
The Structure of Compiler
• Multiple views from different angles
– Logical Structure
– Physical Structure
– Sequencing of the operations
• A major impact of the structure
– Reliability, efficiency
– Usefulness, maintainability
Analysis and Synthesis
• The analysis part of the compiler analyzes the
source program to compute its properties
– Lexical analysis, syntax analysis and semantics analysis,
as well as optimization
– More mathematical and better understood
• The synthesis part of the compiler produces the
translated codes
– Code generation, as well as optimization
– More specialized
• The two parts can be changed independently of the
other
Front End and Back End
• The operations of the front end depend on the
source language
– The scanner, parser, and semantic analyzer, as well as
intermediate code synthesis
• The operations of the back end depend on the
target language
– Code generation, as well as some optimization analysis
• The intermediate representation is the medium of
communication between them
• This structure is important for compiler portability
Passes
• The repetitions to process the entire source program before
generating code are referred as passes.
• Passes may or may not correspond to phases
– A pass often consists of several phases
– A compiler can be one pass, which results in efficient compilation
but less efficient target code
– Most compilers with optimization use more than one pass
• One Pass for scanning and parsing
• One Pass for semantic analysis and source-level optimization
• The third Pass for code generation and target-level optimization
Language Definition and compilers
BACK
1.6 Bootstrapping and Porting
Third Language for Compiler
Construction
• Machine language
– compiler to execute immediately;
• Another language with existed compiler on the
same target machine : (First Scenario)
– Compile the new compiler with existing compiler
• Another language with existed compiler on
different machine : (Second Scenario)
– Compilation produce a cross compiler
T-Diagram Describing Complex
Situation
• A compiler written in language H that
translates language S into language T.
S T
H
• T-Diagram can be combined in two basic
ways.
The First T-diagram Combination
A B B C A C
H H H
• Original compiler
• Compiler source code retargeted to K
• Result in Cross Compiler
The step 2 in porting
A K A K
A A K K
H
• Cross compiler
• Compiler source code retargeted to K
• Result in Retargeted Compiler
BACK
1.7 The TINY Sample Language
and Compiler
Reading this part of text as homework
1.8 C-Minus: A Language for A
Compiler Project
Reading this part of text as homework)
End of Chapter One
Thanks