0% found this document useful (0 votes)
849 views19 pages

Compiler Phases

The document discusses the different phases of a compiler: lexical analysis, syntax analysis, semantic analysis, intermediate code generation, target code generation, and code optimization. It explains that the compiler front-end analyzes the source code for correctness, while the back-end translates the program into an equivalent target program through various intermediate representations and optimizations. The phases work together to first understand then rewrite the program into an efficient low-level form.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
849 views19 pages

Compiler Phases

The document discusses the different phases of a compiler: lexical analysis, syntax analysis, semantic analysis, intermediate code generation, target code generation, and code optimization. It explains that the compiler front-end analyzes the source code for correctness, while the back-end translates the program into an equivalent target program through various intermediate representations and optimizations. The phases work together to first understand then rewrite the program into an efficient low-level form.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

COP4020 Programming Languages

Compiler phases Prof. Xin Yuan

Overview

Compiler phases

Lexical analysis Syntax analysis Semantic analysis Intermediate (machine-independent) code generation Intermediate code optimization Target (machine-dependent) code generation Target code optimization

11/10/2013

COP4020 Spring 2013

Source program with macros

A typical compilation process


Preprocessor
Source program

Compiler
Target assembly program

assembler
Relocatable machine code

Try g++ with v, -E, -S flags on linprog.

linker
Absolute machine code
3

11/10/2013

COP4020 Spring 2013

What is a compiler?

A program that reads a program written in one language (source language) and translates it into an equivalent program in another language (target language).

Two components

Understand the program (make sure it is correct) Rewrite the program in the target language.

Traditionally, the source language is a high level language and the target language is a low level language (machine code).

Source program

compiler

Target program

Error message
11/10/2013 COP4020 Spring 2013 4

Compilation Phases and Passes

Compilation of a program proceeds through a fixed series of phases


Each phase use an (intermediate) form of the program produced by an earlier phase Subsequent phases operate on lower-level code representations

Each phase may consist of a number of passes over the program representation

Pascal, FORTRAN, C languages designed for one-pass compilation, which explains the need for function prototypes Single-pass compilers need less memory to operate Java and ADA are multi-pass

11/10/2013

COP4020 Spring 2013

Compiler Front- and Back-end


Source program (character stream) Abstract syntax tree or other intermediate form

Scanner (lexical analysis)


Tokens

Front end analysis

MachineIndependent Code Improvement Back end synthesis


Modified intermediate form

Parser (syntax analysis)


Parse tree

Target Code Generation


Assembly or object code

Semantic Analysis and Intermediate Code Generation


Abstract syntax tree or other intermediate form
11/10/2013

Machine-Specific Code Improvement


Modified assembly or object code
COP4020 Spring 2013 6

Scanner: Lexical Analysis

Lexical analysis breaks up a program into tokens

Grouping characters into non-separatable units (tokens) Changing a stream to characters to a stream of tokens
program gcd (input, output); var i, j : integer; begin read (i, j); while i <> j do if i > j then i := i - j else j := j - i; writeln (i) end.

program var read i then := )


11/10/2013

gcd i ( <> i i end

( , i j := .

input j , do i i

, : j if ;

output integer ) i j writeln

) ; ; > else (

; begin while j j i

COP4020 Spring 2013

Scanner: Lexical Analysis

What kind of errors can be reported by lexical analyzer?


A = b + @3;

11/10/2013

COP4020 Spring 2013

Parser: Syntax Analysis

Checks whether the token stream meets the grammatical specification of the language and generates the syntax tree.

A syntax error is produced by the compiler when the program does not meet the grammatical specification. For grammatically correct program, this phase generates an internal representation that is easy to manipulate in later phases

Typically a syntax tree (also called a parse tree).

A grammar of a programming language is typically described by a context free grammer, which also defines the structure of the parse tree.

11/10/2013

COP4020 Spring 2013

Context-Free Grammars

A context-free grammar defines the syntax of a programming language The syntax defines the syntactic categories for language constructs

Statements Expressions Declarations

Categories are subdivided into more detailed categories

A Statement is a

For-statement If-statement Assignment

<statement> <for-statement> <assignment>


11/10/2013

::= <for-statement> | <if-statement> | <assignment> ::= for ( <expression> ; <expression> ; <expression> ) <statement> ::= <identifier> := <expression>
COP4020 Spring 2013 10

Example: Micro Pascal


::= program <id> ( <id> <More_ids> ) ; <Block> . ::= <Variables> begin <Stmt> <More_Stmts> end ::= , <id> <More_ids> | <Variables> ::= var <id> <More_ids> : <Type> ; <More_Variables> | <More_Variables> ::= <id> <More_ids> : <Type> ; <More_Variables> | <Stmt> ::= <id> := <Exp> | if <Exp> then <Stmt> else <Stmt> | while <Exp> do <Stmt> | begin <Stmt> <More_Stmts> end <Exp> ::= <num> | <id> | <Exp> + <Exp> | <Exp> - <Exp> <Program> <Block> <More_ids>
11/10/2013 COP4020 Spring 2013 11

Parsing examples

Pos = init + / rate * 60 id1 = id2 + / id3 * const syntax error (exp ::= exp + exp cannot be reduced). Pos = init + rate * 60 id1 = id2 + id3 * const := id2

id1

+
id3 * 60

11/10/2013

COP4020 Spring 2013

12

Semantic Analysis

Semantic analysis is applied by a compiler to discover the meaning of a program by analyzing its parse tree or abstract syntax tree. A program without grammatical errors may not always be correct program.

pos = init + rate * 60 What if pos is a class while init and rate are integers? This kind of errors cannot be found by the parser Semantic analysis finds this type of error and ensure that the program has a meaning.
COP4020 Spring 2013 13

11/10/2013

Semantic Analysis

Static semantic checks (done by the compiler) are performed at compile time

Type checking Every variable is declared before used Identifiers are used in appropriate contexts Check subroutine call arguments Check labels

Dynamic semantic checks are performed at run time, and the compiler produces code that performs these checks

Array subscript values are within bounds Arithmetic errors, e.g. division by zero Pointers are not dereferenced unless pointing to valid object A variable is used but hasn't been initialized When a check fails at run time, an exception is raised
14

11/10/2013

COP4020 Spring 2013

Semantic Analysis and Strong Typing

A language is strongly typed "if (type) errors are always detected"


Errors are either detected at compile time or at run time Examples of such errors are listed on previous slide Languages that are strongly typed are Ada, Java, ML, Haskell Languages that are not strongly typed are Fortran, Pascal, C/C++, Lisp

Strong typing makes language safe and easier to use, but potentially slower because of dynamic semantic checks In some languages, most (type) errors are detected late at run time which is detrimental to reliability e.g. early Basic, Lisp, Prolog, some script languages
COP4020 Spring 2013 15

11/10/2013

Code Generation and Intermediate Code Forms

A typical intermediate form of code produced by the semantic analyzer is an abstract syntax tree (AST) The AST is annotated with useful information such as pointers to the symbol table entry of identifiers

Example AST for the gcd program in Pascal


11/10/2013 COP4020 Spring 2013 16

Code Generation and Intermediate Code Forms

Other intermediate code forms

intermediate code is something that is both close to the final machine code and easy to manipulate (for optimization). One example is the threeaddress code: dst = op1 op op2 The three-address code for the assignment statement:
temp1 = 60 temp2 = id3 + temp1 temp3 = id2 + temp2 id1 = temp3

Machine-independent Intermediate code improvement


temp1 = id3 * 60.0 id1 = id2 + temp1

11/10/2013

COP4020 Spring 2013

17

Target Code Generation and Optimization

From the machine-independent form assembly or object code is generated by the compiler
MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1

This machine-specific code is optimized to exploit specific hardware features

11/10/2013

COP4020 Spring 2013

18

Summary

Compiler front-end: lexical analysis, syntax analysis, semantic analysis

Tasks: understanding the source code, making sure the source code is written correctly

Compiler back-end: Intermediate code generation/improvement, and Machine code generation/improvement

Tasks: translating the program to a semantically the same program (in a different language).

11/10/2013

COP4020 Spring 2013

19

You might also like