Introduction To Compiling
Introduction To Compiling
What is a compiler?
Programming problems are easier to solve in high-level languages
Languages closer to the level of the problem domain, e.g., SmallTalk: OO programming JavaScript: Web pages
Solutions are usually more efficient (faster, smaller) when written in machine language
Language that reflects to the cycle-by-cycle working of a processor
What is a compiler? A program that reads a program written in one language and translates it into another language. As an important part of this translation process, the compiler reports to its user the presence of errors in the source program.
Source language
Compiler
Target language
Error messages
Target Language
Compilation Steps/Phases
Source Program 1
Lexical Analyzer
Syntax Analyzer
3 Symbol-table Manager
Code Optimizer
Code Generator
Target Program
Compilation Steps/Phases
Lexical Analysis Phase: Generates the tokens in the source program
Syntax Analysis Phase: Recognizes sentences" in the program using the syntax of the language
Semantic Analysis Phase: Infers information about the program using the semantics of the language Intermediate Code Generation Phase: Generates abstract code based on the syntactic structure of the program and the semantic information Optimization Phase: Refines the generated code using a series of optimizing transformations Final Code Generation Phase: Translates the abstract intermediate code into specific machine instructions
Lexical Analysis Convert the stream of characters representing input program into a sequence of tokens Tokens are the words" of the programming language Lexeme
The characters comprising a token
For instance, the sequence of characters static int" is recognized as two tokens, representing the two words static" and int" The sequence of characters *x++" is recognized as three tokens, representing *", x" and ++ Removes the white spaces Removes the comments
Lexical Analysis
Input: result = a + b * 10
Tokens:
result, =, a, +, b, *, 10
identifiers
operators
| | |
Syntax Tree
Input: result = a + b * 10
Assign
result
a
b
*
10
It uses the hierarchical structure determined by the syntax-analysis phase to identify the operators and operands of expressions and statements
Performs type checking
Operator operand compatibility
Intermediate Code Generation Translate each hierarchical structure decorated as tree into intermediate code A program translated for an abstract machine Properties of intermediate codes
Should be easy to produce Should be easy to translate into the target program
Intermediate code hides many machine-level details, but has instruction-level mapping to many assembly languages Main motivation: portability One commonly used form is Three-address Code
Code Optimization Apply a series of transformations to improve the time and space efficiency of the generated code. Peephole optimizations: generate new instructions by combining/expanding on a small number of consecutive instructions. Global optimizations: reorder, remove or add instructions to change the structure of generated code Consumes a significant fraction of the compilation time Optimization capability varies widely Simple optimization techniques can be vary valuable
Code Generation Map instructions in the intermediate code to specific machine instructions. Memory management, register allocation, instruction selection, instruction scheduling, Generates sufficient information to enable symbolic debugging.
When an identifier in the source program is detected by the lexical analyzer, the identifier is entered into the symbol table
Error Detection, Recovery and Reporting Each phase can encounter error Specific types of error can be detected by specific phases
Lexical Error: int abc, 1num; Syntax Error: total = capital + rate year; Semantic Error: value = myarray [realIndex];
Should be able to proceed and process the rest of the program after an error detected Should be able to link the error with the source program
:= id1 id2 + *
id3
semantic analyzer
10
:=
Symbol Table position .... initial . rate. intermediate code generator
id1 id2l
+ *
id3
inttoreal
10
E r r o r s
Symbol Table
position ....
initial . rate. intermediate code generator temp1 := inttoreal(10) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3 code optimizer temp1 := id3 * 10.0 id1 := id2 + temp1
E r r o r s
3 address code
final code generator MOVF id3, R2 MULF #10.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1
File inclusion
Include header files
Rational Preprocessors
Augment older languages with modern flow-of-control or data-structures
Language Extension
Add capabilities to a language Equel: query language embedded in C
Assemblers
Source program
Compiler
Assembly program
Assembler
Loader/link-editor
Second pass
Translates each operand code in the machine language
Instruction Register code Addressing Mode Address Relocation bit
Loaders and Link-Editors Convert the relocatable machine code into absolute machine code
Map the relocatable address