18CSC304J-Compiler Design | PDF | Parsing | Compiler
0% found this document useful (0 votes)
26 views

18CSC304J-Compiler Design

Uploaded by

vorava2846
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

18CSC304J-Compiler Design

Uploaded by

vorava2846
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

18CSC304J-Compiler Design

CS416 Compiler Design 1


Preliminaries Required
• Basic knowledge of programming languages.
• Basic knowledge of FSA and CFG.
• Knowledge of a high programming language for the programming
assignments.

Textbook:
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers: Principles, Techniques, and Tools”
Addison-Wesley, 1986.

CS416 Compiler Design 2


Course Outline
• Introduction to Compiling
• Lexical Analysis
• Syntax Analysis
– Context Free Grammars
– Top-Down Parsing, LL Parsing
– Bottom-Up Parsing, LR Parsing
• Syntax-Directed Translation
– Attribute Definitions
– Evaluation of Attribute Definitions
• Semantic Analysis, Type Checking
• Run-Time Organization
• Intermediate Code Generation

CS416 Compiler Design 3


COMPILERS
• A compiler is a program takes a program written in a source language
and translates it into an equivalent program in a target language.

source program COMPILER target program


( Normally a program written in ( Normally the equivalent program in
a high-level programming language) machine code – relocatable object file)

error messages

CS416 Compiler Design 4


CS416 Compiler Design 5
Translators

CS416 Compiler Design 6


CS416 Compiler Design 7
Compiler vs Interpreter

CS416 Compiler Design 8


Compilers vs Assemblers

CS416 Compiler Design 9


Other Applications
• In addition to the development of a compiler, the techniques used in
compiler design can be applicable to many problems in computer
science.
– Techniques used in a lexical analyzer can be used in text editors, information retrieval
system, and pattern recognition programs.
– Techniques used in a parser can be used in a query processing system such as SQL.
– Many software having a complex front-end may need techniques used in compiler design.
• A symbolic equation solver which takes an equation as input. That program should parse the
given input equation.
– Most of the techniques used in compiler design can be used in Natural Language
Processing (NLP) systems.

CS416 Compiler Design 10


Analysis-Synthesis model of compilation
• There are two major parts of a compiler: Analysis and Synthesis

• In analysis phase, an intermediate representation is created from the


given source program.
– Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase.
• In synthesis phase, the equivalent target program is created from this
intermediate representation.
– Intermediate Code Generator, Code Generator, and Code Optimizer are the parts of this
phase.

CS416 Compiler Design 11


Analysis of source program
Analysis is done in 3 phases:
1.Linear Analysis:
Stream of characters are read from left to right and grouped into
tokens
2. Hierarchical Analysis (Syntax):
Tokens are grouped hierarchically into nested collections
3. Semantic Analysis:
Checks inherent meaning of code.

CS416 Compiler Design 12


Language Processing System

CS416 Compiler Design 13


1) Preprocessor
converts the HLL (high level language) into pure high level language. It
includes all the header files and also evaluates if any macro is included. It
is the optional because if any language which does not
support #include and macro preprocessor is not required.
2) Compiler: takes pure high level language as a input and convert into
assembly code.
3) Assembler: takes assembly code as an input and converts it into
assembly code.

CS416 Compiler Design 14


4) Linking and loading:
It has four functions
1. Allocation: get the memory portions from operating system and storing
the object data.
2. Relocation: maps the relative address to the physical address and
relocating the object code.
3. Linker:
It combines all the executable object module to pre single executable file.
4. Loader:
It loads the executable file into permanent storage.

CS416 Compiler Design 15


Phases of A Compiler

Grouping of Phases:
•Analysis Phase:
Lexical, Syntax, Semantic

•Synthesis Phase:
Intermediate Code
generation, Code
optimizer, Code generator

CS416 Compiler Design 16


Lexical Analyzer
• Lexical Analyzer reads the source program character by character and
returns the tokens of the source program.
• A token describes a pattern of characters having same meaning in the
source program. (such as identifiers, operators, keywords, numbers,
delimeters and so on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldvalidentifier
+ add operator
12 a number

• Puts information about identifiers into the symbol table.


• Regular expressions are used to describe tokens (lexical constructs).
• A (Deterministic) Finite State Automaton can be used in the
implementation of a lexical analyzer.

CS416 Compiler Design 17


Syntax Analyzer
• A Syntax Analyzer creates the syntactic structure (generally a parse
tree) of the given program.
• A syntax analyzer is also called as a parser.
• A parse tree describes a syntactic structure.
assgstmt

identifier := expression
• In a parse tree, all terminals are at
leaves.
newval expression + expression
• All inner nodes are non-terminals in
identifier number a context free grammar.

oldval 12

CS416 Compiler Design 18


Syntax Analyzer (CFG)
• The syntax of a language is specified by a context free grammar
(CFG).
• The rules in a CFG are mostly recursive.
• A syntax analyzer checks whether a given program satisfies the rules
implied by a CFG or not.
– If it satisfies, the syntax analyzer creates a parse tree for the given program.

• Ex: We use BNF (Backus Naur Form) to specify a CFG


assgstmt -> identifier := expression
expression -> identifier
expression -> number
expression -> expression + expression

CS416 Compiler Design 19


Syntax Analyzer versus Lexical Analyzer
• Which constructs of a program should be recognized by the lexical
analyzer, and which ones by the syntax analyzer?
– Both of them do similar things; But the lexical analyzer deals with simple non-recursive
constructs of the language.
– The syntax analyzer deals with recursive constructs of the language.
– The lexical analyzer simplifies the job of the syntax analyzer.
– The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program.
– The syntax analyzer works on the smallest meaningful units (tokens) in a source program to
recognize meaningful structures in our programming language.

CS416 Compiler Design 20


Parsing Techniques
• Depending on how the parse tree is created, there are different parsing
techniques.
• These parsing techniques are categorized into two groups:
– Top-Down Parsing,
– Bottom-Up Parsing
• Top-Down Parsing:
– Construction of the parse tree starts at the root, and proceeds towards the leaves.
– Efficient top-down parsers can be easily constructed by hand.
– Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
• Bottom-Up Parsing:
– Construction of the parse tree starts at the leaves, and proceeds towards the root.
– Normally efficient bottom-up parsers are created with the help of some software tools.
– Bottom-up parsing is also known as shift-reduce parsing.
– Operator-Precedence Parsing – simple, restrictive, easy to implement
– LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR

CS416 Compiler Design 21


Semantic Analyzer
• A semantic analyzer checks the source program for semantic errors and
collects the type information for the code generation.
• Type-checking is an important part of semantic analyzer.
• Normally semantic information cannot be represented by a context-free
language used in syntax analyzers.
• Context-free grammars used in the syntax analysis are integrated with
attributes (semantic rules)
– the result is a syntax-directed translation,
– Attribute grammars
• Ex:
newval := oldval + 12

• The type of the identifier newval must match with type of the expression (oldval+12)

CS416 Compiler Design 22


Intermediate Code Generation
• A compiler may produce an explicit intermediate codes representing
the source program.
• These intermediate codes are generally machine (architecture
independent). But the level of intermediate codes is close to the level
of machine codes.
• Ex:
newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2,id3,temp1 Intermediates Codes (Quadraples)


ADD temp1,#1,temp2
MOV temp2,,id1

CS416 Compiler Design 23


Code Optimizer (for Intermediate Code Generator)
• The code optimizer optimizes the code produced by the intermediate
code generator in the terms of time and space.

• Ex:

MULT id2,id3,temp1
ADD temp1,#1,id1

CS416 Compiler Design 24


Code Generator
• Produces the target language in a specific architecture.
• The target program is normally is a relocatable object file containing
the machine codes.

• Ex:
( assume that we have an architecture with instructions whose at least one of its operands is
a machine register)

MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1

CS416 Compiler Design 25


Symbol table
• It is a data structure created and maintained by compilers in order to
store information about the occurrence of various entities such as
variable names, function names, objects, classes, interfaces, etc.
• Symbol table is used by both the analysis and the synthesis parts of a
compiler.
Uses:
• To store the names of all entities in a structured form at one place.
• To verify if a variable has been declared.
• To implement type checking, by verifying assignments and expressions
in the source code are semantically correct.
• To determine the scope of a name (scope resolution).

CS416 Compiler Design 26


Error handler
• The tasks of the Error Handling process are to detect each error, report
it to the user, and then make some recover strategy and implement them
to handle error.
• An Error is the blank entries in the symbol table.
Types of Error –
Run-time error:
✔ error which takes place during the execution of a program, and usually
happens because of adverse system parameters or invalid input data.
✔ Logic errors, occur when executed code does not produce the expected
result. Logic errors are best handled by meticulous program debugging.
Compile-time errors:
✔ rises at compile time, before execution of the program.

CS416 Compiler Design 27


Error handler (…)
Classification of Compile-time error –
•Lexical : This includes misspellings of identifiers, keywords or operators
•Syntax : missing semicolon or unbalanced parenthesis
•Semantic : incompatible value assignment or type mismatches between
operator and operand
•Logical : code not reachable, infinite loop.

CS416 Compiler Design 28


CS416 Compiler Design 29
Error Recovery
Four common error-recovery strategies:
Panic mode
•When a parser encounters an error anywhere in the statement, it ignores
the rest of the statement by not processing input from erroneous input to
delimiter, such as semi-colon.
•This is the easiest way of error-recovery and also, it prevents the parser
from developing infinite loops.
Statement mode
•When a parser encounters an error, it tries to take corrective measures so
that the rest of inputs of statement allow the parser to parse ahead.
•For example, inserting a missing semicolon, replacing comma with a
semicolon etc.
•Parser designers have to be careful here because one wrong correction
may lead to an infinite loop.
CS416 Compiler Design 30
Error Recovery (…)
Error productions
•Some common errors are known to the compiler designers that may occur
in the code.
•In addition, the designers can create augmented grammar to be used, as
productions that generate erroneous constructs when these errors are
encountered.
Global correction
•The parser considers the program in hand as a whole and tries to figure
out what the program is intended to do and tries to find out a closest match
for it, which is error-free.
•When an erroneous input (statement) X is fed, it creates a parse tree for
some closest error-free statement Y.
•This may allow the parser to make minimal changes in the source code,
but due to the complexity (time and space) of this strategy, it has not been
implemented in practice yet. CS416 Compiler Design 31
Cousins of complier
1. Preprocessor:
•A preprocessor is a program that processes its input data to produce
output that is used as input to another program.
•The output is said to be a preprocessed form of the input data, which is
often used by some subsequent programs like compilers.
•They may perform the following functions : Macro processing, Rational
Preprocessors, File Inclusion, Language extension
•Macro processing:
✔A macro is a rule or pattern that specifies how a certain input sequence
should be mapped to an output sequence according to a defined procedure.
✔The mapping process that instantiates a macro into a specific output
sequence is known as macro expansion.

CS416 Compiler Design 32


Cousins of complier (…)
• File Inclusion:
✔ Preprocessor includes header files into the program text.
✔ When the preprocessor finds an #include directive it replaces it by the
entire content of the specified file.
• Rational Preprocessors:
✔ These processors change older languages with more modern
flow-of-control and data-structuring facilities.
• Language extension :
✔ These processors attempt to add capabilities to the language by what
amounts to built-in macros.
✔ For example, the language Equel is a database query language
embedded in C.

CS416 Compiler Design 33


Cousins of complier (…)
2. Assembler
•Assembler creates object code by translating assembly instruction
mnemonics into machine code.
•There are two types of assemblers:
✔One-pass assemblers go through the source code once and assume that
all symbols will be defined before any instruction that references them. ·
✔Two-pass assemblers create a table with all symbols and their values in
the first pass, and then use the table in a second pass to generate code

CS416 Compiler Design 34


Cousins of complier (…)
3. Linker and Loader
•A linker or link editor is a program that takes one or more objects
generated by a compiler and combines them into a single executable
program.
•Three tasks of the linker are
1.Searches the program to find library routines used by program, e.g.
printf(), math routines.
2. Determines the memory locations that code from each module will
occupy and relocates its instructions by adjusting absolute references
3. Resolves references among files.
•A loader is the part of an operating system that is responsible for loading
programs in memory, one of the essential stages in the process of starting a
program.

CS416 Compiler Design 35


Grouping of Phases
Front end
Phases: Lexical analysis, Syntax analysis, Semantic analysis,
Intermediate code generation.
• Front end comprises of phases which are dependent on the input
(source language) and independent on the target machine (target
language).
• It includes lexical and syntactic analysis, symbol table management,
semantic analysis and the generation of intermediate code.
• Code optimization can also be done by the front end.
• It also includes error handling at the phases concerned.

CS416 Compiler Design 36


Grouping of Phases (…)

Back End
•Phases: Code optimizer and code generator
•Back end comprises of those phases of the compiler that are dependent
on the target machine and independent on the source language.
•This includes code optimization, code generation.
•In addition to this, it also encompasses error handling and symbol table
management operations.

CS416 Compiler Design 37


Grouping of Phases (…)

Passes
•The phases of compiler can be implemented in a single pass by marking
the primary actions viz. reading of input file and writing to the output
file.
•Several phases of compiler are grouped into one pass in such a way that
the operations in each and every phase are incorporated during the pass.
•Lexical analysis, syntax analysis, semantic analysis and intermediate
code generation might be grouped into one pass. If so, the token stream
after lexical analysis may be translated directly into intermediate code.

CS416 Compiler Design 38


Grouping of Phases (…)
Reducing the Number of Passes
•Minimizing the number of passes improves the time efficiency as
reading from and writing to intermediate files can be reduced.
•When grouping phases into one pass, the entire program has to be kept
in memory to ensure proper information flow to each phase because one
phase may need information in a different order than the information
produced in previous phase.
•The source program or target program differs from its internal
representation. So, the memory for internal form may be larger than that
of input and output.

CS416 Compiler Design 39


Compiler construction tools
Parser Generators
Input: Grammatical description of a programming language
Output: Syntax analyzers.
Parser generator takes the grammatical description of a programming
language and produces a syntax analyzer.
Scanner Generators
Input: Regular expression description of the tokens of a language
Output: Lexical analyzers.
Scanner generator generates lexical analyzers from a regular expression
description of the tokens of a language.
Syntax-directed Translation Engines
Input: Parse tree.
Output: Intermediate code.
Syntax-directed translation engines produce collections of routines that
walk a parse tree and generates intermediate code.
CS416 Compiler Design 40
Compiler construction tools (…)
Automatic Code Generators
Input: Intermediate language.
Output: Machine language.
Code-generator takes a collection of rules that define the translation of
each operation of the intermediate language into the machine language for
a target machine.
Data-flow Analysis Engines
Data-flow analysis engine gathers the information that is, the values
transmitted from one part of a program to each of the other parts.
Data-flow analysis is a key part of code optimization.
Compiler Construction Toolkits
The toolkits provide integrated set of routines for various phases of
compiler. Compiler construction toolkits provide an integrated set of
routines for construction of phases of compiler.
CS416 Compiler Design 41

You might also like