0% found this document useful (0 votes)
38 views

M2 Session2

The document discusses several topics related to compiler construction tools and programming language evolution: 1. It lists common compiler construction tools like scanner and parser generators, syntax-directed translation engines, and code generators. 2. It describes the evolution of programming languages from machine languages to modern imperative and declarative languages, and classifications like generation, paradigms, and lexical/syntactic analysis. 3. Lexical analysis is discussed as the first phase of compilation involving reading input and producing tokens, with techniques like input buffering and sentinels to improve efficiency.

Uploaded by

Amarnath Kambale
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

M2 Session2

The document discusses several topics related to compiler construction tools and programming language evolution: 1. It lists common compiler construction tools like scanner and parser generators, syntax-directed translation engines, and code generators. 2. It describes the evolution of programming languages from machine languages to modern imperative and declarative languages, and classifications like generation, paradigms, and lexical/syntactic analysis. 3. Lexical analysis is discussed as the first phase of compilation involving reading input and producing tokens, with techniques like input buffering and sentinels to improve efficiency.

Uploaded by

Amarnath Kambale
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Compiler Construction Tools

• Scanner generators

• Parser generators

• Syntax-directed translation engines


• Code-generator –generator
• Data flow analysis engines
• Compiler construction tool kits
1.3 The Evolution of Programming Language
• The first electronic computer appeared in 1940’s
and wereprogrammed in machine language by sequence of 0’s
and1’s
The move to higher level language:
There are thousands of programming language . They can be
classified in a variety of ways.
1. Classification by Generation
2. Classification of language by imperative and declarative
3. Object-oriented language
4. Scripting Language
5. Von-Neumann Language
Classification by Generation
• First generation language: They are machine language.
• Second language: They are Assembly language.
• Third generation: The Higher level Language like C, C++,
C#, java
• Fourth generation : They are language designed for specific
application like SQL for data base queries, post scripting for
text formatting.
• Fifth generation: It has been applied to logic and
constrain- based language like prolog and ops5.
Classification of imperative and declarative

• Imperative language:in which program specifies how a computation


is to be done
• Language such as C, C++,C#, and java.
• Declarative language: in which a program specifies what
computation is to be done
• Functional language such as ML and Haskell , Constraint logic
language such as prolog.
Lexical Analysis
• As the first phase of a compiler, the main task
of the lexical analyzer is to read the input
characters of the source program, group them
into lexemes.
• Produce as output a sequence of tokens for
each lexeme in the source program.
• The stream of tokens is sent to the parser for
syntax analysis.
The Role of t he Lexical Analyzer
The Role of t he Lexical Analyzer
• Stripping out comments and white spaces (blank,
newline, tab)
• Correlating error messages with source program
• Keeps track of Line numbers
Lexical Analysis Versus Parsing
The separation of lexical and syntactic analysis often
allows us to simplify at least one of these tasks.
• Simplicity of design is the most important consideration.
• Compiler efficiency is improved.
- A separate lexical analyzer allows us to apply specialized
techniques that serve only the lexical task, not the job of
parsing.
- In addition, specialized buffering techniques for reading
input characters can speed up the compiler significantly.
• Compiler portability is enhanced.
Tokens, Patterns and Lexemes
• A token is a pair consisting of a token name and an
optional attribute value.
• The token name is an abstract symbol representing a
kind of lexical unit.
• e.g., a particular keyword, or a sequence of input
characters denoting an identifier.
Tokens, Patterns and Lexemes
• A pattern is a description of the form that the
lexemes of a token may take.
• In the case of a keyword as a token, the pattern
is just the sequence of characters that form the
keyword.
Tokens, Patterns and Lexemes
• A lexeme is a sequence of characters in the
source program that matches the pattern for a
token and is identified by the lexical analyzer
as an instance of that token.
Example
Token Informal description Sample lexemes
if Characters i, f if
else Characters e, l, s, e else
comparis < or > or <= or >= or <=,
on == or != !=
pi, score, D2
id Letter followed by letter and 3.14159, 0, 6.02e23
number digits Any numeric constant “core dumped”
literal Anything but “ sorrounded by “

printf(“total = %d\n”, score);


Input Buffering- increases the speed of
reading the source pgm
• Two-buffer scheme that handles large lookaheads
safely called Buffer Pairs.
• Specialized buffering techniques have been
developed to reduce the amount of overhead required
to process a single input character.
Buffer Pairs
-Each buffer is of the same size N, ie size of disk block-
4096 bytes
-using one system read command-can read N characters
into budffer
Two pointers to the input are maintained:
1. Pointer lexemeBegin, marks the beginning of the
current lexeme, whose extent we are attempting to
determine.
2. Pointer forward scans ahead until a pattern match is
found.
Sentinels
• Thus, for each character read, we make two
tests: one for the end of the buffer and one to
determine what character is read.
• Each buffer can hold a sentinel character at the
end.
• The sentinel is a special character that cannot
be part of the source program and a natural
choice is the character eof.
Sentinels at the end of each buffer
Lookahead code with sentinels
switch ( *forward++ ) {
case eof:
if (forward is at end of first buffer ) {
reload second buffer;
forward = beginning of second buffer;
}
else if (forward is at end of second buffer ) {
reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input * /
terminate lexical analysis;
break;

Cases for the other characters


}

You might also like