The document discusses several topics related to compiler construction tools and programming language evolution:
1. It lists common compiler construction tools like scanner and parser generators, syntax-directed translation engines, and code generators.
2. It describes the evolution of programming languages from machine languages to modern imperative and declarative languages, and classifications like generation, paradigms, and lexical/syntactic analysis.
3. Lexical analysis is discussed as the first phase of compilation involving reading input and producing tokens, with techniques like input buffering and sentinels to improve efficiency.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
38 views
M2 Session2
The document discusses several topics related to compiler construction tools and programming language evolution:
1. It lists common compiler construction tools like scanner and parser generators, syntax-directed translation engines, and code generators.
2. It describes the evolution of programming languages from machine languages to modern imperative and declarative languages, and classifications like generation, paradigms, and lexical/syntactic analysis.
3. Lexical analysis is discussed as the first phase of compilation involving reading input and producing tokens, with techniques like input buffering and sentinels to improve efficiency.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17
Compiler Construction Tools
• Scanner generators
• Parser generators
• Syntax-directed translation engines
• Code-generator –generator • Data flow analysis engines • Compiler construction tool kits 1.3 The Evolution of Programming Language • The first electronic computer appeared in 1940’s and wereprogrammed in machine language by sequence of 0’s and1’s The move to higher level language: There are thousands of programming language . They can be classified in a variety of ways. 1. Classification by Generation 2. Classification of language by imperative and declarative 3. Object-oriented language 4. Scripting Language 5. Von-Neumann Language Classification by Generation • First generation language: They are machine language. • Second language: They are Assembly language. • Third generation: The Higher level Language like C, C++, C#, java • Fourth generation : They are language designed for specific application like SQL for data base queries, post scripting for text formatting. • Fifth generation: It has been applied to logic and constrain- based language like prolog and ops5. Classification of imperative and declarative
• Imperative language:in which program specifies how a computation
is to be done • Language such as C, C++,C#, and java. • Declarative language: in which a program specifies what computation is to be done • Functional language such as ML and Haskell , Constraint logic language such as prolog. Lexical Analysis • As the first phase of a compiler, the main task of the lexical analyzer is to read the input characters of the source program, group them into lexemes. • Produce as output a sequence of tokens for each lexeme in the source program. • The stream of tokens is sent to the parser for syntax analysis. The Role of t he Lexical Analyzer The Role of t he Lexical Analyzer • Stripping out comments and white spaces (blank, newline, tab) • Correlating error messages with source program • Keeps track of Line numbers Lexical Analysis Versus Parsing The separation of lexical and syntactic analysis often allows us to simplify at least one of these tasks. • Simplicity of design is the most important consideration. • Compiler efficiency is improved. - A separate lexical analyzer allows us to apply specialized techniques that serve only the lexical task, not the job of parsing. - In addition, specialized buffering techniques for reading input characters can speed up the compiler significantly. • Compiler portability is enhanced. Tokens, Patterns and Lexemes • A token is a pair consisting of a token name and an optional attribute value. • The token name is an abstract symbol representing a kind of lexical unit. • e.g., a particular keyword, or a sequence of input characters denoting an identifier. Tokens, Patterns and Lexemes • A pattern is a description of the form that the lexemes of a token may take. • In the case of a keyword as a token, the pattern is just the sequence of characters that form the keyword. Tokens, Patterns and Lexemes • A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. Example Token Informal description Sample lexemes if Characters i, f if else Characters e, l, s, e else comparis < or > or <= or >= or <=, on == or != != pi, score, D2 id Letter followed by letter and 3.14159, 0, 6.02e23 number digits Any numeric constant “core dumped” literal Anything but “ sorrounded by “
printf(“total = %d\n”, score);
Input Buffering- increases the speed of reading the source pgm • Two-buffer scheme that handles large lookaheads safely called Buffer Pairs. • Specialized buffering techniques have been developed to reduce the amount of overhead required to process a single input character. Buffer Pairs -Each buffer is of the same size N, ie size of disk block- 4096 bytes -using one system read command-can read N characters into budffer Two pointers to the input are maintained: 1. Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we are attempting to determine. 2. Pointer forward scans ahead until a pattern match is found. Sentinels • Thus, for each character read, we make two tests: one for the end of the buffer and one to determine what character is read. • Each buffer can hold a sentinel character at the end. • The sentinel is a special character that cannot be part of the source program and a natural choice is the character eof. Sentinels at the end of each buffer Lookahead code with sentinels switch ( *forward++ ) { case eof: if (forward is at end of first buffer ) { reload second buffer; forward = beginning of second buffer; } else if (forward is at end of second buffer ) { reload first buffer; forward = beginning of first buffer; } else /* eof within a buffer marks the end of input * / terminate lexical analysis; break;