Complier Design Unit 1
Complier Design Unit 1
The development of compilers is closely tied to the evolution of programming languages and
computer science itself. Here's an overview of how compilers came into existence:
History of Compilers
In the 1950s, Grace Hopper developed the first compiler, leading to languages like FORTRAN
(1957), LISP (1958), and COBOL (1959). The 1960s saw innovations like ALGOL, and the
1970s introduced C and Pascal. Modern compilers focus on optimization, supporting object-
oriented features and Just-in-Time compilation. Compilers have revolutionized programming,
enabling complex systems and improving software efficiency.
Pre-Processor: The pre-processor removes all the #include directives by including the files
called file inclusion and all the #define directives using macro expansion. It performs file
inclusion, augmentation, macro-processing, etc. For example: Let in the source program, it is
written #include "Stdio. h". Pre-Processor replaces this file with its contents in the produced
output.
Assembly Language: It's neither in binary form nor high level. It is an intermediate state that
is a combination of machine instructions and some other useful data needed for execution.
Assembler: For every platform (Hardware + OS) we will have an assembler. They are not
universal since for each platform we have one. The output of the assembler is called an object
file. Its translates assembly language to machine code.
Interpreter: An interpreter converts high-level language into low-level machine language, just
like a compiler. But they are different in the way they read the input. The Compiler in one go
reads the inputs, does the processing, and executes the source code whereas the interpreter does
the same line by line. A compiler scans the entire program and translates it as a whole into
machine code whereas an interpreter translates the program one statement at a time. Interpreted
programs are usually slower concerning compiled ones.
Relocatable Machine Code: It can be loaded at any point and can be run. The address within
the program will be in such a way that it will cooperate with the program movement.
Loader/Linker: Loader/Linker converts the relocatable code into absolute code and tries to
run the program resulting in a running program or an error message (or sometimes both can
happen). Linker loads a variety of object files into a single file to make it executable. Then
loader loads it in memory and executes it.
Linker: The basic work of a linker is to merge object codes (that have not even been
connected), produced by the compiler, assembler, standard library function, and
operating system resources.
Loader: The codes generated by the compiler, assembler, and linker are generally re-
located by their nature, which means to say, the starting location of these codes is not
determined, which means they can be anywhere in the computer memory. Thus the
basic task of loaders to find/calculate the exact address of these memory locations.
Overall, compiler design is a complex process that involves multiple stages and requires a deep
understanding of both the programming language and the target platform. A well-designed
compiler can greatly improve the efficiency and performance of software programs, making
them more useful and valuable for users.
Phases of a Compiler
There are two major phases of compilation, which in turn have many parts. Each of them takes
input from the output of the previous level and works in a coordinated way.
Analysis Phase
An intermediate representation is created from the given source code :
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Intermediate Code Generator
Synthesis Phase
An equivalent target program is created from the intermediate representation. It has two parts :
Code Optimizer
Code Generator.
Types of Compiler
Self Compiler: When the compiler runs on the same machine and produces machine
code for the same machine on which it is running then it is called as self compiler or
resident compiler.
Cross Compiler: The compiler may run on one machine and produce the machine
codes for other computers then in that case it is called a cross-compiler. It is capable of
creating code for a platform other than the one on which the compiler is running.
Source-to-Source Compiler: A Source-to-Source Compiler or transcompiler or
transpiler is a compiler that translates source code written in one programming
language into the source code of another programming language.
Single Pass Compiler: When all the phases of the compiler are present inside a single
module, it is simply called a single-pass compiler. It performs the work of converting
source code to machine code.
Multi-Pass Compiler: When several intermediate codes are created in a program and a
syntax tree is processed many times, it is called Multi-Pass Compiler. It breaks codes
into smaller programs.
Just-in-Time (JIT) Compiler: It is a type of compiler that converts code into machine
language during program execution, rather than before it runs. It combines the benefits
of interpretation (real-time execution) and traditional compilation (faster execution).
Ahead-of-Time (AOT) Compiler: It converts the entire source code into machine code
before the program runs. This means the code is fully compiled during development,
resulting in faster startup times and better performance at runtime.
Incremental Compiler: It compiles only the parts of the code that have changed, rather
than recompiling the entire program. This makes the compilation process faster and
more efficient, especially during development.
Operations of Compiler
These are some operations that are done by the compiler:
It breaks source programs into smaller parts.
It enables the creation of symbol tables and intermediate representations.
It helps in code compilation and error detection.
It saves all codes and variables.
It analyses the full program and translates it.
Convert source code to machine code.
Efficiency: Compiled programs are generally more efficient than interpreted programs
because the machine code produced by the compiler is optimized for the specific
hardware platform on which it will run.
Portability: Once a program is compiled, the resulting machine code can be run on any
computer or device that has the appropriate hardware and operating system, making it
highly portable.
Error Checking: Compilers perform comprehensive error checking during the
compilation process, which can help catch syntax, semantic, and logical errors in the
code before it is run.
1. Parser Generator - It produces syntax analyzers (parsers) from the input that is based on a
grammatical description of programming language or on a context-free grammar. It is
useful as the syntax analysis phase is highly complex and consumes more manual and
compilation time. Example: PIC, EQM
2 Scanner Generator - It generates lexical analyzers from the input that consists of regular
expression description based on tokens of a language. It generates a finite automation to
recognize the regular expression. Example: Lex
3 Syntax directed translation engines - It generates intermediate code with three address
format from the input that consists of a parse tree. These engines have routines to traverse
the parse tree and then produces the intermediate code. In this, each node of the parse tree
is associated with one or more translations.
4 Automatic code generators - It generates the machine language for a target machine. Each
operation of the intermediate language is translated using a collection of rules and then is
taken as an input by the code generator. A template matching process is used. An
intermediate language statement is replaced by its equivalent machine language statement
using templates.
5 Data-flow analysis engines - It is used in code optimization. Data flow analysis is a key part
of the code optimization that gathers the information, that is the values that flow from one
part of a program to another.
Lexical Analyzer Generator: This tool helps in generating the lexical analyzer or
scanner of the compiler. It takes as input a set of regular expressions that define the
syntax of the language being compiled and produces a program that reads the input
source code and tokenizes it based on these regular expressions.
Parser Generator: This tool helps in generating the parser of the compiler. It takes as
input a context-free grammar that defines the syntax of the language being compiled
and produces a program that parses the input tokens and builds an abstract syntax tree.
Code Generation Tools: These tools help in generating the target code for the
compiler. They take as input the abstract syntax tree produced by the parser and
produce code that can be executed on the target machine.
Optimization Tools: These tools help in optimizing the generated code for efficiency
and performance. They can perform various optimizations such as dead code
elimination, loop optimization, and register allocation.
Debugging Tools: These tools help in debugging the compiler itself or the programs
that are being compiled. They can provide debugging information such as symbol
tables, call stacks, and runtime errors.
Profiling Tools: These tools help in profiling the compiler or the compiled code to
identify performance bottlenecks and optimize the code accordingly.
Documentation Tools: These tools help in generating documentation for the compiler
and the programming language being compiled. They can generate documentation for
the syntax, semantics, and usage of the language.
Language Support: Compiler construction tools are designed to support a wide range
of programming languages, including high-level languages such as C++, Java, and
Python, as well as low-level languages such as assembly language.
Cross-Platform Support: Compiler construction tools may be designed to work on
multiple platforms, such as Windows, Mac, and Linux.
User Interface: Some compiler construction tools come with a user interface that
makes it easier for developers to work with the compiler and its associated tools.
Text Editor
A text editor plays a fundamental role in compiler design as the primary interface for users to
create and modify source code, which is then processed by a compiler. While not directly part
of the compilation process itself, the text editor is the initial stage in the software development
lifecycle where the program's logic is expressed in a human-readable format.
Text editors are essential tools for writing source code in various programming
languages (e.g., C, Java), which the compiler will process.
Allows for quick writing and editing of source files without the overhead of full IDEs,
especially useful in educational and low-resource environments.
3. Syntax Formatting
Many modern text editors highlight syntax errors and support language-specific
formatting, which helps catch basic mistakes before compilation.
Text editors can integrate with Git, useful when tracking changes in a compiler’s source
code or testing changes in the language being compiled.
5. Multi-Language Support
Useful when a compiler project is written in multiple languages (e.g., C++ for the
compiler core, Python for tooling).
Unlike IDEs, basic text editors don’t integrate tightly with compilers, so the
compilation and debugging process must be done manually.
Unlike IDEs that show compiler errors in real-time, text editors often require you to
compile externally to see errors.
Text editors usually lack built-in debuggers, profilers, or memory inspection tools
needed for compiler development.
Navigating large codebases (like compilers) is easier in IDEs with features like code
navigation, class hierarchies, and refactoring tools.
INTERPRETER
All high-level languages need to be converted to machine code so that the computer can
understand the program after taking the required inputs. The software by which the conversion
of the high-level instructions is performed line-by-line to machine-level language, other than
compiler and assembler, is known as interpreter.
The interpreter in the compiler checks the source code line-by-line and if an error is found on
any line, it stops the execution until the error is resolved. Error correction is quite easy for the
interpreter as the interpreter provides a line-by-line error. But the program takes more time to
complete the execution successfully. Interpreters were first used in 1952 to ease programming
within the limitations of computers at the time. It translates source code into some efficient
intermediate representation and executes them immediately.
Source programs are compiled before time and stored as machine-independent code, which is
then linked at run-time and executed by an interpreter. An Interpreter is generally used in
micro-computers. It helps the programmer to find out the errors and to correct them before the
control moves to the next statement. The interpreter system performs the actions described by
the high-level program. For interpreted programs, the source code is needed to run the program
every time. Interpreted programs run slower than the compiled programs.
Self-Interpreter is a programming language interpreter which is written in a language that can
interpret itself.
For Example: BASIC interpreter written in BASIC(Beginner's All-purpose Symbolic
Instruction Code). They are related to self-hosting compilers. Some languages have elegant
self-interpreters such as Lisp and Prolog.
The advantage of the interpreter is that it is executed line by line which helps users to find
errors easily.
The disadvantage of the interpreter is that it takes more time to execute successfully than
the compiler.
Applications of Interpreters
Debug monitor
A debug monitor is a tool or software component used in software development and debugging
processes. Its primary function is to monitor the execution of a program, allowing developers to
track its behavior, identify issues, and analyze performance. Here are some key aspects of debug
monitors:
3. Logging: Debug monitors often include logging capabilities to record events, errors,
and warnings during execution, which can be reviewed later.
4. Performance Analysis: Some debug monitors also offer profiling features to help
identify performance bottlenecks (बाधाओं) or resource usage issues.
5. Integration: They can be integrated into development environments (IDEs) or used as
standalone tools, depending on the programming language and platform.
A Debug Monitor is a software or hardware tool used to observe, control, and debug the
execution of a program — especially at the low level (machine or assembly language). It helps
developers track down bugs, inspect memory, and monitor CPU registers or I/O operations
during program execution.
Embedded systems
Low-level programming
Operating system development
Compiler and interpreter debugging
Advantages:
❌ Disadvantages:
Architecture of Loader
Features of Loaders
Relocation: Loaders can relocate the program to different memory locations to avoid
memory conflicts with other programs.
Linking: Loaders can link different parts of the program to resolve external references and
create a single executable program.
Error Detection: Loaders can detect and report errors that occur during the loading process.
Memory Allocation: Loaders can allocate memory space to the program and its data,
ensuring that the program has enough memory to execute efficiently.
Execution Preparation: Loaders can prepare the program for execution by setting up the
initial values of the program counter and stack pointer.
Dynamic Loading: Loaders can load program segments dynamically, allowing the program
to only load the necessary segments into memory as they are needed.
Advantages of Loader
Applications of Loader
1. Memory Management: The loader is responsible for allocating memory for the program's
instructions and data. It ensures that the program has enough memory to run and that the
memory is used efficiently.
2. Symbol Resolution: The loader resolves any external references to other programs or
libraries that the program may use. It finds and loads the required libraries and links them to
the program, so the program can access them during execution.
3. Error Handling: The loader checks the program for compatibility with the system and
handles any errors that may occur during the loading process. For example, if the program
requires a library that is not present on the system, the loader will display an error message.
4. Relocation: The loader relocates the program if necessary, adjusting all memory references
in the program to reflect the new location. This allows the same program to be loaded into
different memory locations without having to modify the program's object code.
5. Protection: The loader provides protection mechanisms to prevent unauthorized access to
the program. It prevents the program from overwriting or corrupting other programs or
system memory.
6. Dynamic Linking: The loader can dynamically link libraries and other shared object codes
to the program while it is being loaded, eliminating the need for them to be included in the
program's object code.
7. Overlays: The loader can load programs in chunks called overlays, which allows programs
to run in a smaller memory space by only loading the needed parts of the program at a time.
8. Microcode loading: Some processors use microcode as a low-level intermediate
representation of the instruction set, which is loaded by the loader into the microcode
memory before execution.
Loader Schemes in Compiler Design
A Loader is a system program that takes the object code generated by the compiler/assembler
and places it into main memory so it can be executed. It also performs tasks like address
relocation, linking, and program loading.
Compile-and-Go Loader
• The simplest loader scheme.
• Here, the assembler/ compiler directly places the translated program into memory and starts
execution.
• Disadvantages: Recompilation needed for every run, memory cannot be reused efficiently.
Absolute Loader
• Loader just reads the program into memory and starts execution.
Subroutine Linkage
Relocating Loader
• Can modify object code so it can be loaded into any memory location.
• Uses relocation bits to identify address-dependent instructions.
• Reads multiple object modules, resolves external references, and links them together.
• Binder: Combines different object modules and libraries into one unit before loading.
• Linking Loader: Performs both loading and linking at the same time, resolves symbolic
references, allocates memory, produces executable program.
Overlays
Summary Table