Compiler Design Assignment – Model Answer
Course: Computer Science
Topic: Compiler Design – Basics to Implementation
Total Marks: 20
Due Date: [Insert Date]
Objective
To understand the basic phases of a compiler, analyze its role in programming, and implement a
simple lexical analysis task.
Part A – Theory (10 Marks)
1. Definitions (5 Marks)
• Compiler – A program that translates high-level source code into machine code or intermediate
code. • Interpreter – Executes source code line-by-line without producing a separate executable. •
Token – The smallest meaningful unit in source code, e.g., keywords, identifiers, literals. • Lexeme
– Actual sequence of characters from the source code that matches a token. • Syntax – Rules
defining the correct structure of statements in a programming language.
2. Short Answer Questions (5 Marks)
a) Difference between Lexical Analysis and Syntax Analysis:
- Lexical Analysis: Converts source code into tokens.
- Syntax Analysis: Checks if tokens follow the language grammar (parsing).
b) Phases of a Compiler:
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation
7. Error Handling
c) Importance of Error Handling:
- Identifies location and cause of errors.
- Provides meaningful messages for corrections.
- Allows compilation to continue when possible.
Part B – Practical (10 Marks)
Write a program in C / Python / Java that reads a source code file, identifies keywords, identifiers,
and numbers, counts them, and displays results.
Python Implementation:
keywords = {"if", "while", "for", "else", "return", "int", "float"}
identifiers = {}
numbers = {}
with open("source_code.txt", "r") as file:
for line in file:
tokens = line.strip().split()
for token in tokens:
if token in keywords:
identifiers[token] = identifiers.get(token, 0) + 1
elif token.isidentifier():
identifiers[token] = identifiers.get(token, 0) + 1
elif token.isdigit():
numbers[token] = numbers.get(token, 0) + 1
print("Keyword\tCount")
for k in keywords:
if k in identifiers:
print(f"{k}\t{identifiers[k]}")
print("\nIdentifier\tCount")
for iden, count in identifiers.items():
if iden not in keywords:
print(f"{iden}\t{count}")
print("\nNumber\tCount")
for num, count in numbers.items():
print(f"{num}\t{count}")
C++ Implementation:
#include <iostream>
#include <fstream>
#include <sstream>
#include <unordered_map>
#include <set>
#include <cctype>
using namespace std;
bool isNumber(const string &s) {
for (char c : s) if (!isdigit(c)) return false;
return !s.empty();
}
bool isIdentifier(const string &s) {
if (s.empty() || isdigit(s[0])) return false;
for (char c : s) if (!isalnum(c) && c != '_') return false;
return true;
}
int main() {
set<string> keywords = {"if", "while", "for", "else", "return", "int", "float"};
unordered_map<string, int> keywordCount, identifierCount, numberCount;
ifstream file("source_code.txt");
string word;
while (file >> word) {
if (keywords.find(word) != keywords.end()) keywordCount[word]++;
else if (isNumber(word)) numberCount[word]++;
else if (isIdentifier(word)) identifierCount[word]++;
}
cout << "Keyword\tCount\n";
for (auto &k : keywordCount) cout << k.first << "\t" << k.second << "\n";
cout << "\nIdentifier\tCount\n";
for (auto &id : identifierCount) cout << id.first << "\t" << id.second << "\n";
cout << "\nNumber\tCount\n";
for (auto &num : numberCount) cout << num.first << "\t" << num.second << "\n";
}
Expected Output (for sample source_code.txt)
Category Token Count
Keyword int 2
Keyword if 1
Keyword while 1
Identifier x 2
Identifier total 4
Number 10 2
Number 0 1
Number 5 1
Number 50 1