0% found this document useful (0 votes)

37 views6 pages

Object-Oriented Compiler Construction: Extended Abstract

The document discusses how object-oriented programming techniques can benefit compiler construction. It describes how OOP can be applied to symbol tables, parse trees, execution, and semantic analysis. The key benefits of OOP for compilers include code reuse through inheritance and encapsulation, and automated debugging through message passing. Overall, OOP allows compiler components like syntax analysis and code generation to be implemented modularly through object-oriented abstractions.

Uploaded by

Mairy Pap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views6 pages

Object-Oriented Compiler Construction: Extended Abstract

Uploaded by

Mairy Pap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1

Object-oriented Compiler Construction

Axel-Tobias Schreiner, Bernd Kühl
University of Osnabrück, Germany
{axel,bekuehl}@uos.de, http://www.inf.uos.de/talks/hc2

Extended Abstract
A compiler takes a program in a source language, creates some internal representation while
checking the syntax of the program, performs semantic checks, and finally generates some-
thing that can be executed to produce the intended effect of the program.
The obvious candidate for object technology in a compiler is the symbol table: a mapping
from user-defined names to their properties as expressed in the program. It turns out how-
ever, that compiler implementation can benefit from object technology in many more areas.
If the internal representation is a tree of objects, semantic checking and generation can be
accomplished by sending a message to these objects or by visiting each object. If the result of
generation is a set of persistent objects, program execution can consist of sending a message
to a distinguished object in this set.
Compilers are usually made with tools such as parser and lexical analysis generators. A
parser-generator takes a grammar, specified in a language such as BNF or EBNF, checks it,
and constructs a representation (the parser) which will execute semantic actions as phrases
over the grammar are recognized. If the parser consists of a set of persistent objects, check-
ing the grammar is accomplished by sending a message to the objects. Similarly, recognizing
a program amounts to a message to the start symbol of the grammar. Goal objects are then
created to represent the phrases to be recognized and the user actions are defined as meth-
ods for the goals which are called from the parsing objects.
The presentation will discuss existing Java implementations of these ideas: oops, a parser-
generator based on EBNF and objects; the compiler kit, a class library for implementing parse
trees and interpreters for typical programming language constructs; and jag, an object-based
tool for tree traversal, helpful in debugging and classical code generation.

Introduction
This paper summarizes our experiences how compiler construction benefits from object-ori-
ented programming techniques. We have gained them in several projects and used them to
great advantage in two courses on compiler construction with Objective C and Java [1].
It turns out that OOP can be applied productively in every phase of a compiler implementation
and it delivers the expected benefits: objects enforce information hiding and state encapsula-
tion, methods help to develop by divide and conquer, all work is carried out by messages
which can be debugged by instrumenting their methods. Most importantly, classes encourage
code reuse between projects and inheritance allows code reuse within a project and modifica-
tions from one project to another. As an added benefit, modern class libraries contain many
pre-fabricated algorithms and data structures.
2

source lex symbols syn tree gen image

symtab table sem run

The diagram shows the major components of a typical compiler: lex collates source charac-
ters into words. symtab manages a table mapping these words to symbol descriptions which
are passed around during the rest of the compilation process. syn checks the symbol
sequence against a grammar and usually constructs a tree representing the source. sem
checks the tree for semantic correctness and might modify it to account for implicit operations.
Finally, gen produces an image of the source that can be executed in some run environment.
The rest of this paper tours these components and discusses how object-orientation can be
applied in their implementation and what benefits we found.

Symbol Table
symtab manages a container for descriptions of mostly user-defined symbols: lex assembles
a word form the source and hands it as a key to symtab to locate a description or create a new
one. The lookup happens once per word of the source: from this point on the description is
passed along by reference. Other parts of the compiler query the description or contribute
more information.
Symbol table and descriptions are obvious candidates for OOP within a compiler: the table is
a container object where each description object is held. The descriptions share at least the
ability to be located by key. If a base class is used to hold the key, inheritance helps to encap-
sulate and share the lookup mechanism while keeping it separate from the information consti-
tuting the actual description. Depending on the implementation language, it is highly likely
that an existing class library provides a suitable container, an efficient lookup mechanism, and
perhaps even value representations for some of the description information.

Parse Tree
Given an excerpt from a typical grammar:
sum: product | sum ’+’ product
product: term | product ’*’ term
term: identifier | literal

Then x + 10 is a sentence for which the following trees can be built:

sum product term id: x id: x

sum + +
product term int: 10 int: 10
abstract syntax tree parse tree

The leaves of both trees are input symbols, other nodes represent grammar phrases. An
abstract syntax tree contains a node for each recognized nonterminal symbol and the children
correspond to the symbols in a phrase for the nonterminal symbol; parser generator tools
such as JavaCC [2] often produce an abstract syntax tree automatically. A parse tree is a
3

pruned version of the abstract syntax tree; while it must be built more or less by hand during
syntax analysis, it can be designed to be much more suitable for the tree traversals constitut-
ing the remaining phases of the compilation process.
Once the tree nodes are objects of classes specific to the grammar phrases, the rest of the
compilation is carried out by methods in these classes, i.e., there is an automatic divide and
conquer imposed for semantic analysis, etc. Inheritance further simplifies the implementation,
e.g., there is little difference in checking all six relational operators or in generating code for
some commutative operators and different value types.
For our courses we have implemented an extensible compiler kit (class library) as a reusable
universal back end. Syntax analyzers for different languages build parse trees using these
classes and the compiler kit does everything else: it performs semantic analysis and gener-
ates a persistent, executable tree as an image. The Java version of the kit was carefully
designed so that semantic analysis can be modified by inheriting from and overwriting classes
in the kit. Data type operations can be restricted or extended, new data types can be added,
and type mixing rules can be adapted. The Objective C version of the kit included code gen-
eration into Holub’s C-code [3], an assembler-like coding style for C emulating a fictitious
machine architecture.
OOP opens the possibility to implement the expression part of a language by selecting from
the compiler kit’s data types (or adding one’s own) and building a parse tree using these
classes. The drawback is that there are many classes and many, mostly small methods, but
the advantage is a very significant gain in reusability. The following sections discuss some
details.

Execution
Using OOP a compiler can transform a program source into a tree of persistent objects as an
image. Execution is accomplished by sending a message to the root node of that tree which
results in partial traversal as the message is passed along the tree.
Specifically in Java it is very simple to make objects persistent. This results in a cheap, plat-
form-independent image format and images can be executed wherever a Java machine is
available. Moreover, the execution message can be reused in the compiler, e.g., when con-
stant expressions need to be evaluated or expressions should be partially folded. This is a
significant advantage as it ensures that identical mechanisms are used for evaluation during
compilation and execution.

Semantic Analysis
Semantic analysis decides if a syntactically acceptable program is meaningful. For a large
part it is concerned with the interaction of various data types in expressions: during a post-
order traversal of the parse tree, result types are computed for each part of an expression and
stored in the nodes for the benefit of code generation.
Parse tree nodes are objects, their classes must implement a method to perform semantic
checking of the node. If necessary, the method can augment the parse tree with conversion
nodes. Types are modeled as unique objects. They have methods informing the semantic
analysis about available operations for the type and about permissible interaction with other
types. Other type methods generate a simplified, persistent runtime tree which can be used
as an interpreter or traversed for code generation.
Implementing semantic analysis as a method for the node classes automates a divide and
conquer approach which immediately carries over to new classes. A lot of effort in a compiler
4

implementation is spent on code to check expressions. OOP and a careful design of type
modeling permits reusing this code in new projects.

Types
It is very important to design the type mechanism so that it can be modified, extended, or
restricted in different projects. The key is to let operator nodes ask type objects during seman-
tic analysis whether the intended operations are in fact available:
sem:
for children: child.sem() // sets children’s types
for children:
if child.type.supports(’+’, otherType):
type = child.type.result(’+’, otherType)
break
if type not set: error // impossible operation

This basic approach is extended to let each type decide if it is willing to convert from the other;
if so, the type is asked to add a conversion node to the tree.
Types are represented by type objects, i.e., a type class from the compiler kit can be extended
or restricted by subclassing and using a subclass object to represent a modified type. Thusly,
the compiler kit can be reused to implement a language with a different set of types and mix-
ing rules.

Syntax Analysis
A compiler converts a sentence written in some language, i.e., a program, into an executable
image. A compiler-compiler converts a sentence written in a language like EBNF, i.e., a gram-
mar, into an image which is itself a compiler.

sentence sentence grammar grammar

l l -> m m EBNF EBNF -> m m
o o

As discussed above, an image is a tree of objects which understand a message for semantic
analysis and another message for execution. Semantic analysis for a grammar means to
check if the grammar is suitable for parsing, e.g., because it fulfills a condition such as LL(1).
Using the image resulting from a grammar means at least to perform recognition, i.e., the exe-
cution message must implement at least a parsing algorithm.
When the parser recognizes a phrase, it must either build an abstract syntax tree or it should
execute some user action. A popular generator, yacc, augments phrases with C statements.
The phrases are specified in BNF, i.e., without an iteration syntax, to simplify how the C state-
ments get access to the symbols accepted by the phrase.
We have employed OOP to implement a parser generator oops [4] which accepts a grammar
written in EBNF, checks that it is LL(1), and creates a recursive descent parser for it. oops
compiles itself; it was bootstrapped with jay, a version of yacc which we retargeted to Java [5].
Both versions of oops share the class library for the execution trees.
Unlike yacc, oops can deal with EBNF and has no syntax for user actions. When the gener-
ated parser starts on a phrase it creates a goal object from a phrase-specific class. The goal
object receives shift and reduce messages as the phrase is recognized and completed. User
actions can be implemented in the required methods for the goal classes. We provide trivial
goal classes for checking a grammar and tracing recognition.
5

Objects play a major role in oops: parser objects encapsulate lookahead and follow sets,
which are used for LL(1) checking and for steering the recursive descent during execution,
goal objects encapsulate the state of phrase recognition. We have implemented an automatic
error recovery based on the lookahead and follow sets, but the goal objects could be allowed
to participate as well.

Divide and Conquer

The OOP approach taken in oops automates a design by divide and conquer for LL(1) check-
ing and parsing. oops is simple enough for use in a compiler construction class to lead to dis-
covery of the algorithm based on syntax graph building blocks like the following:

Alt Seq

These blocks can be represented by Alt and Seq nodes which accomplish recognition by
deferring to their subtrees to process alternatives or a sequence. Specifically, Alt needs to
know which subtree to invoke. This can be decided by considering lookahead sets:

Alt requires the lookahead sets of the subtrees to be distinct, and to be distinct from the fol-
low set if there is an empty alternative. The lookahead sets can be computed by inspection;
Alt adds the alternatives, Seq adds terms as long as there is a possibly empty term. The fol-
low sets result from back propagation of lookahead sets; this has to be iterated as long as a
nonterminal acquires a bigger follow set within a graph.
The point is that all these considerations are local within each building block for a syntax
graph and lead directly to methods for the class representing the block. They are simple
enough to be developed in class and they yield a working LL(1) parser generator.

Tree Traversal Techniques

Many algorithms in a compiler employ tree traversal. For dealing with object trees, OOP has
several techniques to offer:

interface Visitor {
tree accept: visitor visit (SomeClass node);
// for each kind of node class
visit:this }

In the visitor design pattern, a Visitor object is sent to the root of the tree. A node always
calls the visitor back sending itself as an argument. The class of the argument, i.e., the class
of the visited node, is used to divide node processing among different methods in the visitor.
In Java, overloading can be employed: Visitor objects must implement a visit method for
each possible node class and the accept method is implemented in each node class so that
overloading selects the appropriate method at the visitor. To allow for tree traversal, the node
class gives a visitor access to a node’s children.
6

Visitor support can be generated by a tree-building parser such as JavaCC but is fairly diffi-
cult to extend or inherit later, i.e., reuse of visitors is hard once the node class library is modi-
fied. The visitor pattern certainly encourages design by divide and conquer but it is awkward
to share the same action for different node classes.
In particular for semantic analysis we found it more convenient to implement a tree traversal
by requiring each node class to implement a suitable method directly. While this precludes dif-
ferent implementations for the same traversal job, e.g., transparently selecting code genera-
tion for different architectures, it simplifies inheritance and code reuse significantly over the
visitor pattern implemented by JavaCC. In Objective C, categories can be added to existing
classes to add new methods — this is a very useful mechanism to add new traversals to an
entire class hierarchy.
A third, more powerful technique is method selection based on a pattern of node and children
classes:
node-class child-class... { Java statements with access to node and children }
...
We implemented a simple tool, jag, to convert these pattern/action statements into Java meth-
ods which are conceptually attached to the node classes and inherited by subclasses, much
like the effect of Objective C categories. Inheritance combined with overloading permits
refinement of initially very coarse traversal rules and significant reuse between projects.

Conclusion
The popularity of Java has made it the language of choice for many university courses if not
industrial projects. While in our opinion Java is not yet quite robust, efficient, and above all
portable enough for mission-critical applications, it is likely to get there soon and it does make
sense to investigate old programming techniques using new paradigms.
Java supports and (much more than C++) encourages OOP which makes significant promises
to improve critical aspects of the software development process. We tried to show in this
paper that these promises do apply to compiler construction: for projects, OOP in compilers
permits significant code reuse, for instruction, OOP in compilers simplifies the design effort
and makes some important algorithms accessible and transparent.

References
[1] http://www.vorlesungen.uos.de/informatik/compilerbau98
[2] http://www.metamata.com
[3] A. Holub, Compiler Design in C, ISBN 0-130-255-252-5.
[4] http://www.inf.uos.de/oops
[5] http://www.inf.uos.de/jay

Object-Oriented Compiler Design
No ratings yet
Object-Oriented Compiler Design
10 pages
UNIT 1.pptx - 20250902 - 234240 - 0000
No ratings yet
UNIT 1.pptx - 20250902 - 234240 - 0000
29 pages
Kampala University
No ratings yet
Kampala University
19 pages
CD Unit IV
No ratings yet
CD Unit IV
10 pages
CC Overview1-1
No ratings yet
CC Overview1-1
54 pages
Type Inference
No ratings yet
Type Inference
34 pages
Compiler Construction Final
No ratings yet
Compiler Construction Final
6 pages
CD Unit 3
No ratings yet
CD Unit 3
10 pages
Syntax and Parsing in Compilers
No ratings yet
Syntax and Parsing in Compilers
92 pages
Material 1
No ratings yet
Material 1
164 pages
Slides 01 - Compiler Construction - UET CS - Introduction
No ratings yet
Slides 01 - Compiler Construction - UET CS - Introduction
37 pages
Compiler Phases Explained
No ratings yet
Compiler Phases Explained
9 pages
1-Phases of Compiler
No ratings yet
1-Phases of Compiler
68 pages
(Brace, Hiroyuki) Isl. Melco, Coo JP
No ratings yet
(Brace, Hiroyuki) Isl. Melco, Coo JP
10 pages
Understanding Compiler Basics and Phases
No ratings yet
Understanding Compiler Basics and Phases
13 pages
Software Construction and Development: Lecture-02
No ratings yet
Software Construction and Development: Lecture-02
33 pages
4 Semantic Analysis
No ratings yet
4 Semantic Analysis
20 pages
Compiler Design Note1
No ratings yet
Compiler Design Note1
111 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
Introduction of Compiler Design
No ratings yet
Introduction of Compiler Design
118 pages
Compiler Assisgnment
No ratings yet
Compiler Assisgnment
15 pages
Semantic Analysis in Compiler Design
No ratings yet
Semantic Analysis in Compiler Design
35 pages
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
No ratings yet
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
16 pages
Phases of A Compiler
No ratings yet
Phases of A Compiler
4 pages
Compiler Design for CSE Students
No ratings yet
Compiler Design for CSE Students
17 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
52 pages
2015
No ratings yet
2015
32 pages
Boolean Variable Analysis in Compilers
No ratings yet
Boolean Variable Analysis in Compilers
69 pages
Comp. Constr2005
No ratings yet
Comp. Constr2005
19 pages
Lecture 6 - Semantic Analysis
No ratings yet
Lecture 6 - Semantic Analysis
26 pages
Additional Note CSC 409
No ratings yet
Additional Note CSC 409
11 pages
Compiler Design
No ratings yet
Compiler Design
56 pages
Compiler Semantic Analysis Guide
No ratings yet
Compiler Semantic Analysis Guide
16 pages
Compiler Construction CSEC325 Token
No ratings yet
Compiler Construction CSEC325 Token
2 pages
Compiler Semantic Analysis
No ratings yet
Compiler Semantic Analysis
108 pages
Semantic Analysis
No ratings yet
Semantic Analysis
19 pages
Semantic Analysis
No ratings yet
Semantic Analysis
108 pages
CD Paper Solution 2022-23
No ratings yet
CD Paper Solution 2022-23
49 pages
Unit I
No ratings yet
Unit I
35 pages
Poc Syntax Directed
No ratings yet
Poc Syntax Directed
26 pages
Compiler Syntax Translation
No ratings yet
Compiler Syntax Translation
19 pages
1-Phases of Compiler
No ratings yet
1-Phases of Compiler
66 pages
Compiler Design and Implementation Guide
No ratings yet
Compiler Design and Implementation Guide
31 pages
Chapter 1-2 Compiler Design
No ratings yet
Chapter 1-2 Compiler Design
60 pages
Lec - 1. INTRODUCTION
No ratings yet
Lec - 1. INTRODUCTION
39 pages
Chapter 1 - Intro
No ratings yet
Chapter 1 - Intro
20 pages
Compiler
No ratings yet
Compiler
15 pages
Language Processors
No ratings yet
Language Processors
38 pages
Compiler Notes - Ullman
No ratings yet
Compiler Notes - Ullman
182 pages
Unit 1
No ratings yet
Unit 1
37 pages
Cs133 Group A: Compiler Construction
No ratings yet
Cs133 Group A: Compiler Construction
24 pages
Lecture 08 Language Translation PDF
No ratings yet
Lecture 08 Language Translation PDF
11 pages
Compiler Design
No ratings yet
Compiler Design
16 pages
CD Lec1
No ratings yet
CD Lec1
42 pages
Cse Module 8
No ratings yet
Cse Module 8
32 pages
TLE7-CSS Mod6 Testing-Electronic-Components V3
No ratings yet
TLE7-CSS Mod6 Testing-Electronic-Components V3
39 pages
International Case Study - Ebay
No ratings yet
International Case Study - Ebay
39 pages
XMR-X5 Body Camera Specs
No ratings yet
XMR-X5 Body Camera Specs
2 pages
Swissbit WORM SD Card
No ratings yet
Swissbit WORM SD Card
48 pages
Important Instructions For Filling Ach Mandate Form: Hdfcbankltd
No ratings yet
Important Instructions For Filling Ach Mandate Form: Hdfcbankltd
1 page
Keyboard Functions and Usage Guide
No ratings yet
Keyboard Functions and Usage Guide
7 pages
b0700sw F PDF
No ratings yet
b0700sw F PDF
610 pages
BDA Viva
No ratings yet
BDA Viva
26 pages
AI Cybersecurity Proposal With Flowcharts
No ratings yet
AI Cybersecurity Proposal With Flowcharts
12 pages
Class 9 Digital Presenation
No ratings yet
Class 9 Digital Presenation
3 pages
Enrollment System
No ratings yet
Enrollment System
29 pages
Cisco Switch Layer 2 VLAN Setup Guide
No ratings yet
Cisco Switch Layer 2 VLAN Setup Guide
6 pages
Giza Governorate 1st Prep Exam 2023-2024
No ratings yet
Giza Governorate 1st Prep Exam 2023-2024
3 pages
SQL Server Cluster Setup Guide
No ratings yet
SQL Server Cluster Setup Guide
73 pages
Office Workflow Optimization Tools
No ratings yet
Office Workflow Optimization Tools
4 pages
X-BIM CAD Browser Guide for Engineers
No ratings yet
X-BIM CAD Browser Guide for Engineers
51 pages
Ritik Mahapatro: Professional Summary
No ratings yet
Ritik Mahapatro: Professional Summary
1 page
WhatsApp Media Access Denied Error
No ratings yet
WhatsApp Media Access Denied Error
1 page
Position Paper: SWIFT On Distributed Ledger Technologies
No ratings yet
Position Paper: SWIFT On Distributed Ledger Technologies
20 pages
Scientech-2261A
100% (1)
Scientech-2261A
2 pages
Gradient Descent Algorithm and Back-Propagation Derivation
No ratings yet
Gradient Descent Algorithm and Back-Propagation Derivation
4 pages
MediCard GO User Manual and FAQs - Sept 2023
No ratings yet
MediCard GO User Manual and FAQs - Sept 2023
33 pages
P4 English Holiday Work-Hillside Primary School Naalya Set 1 2020
100% (1)
P4 English Holiday Work-Hillside Primary School Naalya Set 1 2020
10 pages
Ancestor and Predecessor in Trees
No ratings yet
Ancestor and Predecessor in Trees
8 pages
Ubuntu Automation & Security Labs
No ratings yet
Ubuntu Automation & Security Labs
6 pages
DWBI Unit-1
No ratings yet
DWBI Unit-1
19 pages
MIL 11 12 Q3 0701 The Ethical Use of Media Information
100% (1)
MIL 11 12 Q3 0701 The Ethical Use of Media Information
33 pages
Submitted On
No ratings yet
Submitted On
15 pages
Sign Language Recognition Project
No ratings yet
Sign Language Recognition Project
24 pages
Andina Project Hoist Data Sheet
No ratings yet
Andina Project Hoist Data Sheet
3 pages

Object-Oriented Compiler Construction: Extended Abstract

Uploaded by

Object-Oriented Compiler Construction: Extended Abstract

Uploaded by

1

Object-oriented Compiler Construction

source lex symbols syn tree gen image

symtab table sem run

Then x + 10 is a sentence for which the following trees can be built:

sum product term id: x id: x

sentence sentence grammar grammar

Divide and Conquer

Tree Traversal Techniques

You might also like