Bottom-Up Parsing
(finish)
CS2210
Lecture 7
CS2210 Compiler Design 2004/5
LR(1) Items
■ New item form: [A -> α. β, a] a
terminal (including $)
■ a represents (1 token) lookahead
■ Used only if β=ε, i.e., we use lookahead to
decide whether to reduce (or shift/error)
■ a ∈ Follow(A)
CS2210 Compiler Design 2004/5
LR(1) Items Construction
function closure(I); function goto(I,X);
begin begin
repeat let J be the set
for each item [A->α.Bβ, a] in I, of items
each production B->γ in G’, and [A->a.Xb,a] such
each terminal b in First(βa) that [A->a.Xb,a]
add [B->.γ, b] to I (if not present)
is in I;
until no more items can be added to I
end;
end;
procedure items(G’);
begin
C := {closure({S’->.S,$]})};
repeat
for each set of items I in C and
each grammar symbol X such that
goto(I,X) is not empty do
add goto(I,X) to C (if not there)
until no more items can be added to C
end
CS2210 Compiler Design 2004/5
1
Example
■ Consider same grammar again:
S -> L = R
S -> R
L -> *R
L -> id
R -> L
CS2210 Compiler Design 2004/5
LALR Parsing
■ Advantages
■ Smaller parsing tables than canonical LR
■ 100s versus 1,000s for typical language
■ Most PL constructs can be parsed with LALR
■ Algorithm of choice in practice
■ Construction idea
■ Merge different items with same core, e.g.
L->id., $ and L->id.,+)
■ May reduce where an error should be reported
■ But reduction will lead to a parse error later
■ Merging does not produce new shift-reduce conflicts
■ Possible reduce-reduce conflicts; merge only if no conflicts
CS2210 Compiler Design 2004/5
LALR Table Construction
■ Simple but space / time consuming
■ Build LR tables
■ Merge item sets with identical cores
■ If conflict results: grammar not LALR
CS2210 Compiler Design 2004/5
2
Efficient LALR Construction
■ Ideas:
■ Represent only kernels for items set I, i.e., either
S’->.S or item with “.” not leftmost
■ Can generate all items in I from kernel
■ Can generate parsing actions from kernel
■ Details in the book, in case you are interested
■ won’t be on exam / prelim
CS2210 Compiler Design 2004/5
Parsing with more Lookahead
■ LR(k)
■ Closure: A->α.Bβ, a predict B->.γ, y where
y∈Firstk(βx)
■ Reduce: A-> α.,x reduce on lookahead x
■ Shift on lookahead x for A-> α.aβ, y, where a is a
terminal and x ∈ Firstk(aβy)
■ SLR(k)
■ Reduce: A-> α. if lookahead x ∈ Followk(A)
■ Shift on lookahead x for A-> α.aβ, a terminal and
x ∈ Firstk(aβFollow k(A))
■ LALR(k): merge LR(k) machine states with
same core CS2210 Compiler Design 2004/5
LR(k) in Practice
■ LR(k), SLR(k) are not used in practice
for k>1
■ Tables too large
■ Not necessary in practice since most
grammars can be made LR(1) and even
LALR(1)
■ Some parser generators for LALR(2)
■ Useful if too lazy to rewrite the grammar
CS2210 Compiler Design 2004/5
3
Language Class Hierarchy
CS2210 Compiler Design 2004/5
Parsing with Ambiguous
Grammars
■ Use non-grammatical means to resolve
conflict
■ Want to have just ONE parse tree!
■ Useful because grammar can be
■ “more natural”
■ Smaller
■ Conflict resolution:
■ Precedence
■ Associativity
■ Shift versus reduce
CS2210 Compiler Design 2004/5
Precedence Example
E->E+E | E*E | (E) | I0={E’->.E, E->.E+E, E->.E*E, E->.(E),
E->.id}
id I1={E’->E., E->E.+E, E->E.*E}
I2={E->(.E), E->.E+E, E->.E*E, E->.(E),
versus E->.id}
I3={E->id.}
E-> E+T | T I4={E->E+.E, E->.E+E, E->.E*E, E->.(E),
E->id}
T -> T*F | F I5={E->E*.E, E->.E+E, E->.E*E, E->.(E),
E->.id}
F -> (E) | id I6={E->(E.), E->E.+E, E->E.*E}
I7={E->E+E., E->E.+E, E->E.*E}
Parse for id+id*id I8={E->E*E., E->E.+E, E->E.*E}
I9={E->(E).}
CS2210 Compiler Design 2004/5
4
Parser Generators
■ Yacc, bison, LLGen, LRGen etc
■ Specify grammar
■ Produce a table-drive parser
■ Typically can use precedence & associativity rules
and allow shift-reduce conflicts
■ Project uses Cup a parser generator for Java
CS2210 Compiler Design 2004/5
Syntax-directed Translation &
Type checking
CS2210 Compiler Design 2004/5
Background Reading
■ Chapter 5: read 5.1 - 5.4
■ Rest of chapter is optional -- won’t be
tested in exam
■ Chapter 6: complete
CS2210 Compiler Design 2004/5
5
Syntax-directed Definition
■ Add attribute(s) to grammar symbols
■ E.g., type, value, memory location
■ Semantic rules
■ Compute attributes from rhs of production (synthesized
attribute)
■ Pass attribute from lhs to rhs (inherited attribute)
■ Intrinsic attributes (assigned to leaves)
■ Sometimes also called synthesized (but from “external info”)
■ Decorating/annotating the parse tree = computing
attributes for all parse tree nodes
CS2210 Compiler Design 2004/5
Syntax-directed definitions
and Grammars
■ Attribute Grammar = syntax-directed
definition w/o side-effects
■ S-attributed definition
■ A syntax-directed definition where all
attributes are synthesized
CS2210 Compiler Design 2004/5
Example
Production Semantic Rules
L -> E n print(E.val)
L->E1 + T E.val = E1.val+T.val
E->T E.val = T.val
T->T1*F T.val = T1.val * F.val
T->F T.val = F.val
F->(E) F.val = E.val
F->digit F.val = digit.lexval
CS2210 Compiler Design 2004/5
6
Inherited Attributes
■ Value from a parent or sibling in parse
tree
■ Used to express dependences of PL
constructs
■ E.g. types, lvalue vs. rvalue
CS2210 Compiler Design 2004/5
Example: Inherited Attributes
Production Semantic Rules
D -> T L L.in := T.type
T -> int T.type := integer
T-> real T.type := real
L -> L1, id L1.in := L.in
addtype(id.entry, L.in)
L-> id addtype(id.entry, L.in)
CS2210 Compiler Design 2004/5
Dependency Graphs
■ Semantic rules produce dependencies, e. g., b :=f(c 1,
…, c n) can compute b only after computing all c i
■ Graphical representation
■ Directed graph with an edge from c i to b
■ Evaluation order
■ Topological sort
■ Rule-based (fixed at compile time)
■ Oblivious (does not work for all syntax-directed definitions)
CS2210 Compiler Design 2004/5
7
L-attributed Definitions
■ A -> X1 X2 … Xn
■ Attribute of Xj depends only on inherited
attributes of A and on X1… Xj-1
■ Can be evaluated by a left-to-right
(recursive) traversal
CS2210 Compiler Design 2004/5
Semantic Analysis
■ Type checks
■ Statically or dynamically
■ Control-flow checks
■ E.g., no goto into the middle of a (different
procedure)
■ Uniqueness checks
■ E.g., redefinition of variables
■ Name (matching) checks
■ Procedure name (e.g., Modula 2)
CS2210 Compiler Design 2004/5
Type checking
■ Check
■ That operands have expected types
■ Expressions are well-typed
■ What is a type?
■ “A Collection of computational entities that share
some common property” (Mitchell)
■ Determines the range of values and set of
operations that are defined for values of the type
CS2210 Compiler Design 2004/5
8
Type Systems
■ Rules for assigning types to
programming language constructs
■ “A type system is a syntactic method for
enforcing levels of abstraction in
programs.” (Benjamin C. Pierce)
CS2210 Compiler Design 2004/5
Why Types?
■ Program organization and documentation
■ Consistent interpretation of bits in memory (avoid
unintended semantics)
■ Optimization
■ Compiler can exploit type information for better performance,
example:
■ type r = record name: array[1..256] of char;
year: integer;
end;
■ Can generate efficient code to access record fields:
address(s.year) = base_address(s) + 256 * sizeof(char)
CS2210 Compiler Design 2004/5
Basic & Composite Types
■ Basic types ■ Record types (named
■ Depend on PL, usually tuples)
integers, reals, characters, ■ E.g.
booleans etc. type row = record
■ Type constructors address: integer:
■ Used to build new types lexeme: array[1..15]
from existing ones of char
■ Common constructors: end;
■ Array types record((address X integer)
■ Product types (unnamed X (lexeme X array(1..15,
tuples) char)))
■ Pointers
CS2210 Compiler Design 2004/5
9
Type Expression Definition
■ A type expression is one of
■ Basic type
■ Type name
■ Type constructor
■ Type variable
CS2210 Compiler Design 2004/5
Type System revisited
■ Set of rules to assign types
■ Implemented by a type checker
■ Type checking can be done
■ Statically (= at compile time)
■ Modula 2, C, Java -> statically typed language
■ In practice have to do some checks at run-time as well,
e.g. array bounds checking a[i] is i within bounds?
■ Dynamically (when the program is running)
■ Lisp -> dynamically typed language
CS2210 Compiler Design 2004/5
Strong vs. Weak
■ Strongly typed language
■ A program that passes the type checker
will not have a run-time (type) error
■ Example: Java
■ Weakly typed language
■ Can have type errors
■ C, C++
CS2210 Compiler Design 2004/5
10
Declared vs. Inferred Types
■ Most programming languages require
the programmer to declare types
■ E.g., int x,y,z; int foo(int x)…
■ Some language infer all types based on
inference rules
■ Eg. ML (statically typed but don’t have to
declare types)
CS2210 Compiler Design 2004/5
Example: TC Expressions
E -> literal {E.type := char}
E -> num {E.type := integer}
E -> id {E.type := lookup(id.entry)}
E -> E1 [E2] {E.type := if E2.type =
integer and E1.type = array(s,t) then t
else type_error}
E -> E1^ {if E1.type = pointer(t) then t
else type_error}
CS2210 Compiler Design 2004/5
Type Compatibility
■ What are the types that can legally be used in
a particular context?
■ E.g. 2.5 + x, what type should x be?
■ Type conversion (“cast”)
■ 2.5 + ((float) x)
■ Widening (int to float) versus narrowing (float to
int)
■ Type coercion
■ Implicit conversion dictated by language rules
CS2210 Compiler Design 2004/5
11
Structural Equivalence
■ Two types are compatible if either of
the same basic type or built by same
constructor applied to structurally
equivalent types
■ array[1..10] of integer and array[2..11] of
integer equivalent but not
array[1..10] of integer and array [1..10] of
char
CS2210 Compiler Design 2004/5
Name Equivalence
■ Named types
■ Types equivalent only when they have the same name
■ Unnamed types get fresh names associated with their
declaration
type A = array [1..10] of integer;
type B = array [1..10] of integer;
var x,y: A;
var u,v: B;
var w: array of [1..10] of integer;
Only x ≡ y and u ≡ v
CS2210 Compiler Design 2004/5
Comparison
■ Structural equivalence
■ More flexible
■ More expensive to check
■ Name equivalence
■ Efficiently checkable
■ More restrictive
CS2210 Compiler Design 2004/5
12
Polymorphism
■ Parametric polymorphism
■ Function can be applied to any arguments whose type
match a type expression with type variables
■ Ad hoc polymorphism
■ (aka overloading) two or more implementations with
different types referred to by same name (e.g., + can be
integer and floating point addition)
■ Subtype polymorphism
■ Subtype relation between types allows an expression to
have many possible types
■ Common in oo-languages, e.g. COOL
CS2210 Compiler Design 2004/5
Parametric Polymorphism
■ Characteristic: type variables
■ Often found in functional languages, e.g. ML:
sort : (‘a * ‘a -> bool) * ‘a list -> ‘a list
■ Function may have infinitely many types
■ Implicit or explicit
■ ML implicit: programmer does not have to declare
types
■ Type inference computes necessary types
■ Explicit: program text contains type variables, e.g.,
C++ templates
CS2210 Compiler Design 2004/5
C++ Templates
Template <T>
void swap (T& x, T& y) {
T tmp = x; x = y; y = tmp;
}
CS2210 Compiler Design 2004/5
13
Implementing Parametric
Polymorphism
■ C++ (explicit)
■ Templates are instantiated at link time (why?)
■ Different copies of polymorphic function (e.g.,
swap) are created
■ Why?
■ One of the major reasons for the complexity
of C++ linkers
CS2210 Compiler Design 2004/5
Implementing Parametric
Polymorphism (2)
■ ML (implicit)
■ No need for multiple function copies
■ Uniform data representation for all types: “boxed”
■ Advantage:
■ No code duplication
■ Disadvantage
■ Performance degradation - can be mitigated by “unboxing
optimization”
CS2210 Compiler Design 2004/5
14