0% found this document useful (0 votes)
70 views14 pages

Compiler Lecture6

The document discusses bottom-up parsing techniques, specifically LR(1) and LALR parsing, highlighting their item construction and advantages. It also covers semantic analysis, type checking, and the differences between strong and weak typing, as well as polymorphism in programming languages. Additionally, it introduces concepts of type compatibility, structural and name equivalence, and provides examples of syntax-directed definitions and semantic rules.

Uploaded by

ajayharwani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views14 pages

Compiler Lecture6

The document discusses bottom-up parsing techniques, specifically LR(1) and LALR parsing, highlighting their item construction and advantages. It also covers semantic analysis, type checking, and the differences between strong and weak typing, as well as polymorphism in programming languages. Additionally, it introduces concepts of type compatibility, structural and name equivalence, and provides examples of syntax-directed definitions and semantic rules.

Uploaded by

ajayharwani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Bottom-Up Parsing

(finish)

CS2210
Lecture 7

CS2210 Compiler Design 2004/5

LR(1) Items
■ New item form: [A -> α. β, a] a
terminal (including $)
■ a represents (1 token) lookahead
■ Used only if β=ε, i.e., we use lookahead to
decide whether to reduce (or shift/error)
■ a ∈ Follow(A)

CS2210 Compiler Design 2004/5

LR(1) Items Construction


function closure(I); function goto(I,X);
begin begin
repeat let J be the set
for each item [A->α.Bβ, a] in I, of items
each production B->γ in G’, and [A->a.Xb,a] such
each terminal b in First(βa) that [A->a.Xb,a]
add [B->.γ, b] to I (if not present)
is in I;
until no more items can be added to I
end;
end;
procedure items(G’);
begin
C := {closure({S’->.S,$]})};
repeat
for each set of items I in C and
each grammar symbol X such that
goto(I,X) is not empty do
add goto(I,X) to C (if not there)
until no more items can be added to C
end

CS2210 Compiler Design 2004/5

1
Example
■ Consider same grammar again:
S -> L = R
S -> R
L -> *R
L -> id
R -> L

CS2210 Compiler Design 2004/5

LALR Parsing
■ Advantages
■ Smaller parsing tables than canonical LR
■ 100s versus 1,000s for typical language
■ Most PL constructs can be parsed with LALR
■ Algorithm of choice in practice
■ Construction idea
■ Merge different items with same core, e.g.
L->id., $ and L->id.,+)
■ May reduce where an error should be reported
■ But reduction will lead to a parse error later
■ Merging does not produce new shift-reduce conflicts
■ Possible reduce-reduce conflicts; merge only if no conflicts
CS2210 Compiler Design 2004/5

LALR Table Construction


■ Simple but space / time consuming
■ Build LR tables
■ Merge item sets with identical cores
■ If conflict results: grammar not LALR

CS2210 Compiler Design 2004/5

2
Efficient LALR Construction
■ Ideas:
■ Represent only kernels for items set I, i.e., either
S’->.S or item with “.” not leftmost
■ Can generate all items in I from kernel
■ Can generate parsing actions from kernel
■ Details in the book, in case you are interested
■ won’t be on exam / prelim

CS2210 Compiler Design 2004/5

Parsing with more Lookahead


■ LR(k)
■ Closure: A->α.Bβ, a predict B->.γ, y where
y∈Firstk(βx)
■ Reduce: A-> α.,x reduce on lookahead x
■ Shift on lookahead x for A-> α.aβ, y, where a is a
terminal and x ∈ Firstk(aβy)
■ SLR(k)
■ Reduce: A-> α. if lookahead x ∈ Followk(A)
■ Shift on lookahead x for A-> α.aβ, a terminal and
x ∈ Firstk(aβFollow k(A))
■ LALR(k): merge LR(k) machine states with
same core CS2210 Compiler Design 2004/5

LR(k) in Practice
■ LR(k), SLR(k) are not used in practice
for k>1
■ Tables too large
■ Not necessary in practice since most
grammars can be made LR(1) and even
LALR(1)
■ Some parser generators for LALR(2)
■ Useful if too lazy to rewrite the grammar

CS2210 Compiler Design 2004/5

3
Language Class Hierarchy

CS2210 Compiler Design 2004/5

Parsing with Ambiguous


Grammars
■ Use non-grammatical means to resolve
conflict
■ Want to have just ONE parse tree!
■ Useful because grammar can be
■ “more natural”
■ Smaller
■ Conflict resolution:
■ Precedence
■ Associativity
■ Shift versus reduce
CS2210 Compiler Design 2004/5

Precedence Example
E->E+E | E*E | (E) | I0={E’->.E, E->.E+E, E->.E*E, E->.(E),
E->.id}
id I1={E’->E., E->E.+E, E->E.*E}
I2={E->(.E), E->.E+E, E->.E*E, E->.(E),
versus E->.id}
I3={E->id.}
E-> E+T | T I4={E->E+.E, E->.E+E, E->.E*E, E->.(E),
E->id}
T -> T*F | F I5={E->E*.E, E->.E+E, E->.E*E, E->.(E),
E->.id}

F -> (E) | id I6={E->(E.), E->E.+E, E->E.*E}


I7={E->E+E., E->E.+E, E->E.*E}

Parse for id+id*id I8={E->E*E., E->E.+E, E->E.*E}


I9={E->(E).}

CS2210 Compiler Design 2004/5

4
Parser Generators
■ Yacc, bison, LLGen, LRGen etc
■ Specify grammar
■ Produce a table-drive parser
■ Typically can use precedence & associativity rules
and allow shift-reduce conflicts
■ Project uses Cup a parser generator for Java

CS2210 Compiler Design 2004/5

Syntax-directed Translation &


Type checking

CS2210 Compiler Design 2004/5

Background Reading
■ Chapter 5: read 5.1 - 5.4
■ Rest of chapter is optional -- won’t be
tested in exam
■ Chapter 6: complete

CS2210 Compiler Design 2004/5

5
Syntax-directed Definition
■ Add attribute(s) to grammar symbols
■ E.g., type, value, memory location
■ Semantic rules
■ Compute attributes from rhs of production (synthesized
attribute)
■ Pass attribute from lhs to rhs (inherited attribute)
■ Intrinsic attributes (assigned to leaves)
■ Sometimes also called synthesized (but from “external info”)
■ Decorating/annotating the parse tree = computing
attributes for all parse tree nodes

CS2210 Compiler Design 2004/5

Syntax-directed definitions
and Grammars
■ Attribute Grammar = syntax-directed
definition w/o side-effects
■ S-attributed definition
■ A syntax-directed definition where all
attributes are synthesized

CS2210 Compiler Design 2004/5

Example
Production Semantic Rules
L -> E n print(E.val)
L->E1 + T E.val = E1.val+T.val
E->T E.val = T.val
T->T1*F T.val = T1.val * F.val
T->F T.val = F.val
F->(E) F.val = E.val
F->digit F.val = digit.lexval

CS2210 Compiler Design 2004/5

6
Inherited Attributes
■ Value from a parent or sibling in parse
tree
■ Used to express dependences of PL
constructs
■ E.g. types, lvalue vs. rvalue

CS2210 Compiler Design 2004/5

Example: Inherited Attributes


Production Semantic Rules
D -> T L L.in := T.type
T -> int T.type := integer
T-> real T.type := real
L -> L1, id L1.in := L.in
addtype(id.entry, L.in)
L-> id addtype(id.entry, L.in)

CS2210 Compiler Design 2004/5

Dependency Graphs
■ Semantic rules produce dependencies, e. g., b :=f(c 1,
…, c n) can compute b only after computing all c i
■ Graphical representation
■ Directed graph with an edge from c i to b
■ Evaluation order
■ Topological sort
■ Rule-based (fixed at compile time)
■ Oblivious (does not work for all syntax-directed definitions)

CS2210 Compiler Design 2004/5

7
L-attributed Definitions
■ A -> X1 X2 … Xn
■ Attribute of Xj depends only on inherited
attributes of A and on X1… Xj-1
■ Can be evaluated by a left-to-right
(recursive) traversal

CS2210 Compiler Design 2004/5

Semantic Analysis
■ Type checks
■ Statically or dynamically
■ Control-flow checks
■ E.g., no goto into the middle of a (different
procedure)
■ Uniqueness checks
■ E.g., redefinition of variables
■ Name (matching) checks
■ Procedure name (e.g., Modula 2)

CS2210 Compiler Design 2004/5

Type checking
■ Check
■ That operands have expected types
■ Expressions are well-typed
■ What is a type?
■ “A Collection of computational entities that share
some common property” (Mitchell)
■ Determines the range of values and set of
operations that are defined for values of the type

CS2210 Compiler Design 2004/5

8
Type Systems
■ Rules for assigning types to
programming language constructs
■ “A type system is a syntactic method for
enforcing levels of abstraction in
programs.” (Benjamin C. Pierce)

CS2210 Compiler Design 2004/5

Why Types?
■ Program organization and documentation
■ Consistent interpretation of bits in memory (avoid
unintended semantics)
■ Optimization
■ Compiler can exploit type information for better performance,
example:
■ type r = record name: array[1..256] of char;
year: integer;
end;
■ Can generate efficient code to access record fields:
address(s.year) = base_address(s) + 256 * sizeof(char)

CS2210 Compiler Design 2004/5

Basic & Composite Types


■ Basic types ■ Record types (named
■ Depend on PL, usually tuples)
integers, reals, characters, ■ E.g.
booleans etc. type row = record
■ Type constructors address: integer:
■ Used to build new types lexeme: array[1..15]
from existing ones of char
■ Common constructors: end;
■ Array types record((address X integer)
■ Product types (unnamed X (lexeme X array(1..15,
tuples) char)))
■ Pointers

CS2210 Compiler Design 2004/5

9
Type Expression Definition
■ A type expression is one of
■ Basic type
■ Type name
■ Type constructor
■ Type variable

CS2210 Compiler Design 2004/5

Type System revisited


■ Set of rules to assign types
■ Implemented by a type checker
■ Type checking can be done
■ Statically (= at compile time)
■ Modula 2, C, Java -> statically typed language
■ In practice have to do some checks at run-time as well,
e.g. array bounds checking a[i] is i within bounds?
■ Dynamically (when the program is running)
■ Lisp -> dynamically typed language

CS2210 Compiler Design 2004/5

Strong vs. Weak


■ Strongly typed language
■ A program that passes the type checker
will not have a run-time (type) error
■ Example: Java
■ Weakly typed language
■ Can have type errors
■ C, C++

CS2210 Compiler Design 2004/5

10
Declared vs. Inferred Types
■ Most programming languages require
the programmer to declare types
■ E.g., int x,y,z; int foo(int x)…
■ Some language infer all types based on
inference rules
■ Eg. ML (statically typed but don’t have to
declare types)

CS2210 Compiler Design 2004/5

Example: TC Expressions
E -> literal {E.type := char}
E -> num {E.type := integer}
E -> id {E.type := lookup(id.entry)}
E -> E1 [E2] {E.type := if E2.type =
integer and E1.type = array(s,t) then t
else type_error}
E -> E1^ {if E1.type = pointer(t) then t
else type_error}
CS2210 Compiler Design 2004/5

Type Compatibility
■ What are the types that can legally be used in
a particular context?
■ E.g. 2.5 + x, what type should x be?
■ Type conversion (“cast”)
■ 2.5 + ((float) x)
■ Widening (int to float) versus narrowing (float to
int)
■ Type coercion
■ Implicit conversion dictated by language rules

CS2210 Compiler Design 2004/5

11
Structural Equivalence
■ Two types are compatible if either of
the same basic type or built by same
constructor applied to structurally
equivalent types
■ array[1..10] of integer and array[2..11] of
integer equivalent but not
array[1..10] of integer and array [1..10] of
char

CS2210 Compiler Design 2004/5

Name Equivalence
■ Named types
■ Types equivalent only when they have the same name
■ Unnamed types get fresh names associated with their
declaration
type A = array [1..10] of integer;
type B = array [1..10] of integer;
var x,y: A;
var u,v: B;
var w: array of [1..10] of integer;
Only x ≡ y and u ≡ v

CS2210 Compiler Design 2004/5

Comparison
■ Structural equivalence
■ More flexible
■ More expensive to check
■ Name equivalence
■ Efficiently checkable
■ More restrictive

CS2210 Compiler Design 2004/5

12
Polymorphism
■ Parametric polymorphism
■ Function can be applied to any arguments whose type
match a type expression with type variables
■ Ad hoc polymorphism
■ (aka overloading) two or more implementations with
different types referred to by same name (e.g., + can be
integer and floating point addition)
■ Subtype polymorphism
■ Subtype relation between types allows an expression to
have many possible types
■ Common in oo-languages, e.g. COOL

CS2210 Compiler Design 2004/5

Parametric Polymorphism
■ Characteristic: type variables
■ Often found in functional languages, e.g. ML:
sort : (‘a * ‘a -> bool) * ‘a list -> ‘a list
■ Function may have infinitely many types
■ Implicit or explicit
■ ML implicit: programmer does not have to declare
types
■ Type inference computes necessary types
■ Explicit: program text contains type variables, e.g.,
C++ templates

CS2210 Compiler Design 2004/5

C++ Templates
Template <T>
void swap (T& x, T& y) {
T tmp = x; x = y; y = tmp;
}

CS2210 Compiler Design 2004/5

13
Implementing Parametric
Polymorphism
■ C++ (explicit)
■ Templates are instantiated at link time (why?)
■ Different copies of polymorphic function (e.g.,
swap) are created
■ Why?
■ One of the major reasons for the complexity
of C++ linkers

CS2210 Compiler Design 2004/5

Implementing Parametric
Polymorphism (2)
■ ML (implicit)
■ No need for multiple function copies
■ Uniform data representation for all types: “boxed”
■ Advantage:
■ No code duplication
■ Disadvantage
■ Performance degradation - can be mitigated by “unboxing
optimization”

CS2210 Compiler Design 2004/5

14

You might also like