0% found this document useful (0 votes)

143 views180 pages

PLDITutorial

The document discusses a tutorial on using Datalog and binary decision diagrams (BDDs) for program analysis. The tutorial is structured into four parts that cover essential background on Datalog and BDDs, using relevant tools, developing advanced analyses, and debugging techniques. Program analysis can be specified concisely using Datalog rules that directly correspond to inference rules. Datalog is well-suited for program analysis implementation using BDDs due to the correspondence between Datalog semantics and BDD operations.

Uploaded by

Pritom Rajkhowa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views180 pages

PLDITutorial

Uploaded by

Pritom Rajkhowa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 180

bddbddb:

Using Datalog and BDDs for

Program Analysis

John Whaley
Stanford University and moka5 Inc.

June 11, 2006

Implementing Program
Analysis

vs.

• 2x faster
• Fewer bugs
…56 pages! • Extensible
June 11, Using Datalog and 2
Is it really that easy?
• Requires:
– A different way of thinking
– Knowledge, experience, and intuition
– Perseverance to try different techniques
– A lot of tuning and tweaking
– Luck
• Despite all this, people who use it
swear by it and could “never go back”

June 11, Using Datalog and 3

Tutorial Structure
Part I: Essential Background
– …
Part II: Using the Tools
– …
Part III: Developing Advanced Analyses
– …
Part IV: Profiling, Debugging, Avoiding Gotchas
– …

Short break every 30 minutes

June 11, Using Datalog and 4

Tutorial Structure
Part I: Essential Background
– Datalog for Program Analysis
– Binary Decision Diagrams
Part II: Using the Tools
– …
Part III: Developing Advanced Analyses
– …
Part IV: Profiling, Debugging, Avoiding Gotchas
– …

June 11, Using Datalog and 5

Tutorial Structure
Part I: Essential Background
– …
Part II: Using the Tools
– bddbddb
– Compiler interface (Joeq compiler)
– Datalog editor in Eclipse
– Interactive mode
Part III: Developing Advanced Analyses
– …
Part IV: Profiling, Debugging, Avoiding Gotchas
– …

June 11, Using Datalog and 6

Tutorial Structure
Part I: Essential Background
– …
Part II: Using the Tools
– …
Part III: Developing Advanced Analyses
– Context sensitivity
– Combining multiple analyses
– Race detection examples
– Using advanced bddbddb features
Part IV: Profiling, Debugging, Avoiding Gotchas
– …

June 11, Using Datalog and 7

Tutorial Structure
Part I: Essential Background
– …
Part II: Using the Tools
– …
Part III: Developing Advanced Analyses
– …
Part IV: Profiling, Debugging, Avoiding Gotchas
– Variable ordering
– Iteration order
– Machine learning
– What it’s good for, what it isn’t good for

June 11, Using Datalog and 8

Try it yourself…
• Available as moka5 LivePC
– Non-intrusive installation in a VM
– Automatically kept up to date
– Easy to try, easy to share
– Complete environment on a USB stick

June 11, Using Datalog and 9

Part I: Essential Background
Program Analysis in Datalog

June 11, Using Datalog and 10

Datalog
• Declarative language for deductive
databases [Ullman 1989]
– Like Prolog, but no function symbols,
no predefined evaluation strategy

June 11, Using Datalog and 11

Datalog Basics
Predicate
Atom = Reach(d,x,i) Arguments:
variables or constants

Literal = Atom or NOT Atom

Rule = Atom :- Literal & … & Literal

Make this The body :

atom true For each assignment of values
(the head ). to variables that makes all these
true …

June 11, Using Datalog and 12

Datalog Example
parent(x,y) :- child(y,x).
grandparent(x,z) :- parent(x,y), parent(y,z).

ancestor(x,y) :- parent(x,y).
ancestor(x,z) :- parent(x,y), ancestor(y,z).

June 11, Using Datalog and 13

Datalog
• Intuition: subgoals in the body are
combined by “and” (strictly
speaking: “join”).
• Intuition: Multiple rules for a
predicate (head) are combined by
“or.”

June 11, Using Datalog and 14

Another Datalog Example
hasChild(x) :- child(_,x).
hasNoChild(x) :- !child(_,x).
“!” inverts the relation, not the atom!

hasSibling(x) :- child(x,y), child(z,y), z!=x.

onlyChild(x) :- child(x,_), !hasSibling(x).
_ means “Dont-care” (at least one)
! means “Not”

June 11, Using Datalog and 15

Reaching Defs in Datalog
Reach(d,x,j) :- Reach(d,x,i),
StatementAt(i,s),
!Assign(s,x),
Follows(i,j).
Reach(s,x,j) :- StatementAt(i,s),
Assign(s,x),
Follows(i,j).

June 11, Using Datalog and 16

Definition: EDB Vs. IDB
Predicates
• Some predicates come from the
program, and their tuples are
computed by inspection.
– Called EDB, or extensional database
predicates.
• Others are defined by the rules only.
– Called IDB, or intensional database
predicates.

June 11, Using Datalog and 17

Negation
• Negation makes things tricky.
• Semantics of negation
– No negation allowed [Ullman 1988]
– Stratified Datalog [Chandra 1985]
– Well-founded semantics [Van Gelder
1991]

June 11, Using Datalog and 18

Stratification
• A risk occurs if there are negated literals
involved in a recursive predicate.
– Leads to oscillation in the result.
• Requirement for stratification :
– Must be able to order the IDB predicates so
that if a rule with P in the head has NOT Q in
the body, then Q is either EDB or earlier in
the order than P.

June 11, Using Datalog and 19

Example: Nonstratification
P(x) :- E(x), !P(x).
• If E(1) is true, is P(1) true?
• It is after the first round.
• But not after the second.
• True after the third, not after the
fourth,…

June 11, Using Datalog and 20

Iterative Algorithm for
Datalog
• Start with the EDB predicates =
“whatever the code dictates,” and
with all IDB predicates empty.
• Repeatedly examine the bodies of
the rules, and see what new IDB
facts can be discovered from the EDB
and existing IDB facts.

June 11, Using Datalog and 21

Datalog evaluation strategy
• “Semi-naïve” evaluation
– Remember that a new fact can be
inferred by a rule in a given round only if
it uses in the body some fact discovered
on the previous round.
• Evaluation strategy
– Top-down (goal-directed) [Ullman 1985]
– Bottom-up (infer from base facts)
[Ullman 1989]

June 11, Using Datalog and 22

Our Dialect of Datalog
• Totally-ordered finite domains
– Domains are of a given, finite size
– Makes all Datalog programs “safe”
– Cannot mix variables of different domains
• Constants (named/integers)
• Comparison operators:
= != < <= > >=
• Dont-care: _ Universe: *

June 11, Using Datalog and 23

Why Datalog?
• Developed a tool to translate inference
rules to BDD implementation
• Later, discovered Datalog (Ullman, Reps)
• Semantics of BDDs match Datalog exactly
– Obvious implementation of relations
– Operations occur a set-at-a-time
– Fast set compare, set difference
– Wealth of literature about semantics,
optimization, etc.

June 11, Using Datalog and 24

Inference Rules
Assign(v11,, vv22), vPointsTo(v
Assign(v vPointsTo(v2,2,o).
o)
vPointsTo(v1, o)

• Datalog rules directly correspond to

inference rules.
June 11, Using Datalog and 25
Flow-Insensitive
Pointer Analysis
Input Tuples
o1: p = new Object(); vPointsTo(p, o1)
o2: q = new Object(); vPointsTo(q, o2)
p.f = q; Store(p, f, q)
r = p.f; Load(p, f, r)

p o1 Output Relations
f
hPointsTo(o1, f, o2)
q o2 r
vPointsTo(r, o2)
June 11, Using Datalog and 26
Inference Rule in Datalog
Assignments:
:- Assign(v1, v2),
vPointsTo(v1, o) vPointsTo(v2, o).

v1 = v2;

v2 o

June 11, Using Datalog and 27

Inference Rule in Datalog
Stores:
:- Store(v1, f, v2),
hPointsTo(o1, f, o2) vPointsTo(v1, o1),
vPointsTo(v2, o2).

v1.f = v2;

v1 o1
f
v2 o2

June 11, Using Datalog and 28

Inference Rule in Datalog
Loads:
:- Load(v1, f, v2),
vPointsTo(v2, o2) vPointsTo(v1, o1),
hPointsTo(o1, f, o2).

v2 = v1.f;

v1 o1
f
v2 o2

June 11, Using Datalog and 29

The Whole Algorithm
vPointsTo(v, o) :- vPointsTo0(v, o).
:- Assign(v1, v2),
vPointsTo(v1, o) vPointsTo(v2, o).

:- Store(v1, f, v2),
hPointsTo(o1, f, o2) vPointsTo(v1, o1),
vPointsTo(v2, o2).

:- Load(v1, f, v2),
vPointsTo(v2, o2) vPointsTo(v1, o1),
hPointsTo(o1, f, o2).

June 11, Using Datalog and 30

Format of a Datalog file
• Domains
Name Size ( map file )
V 65536 var.map
H 32768
• Relations
Name ( <attribute list> ) flags
Store (v1 : V, f : F, v2 : V) input
PointsTo (v : V, h : H) input, output
• Rules
Head :- Body .
PointsTo(v1,h) :- Assign(v1,v), PointsTo(v,h).

June 11, Using Datalog and 31

Key Point
• Program information is stored in a
relational database.
– Everything in the program is numbered.
• Write declarative inference rules to
infer new facts about the program.
• Negations OK if they are not in a
recursive cycle.

June 11, Using Datalog and 32

Take a break…
(Next up: Binary Decision Diagrams)

June 11, Using Datalog and 33

Part I: Essential Background
Binary Decision Diagrams

June 11, Using Datalog and 34

Call graph relation
• Call graph expressed as a relation.
– Five edges:
• Calls(A,B)
A
• Calls(A,C)
• Calls(A,D) B C
• Calls(B,D)
• Calls(C,D) D

June 11, Using Datalog and 35

Call graph relation
• Relation expressed as a
binary function.
– A=00, B=01, C=10, D=11

Calls(A,B) → 00 01 A 00
Calls(A,C) → 00 10
01 B C 10
Calls(A,D) → 00 11
Calls(B,D) → 01 11 D 11
Calls(C,D) → 10 11

June 11, Using Datalog and 36

from
Call graph relation
to
x1 x2 x3 x4 f
0 0 0 0 0 • Relation expressed as a
0 0 0 1 1 binary function.
0 0 1 0 1
0 0 1 1 1 – A=00, B=01, C=10, D=11
0 1 0 0 0
0 1 0 1 0 A 00
0 1 1 0 0
0 1 1 1 1
1 0 0 0 0 01 B C 10
1 0 0 1 0
1 0 1 0 0 D 11
1 0 1 1 1
1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 0
June 11, Using Datalog and 37
Binary Decision Diagrams
(Bryant 1986)
• Graphical encoding of a truth table.
x1 0 edge
1 edge
x2 x2

x3 x3 x3 x3

x4 x4 x4 x4 x4 x4 x4 x4

0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0
June 11, Using Datalog and 38
Binary Decision Diagrams
• Collapse redundant nodes.
x1 0 edge
1 edge
x2 x2

x3 x3 x3 x3

x4 x4 x4 x4 x4 x4 x4 x4

0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0
June 11, Using Datalog and 39
Binary Decision Diagrams
• Collapse redundant nodes.
x1 0 edge
1 edge
x2 x2

x3 x3 x3 x3

x4 x4 x4 x4 x4 x4 x4 x4

0 1
June 11, Using Datalog and 40
Binary Decision Diagrams
• Collapse redundant nodes.
x1 0 edge
1 edge
x2 x2

x3 x3 x3 x3

x4 x4 x4

0 1
June 11, Using Datalog and 41
Binary Decision Diagrams
• Collapse redundant nodes.
x1 0 edge
1 edge
x2 x2

x3 x3 x3

x4 x4 x4

0 1
June 11, Using Datalog and 42
Binary Decision Diagrams
• Eliminate unnecessary nodes.
x1 0 edge
1 edge
x2 x2

x3 x3 x3

x4 x4 x4

0 1
June 11, Using Datalog and 43
Binary Decision Diagrams
• Eliminate unnecessary nodes.
x1 0 edge
1 edge
x2 x2

x3 x3

0 1
June 11, Using Datalog and 44
Binary Decision Diagrams
• Size depends on amount of redundancy,
NOT size of relation.
– Identical subtrees share the same
representation.
– As set gets very large, more nodes have
identical zero and one successors, so the size
decreases.

June 11, Using Datalog and 45

BDD Variable Order is
Important!
x1x2 + x3x4
x1
x1

x2 x3 x3

x3 x2 x2

x4
x4

0 1 0 1
x1<x2<x3<x4 x1<x3<x2<x4
June 11, Using Datalog and 46
Variable ordering is NP-hard
• No good general heuristic solutions
• Dynamic reordering heuristics don’t
work well on these problems

• We use:
– Trial and error
– Active learning

June 11, Using Datalog and 47

Apply Operation
• Concept
– Basic technique for building OBDD from Boolean
formula.
A op B A op B
a


a a

b b

c | c c

d d d

0 1 0 1 0 1

Arguments A, B, op Result
 A and B: Boolean Functions  OBDD
 Represented as OBDDs
representing
composite
 op: Boolean Operation (e.g., ^, &, function
|)
June 11, Using Datalog and  A op
48 B
Apply Execution Example
Argument A Argument B Recursive Calls
A1 a a B1 A1,B1

A2 b A2,B2
Operation
c A6 | c B5 A6,B2 A6,B5

A3 d B2 d A3,B2 A5,B2 A3,B4

A4 0 1 A5 B3 0 1 B4 A4,B3 A5,B4

• Optimizations
– Dynamic programming
– Early termination rules

June 11, Using Datalog and 49

Apply Result Generation
Recursive Calls Without Reduction With Reduction
A1,B1 a a

A2,B2 b b

A6,B2 A6,B5 c c c

A3,B2 A5,B2 A3,B4 d 1 1 d

A4,B3 A5,B4 0 1 0 1

– Recursive calling structure implicitly defines unreduced

BDD
– Apply reduction rules bottom-up as return from recursive
calls
June 11, Using Datalog and 50
BDD implementation
• ‘Unique’ table
– Huge hash table
– Each entry: level, left, right, hash, next
• Operation cache
– Memoization cache for operations
• Garbage collection
– Mark and sweep, free list.

June 11, Using Datalog and 51

Code for BDD ‘and’.

Base case:

Memo cache
lookup:

Recursive step:

Memo
June cache
11, insert: Using Datalog and 52
BDD Libraries
• BuDDy
– Simple, fast, memory-friendly
– Identifies BDD by index in unique table
• JavaBDD
– 100% Java, based on BuDDy
– Also native interface to BuDDY, CUDD, CAL, JDD
• CUDD
– Most popular, most feature-complete
– Not as fast as BuDDy
– Other types: ZDD, ADD
• JDD
– 100% Java, fresh implementation
– Still under development

June 11, Using Datalog and 53

Depth-first vs. breadth-first
• BDD algorithms have natural depth-
first recursive formulations.
• Some work on using breadth-first
evaluation for better parallelism and
locality
– CAL: breadth-first BDD package
• General idea: Assume independent,
fixup if not.
• Doesn’t perform well in practice.
June 11, Using Datalog and 54
Take a break…
(Next up: Using the Tools)

June 11, Using Datalog and 55

Tutorial Structure
Part I: Essential Background
– Datalog for Program Analysis
– Binary Decision Diagrams
Part II: Using the Tools
– bddbddb
– Compiler interface (Joeq compiler)
– Datalog editor in Eclipse
– Interactive mode
Part III: Developing Advanced Analyses
– Context sensitivity
– Combining multiple analyses
– Race detection examples
– Using advanced bddbddb features
Part IV: Profiling, Debugging, Avoiding Gotchas
– Variable ordering
– Iteration order
– Machine learning
– What it’s good for, what it isn’t good for

June 11, Using Datalog and 56

Part II: Using the Tools
bddbddb
(BDD-based deductive database)

June 11, Using Datalog and 57

bddbddb System Overview

Java Joeq Input

bytecod frontend relations
e

Datalog Output
program relations

June 11, Using Datalog and 58

Compiler Frontend
• Convert IR into tuples
• Tuples format:
# V0:16 F0:11 V1:16 header line
001
012 one tuple per line

1470 0 1464

June 11, Using Datalog and 59

Compiler Frontend
• Robust frontends:
– Joeq compiler
– Soot compiler
– SUIF compiler (for C code)
• Still experimental:
– Eclipse frontend
– gcc frontend
–…

June 11, Using Datalog and 60

Extracting Relations
• Idea: Iterate thru compiler IR,
numbering and dumping relations of
interest.
– Types
– Methods
– Fields
– Variables
– …

June 11, Using Datalog and 61

joeq.Main.GenRelations
• Generate initial relations for points-to
analysis.
– Does initial pass to discover call graph.
• Options:
-fly : dump on-the-fly call graph info
-cs: dump context-sensitive info
-ssa : dump SSA representation
-partial : no call graph discovery
-Dpa.dumppath= : where to save files
-Dpa.icallgraph= : location of initial call graph
-Dpa.dumpdotgraph : dump call graph in dot

June 11, Using Datalog and 62

Demo of joeq GenRelations

June 11, Using Datalog and 63

Part II: Using the Tools
bddbddb:
From Datalog to BDDs

June 11, Using Datalog and 64

An Adventure in BDDs
• Context-sensitive numbering scheme
– Modify BDD library to add special operations.
– Can’t even analyze small programs. Time: 
• Improved variable ordering
– Group similar BDD variables together.
– Interleave equivalence relations.
– Move common subsets to edges of variable
order. Time: 40h
• Incrementalize outermost loop
– Very tricky, many bugs. Time: 36h
• Factor away control flow, assignments
– Reduces number of variables. Time: 32h

June 11, Using Datalog and 65

An Adventure in BDDs
• Exhaustive search for best BDD order
– Limit search space by not considering intradomain
orderings. Time: 10h
• Eliminate expensive rename operations
– When rename changes relative order, result is not
isomorphic. Time: 7h
• Improved BDD memory layout
– Preallocate to guarantee contiguous. Time: 6h
• BDD operation cache tuning
– Too small: redo work, too big: bad locality
– Parameter sweep to find best values. Time: 2h

June 11, Using Datalog and 66

An Adventure in BDDs
• Simplified treatment of exceptions
– Reduce number of variables, iterations
necessary for convergence. Time: 1h
• Change iteration order
– Required redoing much of the code. Time: 48m
• Eliminate redundant operations
– Introduced subtle bugs. Time: 45m
• Specialized caches for different operations
– Different caches for and, or, etc. Time: 41m

June 11, Using Datalog and 67

An Adventure in BDDs
• Compacted BDD nodes
– 20 bytes  16 bytes Time: 38m
• Improved BDD hashing function
– Simpler hash function. Time: 37m

• Total development time: 1 year

– 1 year per analysis?!?
• Optimizations obscured the algorithm.
• Many bugs discovered, maybe still more.

June 11, Using Datalog and 68

bddbddb:
BDD-Based Deductive
DataBase
• Automatically generate from Datalog
– Optimizations based on my experience
with handcoded version.
– Plus traditional compiler algorithms.

• bddbddb even better than handcoded

– handcoded: 37m bddbddb: 19m

June 11, Using Datalog and 69

Datalog  BDDs
Datalog BDDs
Relations Boolean functions
Relation ops: Boolean function ops:
⋈, ∪, select, project ∧, ∨, −, ∼
Relation at a time Function at a time
Semi-naïve Incrementalization
evaluation
Fixed-point Iterate until stable

June 11, Using Datalog and 70

Compiling Datalog to BDDs
1. Apply Datalog source level transforms.
2. Stratify and determine iteration order.
3. Translate into relational algebra IR.
4. Optimize IR and replace relational
algebra ops with equivalent BDD ops.
5. Assign relation attributes to physical BDD
domains.
6. Perform more optimizations after domain
assignment.
7. Interpret the resulting program.
June 11, Using Datalog and 71
High-Level Transform:
Magic Set Transformation
• Add “magic” predicates to control
generated tuples [Bancilhon 1986,
Beeri 1987]
– Combines ideas from top-down and
bottom-up evaluation
• Doesn’t always help
– Leads to more iterations
– BDDs are good at large operations

June 11, Using Datalog and 72

Predicate Dependency
Graph
vPointsTo0

Load Assign

Store vPointsTo

hPointsTo add edge from RHS

to LHS
:- Store(v1, f, v2),
hPointsTo(o 1, f,
vPointsTo(v o)
2, o22)
:- Load(v1, f, v2),
vPointsTo(v1, o1),
vPointsTo(v1, o1),
:- Assign(v
vPointsTo(v2,1,o2v
hPointsTo(o1, f, o22).),
vPointsTo(v 1, o)
).
vPointsTo(v, o) :- vPointsTo(v
vPointsTo , o). 0(v, o).
2

June 11, Using Datalog and 73

Determining Iteration Order
• Tradeoff between faster convergence
and BDD cache locality
• Static heuristic
– Visit rules in reverse post-order
– Iterate shorter loops before longer loops
• Profile-directed feedback
• User can control iteration order
– pri=# keywords on rules/relations

June 11, Using Datalog and 74

Predicate Dependency
Graph
vPointsTo0

Load Assign

Store vPointsTo

hPointsTo

June 11, Using Datalog and 75

Datalog to Relational
Algebra
:- Assign(v1, v2),
vPointsTo(v1, o) vPointsTo(v2, o).

t1 = ρvariable→source(vPointsTo);
t2 = assign ⋈ t1;
t3 = πsource(t2);
t4 = ρdest→variable(t3);
vPointsTo = vPointsTo ∪ t4;

June 11, Using Datalog and 76

Incrementalization
vP’’ = vP – vP’;
vP’ = vP;
assign’’ = assign – assign’;
assign’ = assign;
t1 = ρvariable→source(vP); t1 = ρvariable→source(vP’’);
t2 = assign ⋈ t1;
t2 = assign ⋈ t1;
t5 = ρvariable→source(vP);
t3 = πsource(t2); t6 = assign’’ ⋈ t5;
t4 = ρdest→variable(t3); t7 = t2 ∪ t6;
t3 = πsource(t7);
vP = vP ∪ t4;
t4 = ρdest→variable(t3);
vP = vP ∪ t4;

June 11, Using Datalog and 77

Optimize into BDD operations
vP’’ = vP – vP’;
vP’ = vP;
assign’’ = assign – assign’;
assign’ = assign; vP’’ = diff(vP, vP’);
t1 = ρvariable→source(vP’’);
t2 = assign ⋈ t1;
vP’ = copy(vP);
t5 = ρvariable→source(vP); t1 = replace(vP’’,variable→source);
t6 = assign’’ ⋈ t5;
t3 = relprod(t1,assign,source);
t7 = t2 ∪ t6;
t3 = πsource(t7); t4 = replace(t3,dest→variable);
t4 = ρdest→variable(t3); vP = or(vP, t4);
vP = vP ∪ t4;

June 11, Using Datalog and 78

Physical domain assignment
vP’’ = diff(vP, vP’);
vP’’ = diff(vP, vP’);
vP’ = copy(vP);
vP’ = copy(vP);
t1 = replace(vP’’,variable→source);
t3 =
t3 = relprod(t1,assign,source); relprod(vP’’,assign,V0);
t4 = replace(t3,dest→variable); t4 = replace(t3, V1→V0);
vP = or(vP, t4); vP = or(vP, t4);

• Minimizing renames is NP-complete

• Renames have vastly different costs
• Priority-based assignment algorithm
June 11, Using Datalog and 79
Other optimizations
• Dead code elimination
• Constant propagation
• Definition-use chaining
• Redundancy elimination
• Global value numbering
• Copy propagation
• Liveness analysis

June 11, Using Datalog and 80

Splitting rules
R(a,e) :- A(a,b), B(b,c), C(c,d), R(d,e).
Can be split into:
T1(a,c) :- A(a,b), B(b,c).
T2(a,d) :- T1(a,c), C(c,d).
R(a,e) :- T2(a,d), R(d,e).

Affects incrementalization, iteration.

Use “split” keyword to auto-split rules.
June 11, Using Datalog and 81
Other Tools
• Banshee (John Kodumal)
– Results are harder to use (not relational)
• Paddle/Jedd (Ondrej Lhotak)
– Imperative style: more expressive
– Not as efficient, doesn’t scale as well

June 11, Using Datalog and 82

Jedd code
• Jedd code is like bddbddb internal IR
before domain assignment:
vP’’ = diff(vP, vP’);
vP’ = copy(vP);
t1 = replace(vP’’,variable→source);
t3 = relprod(t1,assign,source);
t4 = replace(t3,dest→variable);
vP = or(vP, t4);

June 11, Using Datalog and 83

Demo of using bddbddb

June 11, Using Datalog and 84

June 11, Using Datalog and 85

Part III: Developing Advanced Analyses
Context Sensitivity

June 11, Using Datalog and 86

Old Technique:
Summary-Based Analysis
• Idea: Summarize the effect of a
method on its callers.
– Sharir, Pnueli [Muchnick 1981]
– Landi, Ryder [PLDI 1992]
– Wilson, Lam [PLDI 1995]
– Whaley, Rinard [OOPSLA 1999]

June 11, Using Datalog and 87

Old Technique:
Summary-Based Analysis
• Problems:
– Difficult to summarize pointer analysis.
– Composed summaries can get large.
– Recursion is difficult: Must find fixpoint.
– Queries (e.g. which context points to x)
require expanding an exponential
number of contexts.

June 11, Using Datalog and 88

My Technique:
Cloning-Based Analysis
• Simple brute force technique.
– Clone every path through the call graph.
– Run context-insensitive algorithm on
expanded call graph.
• The catch: exponential blowup

June 11, Using Datalog and 89

Cloning is exponential!

June 11, Using Datalog and 90

Recursion
• Actually, cloning is unbounded in the
presence of recursive cycles.
• Technique: We treat all methods
within a strongly-connected
component as a single node.

June 11, Using Datalog and 91

Recursion
A A

B C D B C D

E F E F E F E F

G G G G

June 11, Using Datalog and 92

Top 20 Sourceforge Java Apps

1016

1012

108

104

100

June 11, Using Datalog and 93

Cloning is infeasible (?)
• Typical large program has ~1014 paths.
• If you need 1 byte to represent a clone:
– Would require 256 terabytes of storage
– >12 times size of Library of Congress
– Registered ECC 1GB DIMMs: $41.7 million
• Power: 96.4 kilowatts = Power for 128 homes
– 500 GB hard disks: 564 x $195 = $109,980
• Time to read sequential: 70.8 days

June 11, Using Datalog and 94

Key Insight
• There are many similarities across
contexts.
– Many copies of nearly-identical results.
• BDDs can represent large sets of
redundant data efficiently.
– Need a BDD encoding that exploits the
similarities.

June 11, Using Datalog and 95

Expanded Call Graph
A A0

B C D B0 C0 D0

E E0 E1 E2

F G F0 F1 F2 G0 G1 G2

H H0 H1 H2 H3 H4 H5

June 11, Using Datalog and 96

Numbering Clones
0
A A0
0 0 0
B C D B0 C0 D0
1
0 2
E E0 E1 E2
0-2 0-2
F G F0 F1 F2 G0 G1 G2

0-2 3-5
H H0 H1 H2 H3 H4 H5

June 11, Using Datalog and 97

Context-sensitive Pointer
Analysis Algorithm
1. First, do context-insensitive pointer
analysis to get call graph.
2. Number clones.
3. Do context-insensitive algorithm on
the cloned graph.
• Results explicitly generated for every clone.
• Individual results retrievable with Datalog
query.

June 11, Using Datalog and 98

Counting rule
• IEnum(i,m,vc2,vc1) :- roots(m), mI0(m,i),
IE0(i,m). number

• Special rule to define numbering.

• Head: result of numbering
– First two variables: edge you want to number
– Second two variables: context numbering
• Subgoals: graph edges
– Single variable: roots of graph

June 11, Using Datalog and 99

Demo of context-sensitive

June 11, Using Datalog and 100

Part III: Developing Advanced Analyses
Example: Race Detection

June 11, Using Datalog and 101

Object Sensitivity
• k-object-sensitivity (Milanova, Ryder, Rountev 2003)

• k=3 suffices in our experiments

• CHA/context-insensitive/k-CFA too imprecise

static main() { Contexts of method bar():

h1: C a = new A();
h2: C b = new B(); 1-CFA: { p4 }
p1: foo(a); 2-CFA: { p1:p4, p2:p4, p3:p4 }
p2: foo(b);
p3: foo(a); } 1-objsens: { h1, h2 }
static foo(C c) { 2-objsens: { h1, h2 }
p4: c.bar(); }
June 11, Using Datalog and 102
Open Programs
• Analyzing open programs is important
– Many “programs” are libraries
– Developers need to understand behavior w/o a client

• Standard approach
– Write a “harness” manually
– A client exercising the interface of the open program

• Our approach
– Generate the harness automatically

June 11, Using Datalog and 103

Race Detection
• A multi-threaded program contains a race if:
– Two threads can access a memory
location
– At least one access is a write
– No ordering between the accesses

• As a rule, races are bad

– And common …
– And hard to find …

June 11, Using Datalog and 104

Running Example
public A() { static public void main() {
f = 0; } A a;
public int get() { a = new A();
return rd(); } a.get();
public sync int inc() { a.inc(); }
int t = rd() + (new A()).wr(1);
return wr(t); }

private int rd() { Harness

return f; } (Note: Single-threaded)
private int wr(int x) {
f = x;
return x; }

June 11, Using Datalog and 105

Example: Two Object-Sensitive
Contexts
static public void main() { public A() {
A a; f = 0; }
a = new A(); public int get() {
a.get(); return rd(); }
public sync int inc() {
a.inc(); }
int t = rd() + (new A()).wr(1);
return wr(t); }

private int rd() {

return f; }
private int wr(int x) {
f = x;
return x; }

June 11, Using Datalog and 106

Example: 1st Context
static public void main() { public A() {
A a; f = 0; }
a = new A(); public int get() {
a.get(); return rd(); }
public sync int inc() {
a.inc(); }
int t = rd() + (new A()).wr(1);
return wr(t); }

private int rd() {

return f; }
private int wr(int x) {
f = x;
return x; }

June 11, Using Datalog and 107

Example: 2nd Context
static public void main() { public A() {
A a; f = 0; }
a = new A(); public int get() {
a.get(); return rd(); }
public sync int inc() {
a.inc(); }
int t = rd() + (new A()).wr(1);
return wr(t); }

private int rd() {

return f; }
private int wr(int x) {
f = x;
return x; }

June 11, Using Datalog and 108

Computing Original Pairs
All pairs of accesses such that
– Both references of one of the following forms:
• e1.f and e2.f (the same instance field)
• C.g and C.g (the same static field)
• e1[e3] and e2[e4] (any array elements)
– At least one is a write

June 11, Using Datalog and 109

Example: Original Pairs
public A() { public A() {
f = 0; } f = 0; }
public int get() { public int get() {
return rd(); } return rd(); }
public sync int inc() { public sync int inc() {
int t = rd() + (new A()).wr(1); int t = rd() + (new A()).wr(1);
return wr(t); } return wr(t); }

private int rd() { private int rd() {

return f; } return f; }
private int wr(int x) { private int wr(int x) {
f = x; f = x;
return x; } return x; }

June 11, Using Datalog and 110

Computing Reachable Pairs
• Step 1
– Access pairs with at least one write to same field

• Step 2
– Consider any access pair (e1, e2)
– To be a race e1 must be:
– Reachable from a thread-spawning call site s1
• Without “switching” threads
– Where s1 is reachable from main
– (and similarly for e2)

June 11, Using Datalog and 111

Example: Reachable Pairs
static public void main() { public A() {
A a; f = 0; }
a = new A(); public int get() {
a.get(); return rd(); }
public sync int inc() {
a.inc(); }
int t = rd() + (new A()).wr(1);
return wr(t); }

private int rd() { private int rd() {

return f; } return f; }
private int wr(int x) { private int wr(int x) {
f = x; f = x;
return x; } return x; }

June 11, Using Datalog and 112

Computing Aliasing Pairs
• Steps 1-2
– Access pairs with at least one write to same field
– And both are executed in a thread in some context

• Step 3
– To have a race, both must access the same memory
location
– Use alias analysis

June 11, Using Datalog and 113

Example: Aliasing Pairs
static public void main() { public A() {
A a; f = 0; }
a = new A(); public int get() {
a.get(); return rd(); }
public sync int inc() {
a.inc(); }
int t = rd() + (new A()).wr(1);
return wr(t); }

private int rd() { private int rd() {

return f; } return f; }
private int wr(int x) { private int wr(int x) {
f = x; f = x;
return x; } return x; }

June 11, Using Datalog and 114

Computing Escaping Pairs
• Steps 1-3
– Access pairs with at least one write to same field
– And both are executed in a thread in some context
– And both can access the same memory location

• Step 4
– To have a race, the memory location must also
be thread-shared
– Use thread-escape analysis

June 11, Using Datalog and 115

Example: Escaping Pairs
static public void main() { public A() {
A a; f = 0; }
a = new A(); public int get() {
a.get(); return rd(); }
public sync int inc() {
a.inc(); }
int t = rd() + (new A()).wr(1);
return wr(t); }

private int rd() { private int rd() {

return f; } return f; }
private int wr(int x) { private int wr(int x) {
f = x; f = x;
return x; } return x; }

June 11, Using Datalog and 116

Computing Unlocked Pairs
• Steps 1-4
– Access pairs with at least one write to same field
– And both are executed in a thread in some context
– And both can access the same memory location
– And the memory location is thread-shared

• Step 5
– Discard pairs where the memory location is guarded
by a common lock in both accesses

June 11, Using Datalog and 117

Example: Unlocked Pairs
static public void main() { public A() {
A a; f = 0; }
a = new A(); public int get() {
a.get(); return rd(); }
public sync int inc() {
a.inc(); }
int t = rd() + (new A()).wr(1);
return wr(t); }

private int rd() { private int rd() {

return f; } return f; }
private int wr(int x) { private int wr(int x) {
f = x; f = x;
return x; } return x; }

June 11, Using Datalog and 118

Counterexamples
• Each pair of paths in the context-sensitive call graph
from a pair of roots to a pair of accesses along
which a common lock may not be held

• Different from most other systems

– Pairs of paths (instead of single interleaved path)
– At call-graph level

June 11, Using Datalog and 119

Example: Counterexample
// file Harness.java // file A.java
static public void main() { public A() {
A a; f = 0; }
a = new A(); public int get() {
4: return rd(); }
4: a.get();
public sync int inc() {
5: a.inc(); }
int t= rd() + (new A()).wr(1);
7: return wr(t); }
field reference A.f (A.java:10) [Rd]
A.get(A.java:4)
private int rd() {
Harness.main(Harness.java:4)
10: return f; }
field reference A.f (A.java:12) [Wr] private int wr(int x) {
A.inc(A.java:7) 12: f = x;
Harness.main(Harness.java:5) return x; }
June 11, Using Datalog and 120
Race Checker Datalog

June 11, Using Datalog and 121

Map Sensitivity
...
String username = request.getParameter(“user”)
map.put(“USER_NAME”, username);

... “USER_NAME” ≠ “SEARCH_QUERY”

String query = (String) map.get(“SEARCH_QUERY”);

stmt.executeQuery(query);
...

• Maps with constant string keys are common

• Augment pointer analysis:
– Model Map.put/get operations specially
June 11, Using Datalog and 122
Resolving Reflection
• Reflection is a dynamic language feature
• Used to query object and class information
– static Class Class.forName(String className)
• Obtain a java.lang.Class object
• I.e. Class.forName(“java.lang.String”) gets an
object corresponding to class String
– Object Class.newInstance()
• Object constructor in disguise
• Create a new object of a given class

Class c = Class.forName(“java.lang.String”);
Object o = c.newInstance();

• This makes a new empty string o

June 11, Using Datalog and 123

What to Do About Reflection?

1. String className = ...;

2. Class c = Class.forName(className);
3. Object o = c.newInstance();
4. T t = (T) o;

1. Anything goes 2. Ask the user 3. Subtypes of T 4. Analyze className

+ Obviously + Good results + More + Better still

conservativ - A lot of work precise - Need to
e
for user, - T may have know where
- Call graph difficult to many className
extremely
find subtypes comes from
big and
imprecise answers

June 11, Using Datalog and 124

Analyzing Class Names

• Looking at className seems promising

String stringClass = “java.lang.String”;

foo(stringClass);
...
void foo(String clazz){
bar(clazz);
}
void bar(String className){
Class c = Class.forName(className);
}

June•11,This is interprocedural
Using Datalog and 125
const+copy prop on strings
Pointer Analysis Can Help

Stack variables Heap objects

stringClass

clazz

className
java.lang.String

June 11, Using Datalog and 126

Reflection Resolution Using
Points-to

1. String className = ...;

2. Class c = Class.forName(className);
3. Object o = c.newInstance();
4. T t = (T) o;

• Need to know what className is

– Could be a local string constant like java.lang.String
– But could be a variable passed through many layers of calls
• Points-to analysis says what className refers to
– className --> concrete heap object

June 11, Using Datalog and 127

Reflection Resolution
Constants Specification
points

1. String className = ...;

2. Class c = Class.forName(className); Q: what
3. Object o = c.newInstance(); object does
4. T t = (T) o; this create?

1. String className = ...;

2. Class c = Class.forName(className);
Object o = new T1();
Object o = new T2();
Object o = new T3();
June
4. 11,
T t Using
= (T) o; Datalog and 128
Resolution May Fail!
1. String className = r.readLine();
2. Class c = Class.forName(className);
3. Object o = c.newInstance();
4. T t = (T) o;

• Need help figuring out what className is

• Two options
1. Can ask user for help
• Call to r.readLine on line 1 is a specification point
• User needs to specify what can be read from a file
• Analysis helps the user by listing all specification points
2. Can use cast information
• Constrain possible types instantiated on line 3 to subclasses of T
• Need additional assumptions

June 11, Using Datalog and 129

1. Specification Files
 Format: invocation site => class
loadImpl() @ 43 InetAddress.java:1231 =>
java.net.Inet4AddressImpl

loadImpl() @ 43 InetAddress.java:1231 =>

java.net.Inet6AddressImpl

lookup() @ 86 AbstractCharsetProvider.java:126 =>

sun.nio.cs.ISO_8859_15

lookup() @ 86 AbstractCharsetProvider.java:126 =>

sun.nio.cs.MS1251

tryToLoadClass() @ 29 DataFlavor.java:64 =>

java.io.InputStream

June 11, Using Datalog and 130

2. Using Cast Information
1. String className = ...;
2. Class c = Class.forName(className);
3. Object o = c.newInstance();
4. T t = (T) o;

• Providing specification files is tedious,

time-consuming, error-prone
• Leverage cast data instead
– o instanceof T
– Can constrain type of o if
1. Cast succeeds
2. We know all subclasses of T

June 11, Using Datalog and 131

Analysis Assumptions
1. Assumption: Correct casts.
Type cast operations that always operate
on the result of a call to
Class.newInstance are correct; they will
always succeed without throwing a
ClassCastException.

2. Assumption: Closed world.

We assume that only classes reachable
from the class path at analysis time can
be used by the application at runtime.

June 11, Using Datalog and 132

Casts Aren’t Always Present
• Can’t do anything if no cast post-
dominating a Class.newInstance call
Object factory(String className){
Class c = Class.forName(className);
return c.newInstance();
}
...
SunEncoder t = (SunEncoder)
factory(“sun.io.encoder.” + enc);
SomethingElse e = (SomethingElse)
factory(“SomethingElse“);

June 11, Using Datalog and 133

Call Graph Discovery Process

Program
Program IR
IR Call
Call graph
graph Reflection
Reflection Resolved
Resolved Final
Final call
call
construction
construction resolution
resolution calls
calls graph
graph
using
using
points-to
points-to

User-provided
User-provided Cast-based
Cast-based
spec
spec approximation
approximation

Specification
Specification
June 11, Using points
Datalog and
points 134
Implementation Details
• Call graph construction algorithm in the
presence of reflection is integrated with
pointer analysis
– Pointer analysis already has to deal with virtual calls:
new methods are discovered, points-to relations for
them are created
– Reflection analysis is another level of complexity

• See Datalog specification

June 11, Using Datalog and 135

Reflection Resolution
• Applied to 6 largeResults
Java apps, 190,000 LOC combined
Call graph sizes compared

June 11, Using Datalog and 136

Map relations
• Need to map from values in one
domain to another?
• Use special operator “=>”
• mapAtoB(a,b) :- a => b.
• Elements in A are appended to
domain of B
– A must have map file.
– B must have enough space.

June 11, Using Datalog and 137

Using Code Fragments
• Execute a code fragment before/after every
rule invocation.
A(x,y) :- B(y,a), C(a,z). { code goes here }
• Can access:
– Relations by name.
– Rule information.
– Solver information.
• Can also add code fragment to relations
(triggered on change).
• Special keywords: “modifies”, “pre”, “post”

June 11, Using Datalog and 138

Take a break…
(Next up: Profiling, Debugging)

June 11, Using Datalog and 139

June 11, Using Datalog and 140

Part IV: Profiling, Debugging, Avoiding Gotchas

Variable Ordering

June 11, Using Datalog and 141

TryDomainOrders
• Try all possible domain orders for a given
operation and inputs.
– Bounded: if an order takes longer than current
best, abort it.
• To profile slow-running operations:
Run with -Ddumpslow, -Ddumpcutoff=5000
java net.sf.bddbddb.TryDomainOrders
• If you know ordering constraints, you can add
them to rules/relations
– Constraints automatically propagated to other
rules/relations

June 11, Using Datalog and 142

Variable Numbering:
Active Machine Learning
• Must be determined dynamically
• Limit trials with properties of relations
• Each trial may take a long time
• Active learning:
select trials based on uncertainty
– Can build up trial database to improve
accuracy
• Several hours
• Comparable to exhaustive for small apps

June 11, Using Datalog and 143

Using Machine Learning
• -Dfindbestorder
– Enable machine learning
• -Dfbocutoff=#
– Minimum runtime (in ms) for an
operation to be considered
• -Dfbotrials=#
– Maximum number of trials to run
• -Dtrialfile=
– Filename to load/store trial information.

June 11, Using Datalog and 144

Changing Iteration Order
• bddbddb uses simple iteration order
heuristic – not always optimal
• If a rule is iterating too many times:
– Lower its priority with pri=5
– Increase other rules with pri=-5
– Can also adjust priority of relations
• Solver prints iteration order on startup
• Also try reformulating the problem or
changing input relations

June 11, Using Datalog and 145

Reformulate the Problem
• Change rule form:
A(a,c) :- A(a,b), A(b,c).
vs
A(a,c) :- A(a,b), A(b,c).
• Change input relations
– Short-circuit paths
• Filter relations as you go

June 11, Using Datalog and 146

Debugging
• Debugging can be tricky
– Relations are huge
– Declarative: not so straightforward
• Adding code fragments can help.
• Try it on a small example with full
trace information.
• Best: Interactive solver

June 11, Using Datalog and 147

“Comes from” query
• Special kind of query:
A(3,5) :- ?
“What contributed to (3,5) being added to
A?”

• Add ‘single’ keyword to get only one path.

• Doesn’t solve the negated problem
(missing tuples)

June 11, Using Datalog and 148

Solver options
-Dnoisy -Dbasedir=
-Dtracesolve -Dincludedirs=
-Dfulltracesolve -Ddumprulegraph
-Dbddvarorder= -Duseir
-Dbddnodes= -Dprintir
-Dbddcache= -Dsplit_all_rules
-Dbddminfree= -Dsplit_no_rules
-Dfindbestorder

June 11, Using Datalog and 149

Datalog directives
• .include • .bddvarorder
• .split_all_rules • .bddnodes
• .report_stats • .bddcache
• .noisy • .bddminfree
• .strict • .findbestorder
• .singleignore • .incremental
• .trace • .dot
• .basedir

June 11, Using Datalog and 150

Relation
Rule options
options
input / inputtuples split
output / outputtuples number
printtuples single
printsize cacheafterrename
pri=# findbestorder
{ code fragment } trace / tracefull
x<y pre / post { code }
modifies R

June 11, Using Datalog and 151

Experimental Features
• Distributed computation: (dbddbddb?)
• Profile-directed feedback of iteration
order
• Eclipse integration
• Touchgraph integration
• Debugging interface
• Tracing information
• Include rules in come-from query
June 11, Using Datalog and 152
What works well
• Big sets of mostly redundant data
– Pointer analysis
– Context-sensitive analysis
• Short propagation paths
– Each iteration takes quite a bit of time, so
>1000 iterations will hurt
– Try to preprocess/reformulate problem to
shorten paths
• Natural ‘flow’ problems
• Pure analysis problems (no transformations)

June 11, Using Datalog and 153

What doesn’t work well
• Long propagation paths
– Traditional dataflow analysis (use sparse form
instead)
• Huge problems with little redundancy
– Too much context sensitivity
• Domains that are not easily countable
– Need to manufacture names on the fly
• Problems that have inherently complicated
formulations
• Problems optimized for particular data
structures (union-find, etc.)

June 11, Using Datalog and 154

Using bddbddb in a class
• bddbddb has been very useful in Stanford
advanced compiler course
– Comparing/contrasting analyses becomes easier
– Students can implement and evaluate multiple
techniques without much overhead
• Projects:
– Implement an algorithm from a paper in Datalog,
make a small change and evaluate its
effectiveness
– Experiment with different kinds of context
sensitivity on a given problem
– Improve on BDD solver efficiency
– Build a tool based on analysis results

June 11, Using Datalog and 155

Questions?

June 11, Using Datalog and 156

That’s all, folks!

Thanks for sticking around for all 157 slides!

June 11, Using Datalog and 157

Experimental Results

June 11, Using Datalog and 158

Experimental Results

June 11, Using Datalog and 159

Experimental Results

June 11, Using Datalog and 160

Experimental Results

June 11, Using Datalog and 161

Experimental Results

June 11, Using Datalog and 162

Experimental Results

June 11, Using Datalog and 163

Experimental Results

June 11, Using Datalog and 164

Experimental Results

June 11, Using Datalog and 165

Experimental Results

June 11, Using Datalog and 166

Experimental Results

June 11, Using Datalog and 167

Experimental Results

June 11, Using Datalog and 168

Experimental Results

June 11, Using Datalog and 169

Related Work
• Datalog in Program Analysis
– Specify as Datalog query [Ullman 1989]
– Toupie system [Corsini 1993]
– Demand-driven using magic sets [Reps 1994]
– Program analysis with logic programming [Dawson
1996]
– Crocopat system [Beyer 2003]
– Modular class analysis [Besson 2003]
• BDDs in Program Analysis
– Predicate abstraction [Ball 2000]
– Shape analysis [Manevich 2002, Yavuz-Kahveci 2002]
– Pointer Analysis [Zhu 2002, Berndl 2003, Zhu 2004]
– Jedd system [Lhotak 2004]

June 11, Using Datalog and 170

Related Work
• BDD Variable Ordering
– Variable ordering is NP-complete [Bollig 1996]
– Interleaving [Fujii 1993]
– Sifting [Rudell 1993]
– Genetic algorithms [Drechsler 1995]
– Machine learning for BDD orders [Grumberg 2003]
• Efficient Evaluation of Datalog
– Semi-naïve evaluation [Balbin 1987]
– Bottom-up evaluation [Ullman 1989, Ceri 1990, Naughton
1991]
– Top-down evaluation with tabling [Tamaki 1986, Chen 1996]
– Rule ordering [Ramakrishnan 1990]
– Magic sets transformation [Bancilhon 1986]
– Computing with BDDs [Iwaihara 1995]
– Time and space guarantees [Liu 2003]

June 11, Using Datalog and 171

Program Analysis with bddbddb
• Context-sensitive Java • Object-sensitive
pointer analysis analysis
• C pointer analysis • Cartesian product
• Escape analysis algorithm
• Type analysis • Resolving Java reflection
• External lock analysis • Bounds check
elimination
• Finding memory leaks
• Finding race conditions
• Interprocedural def-use
• Finding Java security
• Interprocedural mod-ref
vulnerabilities
• And many more…
Performance better than handcoded!
June 11, Using Datalog and 172
Conclusion
• bddbddb: new paradigm in program analysis
– Datalog compiled into optimized BDD operations
– Efficiently and easily implement context-sensitive
analyses
– Easier to develop correct analyses
– Easily experiment with new ideas
– Growing library of program analyses
– Easily use and build upon work of others
• Available as open-source LGPL:
http://bddbddb.sourceforge.net
June 11, Using Datalog and 173
My Contribution (2)
bddbddb
(BDD-based deductive database)
– Pointer analysis in 6 lines of Datalog
(a database language)
• Hard to create & debug efficient BDD-based
algorithms (3451 lines, 1 man-year)
• Automatic optimizations in bddbddb
– Easy to create context-sensitive analyses
using pointer analysis results (a few lines)
– Created many analyses using bddbddb

June 11, Using Datalog and 174

Outline
• Pointer Analysis
– Problem Overview
– Brief History
– Pointer Analysis in Datalog
• Context Sensitivity
• Improving Performance
• bddbddb: BDD-based deductive database
• Experimental Results
– Analysis Time
– Analysis Memory
– Analysis Accuracy
• Conclusion

June 11, Using Datalog and 175

Performance is Tricky!
• Context-sensitive numbering scheme
– Modify BDD library to add special operations.
– Can’t even analyze small programs. Time: 
• Improved variable ordering
– Group similar BDD variables together.
– Interleave equivalence relations.
– Move common subsets to edges of variable
order. Time: 40h
• Incrementalize outermost loop
– Very tricky, many bugs. Time: 36h
• Factor away control flow, assignments
– Reduces number of variables. Time: 32h

June 11, Using Datalog and 176

Java Security Vulnerabilities
Application Reported Errors Actual
context- context- Errors
Classe insensitiv sensitive
Name s e
blueblog 306 1 1 1
webgoat 349 81 6 6
blojsom 428 48 2 2
personalblog 611 350 2 2
snipsnap 653 >321 27 15
road2hibern 867 15 1 1
a
pebble 889 427 1 1
roller 989 >267 1 1
Total
June 11, 5356 >1508
Using Datalog and 41
177 29
due to V. Benjamin Livshits
Vulnerabilities Found
SQL HTTP Cross-site Path
Total
injection splitting scripting traversal

Header 0 6 4 0 10
Parameter 6 5 0 2 13
Cookie 1 0 0 0 1
Non-Web 2 0 0 3 5
Total 9 11 4 5 29

June 11, Using Datalog and 178

Summary of Contributions
• The first scalable context-sensitive subset-based
pointer analysis.
– Cloning-based technique using BDDs
– Clever context numbering
– Experimental results on the effects of context sensitivity
• bddbddb: new paradigm in program analysis
– Efficiently and easily implement context-sensitive analyses
– Datalog compiled into optimized BDD operations
– Library of program analyses (with many others)
– Active learning for BDD variable orders (with M. Carbin)
• Artifacts:
– Joeq compiler and virtual machine
– JavaBDD library and BuDDy library
– bddbddb tool

June 11, Using Datalog and 179

Conclusion
• The first scalable context-sensitive subset-
based pointer analysis.
– Accurate: Results for up to 1014 contexts.
– Scales to large programs.

• bddbddb: a new paradigm in prog analysis

– High-level spec  Efficient implementation

• System is publicly available at:

http://bddbddb.sourceforge.net

June 11, Using Datalog and 180

Digital Signal Processing Course Guide
0% (1)
Digital Signal Processing Course Guide
94 pages
Introduction to VHDL Basics
No ratings yet
Introduction to VHDL Basics
88 pages
Kactus2: Hierarchical Design Tutorial: Version Date Author Comments
No ratings yet
Kactus2: Hierarchical Design Tutorial: Version Date Author Comments
19 pages
Sparse Class
No ratings yet
Sparse Class
23 pages
Finite State Machine Datapath Design, Optimization, and Implementation
100% (2)
Finite State Machine Datapath Design, Optimization, and Implementation
123 pages
Verilog Pli
No ratings yet
Verilog Pli
350 pages
H1038 Programming For Engineers: Instructions
No ratings yet
H1038 Programming For Engineers: Instructions
3 pages
Cadence Inverter Design Tutorial
No ratings yet
Cadence Inverter Design Tutorial
37 pages
FFT Algorithms in Discrete Time Signal Processing
100% (2)
FFT Algorithms in Discrete Time Signal Processing
39 pages
State Machine Design in C - CodeProject
No ratings yet
State Machine Design in C - CodeProject
17 pages
100 Days of SystemVerilog Testbenches
No ratings yet
100 Days of SystemVerilog Testbenches
49 pages
VLSI Books For Reference
No ratings yet
VLSI Books For Reference
36 pages
Scilab Guide for Discrete Math Students
No ratings yet
Scilab Guide for Discrete Math Students
62 pages
Klayout-0 21 16
No ratings yet
Klayout-0 21 16
511 pages
Application of FPGAs in Industries
No ratings yet
Application of FPGAs in Industries
16 pages
Making The Most of SystemVerilog+UVM PDF
100% (1)
Making The Most of SystemVerilog+UVM PDF
28 pages
ALSA Framework and ASoC Overview
No ratings yet
ALSA Framework and ASoC Overview
10 pages
Comparison of EDA Software
No ratings yet
Comparison of EDA Software
2 pages
Little Man Computer Overview and Operations
0% (1)
Little Man Computer Overview and Operations
6 pages
Spice Commands and User Manual
100% (1)
Spice Commands and User Manual
49 pages
Simulink Onramp
No ratings yet
Simulink Onramp
1 page
C++ Programming Assignment Solutions
100% (1)
C++ Programming Assignment Solutions
5 pages
Lecture 4 Latching and Logic
No ratings yet
Lecture 4 Latching and Logic
3 pages
MATLAB Report
No ratings yet
MATLAB Report
17 pages
Labview Programs
No ratings yet
Labview Programs
29 pages
Build A Simple Computer: Lecturer: Pazir Ahmad
No ratings yet
Build A Simple Computer: Lecturer: Pazir Ahmad
80 pages
Numbers and Computers by Ronald T. Kneusel
No ratings yet
Numbers and Computers by Ronald T. Kneusel
350 pages
1.FPGA Design Flow Processes Properties
No ratings yet
1.FPGA Design Flow Processes Properties
5 pages
PSpice Tutorial for EE Students
0% (1)
PSpice Tutorial for EE Students
56 pages
UNIX For Users Handout
0% (1)
UNIX For Users Handout
129 pages
SystemVerilog for AHB-DMA Verification
No ratings yet
SystemVerilog for AHB-DMA Verification
4 pages
Simulink Guide for Power Electronics
No ratings yet
Simulink Guide for Power Electronics
12 pages
Sparse Matrix
No ratings yet
Sparse Matrix
6 pages
FPGA Logic Gates and Adders Implementation
No ratings yet
FPGA Logic Gates and Adders Implementation
66 pages
Electronic Circuit Simulation - Wikipedia, The Free Encyclopedia
No ratings yet
Electronic Circuit Simulation - Wikipedia, The Free Encyclopedia
4 pages
How To Measure High Frequency and Duty Cycle Simul
No ratings yet
How To Measure High Frequency and Duty Cycle Simul
9 pages
Microprocessor Lab: Arithmetic Operations
100% (1)
Microprocessor Lab: Arithmetic Operations
30 pages
AICTE Database Integration Project 2024
No ratings yet
AICTE Database Integration Project 2024
4 pages
Lab Manual CS7001 Distributed System Powered by A2softech (A2kash)
No ratings yet
Lab Manual CS7001 Distributed System Powered by A2softech (A2kash)
30 pages
Dec 2019-Jan 2020 (18EVE14)
100% (1)
Dec 2019-Jan 2020 (18EVE14)
2 pages
MATLAB
100% (1)
MATLAB
182 pages
Common C Programming Errors
100% (1)
Common C Programming Errors
8 pages
Understanding Vertical Redundancy Check
No ratings yet
Understanding Vertical Redundancy Check
1 page
Matlab Tutorial Nelson
No ratings yet
Matlab Tutorial Nelson
70 pages
Simulink Design Optimization - Getting Started Guide
No ratings yet
Simulink Design Optimization - Getting Started Guide
225 pages
Database Toolbox: User's Guide
No ratings yet
Database Toolbox: User's Guide
226 pages
8086 Programming
50% (2)
8086 Programming
203 pages
SymPy: Symbolic Math in Python Guide
No ratings yet
SymPy: Symbolic Math in Python Guide
8 pages
Digital Design Through Verilog-18 PDF
No ratings yet
Digital Design Through Verilog-18 PDF
31 pages
12 1 LogicProg
No ratings yet
12 1 LogicProg
26 pages
Datalog and Recursive Queries: Universidad de Concepci On, 2014
No ratings yet
Datalog and Recursive Queries: Universidad de Concepci On, 2014
22 pages
Deductive Databases
No ratings yet
Deductive Databases
24 pages
Datalog and Recursion
No ratings yet
Datalog and Recursion
144 pages
Binary Decision Diagrams: Manipulation
No ratings yet
Binary Decision Diagrams: Manipulation
27 pages
Datalog: Deductive Database Programming: Jay Mccarthy Jay@Racket-Lang
No ratings yet
Datalog: Deductive Database Programming: Jay Mccarthy Jay@Racket-Lang
13 pages
CS240A: Databases and Knowledge Bases
No ratings yet
CS240A: Databases and Knowledge Bases
20 pages
Deductive Database Groupwork
No ratings yet
Deductive Database Groupwork
4 pages
LPN11
No ratings yet
LPN11
60 pages
Pms547 2009 Lectures Datalog-Eval
No ratings yet
Pms547 2009 Lectures Datalog-Eval
30 pages
AI File
No ratings yet
AI File
29 pages
Algebraic Testing in Software Engineering
No ratings yet
Algebraic Testing in Software Engineering
55 pages
Data Structures Exam 2009-2010
No ratings yet
Data Structures Exam 2009-2010
2 pages
Iszc 361
No ratings yet
Iszc 361
2 pages
SSZG 516
No ratings yet
SSZG 516
3 pages
Birla Institute of Technology & Science, Pilani Distance Learning Programmes Division Second Semester 2006-2007 Mid-Semester Test (EC-1 Regular)
No ratings yet
Birla Institute of Technology & Science, Pilani Distance Learning Programmes Division Second Semester 2006-2007 Mid-Semester Test (EC-1 Regular)
1 page
Ugc Net List
No ratings yet
Ugc Net List
28 pages
Ugc Net List
No ratings yet
Ugc Net List
28 pages
Wheel Set Maintenance Manual Updated
No ratings yet
Wheel Set Maintenance Manual Updated
60 pages
Mobile App Marketing Strategies
No ratings yet
Mobile App Marketing Strategies
30 pages
Costing Book PDF
No ratings yet
Costing Book PDF
273 pages
Pre-Feasibility Report For Proposed Construction Project of "Santnagari" at
No ratings yet
Pre-Feasibility Report For Proposed Construction Project of "Santnagari" at
12 pages
HRSG Inspection Guide: Key Tasks
100% (5)
HRSG Inspection Guide: Key Tasks
38 pages
Module 5 Group 5
No ratings yet
Module 5 Group 5
18 pages
XCBCV
No ratings yet
XCBCV
17 pages
Infinity Delta and Delta XL Infinity® Bedside Solutions
No ratings yet
Infinity Delta and Delta XL Infinity® Bedside Solutions
12 pages
Forma 3918 Temperature Set Control
No ratings yet
Forma 3918 Temperature Set Control
2 pages
MSD 7500 Installation Instructions
No ratings yet
MSD 7500 Installation Instructions
8 pages
Understanding the Mischief Rule in Law
No ratings yet
Understanding the Mischief Rule in Law
28 pages
Ritu
100% (1)
Ritu
33 pages
Week One Acct 444 Multiple
No ratings yet
Week One Acct 444 Multiple
12 pages
Inbound WePay Carrier Manual
No ratings yet
Inbound WePay Carrier Manual
69 pages
Natural Justice in Indian Law
100% (1)
Natural Justice in Indian Law
5 pages
Charcoal MSDS
No ratings yet
Charcoal MSDS
11 pages
La Marzocco Technical Newsletter July 2017
No ratings yet
La Marzocco Technical Newsletter July 2017
1 page
Low Pressure Portable Flare Stack: Product Description Applications
No ratings yet
Low Pressure Portable Flare Stack: Product Description Applications
2 pages
Tutorials Origin Pro 9
100% (1)
Tutorials Origin Pro 9
920 pages
Bhuvaneshwari - SHRI240717CR487168765
No ratings yet
Bhuvaneshwari - SHRI240717CR487168765
4 pages
F Maintenance Program Planning Services
100% (1)
F Maintenance Program Planning Services
272 pages
Patent Infringement Lawsuit
No ratings yet
Patent Infringement Lawsuit
239 pages
Flying Qualities Criteria
No ratings yet
Flying Qualities Criteria
24 pages
Zhichi Co Word
No ratings yet
Zhichi Co Word
1 page
Role of Information Systems in Indian Railways
No ratings yet
Role of Information Systems in Indian Railways
21 pages
Farm Equipment Cleaning and Chemical Safety
No ratings yet
Farm Equipment Cleaning and Chemical Safety
6 pages
Lesson-3 State Nation Globalization
No ratings yet
Lesson-3 State Nation Globalization
21 pages
Human Resource Champions
88% (8)
Human Resource Champions
12 pages
2017 - Manuri Et - Al - Effect of Species Grouping and Site Variables On Abovegroud Biomass Models For Lowland Tropical Forests of The Indo Malay
No ratings yet
2017 - Manuri Et - Al - Effect of Species Grouping and Site Variables On Abovegroud Biomass Models For Lowland Tropical Forests of The Indo Malay
14 pages
Plant Watering System
No ratings yet
Plant Watering System
5 pages