Tutorial
Tutorial
Java Bytecode
Laurie Hendren, Patrick Lam, Jennifer Lhoták, Ondřej Lhoták and Feng Qian
McGill University
Special thanks to John Jorgensen and Navindra Umanee for help in preparing
Soot 2.0 and this tutorial.
Soot development has been supported, in part, by research grants from NSERC,
FCAR and IBM
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 1
Program and Cast
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 2
Introduction and Soot Basics
What is Soot?
Soot: Past and Present
Soot Overview
IRs: Baf, Jimple, Shimple, Grimp, Dava
Soot as an end-user tool and Soot as an
Eclipse plugin
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 3
What is Soot?
[Link]/soot/
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 4
What is Soot? (2)
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 5
Soot: Past and Present
[Link]/publications/
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 6
Soot: Past and Present (2)
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 7
Soot Overview
Java SML Scheme Eiffel
source source source source
Generate Bytecode
Java
source
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 8
Soot IRs
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 9
Soot as an end-user tool: Command-line
1. Install Java.
2. Download two .jar files (one for soot and one
for jasmin) and put them on your
CLASSPATH.
[Link]/software/#soot
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 10
Command-line: processing classes
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 11
Command-line: optimizing classes
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 12
Command-line: a more complex example
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 16
Soot as an end-user tool: Eclipse Plugin
1. Install Java
2. Install Eclipse [Link]
3. Download one .jar file and unjar it into your Eclipse
plugin directory
4. Start Eclipse
[Link]/software/#soot
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 18
Jimple
Jimple is:
principal Soot Intermediate Representation
3-address code in a control-flow graph
a typed intermediate representation
stackless
Core statements:
NopStmt
DefinitionStmt: IdentityStmt,
AssignStmt
Intraprocedural control-flow:
IfStmt
GotoStmt
TableSwitchStmt,LookupSwitchStmt
Interprocedural control-flow:
InvokeStmt
ReturnStmt, ReturnVoidStmt
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 20
Kinds of Jimple Stmts II
ThrowStmt
throws an exception
RetStmt
not used; returns from a JSR
MonitorStmt: EnterMonitorStmt,
ExitMonitorStmt
mutual exclusion
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 21
IdentityStmt
this.m();
Where’s the definition of this?
IdentityStmt:
Used for assigning parameter values and
this ref to locals.
Gives each local at least one definition point.
Jimple rep of IdentityStmts:
r0 := @this;
i1 := @parameter0;
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 22
Context: other Jimple Stmts
public int foo([Link]) { // locals
r0 := @this; // IdentityStmt
r1 := @parameter0;
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 24
Bytecode → Jimple
Scene.v()
Scene (singleton)
getSootClass()
getField()
SootClass SootField getSignature()
getMethod()
SootMethod getSignature()
getActiveBody()
JimpleBody
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 28
Body-centric View
SootMethod
getActiveBody()
getLocals()
JimpleBody Chain
getTraps()
getUnits()
Chain Chain
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 29
Getting a UnitGraph
SootMethod
getActiveBody()
UnitGraph getBody()
getLocals()
JimpleBody Chain
new BriefUnitGraph()
getTraps()
getUnits()
Chain Chain
getUnits()
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 30
What to do with a UnitGraph
getBody()
getHeads(), getTails()
getPredsOf(u), getSuccsOf(u)
getExtendedBasicBlockPathBetween
(from, to)
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 31
Control-flow units
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 32
Soot Philosophy on Units
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 33
What is a Box?
s: x = y op z
AssignStmt AssignStmt
x OpExpr
VB VB
y z
x OpExpr
VB VB
y z
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 34
What is a DefBox?
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 36
On UseBoxes
Opposite of defBoxes.
List useBoxes = [Link]();
method [Link]() returns a list of
ValueBoxes, corresponding to all Values
which get used in ut, a Unit.
non-empty for most Soot Units.
ut: x = y op z ;
getUseBoxes(ut) = { y , z , y op z }
(List containing 3 ValueBoxes, 2
containing Locals & 1 Expr)
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 37
Why Boxes?
Change all instances of y to 1:
AssignStmt AssignStmt
x OpExpr
VB VB
y z
x OpExpr
VB VB
y z
setValue() ??
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 38
Search & Replace
replace(b, y, IntConstant.v(1));
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 39
More Abstract Accessors: Stmt
containsArrayRef(),
getArrayRef(), getArrayRefBox()
containsInvokeExpr(),
getInvokeExpr(), getInvokeExprBox()
containsFieldRef(),
getFieldRef(), getFieldRefBox()
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 40
Program and Cast
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 41
Intraprocedural Outline
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 42
Flow Analysis in Soot
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 43
Four Steps to Flow Analysis
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 44
HOWTO: Soot Flow Analysis
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 45
HOWTO: Soot Flow Analysis II
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 46
Flow Analysis Hierarchy
AbstractFlowAnalysis
FlowAnalysis BranchedFlowAnalysis
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 47
Soot Flow Analyses
AbstractFlowAnalysis
FlowAnalysis BranchedFlowAnalysis
Forward-
Forward- Backward-
Casts Nullness
PRE analy’s PRE analy’s
Avail. Expr. Liveness
Array Bds
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 48
Backward vs. Forward Analyses
A forward analysis computes OUT from IN:
i
s
flow dir
fs (i)
fs (i)
t
ft (fs (i))
A backward analysis computes IN from OUT:
fs (ft (i))
s
flow dir
ft (i)
ft (i)
t
i
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 49
Outline: Soot Flow Analysis Examples
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 50
Running Example 1: Live Variables
A local variable v is live at s if there exists some
statement s′ using v and a control-flow path from
s to s′ free of definitions of v.
{z, x}
y=x
{z, y}
{z, y} {y}
{z, y}
x=y z=2
{z, y}
{z, y}
b=y
{z}
{z}
a=z
{}
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 51
Steps to a Flow Analysis
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 52
Step 1: Forward or Backward?
class LiveVariablesAnalysis
extends BackwardFlowAnalysis
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 53
Step 2: Abstraction domain
Domain for Live Variables: sets of Locals
e.g. {x, y, z}
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 54
Implementing an Abstraction
Need to implement copy(), merge() methods:
dest
copy dir CFG
src
merge dir
dest = src1 ⊲⊳ src2
src1 src2
Signatures:
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 56
Flow Sets and Soot
// d = c // d = d ∪ {v}
[Link](d); [Link](v);
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 57
Digression: types of FlowSets
[Link].*Set
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 58
Step 2: copy() for live variables
[Link](destSet);
}
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 59
Step 2: merge() for live variables
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 60
Step 3: Flow equations
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 61
Step 3: Casting
Start by:
casting srcValue, destValue to FlowSet.
casting u to Unit ut.
In code:
FlowSet src = (FlowSet)srcValue,
dest = (FlowSet)destValue;
Unit ut = (Unit)u;
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 62
Step 3: Copying
dest
copy dir CFG
src
[Link] (dest);
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 63
Step 3: Implementing flowThrough
IN[ut]
ut flowThrough
OUT[ut]
IN[ut] = flowThrough(OUT[ut])
= OUT[ut] \ kills[ut] ∪ gens[ut]
flowThrough is the brains of a flow analysis.
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 64
Step 3: flowThrough for live locals
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 65
Step 3: Implementing flowThrough: removing kills
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 66
Step 3: Implementing flowThrough: adding gens
// Add gen set
// for each local v used in
// this unit, add v to dest
for (ValueBox box : [Link]())
{
Value value = [Link]();
if (value instanceof Local)
[Link](value);
}
N.B. our analysis is generic, not restricted to
Jimple.
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 67
Step 4: Initial values
LiveVariablesAnalysis(UnitGraph g)
{
super(g);
doAnalysis();
}
Causes the flow sets to be computed, using
Soot’s flow analysis engine.
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 69
Enjoy: Flow Analysis Results
// return SparseArraySets
// of live variables:
[Link](s);
[Link](s);
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 70
Running Example 2: Branched Nullness
A local variable v is non-null at s if all
control-flow paths reaching s result in v being
assigned a value different from null.
{}
{}
if (y==null) {y}
T F
{} {y}
{x}
x=new foo() [Link]()
{x, y}
y=null
{x}
{x}
y=b.f {x, b}
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 71
HOWTO: Soot Flow Analysis
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 72
Step 1: Forward or Backward?
class NullnessAnalysis
extends ForwardBranchedFlowAnalysis {
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 73
Step 2: Abstraction domain
Domain: sets of Locals known to be non-null
Partial order is subset inclusion.
∗
see [Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 74
Implementing an Abstraction
For a forward analysis, copy and merge mean:
src
copy dir CFG
dest
src1 src2
merge dir
dest = src1 ⊲⊳ src2
[Link](destSet);
}
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 76
Step 2: merge() for nullness
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 77
Step 3: Branched Flow Function
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 78
Step 3: Flow equations
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 79
Step 3: Flow equations
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 79
Step 3: Flow equations
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 79
Step 3: Flow equations
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 79
Step 3: Flow equations
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 79
Step 3: Flow equations
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 79
Step 4: Initial values
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 80
Step 5: Constructor: Prologue
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 81
Step 5: Constructor: Finding All Locals
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 82
Step 5: Creating gen sets
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 83
Step 5: Constructor: Doing work
doAnalysis();
}
}
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 84
Enjoy: Branched Flow Analysis Results
To instantiate a branched analysis & collect
results:
NullnessAnalysis na=new NullnessAnalysis(b);
// another SparseArraySet
if ([Link]()) [Link](s);
// a List of SparseArraySets
if ([Link]()) [Link](s);
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 85
Adding transformations to Soot (easy way)
1. Implement a BodyTransformer or
a SceneTransformer
internalTransform method
does the transformation
2. Choose a pack for your transformation
(usually jtp)
3. Write a main method that adds the transform
to the pack, then runs Soot’s main
4. (Optional) If your transformation needs
command-line options, call
setDeclaredOptions()
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 86
On Packs
Want to run a set of Transformer objects with
one method call.
⇒ Group them in a Pack.
Soot defines default Packs which are run
automatically. To add a Transformer to the
jtp Pack:
Pack jtp = G.v().PackManager().
getPack("jtp");
[Link](new Transform("[Link]",
new NullTransformer()));
[Link](new Transform("[Link]",
new NullnessAnalysisColorer()));
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 87
Extending Soot (hard way)
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 88
Running Soot more than once
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 89
Generating Jimple
.class coffi
jb Jimple
Jimple
.jimple
parser
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 90
Intra-procedural packs
gb gop Grimp
w? (j|s|b|g)(b|t|o|a)p
w ⇒ Whole-program phase
j, s, b, g ⇒ Jimple, Shimple, Baf, Grimp
b, t, o, a ⇒
(b) Body creation
(t) User-defined transformations
(o) Optimizations with -O option
(a) Attribute generation
The p is sometimes silent.
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 92
Soot Packs (Jimple Body)
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 93
Soot Packs (Jimple)
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 94
Soot Packs (Back-end)
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 95
Program and Cast
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 96
Interprocedural Outline
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 97
Soot’s whole-program mode
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 99
Soot phases
wjap
wjtp
cg
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 100
Call Graph
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 101
Call Graph Edge
Each Edge contains
Source method
Source statement (if applicable)
Target method
Kind of edge
foo() { bar() {
[Link](); /* */
} }
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 102
Edge Kinds
/** Due to explicit invokestatic instruction. */
public static final int STATIC = 1;
/** Due to explicit invokevirtual instruction. */
public static final int VIRTUAL = 2;
/** Due to explicit invokeinterface instruction. */
public static final int INTERFACE = 3;
/** Due to explicit invokespecial instruction. */
public static final int SPECIAL = 4;
/** Implicit call to static initializer. */
public static final int CLINIT = 5;
/** Implicit call to [Link]() due to [Link]() call. */
public static final int THREAD = 6;
/** Implicit call to [Link](). */
public static final int EXIT = 7;
/** Implicit call to non-trivial finalizer from constructor. */
public static final int FINALIZE = 8;
/** Implicit call to run() through [Link](). */
public static final int PRIVILEGED = 9;
/** Implicit call to constructor from [Link](). */
public static final int NEWINSTANCE = 10;
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 103
Querying Call Graph
< [Link].{Sources,Units,Targets}
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 106
Code Example
void mayCall( SootMethod src ) {
CallGraph cg =
Scene.v().getCallGraph();
Iterator targets =
new Targets([Link](src));
while( [Link]() ) {
SootMethod tgt =
(SootMethod) [Link]();
[Link]( ""+
src+" may call "+tgt );
}
}
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 107
Reachable Methods
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 108
Code Example
ReachableMethods rm =
Scene.v().getReachableMethods();
Iterator it = [Link]();
while( [Link]() ) {
SootMethod method =
(SootMethod) [Link]();
// method is reachable
}
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 109
Transitive Targets
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 110
Implementation Big Picture
Entry Points
methods
Scene Call Graph
methods
edges edges
edges
Naive Spark
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 111
Points-to analysis
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 112
Spark settings
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 113
PointsToAnalysis interface
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 115
If I want to know...
Local o;
PointsToAnalysis pa =
Scene.v().getPointsToAnalysis();
PointsToSet ptset =
[Link]( o );
[Link] types =
[Link]()
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 116
If I want to know...
Local x, y;
PointsToSet xset =
[Link]( x );
PointsToSet yset =
[Link]( y );
if([Link](yset))
// they’re possibly aliased
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 117
SideEffectTester interface
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 118
Implementations of SideEffectTester
NaiveSideEffectTester
is conservative
does not use call graph or points-to
information
does not require whole-program mode
PASideEffectTester
uses current call graph
uses current points-to information
this may be naive points-to information
[Link]
<
[Link] Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 119
Program and Cast
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 120
Motivation of Soot Attributes
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 121
Java class file attributes
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 122
Attribute format
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 123
Attributes and Soot (overview)
Eclipse
SootClass class_info
SootField field_info
SootMethod method_info
Body Code_attribute
Unit
ValueBox
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 125
Hosts
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 126
Tags
interface Tag
byte[] getValue()
interface Attribute
void setValue(byte[])
class LineNumberTag
class GenericAttribute abstract class JasminAttribute class BytecodeOffsetTag
class ArrayNullCheckTag
byte[] decode( ...... )
class FieldRWTag
String getJasminValue(......) etc ......
class CodeAttribute
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 128
Special case: attributes of Code_attribute
[Link]()
Jimple Bytecode
x = y.f load y
getfield f
store x
Which instruction(s) should get the tags?
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 130
Choosing an Aggregator
ImportantTagAggregator
attaches tag to the “most important” instruction
(field reference, array reference, method
invocation)
Used for array bounds check, null pointer
check, side-effect attributes
FirstTagAggregator
attaches tag to the first instruction
Used for line number table attribute
Easy to make your own . . .
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 131
TagAggregator
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 132
ImportantTagAggregator
abstract class ImportantTagAggregator
extends TagAggregator {
/** Decide whether this tag
* should be aggregated by
* this aggregator. */
public abstract boolean
wantTag( Tag t );
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 134
Example: nullness attribute
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 135
Example: nullness attribute
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 136
Example: nullness attribute
class NullTagAggregator
extends ImportantTagAggregator {
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 138
Motivation of Soot Attributes in Eclipse
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 139
Visual Representations
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 140
String Tags
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 141
Color Tags
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 142
Link Tags
[Link]
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 143
Program and Cast
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 144
Conclusion
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 145
Homework
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 146
Resources
<
Soot, a Tool for Analyzing and Transforming Java Bytecode – p. 148