Building Dynamic
Instrumentation Tools
with
DynamoRIO
Derek Bruening and Qin Zhao
Google
Tutorial Outline
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00
Welcome + DynamoRIO History
DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback
DynamoRIO Tutorial August 2014
DynamoRIO History
Dynamo
HP Labs: PA-RISC late 1990s
x86 Dynamo: 2000
RIO DynamoRIO
MIT: 2001-2004
Prior releases
0.9.1: Jun 2002 (PLDI tutorial)
0.9.2: Oct 2002 (ASPLOS tutorial)
0.9.3: Mar 2003 (CGO tutorial)
0.9.4: Feb 2005
0.9.5: Apr 2008 (CGO tutorial)
1.0 (0.9.6): Sep 2008 ([Link] launch)
DynamoRIO Tutorial August 2014
DynamoRIO History
Determina
2003-2007
Security company
VMware
Acquired Determina (and DynamoRIO) in 2007
Open-source BSD license
Feb 2009: 1.3.1 release
Dec 2013: 4.2.0 release
Aug 2014: 5.0.0 release
DynamoRIO Tutorial August 2014
DynamoRIO Overview
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00
Welcome + DynamoRIO History
DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback
Typical Modern Application: IIS
DynamoRIO Tutorial August 2014
System Virtualization
thread
thread
process
thread
process
thread
thread
thread
thread
thread
process
thread
thread
thread
thread
process
operating system
DynamoRIO Tutorial August 2014
Process Virtualization
process
BC
D
E
process
process
thread
BC
thread
thread
thread
thread
thread
process
thread
thread
thread
thread
thread
thread
operating system
DynamoRIO Tutorial August 2014
Design Goals
Efficient
Near-native performance
Transparent
Match native behavior
Comprehensive
Control every instruction, in any application
Customizable
Adapt to satisfy disparate tool needs
DynamoRIO Tutorial August 2014
Challenges of Real-World Apps
Multiple threads
Synchronization
Application introspection
Reading of return address
Transparency corner cases are the norm
Example: access beyond top of stack
Scalability
Must adapt to varying code sizes, thread counts, etc.
Dynamically generated code
Performance challenges
DynamoRIO Tutorial August 2014
10
Overview Outline
Efficient
Software code cache overview
Thread-shared code cache
Transparent
Comprehensive
Customizable
DynamoRIO Tutorial August 2014
11
Direct Code Modification
e9 37 6f 48 92
jmp <callout>
Kernel32!TerminateProcess:
7d4d1028 7c 05
jl
7d4d102f
7d4d102a 33 c0
xor
%eax,%eax
7d4d102c 40
inc
%eax
7d4d102d eb 08
jmp
7d4d1037
7d4d102f 50
push %eax
7d4d1030 e8 ed 7c 00 00
call 7d4d8d22
DynamoRIO Tutorial August 2014
12
Debugger Trap Too Expensive
cc
int3 (breakpoint)
Kernel32!TerminateProcess:
7d4d1028 7c 05
jl
7d4d102f
7d4d102a 33 c0
xor
%eax,%eax
7d4d102c 40
inc
%eax
7d4d102d eb 08
jmp
7d4d1037
7d4d102f 50
push %eax
7d4d1030 e8 ed 7c 00 00
call 7d4d8d22
DynamoRIO Tutorial August 2014
13
We Need Indirection
Avoid transparency and granularity limitations of directly
modifying application code
Allow arbitrary modifications at unrestricted points in code
stream
Allow systematic, fine-grained modifications to code stream
Guarantee that all code is observed
DynamoRIO Tutorial August 2014
15
Basic Interpreter
application code
foo()
bar()
interpreter
C
fetch
decode
execute
D
E
~300x Slowdown!
F
DynamoRIO Tutorial August 2014
16
Improvement #1: Interpreter + Basic Block Cache
application code
foo()
basic
block
cache
bar()
A
C
DynamoRIO
C
D
E
F
E
F
Slowdown: 300x 25x
DynamoRIO Tutorial August 2014
17
Example Basic Block Fragment
frag7: add
add
%eax, %ecx
cmp
$4, %eax
cmp
$4, %eax
jle
$0x40106f
jle
<stub0>
jmp
<stub1>
dstub0
target: 0x40106f
dstub1
target: fall-thru
DynamoRIO Tutorial August 2014
stub0: mov
%eax, %ecx
%eax, eax-slot
mov
&dstub0, %eax
jmp
context_switch
stub1: mov
%eax, eax-slot
mov
&dstub1, %eax
jmp
context_switch
18
Improvement #2: Linking Direct Branches
application code
foo()
basic
block
cache
bar()
A
C
DynamoRIO
C
D
E
F
E
F
Slowdown: 300x 25x 3x
DynamoRIO Tutorial August 2014
19
Direct Linking
frag7: add
add
%eax, %ecx
cmp
$4, %eax
cmp
$4, %eax
jle
$0x40106f
jle
<frag8>
jmp
<stub1>
stub0: mov
%eax, %ecx
%eax, eax-slot
dstub0
target: 0x40106f
dstub1
target: fall-thru
DynamoRIO Tutorial August 2014
mov
&dstub0, %eax
jmp
context_switch
stub1: mov
%eax, eax-slot
mov
&dstub1, %eax
jmp
context_switch
20
Improvement #3: Linking Indirect Branches
application code
foo()
basic
block
cache
bar()
A
C
DynamoRIO
C
D
E
F
E
F
indirect
branch
lookup
Slowdown: 300x 25x 3x 1.2x
DynamoRIO Tutorial August 2014
21
Indirect Branch Transformation
frag8: mov %ecx, ecx-slot
ret
pop %ecx
jmp
ib_lookup:
<ib_lookup>
...
...
...
DynamoRIO Tutorial August 2014
22
Improvement #4: Trace Building
basic block cache
A
trace cache
A
B
E
F
G
K
J
H
D
Traces reduce branching, improve layout and locality, and
facilitate optimizations across blocks
We avoid indirect branch lookup
Next Executing Tail (NET) trace building scheme [Duesterwald
2000]
DynamoRIO Tutorial August 2014
23
Incremental NET Trace Building
basic block cache
G
trace cache
G
G
K
J
K
DynamoRIO Tutorial August 2014
G
K
J
24
Improvement #4: Trace Building
application code
foo()
basic
block
cache
bar()
A
C
trace
cache
DynamoRIO
ind. br.
stays
on
trace?
A
C
D
E
F
E
F
indirect
branch
lookup
?
F
Slowdown: 300x 25x 3x 1.2x 1.1x
DynamoRIO Tutorial August 2014
25
Base Performance
DynamoRIO Tutorial August 2014
SPEC CPU2000
Server
Desktop
26
Base Performance: SPEC 2006
DynamoRIO Tutorial August 2014
27
Sources of Overhead
Extra instructions
Indirect branch target comparisons
Indirect branch hashtable lookups
Extra data cache pressure
Indirect branch hashtable
Branch mispredictions
ret becomes jmp*
Application code modification
DynamoRIO Tutorial August 2014
28
Time Breakdown for SPEC CPU INT
application code
foo()
bar()
< 1%
trace
cache
A
C
basic
block
cache
DynamoRIO
C
D
E
F
E
F
ind. br.
stays
on
trace?
A
C
2%
indirect
branch
lookup
?
F
4%
0%
DynamoRIO Tutorial August 2014
94%
29
Not An Ordinary Application
An application executing in DynamoRIOs code cache looks
different from what the underlying hardware has been tuned
for
The hardware expects:
Little or no dynamic code modification
Writes to code are expensive
call and ret instructions
Return Stack Buffer predictor
DynamoRIO Tutorial August 2014
30
Performance Counter Data
DynamoRIO Tutorial August 2014
31
Overview Outline
Efficient
Software code cache overview
Thread-shared code cache
Transparent
Comprehensive
Customizable
DynamoRIO Tutorial August 2014
32
Threading Model
Running Program
Thread1
Thread2
Thread3
ThreadN
Code Caching Runtime System
Thread1
Thread2
Thread3
ThreadN
ThreadN
Operating System
Thread1
DynamoRIO Tutorial August 2014
Thread2
Thread3
33
Code Space
Running Program
Thread
Thread
Thread-Private Code Caches
Thread
Thread
Thread
Thread
Thread-Shared Code Cache
Thread
Thread
Operating System
Thread1
Thread2
DynamoRIO Tutorial August 2014
Thread1
Thread2
34
Thread-Private versus Thread-Shared
Thread-private
Less synchronization needed
Absolute addressing for thread-local storage
Thread-specific optimization and instrumentation
Thread-shared
Scales to many-threaded apps
DynamoRIO Tutorial August 2014
35
Database and Web Server Suite
Benchmark
Server
Processes
ab low
IIS low isolation
[Link]
ab med
IIS medium isolation
[Link], [Link]
guest low
IIS low isolation,
SQL Server 2000
[Link],
[Link]
guest med
IIS medium isolation, SQL
Server 2000
[Link],
[Link],
[Link]
DynamoRIO Tutorial August 2014
36
Memory Impact
ab med
DynamoRIO Tutorial August 2014
guest low
guest med
37
Performance Impact
DynamoRIO Tutorial August 2014
38
Scalability Limit
DynamoRIO Tutorial August 2014
39
Overview Outline
Efficient
Transparent
Transparency principles
Cache consistency
Synchronization
Comprehensive
Customizable
DynamoRIO Tutorial August 2014
40
Unavoidably Intrusive
processprocess
BC
BC
D
E
DynamoRIO
process
thread
thread
thread
thread
thread
process
thread
thread
thread
operating system
DynamoRIO Tutorial August 2014
41
Transparency
Do not want to interfere with the semantics of the program
Dangerous to make any assumptions about:
Register usage
Calling conventions
Stack layout
Memory/heap usage
I/O and other system call use
DynamoRIO Tutorial August 2014
42
Painful, But Necessary
Difficult and costly to handle corner cases
Many applications will not notice
but some will!
Microsoft Office: Visual Basic generated code, stack convention
violations
COM, Star Office, MMC: trampolines
Adobe Premiere: self-modifying code
VirtualDub: UPX-packed executable
etc.
DynamoRIO Tutorial August 2014
43
Transparency Principles
Principle 1: As few changes as possible
Set a high bar for value before changing the native environment
Principle 2: Hide necessary changes
Whatever is valuable enough to change must be hidden
Changes that cannot be hidden should not be made
Principle 3: Separate resources
Avoid intra-process resource conflicts
Bruening et al. Transparent Dynamic Instrumentation VEE12
DynamoRIO Tutorial August 2014
44
Principle 1: As few changes as possible
Application code
Executable on disk
Stored addresses
Threads
Application data
Including the stack!
DynamoRIO Tutorial August 2014
45
Error
Error
Error
Error
Error
Error
Error
Error
Error
Error
Return Address Transparency
SPEC CPU2000
DynamoRIO Tutorial August 2014
Server
Desktop
46
Principle 2: Hide necessary changes
Application addresses
Address space
Error transparency
Code cache consistency
DynamoRIO Tutorial August 2014
47
Principle 3: Separate resources
Linux
DynamoRIO Tutorial August 2014
Windows
48
Arbitrary Interleaving
application code
basic
block
cache
malloc()
A
C
call malloc()
DynamoRIO
trace
cache
D
E
F
thread-safe
indirect
branch
lookup
re-entrant!
DynamoRIO Tutorial August 2014
49
Transparency Landscape
Principle 1:
As few changes
as possible
Code
Data
Concurrency
Principle 2:
Hide necessary
changes
Principle 3:
Separate
resources
application code, machine context,
stored addresses cache consistency
stack, heap,
registers,
condition flags
separate stack,
heap, context, i/o
threads,
memory ordering
disjoint locks
Other
DynamoRIO Tutorial August 2014
preserve errors
50
Overview Outline
Efficient
Transparent
Transparency principles
Cache consistency
Synchronization
Comprehensive
Customizable
DynamoRIO Tutorial August 2014
51
Code Change Mechanisms
RISC
x86
I-Cache
D-Cache
I-Cache
D-Cache
A:
B:
C:
D:
A:
B:
C:
D:
A:
B:
C:
D:
A:
B:
C:
D:
Store B
Flush B
Jump B
DynamoRIO Tutorial August 2014
Store B
Jump B
52
How Often Does Code Change?
Not just modification of code!
Removal of code
Shared library unloading
Replacement of code
JIT region re-use
Trampoline on stack
DynamoRIO Tutorial August 2014
53
Code Change Events
Memory
Unmappings
Generated Code
Regions
Modified Code
Regions
SPECFP
112
SPECINT
29
SPECJVM
3373
4591
144
21
20
Photoshop
1168
40
Powerpoint
367
28
33
Word
345
20
Excel
DynamoRIO Tutorial August 2014
54
Detecting Code Removal
Example: shared library being unloaded
Requires explicit request by application to operating system
Detect by monitoring system calls (munmap,
NtUnmapViewOfSection)
DynamoRIO Tutorial August 2014
55
Detecting Code Modification
On x86, no explicit app
request required, as the
icache is kept consistent in
hardware so any memory
write could modify code!
x86
I-Cache
D-Cache
A:
B:
C:
D:
A:
B:
C:
D:
Store B
Jump B
DynamoRIO Tutorial August 2014
56
Page Protection Plus Instrumentation
Invariant: application code copied to code cache must be
read-only
If writable, hide read-only status from application
Some code cannot or should not be made read-only
Self-modifying code
Windows stack
Code on a page with frequently written data
Use per-fragment instrumentation to ensure code is not stale
on entry and to catch self-modification
DynamoRIO Tutorial August 2014
57
Adaptive Consistency Algorithm
Use page protection by default
Most code regions are always read-only
Subdivide written-to regions to reduce flushing cost of writeexecute cycle
Large read-only regions, small written-to regions
Switch to instrumentation if write-execute cycle repeats too
often (or on same page)
Switch back to page protection if writes decrease
Bruening et al. Maintaining Consistency and Bounding Capacity
of Software Code Caches CGO05
DynamoRIO Tutorial August 2014
58
Overview Outline
Efficient
Transparent
Transparency principles
Cache consistency
Synchronization
Comprehensive
Customizable
DynamoRIO Tutorial August 2014
59
Synchronization Transparency
Application thread management should not interfere with the
runtime system, and vice versa
Cannot allow the app to suspend a thread holding a runtime
system lock
Runtime system cannot use app locks
DynamoRIO Tutorial August 2014
60
Disjoint Locks
App thread suspension requires safe spots where no runtime
system locks are held
Time spent in the code cache can be unbounded
Our invariant: no runtime system lock can be held while
executing in the code cache
DynamoRIO Tutorial August 2014
61
Overview Outline
Efficient
Transparent
Comprehensive
Customizable
DynamoRIO Tutorial August 2014
62
Above the Operating System
processprocess
BC
BC
D
E
DynamoRIO
+ client
E
process
thread
thread
thread
thread
thread
process
thread
thread
thread
operating system
DynamoRIO Tutorial August 2014
63
Kernel-Mediated Control Transfers
user mode
kernel mode
majority of
executed
code in a
typical
Windows
application
time
message pending
save user context
message handler
no message pending
restore context
DynamoRIO Tutorial August 2014
64
Intercepting Linux Signals
user mode
kernel mode
signal pending
save user context
register our
own signal
handler
time
DynamoRIO handler
signal handler
no signal pending
restore context
DynamoRIO Tutorial August 2014
65
Windows Messages
user mode
kernel mode
time
message pending
save user context
dispatcher
message handler
no message pending
restore context
DynamoRIO Tutorial August 2014
66
Intercepting Windows Messages
user mode
kernel mode
message pending
save user context
modify
shared library
memory image
time
dispatcher
dispatcher
message handler
no message pending
restore context
DynamoRIO Tutorial August 2014
67
Must Monitor System Calls
To maintain control:
Calls that affect the flow of control: register signal handler, create
thread, set thread context, etc.
To maintain transparency:
Queries of modified state app should not see
To maintain cache consistency:
Calls that affect the address space
To support cache eviction:
Interruptible system calls must be redirected
DynamoRIO Tutorial August 2014
68
Operating System Dependencies
System calls and their numbers
Monitor applications usage, as well as for our own resource
management
Windows changes the numbers each major rel
Details of kernel-mediated control flow
Must emulate how kernel delivers events
Initial injection
Once in, follow child processes
DynamoRIO Tutorial August 2014
69
Overview Outline
Efficient
Transparent
Comprehensive
Customizable
Clients
Building and Deploying Tools
DynamoRIO Tutorial August 2014
70
DynamoRIO + Client
application code
client code
foo()
bar()
basic
block
cache
trace
cache
A
C
Tool
DynamoRIO
C
A
D
C
D
E
F
DynamoRIO Tutorial August 2014
E
F
indirect
branch
lookup
?
F
71
Demo
Clients
The engine exports an API for
building a client
System details abstracted away:
client focuses on manipulating
the code stream
DynamoRIO Tutorial August 2014
73
Provided Tools
In addition to numerous sample clients, DynamoRIO ships
with the following polished end-user tools:
Dr. Memory (drmemory): identifies memory errors (use-afterfrees, buffer overflows, uninitialized reads, memory leaks, etc.)
Dr. Cov (drcov): code coverage tool
Dr. Strace (drstrace): system call tracer for Windows
Dr. Ltrace (drltrace): library call tracer
Run these tools via drrun t <toolname>
E.g.: bin32/drrun t drmemory -- <app cmdline>
DynamoRIO Tutorial August 2014
74
Cross-Platform Clients
DynamoRIO API presents a consistent interface that works
across platforms
Windows versus Linux
32-bit versus 64-bit
Thread-private versus thread-shared
Same client source code generally works on all combinations
of platforms
Some exceptions, noted in the documentation
DynamoRIO Tutorial August 2014
75
Building a Client
Include DR API header file
#include dr_api.h
Set platform defines
WINDOWS or LINUX
X86_32 or X86_64
Export a dr_init function
DR_EXPORT void dr_init (client_id_t client_id)
Build a shared library
DynamoRIO Tutorial August 2014
76
Auto-Configure Using CMake
add_library(myclient SHARED myclient.c)
find_package(DynamoRIO)
if (NOT DynamoRIO_FOUND)
message(FATAL_ERROR "DynamoRIO package
required to build")
endif(NOT DynamoRIO_FOUND)
configure_DynamoRIO_client(myclient)
DynamoRIO Tutorial August 2014
77
CMake
Build system converted to CMake when open-sourced
Switch from frozen toolchain to supporting range of tools
CMake generates build files for native compiler of choice
Makefiles for UNIX, nmake, etc.
Visual Studio project files
[Link]
DynamoRIO Tutorial August 2014
78
DynamoRIO Extensions
DynamoRIO API is extended via libraries called Extensions
Both static and shared supported
Built and packaged with DynamoRIO
Easy for a client to use
use_DynamoRIO_extension(myclient extensionname)
DynamoRIO Tutorial August 2014
79
Operating System Dependencies
System calls and their numbers
Monitor applications usage, as well as for our own resource
management
Windows changes the numbers each major rel
Details of kernel-mediated control flow
Must emulate how kernel delivers events
Initial injection
Once in, follow child processes
DynamoRIO Tutorial August 2014
80
DynamoRIO Extensions, Contd
Current Extensions:
drsyms: symbol lookup (currently Windows-only)
drcontainers: hashtable
drmgr: multi-instrumentation mediation
drwrap: function wrapping and replacing
drutil: memory tracing, string loop expansion
drx: multi-process management, misc utilities
drsyscall: system call names, numbers, parameter types
Coming soon:
drreg: register stealing and allocating
Umbra: shadow memory framework
Your utility library or framework contribution!
DynamoRIO Tutorial August 2014
81
Application Configuration
File-based scheme
Per-user local files
$HOME/.dynamorio/ on Linux
$USERPROFILE/dynamorio/ on Windows
Global files
/etc/dynamorio/ on Linux
Registry-specified directory on Windows
Files are lists of var=value
DynamoRIO Tutorial August 2014
82
Deploying Clients
One-step configure-and-run usage model:
drrun c <client> <client options> -- <app cmdline>
Uses an invisible temporary first-priority one-time config file
Two-step usage model giving more control over children:
drconfig reg <appname> -c <client> <client options>
drinject <app cmdline>
Systemwide injection:
drconfig syswide_on reg <appname> -c <client> <options>
<run app normally>
Polished tool invocation:
drrun t <tool> <tool options> -- <app cmdline>
Uses a pre-existing tool config file
DynamoRIO Tutorial August 2014
83
Deploying Clients On Linux
drrun and drinject scripts: LD_PRELOAD-based
Take over after statically-dependent shared libs but before exe
Suid apps ignore LD_PRELOAD
Place [Link]'s full path in /etc/[Link]
Copy [Link] to /usr/lib
In the future:
Attach
Earliest injection
DynamoRIO Tutorial August 2014
84
Deploying Clients On Windows
drinject and drrun injection
Currently after all shared libs are initialized
From-parent injection
Early: before any shared libs are loaded
Systemwide injection via syswide_on
Requires administrative privileges
Launch app normally: no need to run via drinject/drrun
Moderately early: during [Link] initialization
In the future:
Earliest injection for drrun/drinject and from-parent
DynamoRIO Tutorial August 2014
85
Following Child Processes
Runtime option follow_children
Default on: follow all children
Whitelist
-no_follow_children and configure files for whitelist
Blacklist
-follow_children and configure files norun for blacklist
drconfig -norun to create do-not-follow config file
DynamoRIO Tutorial August 2014
86
Non-Standard Deployment
drdecode
Static IA-32/AMD64 decoding/encoding/instruction manipulation
library
Standalone API
Use DynamoRIO as a library of IA-32/AMD64 manipulation
routines plus cross-platform file i/o, locks, etc.
Start/Stop API
Can instrument source code with where DynamoRIO should
control the application
DynamoRIO Tutorial August 2014
87
Runtime Options
Pass options to drconfig/drrun
A large number of options; the most relevant are:
-code_api
-c <client lib> <client options>
-thread_private
-follow_children
-opt_cleancall
-tracedump_text and tracedump_binary
-prof_pcs
DynamoRIO Tutorial August 2014
88
Runtime Options For Debugging
Notifications:
-stderr_mask 0xN
-msgbox_mask 0xN
Windows:
-no_hide
Debug-build-only:
-loglevel N
-ignore_assert_list *
DynamoRIO Tutorial August 2014
89
Examples, Part 1
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00
Welcome + DynamoRIO History
DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback
DynamoRIO API
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00
Welcome + DynamoRIO History
DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback
DynamoRIO API Outline
Building and Deploying
Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting
DynamoRIO Tutorial August 2014
92
DynamoRIO + Client
application code
client code
foo()
bar()
basic
block
cache
trace
cache
A
C
Tool
DynamoRIO
C
A
D
C
D
E
F
DynamoRIO Tutorial August 2014
E
F
indirect
branch
lookup
?
F
93
Transformation Time vs Execution Time
application code
average
instruction
length
transformation
time
client code
foo()
bar()
A
C
DynamoRIO
basic
block
cache
trace
cache
call
instruction
execution
A
count
execution time
C
A
C
F
94
indirect
branch
lookup
?
F
Client Events: Code Stream
Client has opportunity to inspect and potentially modify every
single application instruction, immediately before it executes
Event happens at transformation time
Modifications or inserted code will operate at execution time
Entire application code stream
Basic block creation event: can modify the block
For comprehensive instrumentation tools
Or, focus on hot code only
Trace creation event: can modify the trace
Custom trace creation: can determine trace end condition
For optimization and profiling tools
DynamoRIO Tutorial August 2014
95
Simplifying Client View
Several optimizations disabled
Elision of unconditional branches
Indirect call to direct call conversion
Shared cache sizing
Process-shared and persistent code caches
Future release will give client control over optimizations
DynamoRIO Tutorial August 2014
96
Basic Block Event
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag,
instrlist_t *bb, bool for_trace,
bool translating) {
instr_t *inst;
for (inst = instrlist_first(bb);
inst != NULL;
inst = instr_get_next(inst)) {
/* */
}
return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
dr_register_bb_event(event_basic_block);
}
DynamoRIO Tutorial August 2014
97
Trace Event
static dr_emit_flags_t
event_trace(void *drcontext, void *tag,
instrlist_t *trace, bool translating) {
instr_t *inst;
for (inst = instrlist_first(trace);
inst != NULL;
inst = instr_get_next(inst)) {
/* */
}
return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
dr_register_trace_event(event_trace);
}
DynamoRIO Tutorial August 2014
98
Client Events: Application Actions
Application thread creation and deletion
Application library load and unload
Application exception (Windows)
Client chooses whether to deliver or suppress
Application signal (Linux)
Client chooses whether to deliver, suppress, bypass the app
handler, or redirect control
DynamoRIO Tutorial August 2014
99
Client Events: Application System Calls
Application pre- and post- system call
Platform-independent system call parameter access
Client can modify:
Return value in post-, or set value and skip syscall in pre Call number
Params
Client can invoke an additional system call as the app
DynamoRIO Tutorial August 2014
100
Client Events: Bookkeeping
Initialization and Exit
Entire process
Each thread
Child of fork (Linux-only)
Basic block and trace deletion during cache management
Nudge received
Used for communication into client
Itimer fired (Linux-only)
DynamoRIO Tutorial August 2014
101
Multiple Clients
It is each client's responsibility to ensure compatibility with
other clients
Instruction stream modifications made by one client are visible to
other clients
At client registration each client is given a priority
dr_init() called in priority order (priority 0 called first and thus
registers its callbacks first)
Event callbacks called in reverse order of registration
Gives precedence to first registered callback, which is given the
final opportunity to modify the instruction stream or influence
DynamoRIO's operation
drmgr Extension provides mediation among multiple
components
DynamoRIO Tutorial August 2014
102
DynamoRIO API Outline
Building and Deploying
Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting
DynamoRIO Tutorial August 2014
103
DynamoRIO API: General Utilities
DynamoRIO provides safe utilities for transparency support
Separate stack
Separate memory allocation
Separate file I/O
Utility options
Use DynamoRIO-provided utilities directly
Use shared libraries via DynamoRIO private loader
Malloc, etc. redirected to DynamoRIO-provided utilities
Use static libraries with dependencies redirected
Risky for client to directly invoke system calls
DynamoRIO Tutorial August 2014
104
DynamoRIO Heap
Three flavors:
Thread-private: no synchronization; thread lifetime
Global: synchronized, process lifetime
Non-heap: for generated code, etc.
No header on allocated memory: low overhead but must pass
size on free
Leak checking
Debug build complains at exit if memory was not deallocated
DynamoRIO Tutorial August 2014
105
Thread Support
Thread support
Thread-local storage
Callback-local storage
Simple mutexes
Read-write locks
Thread-private code caches, if requested
Sideline support
Create new client-only thread
Thread-private itimer (Linux-only)
Suspend and resume all other threads
Cannot hold locks while suspending
DynamoRIO Tutorial August 2014
106
Thread-Local Storage (TLS)
Absolute addressing
Thread-private only
Application stack
Not reliable or transparent
Stolen register
Performance hit
Segment
Best solution for thread-shared
DynamoRIO Tutorial August 2014
107
Callback-Local Storage (CLS)
user mode
kernel mode
time
message pending
save user context
dispatcher
message handler
no message pending
restore context
DynamoRIO Tutorial August 2014
108
Callback-Local Storage (CLS)
Windows callbacks interrupt execution to process an event
and later resume the suspended context
TLS data from the suspended context will be overwritten
during callback execution
CLS data is saved at the interruption point and restored at the
resumption point
Whenever keeping persistent data specific to one context
rather than overall execution, use CLS instead of TLS
Usually only needed when storing data specific to a system call
in pre-syscall event and reading it back in post-syscall event
Can be used for Linux signals as well
Provided by the drmgr Extension
DynamoRIO Tutorial August 2014
109
DynamoRIO API: General Utilities, Contd
Communication
Nudges: ping from external process
File creation, reading, and writing
File descriptor isolation on Linux
Safe read/write
Fault-proof read/write routines
Try/except facility
DynamoRIO Tutorial August 2014
110
DynamoRIO API: General Utilities, Contd
Application inspection
Address space querying
Module iterator
Processor feature identification
Symbol lookup
Function replacing and wrapping
DynamoRIO Tutorial August 2014
111
Symbol Table Access
The drsyms Extension provides access to symbol tables and
debug information
Currently supports the following:
Windows PDB
Linux ELF + DWARF2
Windows PECOFF + DWARF2
API includes:
Address to symbol and line information
Symbol to address
Symbol enumeration and searching
Symbol demangling
Symbol types
DynamoRIO Tutorial August 2014
112
Function Replacing and Wrapping
drwrap Extension provides function replacing and wrapping
Use dr_get_proc_address() to find library exports or drsyms
Extension to find internal functions
Function replacing replaces with application code
Function wrapping calls pre and post callbacks that execute
as client code around the target application function
Arguments, return value, and whether the function is executed
can all be examined and controlled
DynamoRIO Tutorial August 2014
113
Third-Party Libraries
Private loader inside DynamoRIO will load any external
shared libraries a client imports from
Loads a duplicate copy of each library and tries to isolate from
the applications copy
On Windows, private loader does not support locating SxS
libraries, so use static libc with VS2005 or VS2008
C++ clients are built normally
C clients by default do not link with libc
Set DynamoRIO_USE_LIBC variable prior to invoking
configure_DynamoRIO_client() to use libc with a C client
DynamoRIO Tutorial August 2014
114
Private Libraries
Private loader on Windows
Not easy to fully isolate system data structures
PEB and key TEB fields are isolated
Some libraries like [Link] are shared
To examine application state while in client code, use
dr_switch_to_app_state()
Private loader on Linux
Isolation is simpler and more complete
DynamoRIO Tutorial August 2014
115
Optimal Transparency
For best transparency: completely self-contained client
Imports only from DynamoRIO API
-nodefaultlibs or /nodefaultlib
Alternatives to dynamic libc on Windows:
String and utility routines provided by forwards to ntdll
ntdll contains mini-libc
[Link] /MT static copy of C/C++ libraries
Alternatives to dynamic libc on Linux:
For static C/C++ lib, use ld wrap to redirect malloc to DRs heap
Newer distributions dont ship suitable static C/C++ lib
DynamoRIO Tutorial August 2014
116
DynamoRIO API Outline
Building and Deploying
Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting
DynamoRIO Tutorial August 2014
117
DynamoRIO API: Instruction Representation
Full IA-32/AMD64 instruction representation
Instruction creation with auto-implicit-operands
Operand iteration
Instruction lists with iteration, insertion, removal
Decoding at various levels of detail
Encoding
DynamoRIO Tutorial August 2014
118
Instruction Representation
8d 34 01
lea
(%ecx,%eax,1) -> %esi
8b 46 0c
mov
0xc(%esi) -> %eax
2b 46 1c
sub
0x1c(%esi) %eax -> %eax
0f b7 4e 08
movzx
0x8(%esi) -> %ecx
c1 e1 07
shl
$0x07 %ecx -> %ecx
WCPAZSO
3b c1
cmp
%eax %ecx
WCPAZSO
0f 8d a2 0a 00 00 jnl
raw bytes
DynamoRIO Tutorial August 2014
opcode
$0x77f52269
operands
WCPAZSO
RSO
eflags
119
Instruction Representation
lea
(%ecx,%eax,1) -> %edi
mov
0xc(%edi) -> %eax
sub
0x1c(%edi) %eax -> %eax
movzx
0x8(%edi) -> %ecx
c1 e1 07
shl
$0x07 %ecx -> %ecx
WCPAZSO
3b c1
cmp
%eax %ecx
WCPAZSO
0f 8d a2 0a 00 00 jnl
raw bytes
DynamoRIO Tutorial August 2014
opcode
$0x77f52269
operands
WCPAZSO
RSO
eflags
120
Instruction Creation
Method 1: use the INSTR_CREATE_opcode macros that fill
in implicit operands automatically:
instr_t *instr = INSTR_CREATE_dec(dcontext,
opnd_create_reg(DR_REG_EDX));
Method 2: specify opcode + all operands (including implicit
operands):
instr_t *instr = instr_create(dcontext);
instr_set_opcode(instr, OP_dec);
instr_set_num_opnds(dcontext, instr, 1, 1);
instr_set_dst(instr, 0, opnd_create_reg(DR_REG_EDX));
instr_set_src(instr, 0, opnd_create_reg(DR_REG_EDX));
DynamoRIO Tutorial August 2014
121
Linear Control Flow
Both basic blocks and traces are
linear
Instruction sequences are all
single-entrance, multiple-exit
Greatly simplifies analysis
algorithms
DynamoRIO Tutorial August 2014
122
64-Bit Versus 32-Bit
32-bit build of DynamoRIO only handles 32-bit code
64-bit build of DynamoRIO decodes/encodes both 32-bit and
64-bit code
Current release does not support executing applications that mix
the two
IR is universal: covers both 32-bit and 64-bit
Abstracts away underlying mode
DynamoRIO Tutorial August 2014
123
64-Bit Thread and Instruction Modes
When going to or from the IR, the thread mode and instruction
mode determine how instrs are interpreted
When decoding, current threads mode is used
Default is 64-bit for 64-bit DynamoRIO
Can be changed with set_x86_mode()
When encoding, that instructions mode is used
When created, set to mode of current thread
Can be changed with instr_set_x86_mode()
DynamoRIO Tutorial August 2014
124
64-Bit Clients
Define X86_64 before including header files when building a
64-bit client
Convenience macros for printf formats, etc. are provided
E.g.:
printf(Pointer is PFX\n, p);
Use X macros for cross-platform registers
DR_REG_XAX is DR_REG_EAX when compiled 32-bit, and
DR_REG_RAX when compiled 64-bit
DynamoRIO Tutorial August 2014
125
DynamoRIO API: Code Manipulation
Processor information
State preservation
Eflags, arith flags, floating-point state, MMX/SSE state
Spill slots, TLS, CLS
Clean calls to C code
Dynamic instrumentation
Replace code in the code cache
Branch instrumentation
Convenience routines
DynamoRIO Tutorial August 2014
126
Processor Information
Processor type
proc_get_vendor(), proc_get_family(), proc_get_type(),
proc_get_model(), proc_get_stepping(), proc_get_brand_string()
Processor features
proc_has_feature(), proc_get_all_feature_bits()
Cache information
proc_get_cache_line_size(), proc_is_cache_aligned(),
proc_bump_to_end_of_cache_line(),
proc_get_containing_page()
proc_get_L1_icache_size(), proc_get_L1_dcache_size(),
proc_get_L2_cache_size(), proc_get_cache_size_str()
DynamoRIO Tutorial August 2014
127
State Preservation
Spill slots for registers
3 fast slots, 6/14 slower slots
dr_save_reg(), dr_restore_reg(), and dr_reg_spill_slot_opnd()
From C code: dr_read_saved_reg(), dr_write_saved_reg()
Dedicated TLS field for thread-local data
dr_insert_read_tls_field(), dr_insert_write_tls_field()
From C code: dr_get_tls_field(), dr_set_tls_field()
Parallel routines for CLS fields
Arithmetic flag preservation
dr_save_arith_flags(), dr_restore_arith_flags()
Floating-point/MMX/SSE state
dr_insert_save_fpstate(), dr_insert_restore_fpstate()
DynamoRIO Tutorial August 2014
128
Clean Calls
if (instr_is_mbr(instr)) {
app_pc address = instr_get_app_pc(instr);
uint opcode = instr_get_opcode(instr);
instr_t *nxt = instr_get_next(instr);
dr_insert_clean_call(drcontext, ilist, nxt, (void *) at_mbr,
false/*don't need to save fp state*/,
2 /* 2 parameters */,
/* opcode is 1st parameter */
OPND_CREATE_INT32(opcode),
/* address is 2nd parameter */
OPND_CREATE_INTPTR(address));
}
Saved interrupted application state can be accessed using
dr_get_mcontext() and modified using dr_set_mcontext()
DynamoRIO Tutorial August 2014
129
Clean Call Inlining
Simple clean callees will be automatically optimized and
potentially inlined
-opt_cleancall runtime option controls aggressiveness
Current requirements for inlining:
Leaf routine (may call PIC get-pc thunk)
Zero or one argument
Relatively short
Compile the client with optimizations to improve clean call
optimization
Look in debug logfile for CLEANCALL to see results
DynamoRIO Tutorial August 2014
130
Dynamic Instrumentation
Thread-shared: flush all code corresponding to application
address and then re-instrument when re-executed
Can flush from clean call, and use dr_redirect_execution() since
cannot return to potentially flushed cache fragment
Thread-private: can also replace particular fragment (does not
affect other potential copies of the source app code)
dr_replace_fragment()
DynamoRIO Tutorial August 2014
131
Flushing the Cache
Immediately deleting or replacing individual code cache
fragments is available for thread-private caches
Only removes from that threads cache
Two basic types of thread-shared flush:
Non-precise: remove all entry points but let target cache code be
invalidated and freed lazily
Precise/synchronous:
Suspend the world
Relocate threads inside the target cache code
Invalidate and free the target code immediately
DynamoRIO Tutorial August 2014
132
Flushing the Cache
Thread-shared flush API routines:
dr_unlink_flush_region(): non-precise flush
dr_flush_region(): synchronous flush
dr_delay_flush_region():
No action until a thread exits code cache on its own
If provide a completion callback, synchronous once triggered
Without a callback, non-precise
DynamoRIO Tutorial August 2014
133
Multi-Instrumentation Mediation
The drmgr Extension provides mediation among multiple
agents for basic block instrumentation and TLS/CLS access
Divides instrumentation into four stages and orders the
callbacks for each stage:
Application-to-application transformations
Application analysis
Instrumentation insertion
Instrumentation optimization
Enables multi-library frameworks and modular clients
DynamoRIO Tutorial August 2014
134
Memory Tracing
drutil Extension provides utilities for memory address tracing:
Address acquisition
String loop expansion
DynamoRIO Tutorial August 2014
135
DynamoRIO API Outline
Building and Deploying
Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting
DynamoRIO Tutorial August 2014
136
DynamoRIO API: Translation
Translation refers to the mapping of a code cache machine
state (program counter, registers, and memory) to its
corresponding application state
The program counter always needs to be translated
Registers and memory may also need to be translated
depending on the transformations applied when copying into the
code cache
DynamoRIO Tutorial August 2014
137
Translation Case 1: Fault
user context
user context
faulting instr.
faulting instr.
Exception and signal handlers are passed machine context of
the faulting instruction.
For transparency, that context must be translated from the
code cache to the original code location
Translated location should be where the application would
have had the fault or where execution should be resumed
DynamoRIO Tutorial August 2014
138
Translation Case 2: Relocation
If one application thread suspends another, or DynamoRIO
suspends all threads for a synchronous cache flush:
Need suspended target thread in a safe spot
Not always practical to wait for it to arrive at a safe spot (if in a
system call, e.g.)
DynamoRIO forcibly relocates the thread
Must translate its state to the proper application state at which to
resume execution
DynamoRIO Tutorial August 2014
139
Translation Approaches
Two approaches to program counter translation:
Store mappings generated during fragment building
High memory overhead (> 20% for some applications, because it
prevents internal storage optimizations) even with highly optimized
difference-based encoding. Costly for something rarely used.
Re-create mapping on-demand from original application code
Cache consistency guarantees mean the corresponding application
code is unchanged
Requires idempotent code transformations
DynamoRIO supports both approaches
The engine mostly uses the on-demand approach, but stored
mappings are occasionally needed
DynamoRIO Tutorial August 2014
140
Instruction Translation Field
Each instruction contains a translation field
Holds the application address that the instruction corresponds
to
Set via instr_set_translation()
DynamoRIO Tutorial August 2014
141
Context Translation Via Re-Creation
A1: mov
%ebx, %ecx
A2: add
%eax, (%ecx)
A3: cmp
$4, (%eax)
A4: jle
710349fb
C1: mov
%ebx, %ecx
D1: (A1) mov
%ebx, %ecx
C2: add
%eax, (%ecx)
D2: (A2) add
%eax, (%ecx)
C3: cmp
$4, (%eax)
D3: (A3) cmp
$4, (%eax)
C4: jle
<stub0>
D4: (A4) jle
<stub0>
C5: jmp
<stub1>
D5: (A4) jmp
<stub1>
DynamoRIO Tutorial August 2014
142
Application vs. Meta Instructions
By default, instructions are treated as application instructions
Must have translations: instr_set_translation(), INSTR_XL8()
Control-flow-changing app instructions are modified to retain
DynamoRIO control and result in cache populating
Meta instructions are added instrumentation code
Not treated as part of the application (e.g., calls run natively)
Usually cannot fault, so translations not needed
Created via instr_set_meta() or instrlist_meta_append()
Meta instructions can reference application memory, or
deliberately fault
A meta instruction that might fault must contain a translation
The client should handle any such fault
DynamoRIO Tutorial August 2014
143
Client Translation Support
Instruction lists passed to clients are annotated with
translation information
Read via instr_get_translation()
Clients are free to delete instructions, change instructions and
their translations, and add new tool and app instructions (see
dr_register_bb_event() for restrictions)
An idempotent client that restricts itself to deleting app
instructions and adding non-faulting meta instructions can ignore
translation concerns
DynamoRIO takes care of instructions added by API routines
(insert_clean_call(), etc.)
Clients can choose between storing or regenerating
translations on a fragment by fragment basis.
DynamoRIO Tutorial August 2014
144
Client Regenerated Translations
Client returns DR_EMIT_DEFAULT from its bb or trace event
callback
Client bb & trace event callbacks are re-called when
translations are needed with translating==true
Client must exactly duplicate transformations performed when
the block was generated
Client must set translation field for all added app instructions
and all meta instructions that might fault
This is true even if translating==false since DynamoRIO may
decide it needs to store translations anyway
DynamoRIO Tutorial August 2014
145
Client Stored Translations
Client returns DR_EMIT_STORE_TRANSLATIONS from its
bb or trace event callback
Client must set translation field for all added app instructions
and all meta instructions that might fault
Client bb or trace hook will not be re-called with
translating==true
DynamoRIO Tutorial August 2014
146
Register State Translation
Translation may be needed at a point where some registers
are spilled to memory
During indirect branch or RIP-relative mangling, e.g.
DynamoRIO walks fragment up to translation point, tracking
register spills and restores
Special handling for stack pointer around indirect calls and
returns
DynamoRIO tracks client spills and restores implicitly added
by API routines
Clean calls, etc.
Explicit spill/restore (e.g., dr_save_reg()) clients responsibility
DynamoRIO Tutorial August 2014
147
Client Register State Translation
If a client adds its own register spilling/restoring code or
changes register mappings it must register for the restore
state event to correct the context
The same event can also be used to fix up the applications
view of memory
DynamoRIO does not internally store this kind of translation
information ahead of time when the fragment is built
The client must maintain its own data structures
DynamoRIO Tutorial August 2014
148
DynamoRIO API Outline
Building and Deploying
Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting
DynamoRIO Tutorial August 2014
149
DynamoRIO versus Pin
Basic interface is fundamentally different
Pin = insert callout/trampoline only
Not so different from tools that modify the original code: Dyninst,
Vulcan, Detours
Uses code cache only for transparency
DynamoRIO = arbitrary code stream modifications
Only feasible with a code cache
Takes full advantage of power of code cache
General IA-32/AMD64 decode/encode/IR support
DynamoRIO Tutorial August 2014
150
DynamoRIO versus Pin
Pin = insert callout/trampoline only
Pin tries to inline and optimize
Client has little control or guarantee over final performance
DynamoRIO = arbitrary code stream modifications
Client has full control over all inserted instrumentation
Result can be significant performance difference
PiPA Memory Profiler + Cache Simulator:
3.27x speedup w/ DynamoRIO vs 2.6x w/ Pin
DynamoRIO also performs callout (clean call) optimization and
inlining just like Pin for less performance-focused clients
DynamoRIO Tutorial August 2014
151
Base Performance Comparison (No Tool)
DynamoRIO Tutorial August 2014
152
Base Performance Comparison (No Tool)
DynamoRIO Tutorial August 2014
153
Base Memory Comparison (No Tool)
DynamoRIO Tutorial August 2014
154
Base Memory Comparison (No Tool)
DynamoRIO Tutorial August 2014
155
BBCount Pin Tool
static int bbcount;
VOID PIN_FAST_ANALYSIS_CALL docount() { bbcount++; }
VOID Trace(TRACE trace, VOID *v) {
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) {
BBL_InsertCall(bbl, IPOINT_ANYWHERE, AFUNPTR(docount),
IARG_FAST_ANALYSIS_CALL, IARG_END);
}
}
int main(int argc, CHAR *argv[]) {
PIN_InitSymbols();
PIN_Init(argc, argv);
TRACE_AddInstrumentFunction(Trace, 0);
PIN_StartProgram();
return 0;
}
DynamoRIO Tutorial August 2014
156
Simple BBCount DynamoRIO Tool
static int bbcount;
static void docount() { bbcount++; }
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
bool for_trace, bool translating) {
dr_insert_clean_call(drcontext, bb, instrlist_first(bb), docount, false, 0);
return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
dr_register_bb_event(event_basic_block);
}
DynamoRIO Tutorial August 2014
157
BBCount Performance Comparison: Simple Tool
DynamoRIO Tutorial August 2014
158
BBCount Performance Comparison: Simple Tool
DynamoRIO Tutorial August 2014
159
Optimized BBCount DynamoRIO Tool
static int global_count;
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
bool for_trace, bool translating) {
instr_t *instr, *first = instrlist_first(bb);
uint flags;
/* Our inc can go anywhere, so find a spot where flags are dead.
* Technically this can be unsafe if app reads flags on fault =>
* stop at instr that can fault, or supply runtime op */
for (instr = first; instr != NULL; instr = instr_get_next(instr)) {
flags = instr_get_arith_flags(instr);
/* OP_inc doesn't write CF but not worth distinguishing */
if (TESTALL(EFLAGS_WRITE_6, flags) && !TESTANY(EFLAGS_READ_6, flags))
break;
}
if (instr == NULL)
dr_save_arith_flags(drcontext, bb, first, SPILL_SLOT_1);
instrlist_tool_preinsert(bb, (instr == NULL) ? first : instr,
INSTR_CREATE_inc(drcontext, OPND_CREATE_ABSMEM((byte *)&global_count, OPSZ_4)));
if (instr == NULL)
dr_restore_arith_flags(drcontext, bb, first, SPILL_SLOT_1);
return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
dr_register_bb_event(event_basic_block);
}
DynamoRIO Tutorial August 2014
161
BBCount Performance Comparison: Opt Tool
DynamoRIO Tutorial August 2014
162
BBCount Performance Comparison: Opt Tool
DynamoRIO Tutorial August 2014
163
DynamoRIO API Outline
Building and Deploying
Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting
DynamoRIO Tutorial August 2014
165
Obtaining Help
Read the documentation
[Link]
Look at the sample clients
In the documentation
In the release package: samples/
Ask on the DynamoRIO Users discussion forum/mailing list
[Link]
DynamoRIO Tutorial August 2014
166
Debugging Clients
Use the DynamoRIO debug build for asserts
Often point out the problem
Use logging
-loglevel N
stored in logs/ subdir of DR install dir
Attach a debugger
gdb or windbg
-msgbox_mask 0xN
-no_hide
windbg: .reload [Link]=0xN
More tips:
[Link]
DynamoRIO Tutorial August 2014
167
Reporting Bugs
Search the Issue Tracker off [Link] first
[Link]
File a new Issue if not found
Follow conventions on wiki
[Link]
CRASH, APP CRASH, HANG, ASSERT
Example titles:
CRASH (1.3.1 [Link])
vm_area_add_fragment:vmareas.c(4466)
ASSERT (1.3.0 suite/tests/common/segfault)
study_hashtable:fragment.c:1745 ASSERT_NOT_REACHED
DynamoRIO Tutorial August 2014
168
Examples, Part 2
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00
Welcome + DynamoRIO History
DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback
Feedback
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00
Welcome + DynamoRIO History
DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback
Optional Slides:
Advanced Code
Cache Topics
Overview Outline
Efficient
Software code cache overview
Thread-shared code cache
Cache capacity limits
Data structures
Transparent
Comprehensive
Customizable
DynamoRIO Tutorial August 2014
172
Added Memory Breakdown
DynamoRIO Tutorial August 2014
173
Code Expansion
exit stubs
19%
indirect branch target
handling
7%
net jumps
8%
DynamoRIO Tutorial August 2014
original code
66%
174
Cache Capacity Challenges
How to set an upper limit on the cache size
Different applications have different working sets and different
total code sizes
Which fragments to evict when that limit is reached
Without expensive profiling or extensive fragmentation
DynamoRIO Tutorial August 2014
175
Adaptive Sizing Algorithm
Enlarge cache if warranted by
percentage of new fragments that are
regenerated
Target working set of application: dont
enlarge for once-only code
Low-overhead, incremental, and
reactive
DynamoRIO Tutorial August 2014
176
Cache Capacity Settings
Thread-private:
Working set size matching is on by default
Client may see blocks or traces being deleted in the absence of
any cache consistency event
Can disable capacity management via
-no_finite_bb_cache
-no_finite_trace_cache
Thread-shared:
Set to infinite size by default
Can enable capacity management via
-finite_shared_bb_cache
-finite_shared_trace_cache
Reset triggered when hit up-front reservation
DynamoRIO Tutorial August 2014
177
Overview Outline
Efficient
Software code cache overview
Thread-shared code cache
Cache capacity limits
Data structures
Transparent
Comprehensive
Customizable
DynamoRIO Tutorial August 2014
178
Two Modes of Code Cache Operation
Fine-grained scheme
Supports individual code fragment unlink and removal
Separate data structure per code fragment and each of its exits,
memory regions spanned, and incoming links
Coarse-grained scheme
No individual code fragment control
Permanent intra-cache links
No per-fragment data structures at all
Treat entire cache as a unit for consistency
DynamoRIO Tutorial August 2014
179
Data Structures
Fine-grained scheme
Data structures are highly tuned and compact
Coarse-grained scheme
There are no data structures
Savings on applications with large amounts of code are typically
15%-25% of committed memory and 5%-15% of working set
DynamoRIO Tutorial August 2014
180
Status in Current Release
Fine-grained scheme
Current default
Coarse-grained scheme
Select with opt_memory runtime option
Possible performance hit on certain benchmarks
In the future will be the default option
Required for persisted and process-shared caches
DynamoRIO Tutorial August 2014
181
Adaptive Level of Granularity
Start with coarse-grain caches
Plus freezing and sharing/persisting
Switch to fine-grain for individual modules or sub-regions of
modules after significant consistency events, to avoid
expensive entire-module flushes
Support simultaneous fine-grain fragments within coarse-grain
regions for corner cases
Match amount of bookkeeping to amount of code change
Majority of application code does not need fine-grain
DynamoRIO Tutorial August 2014
182
Many Varieties of Code Caches
Coarse-grained versus fine-grained
Thread-shared versus thread-private
Basic blocks versus traces
DynamoRIO Tutorial August 2014
183
Optional Slides:
Dr. Memory
Dr. Memory
Detects reads of uninitialized memory
Detects heap errors
Out-of-bounds accesses (underflow, overflow)
Access to freed memory
Invalid frees
Memory leaks
Detects other accesses to invalid memory
Stack tracking
Thread-local storage slot tracking
Operates at runtime on unmodified Windows & Linux binaries
DynamoRIO Tutorial August 2014
185
Dr. Memory Instrumentation
Monitor all memory accesses, stack adjustments, and heap
allocations
Shadow each byte of app memory
Each bytes shadow stores one of 4 values:
Unaddressable
Uninitialized
Defined at byte level
Defined at bit level escape to extra per-bit shadow values
DynamoRIO Tutorial August 2014
186
Dr. Memory
Stack
Shadow Stack
defined
Heap
redzone
undefined
defined
Shadow Heap
invalid
defined
malloc
undefined
defined
invalid
DynamoRIO Tutorial August 2014
redzone
invalid
freed
invalid
187
Partial-Word Defines But Whole-Word Transfers
Sub-dword variables are moved around as whole dwords
Cannot raise error when a move reads uninitialized bits
Must propagate on moves and thus must shadow registers
Propagate shadow values by mirroring app data flow
Check system call reads and propagate system call writes
Else, false negatives (reads) or positives (writes)
Raise errors instead of propagating at certain points
Report errors only on significant reads
DynamoRIO Tutorial August 2014
188
Shadowing Registers
Use multiple TLS slots
dr_raw_tls_calloc()
Alternative: steal register
Can read and write w/o spilling
Bring into spilled register to combine w/ other args
Defined=0, uninitialized=1
Combine via bitwise or
DynamoRIO Tutorial August 2014
189
Monitoring Stack Changes
As stack is extended and contracts again, must update stack
shadow as unaddressable vs uninitialized
Push, pop, or any write to stack pointer
Try to distinguish large alloc/dealloc from stack swap
DynamoRIO Tutorial August 2014
190
Kernel-Mediated Stack Changes
Kernel places data on the stack and removes it again
Windows: APC, callback, and exception
Linux: signals
Linux signals as an example:
intercept sigaltstack changes
intercept handler registration to instrument handler code
use DR's signal event to record app xsp at interruption point
when see event followed by handler, check which stack and
mark from either interrupted xsp or altstack base to cur xsp as
defined (ignoring padding)
record cur xsp in handler, and use to undo on sigreturn
DynamoRIO Tutorial August 2014
191
Types Of Instrumentation
Clean call
Simplest, but expensive in both time and space: full context
switch from application state to tool state with separate stack to
execute C code
Shared clean call
Saves space
Lean procedure
Shared routine with smaller context switch than full clean call
Jump-and-link rather than swapping stack
Array of routines, one per pair of dead registers
Inlined
Smallest context switch, but should limit to small sequences of
instrumentation
DynamoRIO Tutorial August 2014
192
Non-Code-Cache Code
Use dr_nonheap_alloc() to allocate space to store code
Generate code using DRs IR and emit to target space
Mark read-only once emitted via dr_memory_protect()
DynamoRIO Tutorial August 2014
193
Jump-and-Link
Rather than using call+return, avoid stack swap cost by using
jump-and-link
Store return address in a register or TLS slot
Direct jump to target
Indirect jump back to source
PRE(bb, inst, INSTR_CREATE_mov_st(drcontext,
spill_slot_opnd(drcontext, SPILL_SLOT_2),
opnd_create_instr(appinst)));
PRE(bb, inst, INSTR_CREATE_jmp(drcontext,
opnd_create_pc(shared_slowpath_region)));
...
PRE(ilist, NULL, INSTR_CREATE_jmp_ind(drcontext,
spill_slot_opnd(SPILL_SLOT_2)));
DynamoRIO Tutorial August 2014
194
Inter-Instruction Storage
Spill slots provided by DR are only guaranteed to be live
during a single app instr
In practice, live until next selfmod instr
Allocate own TLS for spill slots
dr_raw_tls_calloc()
Steal registers across whole bb
Restore before each app read
Update spill slot after each app write
Restore on fault
DynamoRIO Tutorial August 2014
195
Using Faults For Faster Common Case Code
Instead of explicitly checking for rare cases, use faults to
handle them and keep common case code path fast
Signal and exception event and restore state extended event
all provide pre- and post-translation contexts and containing
fragment information
Client can return failure for extended restore state event
When can support re-execution of faulting cache instr, but not restart translation for relocation
DynamoRIO Tutorial August 2014
196
Address Space Iteration
Repeated calls to dr_query_memory_ex()
Check dr_memory_is_in_client() and
dr_memory_is_dr_internal()
Heap walk
API on Windows
Initial structures on Windows
TEB, TLS, etc.
PEB, ProcessParameters, etc.
DynamoRIO Tutorial August 2014
197
Intercepting Library Routines
Common task
Dr. Memory monitors malloc, calloc, realloc, free,
malloc_usable_size, etc.
Alternative is to replace w/ own copies
Locating entry point
Module API
Pre-hooks are easy
Post-hooks are hard
Three techniques, each with its own limitations
See paper in CGO 2011
drwrap Extension now provides function wrapping
DynamoRIO Tutorial August 2014
198
Replacing Library Routines
Dr. Memory replaces libc routines containing optimized code
that raises false positives
memcpy, strlen, strchr, etc.
Simplification: arrange for routines to always be entered in a
new bb
Do not request elision or indcall2direct from DR
Want to interpret replaced routines
DR treats native execution differently: aborts on fault, etc.
Replace entire bb with jump to replacement routine
drwrap Extension now provides function replacement
DynamoRIO Tutorial August 2014
199
Delayed Fragment Deletion
Due to non-precise flushing we can have a flushed bb made
inaccessible but not actually freed for some time
When keeping state per bb, if a duplicate bb is seen, replace
the state and increment a counter ignore_next_delete
On a deletion event, decrement and ignore unless below 0
Can't tell apart from duplication due to thread-private copies:
but this mechanism handles that if saved info is deterministic
and identical for each copy
DynamoRIO Tutorial August 2014
200
Callstack Walking
Use case: error reporting
Technique:
Start with xbp as frame pointr (fp)
Look for <fp,retaddr> pairs where retaddr = inside a module
Interesting issues:
When scanning for frame pointer (in frameless func, or at bottom
of stack), querying whether in a module dominates performance
msvcr80!malloc pushes ebx and then ebp, requiring special
handling
When displaying, use retaddr-1 for symbol lookup
More sophisticated techniques needed in presence of FPO
DynamoRIO Tutorial August 2014
201
Suspending The World
Use case: Dr. Memory leak check
GC-like memory scan
Use dr_suspend_all_other_threads() and
dr_resume_all_other_threads()
Cannot hold locks while suspending
DynamoRIO Tutorial August 2014
202
Using Nudges
Daemon apps do not exit
Request results mid-run
Cross-platform
Signal on Linux
Remote thread on Windows
DynamoRIO Tutorial August 2014
203
Tool Packaging
DynamoRIO is redistributable, so you can include a copy with
your tool
drrun supports the t option via a tool configuration file
drrun t drcov -- <app cmdline>
Custom front end to configure and launch app
We provide several libraries for building tool front ends:
drconfiglib
drinjectlib
drfrontendlib
DynamoRIO Tutorial August 2014
204