0% found this document useful (0 votes)
774 views201 pages

DynamoRIO Dynamic Instrumentation Guide

This document provides an overview of the DynamoRIO dynamic instrumentation tool. It begins with a brief history of DynamoRIO and its predecessors. The remainder of the document outlines DynamoRIO's tutorial, which includes an overview of DynamoRIO and how it works, examples of its usage, and a discussion of its API. It also touches on key aspects of DynamoRIO such as its efficient software code caching approach, support for transparency, comprehensive control over application code, and customizability.

Uploaded by

kurapix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
774 views201 pages

DynamoRIO Dynamic Instrumentation Guide

This document provides an overview of the DynamoRIO dynamic instrumentation tool. It begins with a brief history of DynamoRIO and its predecessors. The remainder of the document outlines DynamoRIO's tutorial, which includes an overview of DynamoRIO and how it works, examples of its usage, and a discussion of its API. It also touches on key aspects of DynamoRIO such as its efficient software code caching approach, support for transparency, comprehensive control over application code, and customizability.

Uploaded by

kurapix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Building Dynamic

Instrumentation Tools
with
DynamoRIO
Derek Bruening and Qin Zhao
Google

Tutorial Outline

1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00

Welcome + DynamoRIO History


DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback

DynamoRIO Tutorial August 2014

DynamoRIO History
Dynamo
HP Labs: PA-RISC late 1990s
x86 Dynamo: 2000

RIO  DynamoRIO
MIT: 2001-2004

Prior releases

0.9.1: Jun 2002 (PLDI tutorial)


0.9.2: Oct 2002 (ASPLOS tutorial)
0.9.3: Mar 2003 (CGO tutorial)
0.9.4: Feb 2005
0.9.5: Apr 2008 (CGO tutorial)
1.0 (0.9.6): Sep 2008 ([Link] launch)

DynamoRIO Tutorial August 2014

DynamoRIO History
Determina
2003-2007
Security company

VMware
Acquired Determina (and DynamoRIO) in 2007

Open-source BSD license

Feb 2009: 1.3.1 release

Dec 2013: 4.2.0 release


Aug 2014: 5.0.0 release

DynamoRIO Tutorial August 2014

DynamoRIO Overview
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00

Welcome + DynamoRIO History


DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback

Typical Modern Application: IIS

DynamoRIO Tutorial August 2014

System Virtualization

thread

thread

process
thread

process
thread
thread
thread
thread

thread

process
thread

thread

thread

thread

process

operating system

DynamoRIO Tutorial August 2014

Process Virtualization
process

BC

D
E

process

process
thread

BC

thread

thread

thread

thread

thread

process

thread

thread
thread
thread
thread

thread

operating system

DynamoRIO Tutorial August 2014

Design Goals
Efficient
Near-native performance

Transparent
Match native behavior

Comprehensive
Control every instruction, in any application

Customizable
Adapt to satisfy disparate tool needs

DynamoRIO Tutorial August 2014

Challenges of Real-World Apps


Multiple threads
Synchronization

Application introspection
Reading of return address

Transparency corner cases are the norm


Example: access beyond top of stack

Scalability
Must adapt to varying code sizes, thread counts, etc.

Dynamically generated code


Performance challenges

DynamoRIO Tutorial August 2014

10

Overview Outline
Efficient
Software code cache overview
Thread-shared code cache

Transparent
Comprehensive
Customizable

DynamoRIO Tutorial August 2014

11

Direct Code Modification


e9 37 6f 48 92

jmp <callout>

Kernel32!TerminateProcess:
7d4d1028 7c 05

jl

7d4d102f

7d4d102a 33 c0

xor

%eax,%eax

7d4d102c 40

inc

%eax

7d4d102d eb 08

jmp

7d4d1037

7d4d102f 50

push %eax

7d4d1030 e8 ed 7c 00 00

call 7d4d8d22

DynamoRIO Tutorial August 2014

12

Debugger Trap Too Expensive


cc

int3 (breakpoint)

Kernel32!TerminateProcess:
7d4d1028 7c 05

jl

7d4d102f

7d4d102a 33 c0

xor

%eax,%eax

7d4d102c 40

inc

%eax

7d4d102d eb 08

jmp

7d4d1037

7d4d102f 50

push %eax

7d4d1030 e8 ed 7c 00 00

call 7d4d8d22

DynamoRIO Tutorial August 2014

13

We Need Indirection
Avoid transparency and granularity limitations of directly
modifying application code
Allow arbitrary modifications at unrestricted points in code
stream
Allow systematic, fine-grained modifications to code stream
Guarantee that all code is observed

DynamoRIO Tutorial August 2014

15

Basic Interpreter
application code
foo()

bar()

interpreter
C

fetch

decode

execute

D
E

~300x Slowdown!
F

DynamoRIO Tutorial August 2014

16

Improvement #1: Interpreter + Basic Block Cache


application code
foo()

basic
block
cache

bar()

A
C

DynamoRIO

C
D

E
F

E
F

Slowdown: 300x 25x


DynamoRIO Tutorial August 2014

17

Example Basic Block Fragment


frag7: add

add

%eax, %ecx

cmp

$4, %eax

cmp

$4, %eax

jle

$0x40106f

jle

<stub0>

jmp

<stub1>

dstub0
target: 0x40106f

dstub1
target: fall-thru

DynamoRIO Tutorial August 2014

stub0: mov

%eax, %ecx

%eax, eax-slot

mov

&dstub0, %eax

jmp

context_switch

stub1: mov

%eax, eax-slot

mov

&dstub1, %eax

jmp

context_switch

18

Improvement #2: Linking Direct Branches


application code
foo()

basic
block
cache

bar()

A
C

DynamoRIO

C
D

E
F

E
F

Slowdown: 300x 25x 3x


DynamoRIO Tutorial August 2014

19

Direct Linking
frag7: add

add

%eax, %ecx

cmp

$4, %eax

cmp

$4, %eax

jle

$0x40106f

jle

<frag8>

jmp

<stub1>

stub0: mov

%eax, %ecx

%eax, eax-slot

dstub0
target: 0x40106f

dstub1
target: fall-thru

DynamoRIO Tutorial August 2014

mov

&dstub0, %eax

jmp

context_switch

stub1: mov

%eax, eax-slot

mov

&dstub1, %eax

jmp

context_switch

20

Improvement #3: Linking Indirect Branches


application code
foo()

basic
block
cache

bar()

A
C

DynamoRIO

C
D

E
F

E
F

indirect
branch
lookup

Slowdown: 300x 25x 3x 1.2x


DynamoRIO Tutorial August 2014

21

Indirect Branch Transformation

frag8: mov %ecx, ecx-slot


ret

pop %ecx
jmp

ib_lookup:

<ib_lookup>

...
...
...

DynamoRIO Tutorial August 2014

22

Improvement #4: Trace Building


basic block cache
A

trace cache
A
B

E
F

G
K
J

H
D

Traces reduce branching, improve layout and locality, and


facilitate optimizations across blocks
We avoid indirect branch lookup

Next Executing Tail (NET) trace building scheme [Duesterwald


2000]

DynamoRIO Tutorial August 2014

23

Incremental NET Trace Building


basic block cache
G

trace cache
G

G
K

J
K

DynamoRIO Tutorial August 2014

G
K
J

24

Improvement #4: Trace Building


application code
foo()

basic
block
cache

bar()

A
C

trace
cache

DynamoRIO

ind. br.
stays
on
trace?

A
C
D

E
F

E
F

indirect
branch
lookup

?
F

Slowdown: 300x 25x 3x 1.2x 1.1x


DynamoRIO Tutorial August 2014

25

Base Performance

DynamoRIO Tutorial August 2014

SPEC CPU2000

Server

Desktop
26

Base Performance: SPEC 2006

DynamoRIO Tutorial August 2014

27

Sources of Overhead
Extra instructions
Indirect branch target comparisons
Indirect branch hashtable lookups

Extra data cache pressure


Indirect branch hashtable

Branch mispredictions
ret becomes jmp*

Application code modification

DynamoRIO Tutorial August 2014

28

Time Breakdown for SPEC CPU INT


application code
foo()

bar()

< 1%

trace
cache

A
C

basic
block
cache

DynamoRIO

C
D

E
F

E
F

ind. br.
stays
on
trace?

A
C

2%

indirect
branch
lookup

?
F

4%
0%
DynamoRIO Tutorial August 2014

94%
29

Not An Ordinary Application


An application executing in DynamoRIOs code cache looks
different from what the underlying hardware has been tuned
for
The hardware expects:
Little or no dynamic code modification
Writes to code are expensive

call and ret instructions


Return Stack Buffer predictor

DynamoRIO Tutorial August 2014

30

Performance Counter Data

DynamoRIO Tutorial August 2014

31

Overview Outline
Efficient
Software code cache overview
Thread-shared code cache

Transparent
Comprehensive
Customizable

DynamoRIO Tutorial August 2014

32

Threading Model
Running Program
Thread1

Thread2

Thread3

ThreadN

Code Caching Runtime System


Thread1

Thread2

Thread3

ThreadN

ThreadN

Operating System
Thread1

DynamoRIO Tutorial August 2014

Thread2

Thread3

33

Code Space
Running Program
Thread

Thread

Thread-Private Code Caches


Thread

Thread

Thread

Thread

Thread-Shared Code Cache


Thread

Thread

Operating System
Thread1

Thread2

DynamoRIO Tutorial August 2014

Thread1

Thread2

34

Thread-Private versus Thread-Shared


Thread-private
Less synchronization needed
Absolute addressing for thread-local storage
Thread-specific optimization and instrumentation

Thread-shared
Scales to many-threaded apps

DynamoRIO Tutorial August 2014

35

Database and Web Server Suite


Benchmark

Server

Processes

ab low

IIS low isolation

[Link]

ab med

IIS medium isolation

[Link], [Link]

guest low

IIS low isolation,


SQL Server 2000

[Link],
[Link]

guest med

IIS medium isolation, SQL


Server 2000

[Link],
[Link],
[Link]

DynamoRIO Tutorial August 2014

36

Memory Impact

ab med
DynamoRIO Tutorial August 2014

guest low

guest med
37

Performance Impact

DynamoRIO Tutorial August 2014

38

Scalability Limit

DynamoRIO Tutorial August 2014

39

Overview Outline
Efficient
Transparent
Transparency principles
Cache consistency
Synchronization

Comprehensive
Customizable

DynamoRIO Tutorial August 2014

40

Unavoidably Intrusive
processprocess

BC

BC

D
E

DynamoRIO

process
thread

thread

thread

thread

thread

process

thread

thread

thread

operating system

DynamoRIO Tutorial August 2014

41

Transparency
Do not want to interfere with the semantics of the program
Dangerous to make any assumptions about:

Register usage
Calling conventions
Stack layout
Memory/heap usage
I/O and other system call use

DynamoRIO Tutorial August 2014

42

Painful, But Necessary


Difficult and costly to handle corner cases
Many applications will not notice
but some will!
Microsoft Office: Visual Basic generated code, stack convention
violations
COM, Star Office, MMC: trampolines
Adobe Premiere: self-modifying code
VirtualDub: UPX-packed executable
etc.

DynamoRIO Tutorial August 2014

43

Transparency Principles
Principle 1: As few changes as possible
Set a high bar for value before changing the native environment

Principle 2: Hide necessary changes


Whatever is valuable enough to change must be hidden
Changes that cannot be hidden should not be made

Principle 3: Separate resources


Avoid intra-process resource conflicts

Bruening et al. Transparent Dynamic Instrumentation VEE12

DynamoRIO Tutorial August 2014

44

Principle 1: As few changes as possible

Application code
Executable on disk
Stored addresses
Threads
Application data
Including the stack!

DynamoRIO Tutorial August 2014

45

Error
Error
Error
Error
Error
Error
Error
Error
Error
Error

Return Address Transparency

SPEC CPU2000
DynamoRIO Tutorial August 2014

Server

Desktop
46

Principle 2: Hide necessary changes

Application addresses
Address space
Error transparency
Code cache consistency

DynamoRIO Tutorial August 2014

47

Principle 3: Separate resources

Linux
DynamoRIO Tutorial August 2014

Windows
48

Arbitrary Interleaving
application code

basic
block
cache

malloc()
A
C

call malloc()

DynamoRIO

trace
cache

D
E
F

thread-safe

indirect
branch
lookup

re-entrant!
DynamoRIO Tutorial August 2014

49

Transparency Landscape
Principle 1:
As few changes
as possible
Code

Data

Concurrency

Principle 2:
Hide necessary
changes

Principle 3:
Separate
resources

application code, machine context,


stored addresses cache consistency
stack, heap,
registers,
condition flags

separate stack,
heap, context, i/o

threads,
memory ordering

disjoint locks

Other

DynamoRIO Tutorial August 2014

preserve errors

50

Overview Outline
Efficient
Transparent
Transparency principles
Cache consistency
Synchronization

Comprehensive
Customizable

DynamoRIO Tutorial August 2014

51

Code Change Mechanisms

RISC

x86

I-Cache

D-Cache

I-Cache

D-Cache

A:
B:
C:
D:

A:
B:
C:
D:

A:
B:
C:
D:

A:
B:
C:
D:

Store B
Flush B
Jump B

DynamoRIO Tutorial August 2014

Store B
Jump B

52

How Often Does Code Change?


Not just modification of code!
Removal of code
Shared library unloading

Replacement of code
JIT region re-use
Trampoline on stack

DynamoRIO Tutorial August 2014

53

Code Change Events


Memory
Unmappings

Generated Code
Regions

Modified Code
Regions

SPECFP

112

SPECINT

29

SPECJVM

3373

4591

144

21

20

Photoshop

1168

40

Powerpoint

367

28

33

Word

345

20

Excel

DynamoRIO Tutorial August 2014

54

Detecting Code Removal


Example: shared library being unloaded
Requires explicit request by application to operating system
Detect by monitoring system calls (munmap,
NtUnmapViewOfSection)

DynamoRIO Tutorial August 2014

55

Detecting Code Modification


On x86, no explicit app
request required, as the
icache is kept consistent in
hardware so any memory
write could modify code!

x86
I-Cache

D-Cache

A:
B:
C:
D:

A:
B:
C:
D:

Store B
Jump B

DynamoRIO Tutorial August 2014

56

Page Protection Plus Instrumentation


Invariant: application code copied to code cache must be
read-only
If writable, hide read-only status from application

Some code cannot or should not be made read-only


Self-modifying code
Windows stack
Code on a page with frequently written data

Use per-fragment instrumentation to ensure code is not stale


on entry and to catch self-modification

DynamoRIO Tutorial August 2014

57

Adaptive Consistency Algorithm


Use page protection by default
Most code regions are always read-only

Subdivide written-to regions to reduce flushing cost of writeexecute cycle


Large read-only regions, small written-to regions

Switch to instrumentation if write-execute cycle repeats too


often (or on same page)
Switch back to page protection if writes decrease

Bruening et al. Maintaining Consistency and Bounding Capacity


of Software Code Caches CGO05

DynamoRIO Tutorial August 2014

58

Overview Outline
Efficient
Transparent
Transparency principles
Cache consistency
Synchronization

Comprehensive
Customizable

DynamoRIO Tutorial August 2014

59

Synchronization Transparency
Application thread management should not interfere with the
runtime system, and vice versa
Cannot allow the app to suspend a thread holding a runtime
system lock
Runtime system cannot use app locks

DynamoRIO Tutorial August 2014

60

Disjoint Locks
App thread suspension requires safe spots where no runtime
system locks are held
Time spent in the code cache can be unbounded
 Our invariant: no runtime system lock can be held while
executing in the code cache

DynamoRIO Tutorial August 2014

61

Overview Outline

Efficient
Transparent
Comprehensive
Customizable

DynamoRIO Tutorial August 2014

62

Above the Operating System


processprocess

BC

BC

D
E

DynamoRIO
+ client
E

process
thread

thread

thread

thread

thread

process

thread

thread

thread

operating system

DynamoRIO Tutorial August 2014

63

Kernel-Mediated Control Transfers


user mode

kernel mode

majority of
executed
code in a
typical
Windows
application

time

message pending
save user context

message handler

no message pending
restore context

DynamoRIO Tutorial August 2014

64

Intercepting Linux Signals


user mode

kernel mode

signal pending
save user context

register our
own signal
handler

time

DynamoRIO handler

signal handler

no signal pending
restore context

DynamoRIO Tutorial August 2014

65

Windows Messages
user mode

kernel mode

time

message pending
save user context

dispatcher
message handler

no message pending
restore context

DynamoRIO Tutorial August 2014

66

Intercepting Windows Messages


user mode

kernel mode

message pending
save user context

modify
shared library
memory image

time

dispatcher

dispatcher
message handler

no message pending
restore context

DynamoRIO Tutorial August 2014

67

Must Monitor System Calls


To maintain control:
Calls that affect the flow of control: register signal handler, create
thread, set thread context, etc.

To maintain transparency:
Queries of modified state app should not see

To maintain cache consistency:


Calls that affect the address space

To support cache eviction:


Interruptible system calls must be redirected

DynamoRIO Tutorial August 2014

68

Operating System Dependencies


System calls and their numbers
Monitor applications usage, as well as for our own resource
management
Windows changes the numbers each major rel

Details of kernel-mediated control flow


Must emulate how kernel delivers events

Initial injection
Once in, follow child processes

DynamoRIO Tutorial August 2014

69

Overview Outline

Efficient
Transparent
Comprehensive
Customizable
Clients
Building and Deploying Tools

DynamoRIO Tutorial August 2014

70

DynamoRIO + Client
application code
client code
foo()

bar()

basic
block
cache

trace
cache

A
C

Tool

DynamoRIO

C
A
D

C
D

E
F

DynamoRIO Tutorial August 2014

E
F

indirect
branch
lookup

?
F

71

Demo

Clients
The engine exports an API for
building a client
System details abstracted away:
client focuses on manipulating
the code stream

DynamoRIO Tutorial August 2014

73

Provided Tools
In addition to numerous sample clients, DynamoRIO ships
with the following polished end-user tools:
Dr. Memory (drmemory): identifies memory errors (use-afterfrees, buffer overflows, uninitialized reads, memory leaks, etc.)
Dr. Cov (drcov): code coverage tool
Dr. Strace (drstrace): system call tracer for Windows
Dr. Ltrace (drltrace): library call tracer

Run these tools via drrun t <toolname>


E.g.: bin32/drrun t drmemory -- <app cmdline>

DynamoRIO Tutorial August 2014

74

Cross-Platform Clients
DynamoRIO API presents a consistent interface that works
across platforms
Windows versus Linux
32-bit versus 64-bit
Thread-private versus thread-shared

Same client source code generally works on all combinations


of platforms
Some exceptions, noted in the documentation

DynamoRIO Tutorial August 2014

75

Building a Client

Include DR API header file


#include dr_api.h

Set platform defines


WINDOWS or LINUX
X86_32 or X86_64

Export a dr_init function

DR_EXPORT void dr_init (client_id_t client_id)

Build a shared library

DynamoRIO Tutorial August 2014

76

Auto-Configure Using CMake


add_library(myclient SHARED myclient.c)
find_package(DynamoRIO)
if (NOT DynamoRIO_FOUND)
message(FATAL_ERROR "DynamoRIO package
required to build")
endif(NOT DynamoRIO_FOUND)
configure_DynamoRIO_client(myclient)

DynamoRIO Tutorial August 2014

77

CMake
Build system converted to CMake when open-sourced
Switch from frozen toolchain to supporting range of tools

CMake generates build files for native compiler of choice


Makefiles for UNIX, nmake, etc.
Visual Studio project files

[Link]

DynamoRIO Tutorial August 2014

78

DynamoRIO Extensions

DynamoRIO API is extended via libraries called Extensions


Both static and shared supported
Built and packaged with DynamoRIO
Easy for a client to use
use_DynamoRIO_extension(myclient extensionname)

DynamoRIO Tutorial August 2014

79

Operating System Dependencies


System calls and their numbers
Monitor applications usage, as well as for our own resource
management
Windows changes the numbers each major rel

Details of kernel-mediated control flow


Must emulate how kernel delivers events

Initial injection
Once in, follow child processes

DynamoRIO Tutorial August 2014

80

DynamoRIO Extensions, Contd


Current Extensions:

drsyms: symbol lookup (currently Windows-only)


drcontainers: hashtable
drmgr: multi-instrumentation mediation
drwrap: function wrapping and replacing
drutil: memory tracing, string loop expansion
drx: multi-process management, misc utilities
drsyscall: system call names, numbers, parameter types

Coming soon:
drreg: register stealing and allocating
Umbra: shadow memory framework
Your utility library or framework contribution!

DynamoRIO Tutorial August 2014

81

Application Configuration
File-based scheme
Per-user local files
$HOME/.dynamorio/ on Linux
$USERPROFILE/dynamorio/ on Windows

Global files
/etc/dynamorio/ on Linux
Registry-specified directory on Windows

Files are lists of var=value

DynamoRIO Tutorial August 2014

82

Deploying Clients
One-step configure-and-run usage model:
drrun c <client> <client options> -- <app cmdline>
Uses an invisible temporary first-priority one-time config file

Two-step usage model giving more control over children:


drconfig reg <appname> -c <client> <client options>
drinject <app cmdline>

Systemwide injection:
drconfig syswide_on reg <appname> -c <client> <options>
<run app normally>

Polished tool invocation:


drrun t <tool> <tool options> -- <app cmdline>
Uses a pre-existing tool config file
DynamoRIO Tutorial August 2014

83

Deploying Clients On Linux


drrun and drinject scripts: LD_PRELOAD-based
Take over after statically-dependent shared libs but before exe

Suid apps ignore LD_PRELOAD


Place [Link]'s full path in /etc/[Link]
Copy [Link] to /usr/lib

In the future:
Attach
Earliest injection

DynamoRIO Tutorial August 2014

84

Deploying Clients On Windows


drinject and drrun injection
Currently after all shared libs are initialized

From-parent injection
Early: before any shared libs are loaded

Systemwide injection via syswide_on


Requires administrative privileges
Launch app normally: no need to run via drinject/drrun
Moderately early: during [Link] initialization

In the future:
Earliest injection for drrun/drinject and from-parent

DynamoRIO Tutorial August 2014

85

Following Child Processes


Runtime option follow_children
Default on: follow all children

Whitelist
-no_follow_children and configure files for whitelist

Blacklist
-follow_children and configure files norun for blacklist
drconfig -norun to create do-not-follow config file

DynamoRIO Tutorial August 2014

86

Non-Standard Deployment
drdecode
Static IA-32/AMD64 decoding/encoding/instruction manipulation
library

Standalone API
Use DynamoRIO as a library of IA-32/AMD64 manipulation
routines plus cross-platform file i/o, locks, etc.

Start/Stop API
Can instrument source code with where DynamoRIO should
control the application

DynamoRIO Tutorial August 2014

87

Runtime Options
Pass options to drconfig/drrun
A large number of options; the most relevant are:

-code_api
-c <client lib> <client options>
-thread_private
-follow_children
-opt_cleancall
-tracedump_text and tracedump_binary
-prof_pcs

DynamoRIO Tutorial August 2014

88

Runtime Options For Debugging


Notifications:
-stderr_mask 0xN
-msgbox_mask 0xN

Windows:
-no_hide

Debug-build-only:
-loglevel N
-ignore_assert_list *

DynamoRIO Tutorial August 2014

89

Examples, Part 1
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00

Welcome + DynamoRIO History


DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback

DynamoRIO API
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00

Welcome + DynamoRIO History


DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback

DynamoRIO API Outline

Building and Deploying


Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting

DynamoRIO Tutorial August 2014

92

DynamoRIO + Client
application code
client code
foo()

bar()

basic
block
cache

trace
cache

A
C

Tool

DynamoRIO

C
A
D

C
D

E
F

DynamoRIO Tutorial August 2014

E
F

indirect
branch
lookup

?
F

93

Transformation Time vs Execution Time

application code

average
instruction
length
transformation
time

client code
foo()

bar()

A
C

DynamoRIO

basic
block
cache

trace
cache

call
instruction
execution
A
count
execution time
C
A
C

F
94

indirect
branch
lookup

?
F

Client Events: Code Stream


Client has opportunity to inspect and potentially modify every
single application instruction, immediately before it executes
Event happens at transformation time
Modifications or inserted code will operate at execution time

Entire application code stream


Basic block creation event: can modify the block
For comprehensive instrumentation tools

Or, focus on hot code only


Trace creation event: can modify the trace
Custom trace creation: can determine trace end condition
For optimization and profiling tools

DynamoRIO Tutorial August 2014

95

Simplifying Client View


Several optimizations disabled

Elision of unconditional branches


Indirect call to direct call conversion
Shared cache sizing
Process-shared and persistent code caches

Future release will give client control over optimizations

DynamoRIO Tutorial August 2014

96

Basic Block Event


static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag,
instrlist_t *bb, bool for_trace,
bool translating) {
instr_t *inst;
for (inst = instrlist_first(bb);
inst != NULL;
inst = instr_get_next(inst)) {
/* */
}
return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
dr_register_bb_event(event_basic_block);
}

DynamoRIO Tutorial August 2014

97

Trace Event
static dr_emit_flags_t
event_trace(void *drcontext, void *tag,
instrlist_t *trace, bool translating) {
instr_t *inst;
for (inst = instrlist_first(trace);
inst != NULL;
inst = instr_get_next(inst)) {
/* */
}
return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
dr_register_trace_event(event_trace);
}

DynamoRIO Tutorial August 2014

98

Client Events: Application Actions


Application thread creation and deletion
Application library load and unload
Application exception (Windows)
Client chooses whether to deliver or suppress

Application signal (Linux)


Client chooses whether to deliver, suppress, bypass the app
handler, or redirect control

DynamoRIO Tutorial August 2014

99

Client Events: Application System Calls


Application pre- and post- system call
Platform-independent system call parameter access
Client can modify:
Return value in post-, or set value and skip syscall in pre Call number
Params

Client can invoke an additional system call as the app

DynamoRIO Tutorial August 2014

100

Client Events: Bookkeeping


Initialization and Exit
Entire process
Each thread
Child of fork (Linux-only)

Basic block and trace deletion during cache management


Nudge received
Used for communication into client

Itimer fired (Linux-only)

DynamoRIO Tutorial August 2014

101

Multiple Clients
It is each client's responsibility to ensure compatibility with
other clients
Instruction stream modifications made by one client are visible to
other clients

At client registration each client is given a priority


dr_init() called in priority order (priority 0 called first and thus
registers its callbacks first)

Event callbacks called in reverse order of registration


Gives precedence to first registered callback, which is given the
final opportunity to modify the instruction stream or influence
DynamoRIO's operation

drmgr Extension provides mediation among multiple


components
DynamoRIO Tutorial August 2014

102

DynamoRIO API Outline

Building and Deploying


Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting

DynamoRIO Tutorial August 2014

103

DynamoRIO API: General Utilities


DynamoRIO provides safe utilities for transparency support
Separate stack
Separate memory allocation
Separate file I/O

Utility options
Use DynamoRIO-provided utilities directly
Use shared libraries via DynamoRIO private loader
Malloc, etc. redirected to DynamoRIO-provided utilities

Use static libraries with dependencies redirected

Risky for client to directly invoke system calls

DynamoRIO Tutorial August 2014

104

DynamoRIO Heap
Three flavors:

Thread-private: no synchronization; thread lifetime


Global: synchronized, process lifetime
Non-heap: for generated code, etc.
No header on allocated memory: low overhead but must pass
size on free

Leak checking
Debug build complains at exit if memory was not deallocated

DynamoRIO Tutorial August 2014

105

Thread Support
Thread support

Thread-local storage
Callback-local storage
Simple mutexes
Read-write locks
Thread-private code caches, if requested

Sideline support
Create new client-only thread
Thread-private itimer (Linux-only)

Suspend and resume all other threads


Cannot hold locks while suspending

DynamoRIO Tutorial August 2014

106

Thread-Local Storage (TLS)


Absolute addressing
Thread-private only

Application stack
Not reliable or transparent

Stolen register
Performance hit

Segment
Best solution for thread-shared

DynamoRIO Tutorial August 2014

107

Callback-Local Storage (CLS)


user mode

kernel mode

time

message pending
save user context

dispatcher
message handler

no message pending
restore context

DynamoRIO Tutorial August 2014

108

Callback-Local Storage (CLS)


Windows callbacks interrupt execution to process an event
and later resume the suspended context
TLS data from the suspended context will be overwritten
during callback execution
CLS data is saved at the interruption point and restored at the
resumption point
Whenever keeping persistent data specific to one context
rather than overall execution, use CLS instead of TLS
Usually only needed when storing data specific to a system call
in pre-syscall event and reading it back in post-syscall event

Can be used for Linux signals as well


Provided by the drmgr Extension

DynamoRIO Tutorial August 2014

109

DynamoRIO API: General Utilities, Contd


Communication
Nudges: ping from external process
File creation, reading, and writing
File descriptor isolation on Linux

Safe read/write
Fault-proof read/write routines
Try/except facility

DynamoRIO Tutorial August 2014

110

DynamoRIO API: General Utilities, Contd


Application inspection

Address space querying


Module iterator
Processor feature identification
Symbol lookup
Function replacing and wrapping

DynamoRIO Tutorial August 2014

111

Symbol Table Access


The drsyms Extension provides access to symbol tables and
debug information
Currently supports the following:
Windows PDB
Linux ELF + DWARF2
Windows PECOFF + DWARF2

API includes:

Address to symbol and line information


Symbol to address
Symbol enumeration and searching
Symbol demangling
Symbol types

DynamoRIO Tutorial August 2014

112

Function Replacing and Wrapping


drwrap Extension provides function replacing and wrapping
Use dr_get_proc_address() to find library exports or drsyms
Extension to find internal functions
Function replacing replaces with application code
Function wrapping calls pre and post callbacks that execute
as client code around the target application function
Arguments, return value, and whether the function is executed
can all be examined and controlled

DynamoRIO Tutorial August 2014

113

Third-Party Libraries
Private loader inside DynamoRIO will load any external
shared libraries a client imports from
Loads a duplicate copy of each library and tries to isolate from
the applications copy

On Windows, private loader does not support locating SxS


libraries, so use static libc with VS2005 or VS2008
C++ clients are built normally
C clients by default do not link with libc
Set DynamoRIO_USE_LIBC variable prior to invoking
configure_DynamoRIO_client() to use libc with a C client

DynamoRIO Tutorial August 2014

114

Private Libraries
Private loader on Windows
Not easy to fully isolate system data structures
PEB and key TEB fields are isolated
Some libraries like [Link] are shared

To examine application state while in client code, use


dr_switch_to_app_state()

Private loader on Linux


Isolation is simpler and more complete

DynamoRIO Tutorial August 2014

115

Optimal Transparency
For best transparency: completely self-contained client
Imports only from DynamoRIO API
-nodefaultlibs or /nodefaultlib

Alternatives to dynamic libc on Windows:


String and utility routines provided by forwards to ntdll
ntdll contains mini-libc

[Link] /MT static copy of C/C++ libraries

Alternatives to dynamic libc on Linux:


For static C/C++ lib, use ld wrap to redirect malloc to DRs heap
Newer distributions dont ship suitable static C/C++ lib

DynamoRIO Tutorial August 2014

116

DynamoRIO API Outline

Building and Deploying


Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting

DynamoRIO Tutorial August 2014

117

DynamoRIO API: Instruction Representation

Full IA-32/AMD64 instruction representation


Instruction creation with auto-implicit-operands
Operand iteration
Instruction lists with iteration, insertion, removal
Decoding at various levels of detail
Encoding

DynamoRIO Tutorial August 2014

118

Instruction Representation
8d 34 01

lea

(%ecx,%eax,1) -> %esi

8b 46 0c

mov

0xc(%esi) -> %eax

2b 46 1c

sub

0x1c(%esi) %eax -> %eax

0f b7 4e 08

movzx

0x8(%esi) -> %ecx

c1 e1 07

shl

$0x07 %ecx -> %ecx

WCPAZSO

3b c1

cmp

%eax %ecx

WCPAZSO

0f 8d a2 0a 00 00 jnl
raw bytes

DynamoRIO Tutorial August 2014

opcode

$0x77f52269
operands

WCPAZSO

RSO
eflags

119

Instruction Representation
lea

(%ecx,%eax,1) -> %edi

mov

0xc(%edi) -> %eax

sub

0x1c(%edi) %eax -> %eax

movzx

0x8(%edi) -> %ecx

c1 e1 07

shl

$0x07 %ecx -> %ecx

WCPAZSO

3b c1

cmp

%eax %ecx

WCPAZSO

0f 8d a2 0a 00 00 jnl
raw bytes

DynamoRIO Tutorial August 2014

opcode

$0x77f52269
operands

WCPAZSO

RSO
eflags

120

Instruction Creation

Method 1: use the INSTR_CREATE_opcode macros that fill


in implicit operands automatically:
instr_t *instr = INSTR_CREATE_dec(dcontext,
opnd_create_reg(DR_REG_EDX));

Method 2: specify opcode + all operands (including implicit


operands):
instr_t *instr = instr_create(dcontext);
instr_set_opcode(instr, OP_dec);
instr_set_num_opnds(dcontext, instr, 1, 1);
instr_set_dst(instr, 0, opnd_create_reg(DR_REG_EDX));
instr_set_src(instr, 0, opnd_create_reg(DR_REG_EDX));

DynamoRIO Tutorial August 2014

121

Linear Control Flow


Both basic blocks and traces are
linear
Instruction sequences are all
single-entrance, multiple-exit
Greatly simplifies analysis
algorithms

DynamoRIO Tutorial August 2014

122

64-Bit Versus 32-Bit


32-bit build of DynamoRIO only handles 32-bit code
64-bit build of DynamoRIO decodes/encodes both 32-bit and
64-bit code
Current release does not support executing applications that mix
the two

IR is universal: covers both 32-bit and 64-bit


Abstracts away underlying mode

DynamoRIO Tutorial August 2014

123

64-Bit Thread and Instruction Modes


When going to or from the IR, the thread mode and instruction
mode determine how instrs are interpreted
When decoding, current threads mode is used
Default is 64-bit for 64-bit DynamoRIO
Can be changed with set_x86_mode()

When encoding, that instructions mode is used


When created, set to mode of current thread
Can be changed with instr_set_x86_mode()

DynamoRIO Tutorial August 2014

124

64-Bit Clients
Define X86_64 before including header files when building a
64-bit client
Convenience macros for printf formats, etc. are provided
E.g.:
printf(Pointer is PFX\n, p);

Use X macros for cross-platform registers


DR_REG_XAX is DR_REG_EAX when compiled 32-bit, and
DR_REG_RAX when compiled 64-bit

DynamoRIO Tutorial August 2014

125

DynamoRIO API: Code Manipulation


Processor information
State preservation
Eflags, arith flags, floating-point state, MMX/SSE state
Spill slots, TLS, CLS

Clean calls to C code


Dynamic instrumentation
Replace code in the code cache

Branch instrumentation
Convenience routines

DynamoRIO Tutorial August 2014

126

Processor Information
Processor type
proc_get_vendor(), proc_get_family(), proc_get_type(),
proc_get_model(), proc_get_stepping(), proc_get_brand_string()

Processor features
proc_has_feature(), proc_get_all_feature_bits()

Cache information
proc_get_cache_line_size(), proc_is_cache_aligned(),
proc_bump_to_end_of_cache_line(),
proc_get_containing_page()
proc_get_L1_icache_size(), proc_get_L1_dcache_size(),
proc_get_L2_cache_size(), proc_get_cache_size_str()

DynamoRIO Tutorial August 2014

127

State Preservation
Spill slots for registers
3 fast slots, 6/14 slower slots
dr_save_reg(), dr_restore_reg(), and dr_reg_spill_slot_opnd()
From C code: dr_read_saved_reg(), dr_write_saved_reg()

Dedicated TLS field for thread-local data


dr_insert_read_tls_field(), dr_insert_write_tls_field()
From C code: dr_get_tls_field(), dr_set_tls_field()
Parallel routines for CLS fields

Arithmetic flag preservation


dr_save_arith_flags(), dr_restore_arith_flags()

Floating-point/MMX/SSE state
dr_insert_save_fpstate(), dr_insert_restore_fpstate()
DynamoRIO Tutorial August 2014

128

Clean Calls
if (instr_is_mbr(instr)) {
app_pc address = instr_get_app_pc(instr);
uint opcode = instr_get_opcode(instr);
instr_t *nxt = instr_get_next(instr);
dr_insert_clean_call(drcontext, ilist, nxt, (void *) at_mbr,
false/*don't need to save fp state*/,
2 /* 2 parameters */,
/* opcode is 1st parameter */
OPND_CREATE_INT32(opcode),
/* address is 2nd parameter */
OPND_CREATE_INTPTR(address));
}

Saved interrupted application state can be accessed using


dr_get_mcontext() and modified using dr_set_mcontext()

DynamoRIO Tutorial August 2014

129

Clean Call Inlining


Simple clean callees will be automatically optimized and
potentially inlined
-opt_cleancall runtime option controls aggressiveness
Current requirements for inlining:
Leaf routine (may call PIC get-pc thunk)
Zero or one argument
Relatively short

Compile the client with optimizations to improve clean call


optimization
Look in debug logfile for CLEANCALL to see results

DynamoRIO Tutorial August 2014

130

Dynamic Instrumentation
Thread-shared: flush all code corresponding to application
address and then re-instrument when re-executed
Can flush from clean call, and use dr_redirect_execution() since
cannot return to potentially flushed cache fragment

Thread-private: can also replace particular fragment (does not


affect other potential copies of the source app code)
dr_replace_fragment()

DynamoRIO Tutorial August 2014

131

Flushing the Cache


Immediately deleting or replacing individual code cache
fragments is available for thread-private caches
Only removes from that threads cache

Two basic types of thread-shared flush:


Non-precise: remove all entry points but let target cache code be
invalidated and freed lazily
Precise/synchronous:
Suspend the world
Relocate threads inside the target cache code
Invalidate and free the target code immediately

DynamoRIO Tutorial August 2014

132

Flushing the Cache


Thread-shared flush API routines:
dr_unlink_flush_region(): non-precise flush
dr_flush_region(): synchronous flush
dr_delay_flush_region():
No action until a thread exits code cache on its own
If provide a completion callback, synchronous once triggered
Without a callback, non-precise

DynamoRIO Tutorial August 2014

133

Multi-Instrumentation Mediation
The drmgr Extension provides mediation among multiple
agents for basic block instrumentation and TLS/CLS access
Divides instrumentation into four stages and orders the
callbacks for each stage:

Application-to-application transformations
Application analysis
Instrumentation insertion
Instrumentation optimization

Enables multi-library frameworks and modular clients

DynamoRIO Tutorial August 2014

134

Memory Tracing
drutil Extension provides utilities for memory address tracing:
Address acquisition
String loop expansion

DynamoRIO Tutorial August 2014

135

DynamoRIO API Outline

Building and Deploying


Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting

DynamoRIO Tutorial August 2014

136

DynamoRIO API: Translation


Translation refers to the mapping of a code cache machine
state (program counter, registers, and memory) to its
corresponding application state
The program counter always needs to be translated
Registers and memory may also need to be translated
depending on the transformations applied when copying into the
code cache

DynamoRIO Tutorial August 2014

137

Translation Case 1: Fault


user context

user context

faulting instr.

faulting instr.

Exception and signal handlers are passed machine context of


the faulting instruction.
For transparency, that context must be translated from the
code cache to the original code location
Translated location should be where the application would
have had the fault or where execution should be resumed

DynamoRIO Tutorial August 2014

138

Translation Case 2: Relocation


If one application thread suspends another, or DynamoRIO
suspends all threads for a synchronous cache flush:
Need suspended target thread in a safe spot
Not always practical to wait for it to arrive at a safe spot (if in a
system call, e.g.)

DynamoRIO forcibly relocates the thread


Must translate its state to the proper application state at which to
resume execution

DynamoRIO Tutorial August 2014

139

Translation Approaches
Two approaches to program counter translation:
Store mappings generated during fragment building
High memory overhead (> 20% for some applications, because it
prevents internal storage optimizations) even with highly optimized
difference-based encoding. Costly for something rarely used.

Re-create mapping on-demand from original application code


Cache consistency guarantees mean the corresponding application
code is unchanged
Requires idempotent code transformations

DynamoRIO supports both approaches


The engine mostly uses the on-demand approach, but stored
mappings are occasionally needed

DynamoRIO Tutorial August 2014

140

Instruction Translation Field


Each instruction contains a translation field
Holds the application address that the instruction corresponds
to
Set via instr_set_translation()

DynamoRIO Tutorial August 2014

141

Context Translation Via Re-Creation


A1: mov

%ebx, %ecx

A2: add

%eax, (%ecx)

A3: cmp

$4, (%eax)

A4: jle

710349fb

C1: mov

%ebx, %ecx

D1: (A1) mov

%ebx, %ecx

C2: add

%eax, (%ecx)

D2: (A2) add

%eax, (%ecx)

C3: cmp

$4, (%eax)

D3: (A3) cmp

$4, (%eax)

C4: jle

<stub0>

D4: (A4) jle

<stub0>

C5: jmp

<stub1>

D5: (A4) jmp

<stub1>

DynamoRIO Tutorial August 2014

142

Application vs. Meta Instructions


By default, instructions are treated as application instructions
Must have translations: instr_set_translation(), INSTR_XL8()
Control-flow-changing app instructions are modified to retain
DynamoRIO control and result in cache populating

Meta instructions are added instrumentation code


Not treated as part of the application (e.g., calls run natively)
Usually cannot fault, so translations not needed
Created via instr_set_meta() or instrlist_meta_append()

Meta instructions can reference application memory, or


deliberately fault
A meta instruction that might fault must contain a translation
The client should handle any such fault

DynamoRIO Tutorial August 2014

143

Client Translation Support


Instruction lists passed to clients are annotated with
translation information
Read via instr_get_translation()
Clients are free to delete instructions, change instructions and
their translations, and add new tool and app instructions (see
dr_register_bb_event() for restrictions)
An idempotent client that restricts itself to deleting app
instructions and adding non-faulting meta instructions can ignore
translation concerns
DynamoRIO takes care of instructions added by API routines
(insert_clean_call(), etc.)

Clients can choose between storing or regenerating


translations on a fragment by fragment basis.
DynamoRIO Tutorial August 2014

144

Client Regenerated Translations


Client returns DR_EMIT_DEFAULT from its bb or trace event
callback
Client bb & trace event callbacks are re-called when
translations are needed with translating==true
Client must exactly duplicate transformations performed when
the block was generated
Client must set translation field for all added app instructions
and all meta instructions that might fault
This is true even if translating==false since DynamoRIO may
decide it needs to store translations anyway

DynamoRIO Tutorial August 2014

145

Client Stored Translations


Client returns DR_EMIT_STORE_TRANSLATIONS from its
bb or trace event callback
Client must set translation field for all added app instructions
and all meta instructions that might fault
Client bb or trace hook will not be re-called with
translating==true

DynamoRIO Tutorial August 2014

146

Register State Translation


Translation may be needed at a point where some registers
are spilled to memory
During indirect branch or RIP-relative mangling, e.g.

DynamoRIO walks fragment up to translation point, tracking


register spills and restores
Special handling for stack pointer around indirect calls and
returns

DynamoRIO tracks client spills and restores implicitly added


by API routines
Clean calls, etc.
Explicit spill/restore (e.g., dr_save_reg()) clients responsibility

DynamoRIO Tutorial August 2014

147

Client Register State Translation


If a client adds its own register spilling/restoring code or
changes register mappings it must register for the restore
state event to correct the context
The same event can also be used to fix up the applications
view of memory
DynamoRIO does not internally store this kind of translation
information ahead of time when the fragment is built
The client must maintain its own data structures

DynamoRIO Tutorial August 2014

148

DynamoRIO API Outline

Building and Deploying


Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting

DynamoRIO Tutorial August 2014

149

DynamoRIO versus Pin


Basic interface is fundamentally different
Pin = insert callout/trampoline only
Not so different from tools that modify the original code: Dyninst,
Vulcan, Detours
Uses code cache only for transparency

DynamoRIO = arbitrary code stream modifications


Only feasible with a code cache
Takes full advantage of power of code cache
General IA-32/AMD64 decode/encode/IR support

DynamoRIO Tutorial August 2014

150

DynamoRIO versus Pin


Pin = insert callout/trampoline only
Pin tries to inline and optimize
Client has little control or guarantee over final performance

DynamoRIO = arbitrary code stream modifications


Client has full control over all inserted instrumentation
Result can be significant performance difference
PiPA Memory Profiler + Cache Simulator:
3.27x speedup w/ DynamoRIO vs 2.6x w/ Pin

DynamoRIO also performs callout (clean call) optimization and


inlining just like Pin for less performance-focused clients

DynamoRIO Tutorial August 2014

151

Base Performance Comparison (No Tool)

DynamoRIO Tutorial August 2014

152

Base Performance Comparison (No Tool)

DynamoRIO Tutorial August 2014

153

Base Memory Comparison (No Tool)

DynamoRIO Tutorial August 2014

154

Base Memory Comparison (No Tool)

DynamoRIO Tutorial August 2014

155

BBCount Pin Tool


static int bbcount;
VOID PIN_FAST_ANALYSIS_CALL docount() { bbcount++; }
VOID Trace(TRACE trace, VOID *v) {
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) {
BBL_InsertCall(bbl, IPOINT_ANYWHERE, AFUNPTR(docount),
IARG_FAST_ANALYSIS_CALL, IARG_END);
}
}
int main(int argc, CHAR *argv[]) {
PIN_InitSymbols();
PIN_Init(argc, argv);
TRACE_AddInstrumentFunction(Trace, 0);
PIN_StartProgram();
return 0;
}

DynamoRIO Tutorial August 2014

156

Simple BBCount DynamoRIO Tool


static int bbcount;
static void docount() { bbcount++; }
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
bool for_trace, bool translating) {
dr_insert_clean_call(drcontext, bb, instrlist_first(bb), docount, false, 0);
return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
dr_register_bb_event(event_basic_block);
}

DynamoRIO Tutorial August 2014

157

BBCount Performance Comparison: Simple Tool

DynamoRIO Tutorial August 2014

158

BBCount Performance Comparison: Simple Tool

DynamoRIO Tutorial August 2014

159

Optimized BBCount DynamoRIO Tool


static int global_count;
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
bool for_trace, bool translating) {
instr_t *instr, *first = instrlist_first(bb);
uint flags;
/* Our inc can go anywhere, so find a spot where flags are dead.
* Technically this can be unsafe if app reads flags on fault =>
* stop at instr that can fault, or supply runtime op */
for (instr = first; instr != NULL; instr = instr_get_next(instr)) {
flags = instr_get_arith_flags(instr);
/* OP_inc doesn't write CF but not worth distinguishing */
if (TESTALL(EFLAGS_WRITE_6, flags) && !TESTANY(EFLAGS_READ_6, flags))
break;
}
if (instr == NULL)
dr_save_arith_flags(drcontext, bb, first, SPILL_SLOT_1);
instrlist_tool_preinsert(bb, (instr == NULL) ? first : instr,
INSTR_CREATE_inc(drcontext, OPND_CREATE_ABSMEM((byte *)&global_count, OPSZ_4)));
if (instr == NULL)
dr_restore_arith_flags(drcontext, bb, first, SPILL_SLOT_1);
return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
dr_register_bb_event(event_basic_block);
}

DynamoRIO Tutorial August 2014

161

BBCount Performance Comparison: Opt Tool

DynamoRIO Tutorial August 2014

162

BBCount Performance Comparison: Opt Tool

DynamoRIO Tutorial August 2014

163

DynamoRIO API Outline

Building and Deploying


Events
Utilities
Instruction Manipulation
State Translation
Comparison with Pin
Troubleshooting

DynamoRIO Tutorial August 2014

165

Obtaining Help
Read the documentation
[Link]

Look at the sample clients


In the documentation
In the release package: samples/

Ask on the DynamoRIO Users discussion forum/mailing list


[Link]

DynamoRIO Tutorial August 2014

166

Debugging Clients
Use the DynamoRIO debug build for asserts
Often point out the problem

Use logging
-loglevel N
stored in logs/ subdir of DR install dir

Attach a debugger

gdb or windbg
-msgbox_mask 0xN
-no_hide
windbg: .reload [Link]=0xN

More tips:
[Link]
DynamoRIO Tutorial August 2014

167

Reporting Bugs
Search the Issue Tracker off [Link] first
[Link]

File a new Issue if not found


Follow conventions on wiki
[Link]
CRASH, APP CRASH, HANG, ASSERT

Example titles:
CRASH (1.3.1 [Link])
vm_area_add_fragment:vmareas.c(4466)
ASSERT (1.3.0 suite/tests/common/segfault)
study_hashtable:fragment.c:1745 ASSERT_NOT_REACHED

DynamoRIO Tutorial August 2014

168

Examples, Part 2
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00

Welcome + DynamoRIO History


DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback

Feedback
1:30-1:40
1:40-2:40
2:40-3:00
3:00-3:15
3:15-4:00
4:00-4:45
4:45-5:00

Welcome + DynamoRIO History


DynamoRIO Overview
Examples, Part 1
Break
DynamoRIO API
Examples, Part 2
Feedback

Optional Slides:
Advanced Code
Cache Topics

Overview Outline
Efficient

Software code cache overview


Thread-shared code cache
Cache capacity limits
Data structures

Transparent
Comprehensive
Customizable

DynamoRIO Tutorial August 2014

172

Added Memory Breakdown

DynamoRIO Tutorial August 2014

173

Code Expansion
exit stubs
19%

indirect branch target


handling
7%

net jumps
8%

DynamoRIO Tutorial August 2014

original code
66%

174

Cache Capacity Challenges


How to set an upper limit on the cache size
Different applications have different working sets and different
total code sizes

Which fragments to evict when that limit is reached


Without expensive profiling or extensive fragmentation

DynamoRIO Tutorial August 2014

175

Adaptive Sizing Algorithm


Enlarge cache if warranted by
percentage of new fragments that are
regenerated
Target working set of application: dont
enlarge for once-only code
Low-overhead, incremental, and
reactive

DynamoRIO Tutorial August 2014

176

Cache Capacity Settings


Thread-private:
Working set size matching is on by default
Client may see blocks or traces being deleted in the absence of
any cache consistency event
Can disable capacity management via
-no_finite_bb_cache
-no_finite_trace_cache

Thread-shared:
Set to infinite size by default
Can enable capacity management via
-finite_shared_bb_cache
-finite_shared_trace_cache

Reset triggered when hit up-front reservation


DynamoRIO Tutorial August 2014

177

Overview Outline
Efficient

Software code cache overview


Thread-shared code cache
Cache capacity limits
Data structures

Transparent
Comprehensive
Customizable

DynamoRIO Tutorial August 2014

178

Two Modes of Code Cache Operation


Fine-grained scheme
Supports individual code fragment unlink and removal
Separate data structure per code fragment and each of its exits,
memory regions spanned, and incoming links

Coarse-grained scheme

No individual code fragment control


Permanent intra-cache links
No per-fragment data structures at all
Treat entire cache as a unit for consistency

DynamoRIO Tutorial August 2014

179

Data Structures
Fine-grained scheme
Data structures are highly tuned and compact

Coarse-grained scheme
There are no data structures
Savings on applications with large amounts of code are typically
15%-25% of committed memory and 5%-15% of working set

DynamoRIO Tutorial August 2014

180

Status in Current Release


Fine-grained scheme
Current default

Coarse-grained scheme

Select with opt_memory runtime option


Possible performance hit on certain benchmarks
In the future will be the default option
Required for persisted and process-shared caches

DynamoRIO Tutorial August 2014

181

Adaptive Level of Granularity


Start with coarse-grain caches
Plus freezing and sharing/persisting

Switch to fine-grain for individual modules or sub-regions of


modules after significant consistency events, to avoid
expensive entire-module flushes
Support simultaneous fine-grain fragments within coarse-grain
regions for corner cases

Match amount of bookkeeping to amount of code change


Majority of application code does not need fine-grain

DynamoRIO Tutorial August 2014

182

Many Varieties of Code Caches


Coarse-grained versus fine-grained
Thread-shared versus thread-private
Basic blocks versus traces

DynamoRIO Tutorial August 2014

183

Optional Slides:
Dr. Memory

Dr. Memory
Detects reads of uninitialized memory
Detects heap errors

Out-of-bounds accesses (underflow, overflow)


Access to freed memory
Invalid frees
Memory leaks

Detects other accesses to invalid memory


Stack tracking
Thread-local storage slot tracking

Operates at runtime on unmodified Windows & Linux binaries

DynamoRIO Tutorial August 2014

185

Dr. Memory Instrumentation


Monitor all memory accesses, stack adjustments, and heap
allocations
Shadow each byte of app memory
Each bytes shadow stores one of 4 values:

Unaddressable
Uninitialized
Defined at byte level
Defined at bit level  escape to extra per-bit shadow values

DynamoRIO Tutorial August 2014

186

Dr. Memory

Stack

Shadow Stack
defined

Heap
redzone

undefined
defined

Shadow Heap
invalid
defined

malloc

undefined
defined

invalid

DynamoRIO Tutorial August 2014

redzone

invalid

freed

invalid

187

Partial-Word Defines But Whole-Word Transfers


Sub-dword variables are moved around as whole dwords
Cannot raise error when a move reads uninitialized bits
Must propagate on moves and thus must shadow registers
Propagate shadow values by mirroring app data flow

Check system call reads and propagate system call writes


Else, false negatives (reads) or positives (writes)

Raise errors instead of propagating at certain points


Report errors only on significant reads

DynamoRIO Tutorial August 2014

188

Shadowing Registers
Use multiple TLS slots
dr_raw_tls_calloc()
Alternative: steal register

Can read and write w/o spilling


Bring into spilled register to combine w/ other args
Defined=0, uninitialized=1
Combine via bitwise or

DynamoRIO Tutorial August 2014

189

Monitoring Stack Changes


As stack is extended and contracts again, must update stack
shadow as unaddressable vs uninitialized
Push, pop, or any write to stack pointer
Try to distinguish large alloc/dealloc from stack swap

DynamoRIO Tutorial August 2014

190

Kernel-Mediated Stack Changes


Kernel places data on the stack and removes it again
Windows: APC, callback, and exception
Linux: signals

Linux signals as an example:

intercept sigaltstack changes


intercept handler registration to instrument handler code
use DR's signal event to record app xsp at interruption point
when see event followed by handler, check which stack and
mark from either interrupted xsp or altstack base to cur xsp as
defined (ignoring padding)
record cur xsp in handler, and use to undo on sigreturn

DynamoRIO Tutorial August 2014

191

Types Of Instrumentation
Clean call
Simplest, but expensive in both time and space: full context
switch from application state to tool state with separate stack to
execute C code

Shared clean call


Saves space

Lean procedure
Shared routine with smaller context switch than full clean call
Jump-and-link rather than swapping stack
Array of routines, one per pair of dead registers

Inlined
Smallest context switch, but should limit to small sequences of
instrumentation
DynamoRIO Tutorial August 2014

192

Non-Code-Cache Code
Use dr_nonheap_alloc() to allocate space to store code
Generate code using DRs IR and emit to target space
Mark read-only once emitted via dr_memory_protect()

DynamoRIO Tutorial August 2014

193

Jump-and-Link
Rather than using call+return, avoid stack swap cost by using
jump-and-link
Store return address in a register or TLS slot
Direct jump to target
Indirect jump back to source
PRE(bb, inst, INSTR_CREATE_mov_st(drcontext,
spill_slot_opnd(drcontext, SPILL_SLOT_2),
opnd_create_instr(appinst)));
PRE(bb, inst, INSTR_CREATE_jmp(drcontext,
opnd_create_pc(shared_slowpath_region)));
...
PRE(ilist, NULL, INSTR_CREATE_jmp_ind(drcontext,
spill_slot_opnd(SPILL_SLOT_2)));

DynamoRIO Tutorial August 2014

194

Inter-Instruction Storage
Spill slots provided by DR are only guaranteed to be live
during a single app instr
In practice, live until next selfmod instr

Allocate own TLS for spill slots


dr_raw_tls_calloc()

Steal registers across whole bb


Restore before each app read
Update spill slot after each app write
Restore on fault

DynamoRIO Tutorial August 2014

195

Using Faults For Faster Common Case Code


Instead of explicitly checking for rare cases, use faults to
handle them and keep common case code path fast
Signal and exception event and restore state extended event
all provide pre- and post-translation contexts and containing
fragment information
Client can return failure for extended restore state event
When can support re-execution of faulting cache instr, but not restart translation for relocation

DynamoRIO Tutorial August 2014

196

Address Space Iteration


Repeated calls to dr_query_memory_ex()
Check dr_memory_is_in_client() and
dr_memory_is_dr_internal()
Heap walk
API on Windows

Initial structures on Windows


TEB, TLS, etc.
PEB, ProcessParameters, etc.

DynamoRIO Tutorial August 2014

197

Intercepting Library Routines


Common task
Dr. Memory monitors malloc, calloc, realloc, free,
malloc_usable_size, etc.
Alternative is to replace w/ own copies

Locating entry point


Module API

Pre-hooks are easy


Post-hooks are hard
Three techniques, each with its own limitations
See paper in CGO 2011
drwrap Extension now provides function wrapping

DynamoRIO Tutorial August 2014

198

Replacing Library Routines


Dr. Memory replaces libc routines containing optimized code
that raises false positives
memcpy, strlen, strchr, etc.

Simplification: arrange for routines to always be entered in a


new bb
Do not request elision or indcall2direct from DR

Want to interpret replaced routines


DR treats native execution differently: aborts on fault, etc.

Replace entire bb with jump to replacement routine


drwrap Extension now provides function replacement

DynamoRIO Tutorial August 2014

199

Delayed Fragment Deletion


Due to non-precise flushing we can have a flushed bb made
inaccessible but not actually freed for some time
When keeping state per bb, if a duplicate bb is seen, replace
the state and increment a counter ignore_next_delete
On a deletion event, decrement and ignore unless below 0
Can't tell apart from duplication due to thread-private copies:
but this mechanism handles that if saved info is deterministic
and identical for each copy

DynamoRIO Tutorial August 2014

200

Callstack Walking
Use case: error reporting
Technique:
Start with xbp as frame pointr (fp)
Look for <fp,retaddr> pairs where retaddr = inside a module

Interesting issues:
When scanning for frame pointer (in frameless func, or at bottom
of stack), querying whether in a module dominates performance
msvcr80!malloc pushes ebx and then ebp, requiring special
handling
When displaying, use retaddr-1 for symbol lookup
More sophisticated techniques needed in presence of FPO

DynamoRIO Tutorial August 2014

201

Suspending The World


Use case: Dr. Memory leak check
GC-like memory scan

Use dr_suspend_all_other_threads() and


dr_resume_all_other_threads()
Cannot hold locks while suspending

DynamoRIO Tutorial August 2014

202

Using Nudges
Daemon apps do not exit
Request results mid-run
Cross-platform
Signal on Linux
Remote thread on Windows

DynamoRIO Tutorial August 2014

203

Tool Packaging
DynamoRIO is redistributable, so you can include a copy with
your tool
drrun supports the t option via a tool configuration file
drrun t drcov -- <app cmdline>

Custom front end to configure and launch app


We provide several libraries for building tool front ends:
drconfiglib
drinjectlib
drfrontendlib

DynamoRIO Tutorial August 2014

204

You might also like