Compiling Dynamorio VS2019

Here is just a quick tips too start create compatibility to use VS2019.

References:

I try to install VS2017 on my new installed VM, but currently there is only VS2019 installer so I trying a combination a bit:

VS2019 with VS2017 (v141 | v14.16):

  1. Start command line to use cl.exe from v14.16 not VS 2019 (v142 | v14.24)
%comspec% /k "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars32.bat" -vcvars_ver=14.16

See xx.16:

cl.exe
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27034 for x86
ml.exe
Microsoft (R) Macro Assembler Version 14.16.27034.0
  1. Run cmake to use v141 toolset

I cannot change the Generator so it always use “Visual Studio 16 2019”.

cmake -T v141,host=x86,version=14.16 -A Win32 ..
  1. Build as wiki said:
cmake --build . --config RelWithDebInfo

VS2019

  1. Start command line as “Developer Command Prompt for VS 2019”, or:
%comspec% /k "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\Tools\VsDevCmd.bat"
  1. Run cmake with Win32 (within build dir):
cmake -A Win32 ..
  1. Buils as wiki said:
cmake --build . --config RelWithDebInfo

Install

This instruction will install a copy to export dir.

cmake --build . --config RelWithDebInfo --target install

Running

Goto export dir and try:

bin32\drrun.exe -c samples\bin32\bbcount.dll -- notepad.exe

TODO

The error above if we enforce -A x64 (by default), so need to update cpp2asm_support.cmake

Record the Code Flow with DynamoRIO – Part 2

Module Load Events

When a module loaded we will need to iterate all exported functions from that module to wrapping the function so that we know when the exported function called.

To iterate over all exported functions we will start with dr_symbol_export_iterator_start, check if it has next with dr_symbol_export_iterator_hasnext, then get next with dr_symbol_export_iterator_next, and if it has reach then end wi call dr_symbol_export_iterator_stop.

During module loaded we wrap the function with drwrap_wrap then store the function address into the hash table. The hash table entry are allocated with dr_global_alloc then store it using hashtable_add.

During module unloaded we unwrap the function with drwrap_unwrap. We also remove the hash table entry with hashtable_remove which then call the callback that will free the entry with dr_global_free.

We can also iterate all imported function from current main module with start using dr_symbol_import_iterator_start, check next with dr_symbol_import_iterator_hasnext, get next with dr_symbol_import_iterator_next, and stop with dr_symbol_import_iterator_stop.

We can also get the module name with dr_module_preferred_name.

Thread Init Events

When thread init and ready to run, we allocate the trace list buffer into TLS. We allocated it with dr_thread_alloc then we register it into TLS field with drmgr_set_tls_field.

When tread exit and we are done with the trace list buffer then we get the TLS field with drmgr_get_tls_field then freeing the memory with dr_thread_free.

To get the current thread id we use dr_get_thread_id.

Exported Function

Since all exported function already wrapped then when exported function called, our callback function will be executed first. The callback function will get the current function address with drwrap_get_func and context with drwrap_get_drcontext.

We will only trace the function that called directly by main module (an executable). We check the function return target with drwrap_get_retaddr and check the owner of that return address with dr_lookup_module. After using the module data we should do dr_free_module_data.

Basic Block Manipulation

During the 2nd-phase (analysis) we get the first instruction from current basic block with instrlist_first then we get the address with instr_get_app_pc and the owner with dr_lookup_module.

If the owner is main module we will trace it with set the user data with the address of first instruction. This user data will be passed thru the 3rd-phase (insertion) that will be useful later.

To get the basic block size (total instructions opcodes in bytes) we should iterate all instruction and sum instr_length per instruction. We use instr_get_next_app to get next instruction, and starting with instrlist_first_app.

During the 3rd-phase (insertion) we will insert the instruction to write into trace list buffer. The callback on insertion phase is called not per basic blok, but per instruction within basic block. So that we will only add once per basic block when the callback called for the first instructon. We know by compare with the value of user data.

To get the address of basic block we can use instr_get_app_pc on first instruction. We will prepend our code before application instruction using instrlist_meta_preinsert.

Before our instruction using some of CPU registers, we should save current state of registers and restore them later. This is to ensure the application behaviour doesn’t changed. We use dr_save_arith_flags and dr_save_reg and then use dr_restore_reg and dr_restore_arith_flags after using them.

To get address of TLS field on our instruction, we use drmgr_insert_read_tls_field to save the address on a register. This function similar with the drmg_get_tls_field.

At some point when we need the instruction to jump to our C function, we could use dr_insert_clean_call to use clean call to our wrapper function.

Meta Instruction to Inject

Here are the pseudo code:

LABEL start;
LABEL skip_rdtsc;
LABEL skip_dump;

slot1 = AFLAGS;
slot2 = XCX;
slot3 = XBX;
XBX = tls_fields(idx);
XCX = XBX->pos;
XBX->pc_data[XCX] = start;
if (XCX != 0) goto skip_rdtsc;

slot4 = XAX;
slot5 = XDX;
XDX:XAX = rdtsc();
XBX->ts = XDX:XAX;
XDX = slot5;
XAX = slot4;

skip_rdtsc:
XCX++;
XBX->pos = XCX;
if (XCX < buf_total) goto skip_dump;

clean_call(XCX);

skip_dump:
XBX = slot3;
XCX = slot2;
AFLAGS = slot1;

start:
...

with the structure of tls_field:

typedef struct {
uint pos;
uint64 ts;
thread_id_t thread;
uint32 pc_data[BUF_SIZE];
} per_thread_t;

For further implementation, please vist the repository: https://github.com/firodj/bbtrace/

Record the Code Flow with DynamoRIO – Part 1

Intro

To understand what an executable program does (which no source code, with or without debug symbol), we need disassembler. Further more we could also need decompiler to reversing into high level language like C that understandable by human.

That tools will statically analyze the binary with all possible branches. Understanding all branches will take more time for us to understand and may take more pain. Besides there was a part which never be executed nor obfuscated which make analyze become wrong.

Usually the next task to do is by debugging it, setting break-point, and following the next trace. Doing this is also more trivial and need more time. Sometimes branches may not entering some path because only occurs when user give some input. So then real-time tools that has ability to record the trace is needed, such as instrumentation.

DynamoRIO

DynamoRIO is tools for analyzing program, instrumentation, optimization, profiling. This tools will manipulate code while it executes. Other similar tools is Intel PIN which is proprietary. DynamoRIO itself which started by Hewlett-Packard at 2001 is open source now since 2009.

We will use Windows and an .exe file which will be analyzed. When an executable running normally, the OS will load the code into RAM then CPU execute them. But when an executable running by DynamoRIO (drrun.exe) then code will be load into memory by DynamoRIO which then splitting them into more basic blocks, manipulate the basic block, and then arrange them into trace cache for being executed by CPU. The execution speed almost same.

The executable contains instructions in opcode form that is understood by the CPU. There are some instructions that serves as control-flow that change the plot of executable. For example, on the C language the branching if or function call then return they are will be converted into JNZ,CALL, or RET respectively.

A basic block contains a set of instructions that will be executed in sequence and terminated by a single instruction control-flow. Control-flow instruction will change the contents of the register EIP that causes the code next after control-flow instruction may not necessarily executed.

If the jump address is far enough then the CPU will load more pieces of code from RAM to CPU cache, that will slightly damaging the performance. Optimization is done by DynamoRIO with reconstructing some of the basic block with ceratin order into the trace cache. It takes the translate of the destination address from the original code address to the current conditions in the trace cache. This will ensure the jump is near enough so that address may already in the CPU cache.

API

DynamoRIO has provided several APIs so that we can create our own tools at certain points when DynamoRIO analyzing and building the basic block and trace cache before the code executed by the CPU.

To learn how to use these APIs and its extensions, DynamoRIO comes with several clients and samples.

Recording

We will record the code flow from an executable into a file called trace file. Each basic block that will be executed by CPU will append its address into these trace file. We will need to add our code on each basic block as some instructions called meta instruction which will not changed the behavior of an executable.

We well also record which modules loaded (.dll files) and when and what exported function executed. These module address is needed later to know the owner of the basic block.

We also record the basic block per thread and at some point a simple time-stamp.

Clients

We will create our client with dr_client_main as entry point. In this entry point we will set the client name with dr_set_client_name, do some of initialization of some extensions and registering some event hooks.

DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[])
{
...
}

Events

We will use dr_register_exit_event to do some of uninitialization when an executable exit.

We also use the drmgr version of drmgr_register_thread_init_event and drmgr_register_thread_exit_event that will be executed when a thread initialized and exit.

We also use the drmgr version of drmgr_register_module_load_event and drmgr_register_module_unload_event that will be executed when a modules (.dll file) loaded or unloaded.

Last we use the drmgr version of drmgr_register_bb_instrumentation_event that will be used to analyze then modify the basic block.

Extensions

The extensions is a helper functions that will be linked statically with our client tool. We use drmgr extension which provides more advance function also needed by the drwrap extension.

The drwrap extension needed to instrument when an executable executed the function from loaded module (.dll file).

Other extensions is drcontainers which provider data structure functions such as hash tables.

configure_DynamoRIO_client(bbtrace)
use_DynamoRIO_extension(bbtrace drmgr)
use_DynamoRIO_extension(bbtrace drwrap)
use_DynamoRIO_extension(bbtrace drcontainers)

Initializing

We initialize extension with drmgr_init and drwrap_init. And should uninitialize them at exit with drmgr_uninit and drwrap_uninit

To open a file we use dr_open_file with DR_FILE_WRITE_OVERWRITE | DR_FILE_ALLOW_LARGE flags. To write to these file we could use dr_fprintf or dr_write_file then close it with dr_close_file.

We should use dr_snprintf version to format strig to buffer for safety reason usig DynamoRIO instead use libc snprintf directly.

The drwrap is configured with drwrap_set_global_flags with flags DRWRAP_NO_FRILLS | DRWRAP_FAST_CLEANCALLS.

We create mutex with dr_mutex_create to avoid collision when writing into trace file. After using it we destroy the mutex with dr_mutex_destroy.

Since writing directly each block into file will damage performance we need a trace list buffer which resides on memory and will be flushed to file when touch its threshold. We set up the trace list buffer per thread as TLS so that we need to use drmgr_register_tls_field.

We will save the current executable address. We need to use dr_get_main_module to get main module and access its start field. After that we should dispose with dr_free_module_data. This address will be used to check the owner of basic block.

Since we also need to track the exported function, we will use hash table and initialized with hashtable_init_ex, and provide the callback when an item should be destroyed. When done we use hashtable_delete to free the memory. To add item into hash table wr use hashtable_add and delete the item with hashtable_remove. To get an item use hashtable_lookup.

Continue to Part 2