Intro
To understand what an executable program does (which no source code, with or without debug symbol), we need disassembler. Further more we could also need decompiler to reversing into high level language like C that understandable by human.
That tools will statically analyze the binary with all possible branches. Understanding all branches will take more time for us to understand and may take more pain. Besides there was a part which never be executed nor obfuscated which make analyze become wrong.
Usually the next task to do is by debugging it, setting break-point, and following the next trace. Doing this is also more trivial and need more time. Sometimes branches may not entering some path because only occurs when user give some input. So then real-time tools that has ability to record the trace is needed, such as instrumentation.
DynamoRIO
DynamoRIO is tools for analyzing program, instrumentation, optimization, profiling. This tools will manipulate code while it executes. Other similar tools is Intel PIN which is proprietary. DynamoRIO itself which started by Hewlett-Packard at 2001 is open source now since 2009.
We will use Windows and an .exe file which will be analyzed. When an executable running normally, the OS will load the code into RAM then CPU execute them. But when an executable running by DynamoRIO (drrun.exe) then code will be load into memory by DynamoRIO which then splitting them into more basic blocks, manipulate the basic block, and then arrange them into trace cache for being executed by CPU. The execution speed almost same.
The executable contains instructions in opcode form that is understood by the CPU. There are some instructions that serves as control-flow that change the plot of executable. For example, on the C language the branching if or function call then return they are will be converted into JNZ,CALL, or RET respectively.
A basic block contains a set of instructions that will be executed in sequence and terminated by a single instruction control-flow. Control-flow instruction will change the contents of the register EIP that causes the code next after control-flow instruction may not necessarily executed.
If the jump address is far enough then the CPU will load more pieces of code from RAM to CPU cache, that will slightly damaging the performance. Optimization is done by DynamoRIO with reconstructing some of the basic block with ceratin order into the trace cache. It takes the translate of the destination address from the original code address to the current conditions in the trace cache. This will ensure the jump is near enough so that address may already in the CPU cache.
API
DynamoRIO has provided several APIs so that we can create our own tools at certain points when DynamoRIO analyzing and building the basic block and trace cache before the code executed by the CPU.
To learn how to use these APIs and its extensions, DynamoRIO comes with several clients and samples.
Recording
We will record the code flow from an executable into a file called trace file. Each basic block that will be executed by CPU will append its address into these trace file. We will need to add our code on each basic block as some instructions called meta instruction which will not changed the behavior of an executable.
We well also record which modules loaded (.dll files) and when and what exported function executed. These module address is needed later to know the owner of the basic block.
We also record the basic block per thread and at some point a simple time-stamp.
Clients
We will create our client with dr_client_main as entry point. In this entry point we will set the client name with dr_set_client_name, do some of initialization of some extensions and registering some event hooks.
DR_EXPORT void dr_client_main(client_id_t id, int argc, const char *argv[])
{
...
}
Events
We will use dr_register_exit_event to do some of uninitialization when an executable exit.
We also use the drmgr version of drmgr_register_thread_init_event and drmgr_register_thread_exit_event that will be executed when a thread initialized and exit.
We also use the drmgr version of drmgr_register_module_load_event and drmgr_register_module_unload_event that will be executed when a modules (.dll file) loaded or unloaded.
Last we use the drmgr version of drmgr_register_bb_instrumentation_event that will be used to analyze then modify the basic block.
Extensions
The extensions is a helper functions that will be linked statically with our client tool. We use drmgr extension which provides more advance function also needed by the drwrap extension.
The drwrap extension needed to instrument when an executable executed the function from loaded module (.dll file).
Other extensions is drcontainers which provider data structure functions such as hash tables.
configure_DynamoRIO_client(bbtrace)
use_DynamoRIO_extension(bbtrace drmgr)
use_DynamoRIO_extension(bbtrace drwrap)
use_DynamoRIO_extension(bbtrace drcontainers)
Initializing
We initialize extension with drmgr_init and drwrap_init. And should uninitialize them at exit with drmgr_uninit and drwrap_uninit
To open a file we use dr_open_file with DR_FILE_WRITE_OVERWRITE | DR_FILE_ALLOW_LARGE flags. To write to these file we could use dr_fprintf or dr_write_file then close it with dr_close_file.
We should use dr_snprintf version to format strig to buffer for safety reason usig DynamoRIO instead use libc snprintf directly.
The drwrap is configured with drwrap_set_global_flags with flags DRWRAP_NO_FRILLS | DRWRAP_FAST_CLEANCALLS.
We create mutex with dr_mutex_create to avoid collision when writing into trace file. After using it we destroy the mutex with dr_mutex_destroy.
Since writing directly each block into file will damage performance we need a trace list buffer which resides on memory and will be flushed to file when touch its threshold. We set up the trace list buffer per thread as TLS so that we need to use drmgr_register_tls_field.
We will save the current executable address. We need to use dr_get_main_module to get main module and access its start field. After that we should dispose with dr_free_module_data. This address will be used to check the owner of basic block.
Since we also need to track the exported function, we will use hash table and initialized with hashtable_init_ex, and provide the callback when an item should be destroyed. When done we use hashtable_delete to free the memory. To add item into hash table wr use hashtable_add and delete the item with hashtable_remove. To get an item use hashtable_lookup.
Continue to Part 2