Journey to Understand RetDec – Part 1

This post is just my self notes to remember what I’ve achieved during a short of period between my “flip-flop” with my actual projects.

The repository: https://github.com/avast/retdec.git

Just read and follow the README.md.

The entry point of this decompiler is: bin/retdec-decompiler.py which will call multiple tools for each passes.

There is a stripped log that I’ve captured for app.exe:

##### Checking if file is a Mach-O Universal static library...

##### Checking if file is an archive...
RUN: bin/retdec-ar-extractor app.exe --arch-magic
Not an archive, going to the next step.

##### Gathering file information...
RUN: bin/retdec-fileinfo -c app.exe.json --similarity app.exe --no-hashes=all --crypto share/retdec/support/generic/yara_patterns/signsrch/signsrch.yara --crypto share/retdec/support/generic/yara_patterns/signsrch/signsrch.yarac --max-memory-half-ram

Input file : app.exe
File format : PE
File class : 32-bit
File type : Executable file
Architecture : x86
Endianness : Little endian
Image base address : 0x400000
Entry point address : 0x75bcc7
Entry point offset : 0x35bcc7
Entry point section name : .text
Entry point section index: 0
Bytes on entry point : 6a186898397d00e8cd500000bf940000008bc7e8f10700008965e88bf4893e56ff15683178008b4e10890d30848b008b4604
Detected tool : Microsoft Linker (7.0) (linker), combined heuristic
Detected tool : MSVC (7.0) Visual Studio .NET 2002 (compiler), combined heuristic
Original language : C++
Rich header offset : 0x80
Rich header key : 0x6e18e0d5
Rich header signature : 002723ca00000003004024fa00000033001c24fa000000b2005f0c0500000057000f0c050000000c00600c05
000000fb001c23da00000005001923fa0000001c005f0fc300000001005f08130000000c001220fc00000006
00600fc300000079005d0fc3000000030001000000000108001d24fa0000006b005e0bec00000001003d24fa
00000001

##### Trying to unpack app.exe into app.exe-unpacked.tmp by using generic unpacker...
RUN: bin/retdec-unpacker app.exe -o app.exe-unpacked.tmp --max-memory-half-ram

No matching plugins found for 'Microsoft Linker 7.0'
No matching plugins found for 'MSVC 7.0'
##### Unpacking by using generic unpacker: nothing to do

##### Trying to unpack app.exe into app.exe-unpacked.tmp by using UPX...
RUN: upx -d app.exe -o app.exe-unpacked.tmp

upx: app.exe: NotPackedException: not packed by UPX
##### Unpacking by using UPX: nothing to do

##### Decompiling app.exe into app.exe.bc...
RUN: bin/retdec-bin2llvmir -provider-init -decoder -verify -x87-fpu -main-detection -idioms-libgcc -inst-opt -cond-branch-opt -syscalls -stack -constants -param-return -local-vars -inst-opt -simple-types -generate-dsm -remove-asm-instrs -class-hierarchy -select-fncs -unreachable-funcs -inst-opt -x86-addr-spaces -value-protect -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -instcombine -tbaa -targetlibinfo -basicaa -domtree -simplifycfg -domtree -early-cse -lower-expect -targetlibinfo -tbaa -basicaa -globalopt -mem2reg -instcombine -simplifycfg -basiccg -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -simplifycfg -reassociate -domtree -loops -loop-simplify -lcssa -loop-rotate -licm -lcssa -instcombine -scalar-evolution -loop-simplifycfg -loop-simplify -aa -loop-accesses -loop-load-elim -lcssa -indvars -loop-idiom -loop-deletion -memdep -gvn -memdep -sccp -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -dce -bdce -adce -die -simplifycfg -instcombine -strip-dead-prototypes -globaldce -constmerge -constprop -instnamer -domtree -instcombine -inst-opt -simple-types -stack-ptr-op-remove -idioms -global-to-local -dead-global-assign -instcombine -inst-opt -idioms -phi2seq -value-protect -disable-inlining -disable-simplify-libcalls -config-path app.exe.json -max-memory-half-ram -o app.exe.bc

Running phase: Initialization ( 0.01s )
Running phase: LLVM ( 0.02s )
Running phase: Providers initialization ( 0.02s )
Running phase: Input binary to LLVM IR decoding ( 2.66s )
Running phase: LLVM ( 177.20s )
Running phase: x87 fpu register analysis ( 178.84s )
Running phase: Main function identification optimization ( 185.27s )
Running phase: Libgcc idioms optimization ( 185.36s )
Running phase: LLVM instruction optimization ( 185.36s )
Running phase: Conditional branch optimization ( 199.31s )
Running phase: Syscalls optimization ( 223.52s )
Running phase: Stack optimization ( 223.52s )
Running phase: Constants optimization ( 332.93s )
Running phase: Function parameters and returns optimization ( 705.10s )
Running phase: Register localization optimization ( 758.65s )
Running phase: LLVM instruction optimization ( 786.78s )
Running phase: Simple types recovery optimization ( 798.37s )
Running phase: Disassembly generation ( 834.99s )
Running phase: Assembly mapping instruction removal ( 875.46s )
Running phase: C++ class hierarchy optimization ( 899.23s )
Running phase: Selected functions optimization ( 901.67s )
Running phase: Unreachable functions optimization ( 901.67s )
Running phase: LLVM instruction optimization ( 902.15s )
Running phase: x86 address spaces optimization ( 912.43s )
Running phase: Value protection optimization ( 914.20s )
Running phase: LLVM ( 918.83s )
Running phase: LLVM instruction optimization ( 1835.37s )
Running phase: Simple types recovery optimization ( 1843.00s )
Running phase: Stack pointer operations optimization ( 1843.08s )
Running phase: Instruction idioms optimization ( 1845.14s )
Running phase: Global to local optimization ( 1856.68s )
Running phase: Dead global assign optimization ( 1966.07s )
Running phase: LLVM ( 2055.81s )
Running phase: LLVM instruction optimization ( 2074.47s )
Running phase: Instruction idioms optimization ( 2079.34s )
Running phase: Phi2Seq optimization ( 2086.92s )
Running phase: Value protection optimization ( 2088.04s )
Running phase: LLVM ( 2089.44s )
Running phase: Bitcode Writer ( 2090.13s )
Running phase: Assembly Writer ( 2092.58s )
Running phase: Cleanup ( 2102.04s )

##### Decompiling app.exe.bc into app.exe.c...
RUN: bin/retdec-llvmir2hll -target-hll=c -var-renamer=readable -var-name-gen=fruit -var-name-gen-prefix= -call-info-obtainer=optim -arithm-expr-evaluator=c -validate-module -o app.exe.c app.exe.bc -enable-debug -emit-debug-comments -config-path=app.exe.json -max-memory-half-ram
Running phase: initialization ( 2.34s )
-> creating the used HLL writer [c] ( 2.34s )
-> creating the used alias analysis [simple] ( 2.34s )
-> creating the used call info obtainer [optim] ( 2.34s )
-> creating the used evaluator of arithmetical expressions [c] ( 2.34s )
-> creating the used variable names generator [fruit] ( 2.34s )
-> creating the used variable renamer [readable] ( 2.34s )
-> creating the used semantics [libc,gcc-general,win-api] ( 2.34s )
-> loading the input config ( 2.34s )
Running phase: conversion of LLVM IR into BIR ( 8.56s )
-> converting global variables ( 11.07s )
-> converting function function_401000 ( 14.93s )
....
Running phase: removing functions prefixed with [__decompiler_undefined_function_] ( 93.96s )
Running phase: removing functions from standard libraries ( 210.15s )
Running phase: removing code that is not reachable in a CFG ( 210.18s )
Warning: [NonRecursiveCFGBuilder] there is no node for an edge to
...
Running phase: signed/unsigned types fixing ( 232.33s )
Running phase: converting LLVM intrinsic functions to standard functions ( 498.39s )
Running phase: obtaining debug information ( 504.04s )
Running phase: alias analysis [simple] ( 506.98s )
Running phase: optimizations [normal] ( 513.46s )
-> running GotoStmtOptimizer ( 513.46s )
-> running RemoveUselessCastsOptimizer ( 516.92s )
-> running UnusedGlobalVarOptimizer ( 524.39s )
-> running DeadLocalAssignOptimizer ( 531.70s )
-> running SimpleCopyPropagationOptimizer ( 1510.30s )
Warning: [NonRecursiveCFGBuilder] there is no node for an edge to
...

Actually I still don’t get the app.exe.c output since the process takes too long and after several hours I’ve abort it.

The steps to decompile is quite straight forward:

  1. Unpack and function detected.
  2. The bin2llvmir: app.exe –> app.exe.bc (capstone)
  3. The llvmir2hll: app.exe.bc –> app.exe.c

The things that is mutate over each pass is the config file app.exe.json. The config file consists of information about the architecture and compiler used to build the app.exe and also filled by pattern detected functions.

The tools has its own argument parser to accommodate the needs of:

  1. Order of arguments will changes the order of process within.
  2. The argument may shows several times at some position depends on how the process to achieve.

TODO: Try to decompile only several interested functions which is produced by bbtrace.