Data Encoding
Chapter 13
Data Encoding
Goals
Defeat signature-detection by obfuscating malicious content. Disguise internal
working
Encrypt network communication
Hide command and control location
Hide staging file before transmission
Hide from “strings” analysis
Simple Ciphers
Low overhead, simple, less obvious, light-weight – prevent
basic analysis
Casesar Cipher
Shift/Rotate characters (e.g. shifting letters three characters to
the right)
XOR (e.g. XOR with 0x3C)
Bit-wise XOR of data with a fixed byte or generated byte stream
Brute-force XOR Encoding
For a fixed byte XOR, can brute force all 256 values to find a
header that makes sense. -> just try out (single-byte encoding)
MZ header 4d, 5a
Brute-Forcing Many Files
Know: PE file header contain a string: This program must be
running under Win32/This program cannot be run is DOS.
Enumerate through all possible keys to find a match
Easy to break for single-byte XOR cipher
Null-preserving XOR encoding
Easy to see by glance through the hex file for NULL – 0x12 (original) xor
0x12 (key) -> NULL
Some malware uses null-preserving XOR to make detection less obvious
Skip if original is NULL or key itself
Otherwise, XOR with key
Key is less obvious this way
Identify XOR Loops
Use Search->Text to find all the XOR
3 Cases XOR are used:
XOR of a register with itself
XOR of a register with a constant
XOR of one register with a different register
Encoding -> XOR with a constant inside a loop -> use IDA Pro to identify the
loop (graphical view)
Base 64
Base-64
From MIME standard
Represents binary data in an ASCII string format
Binary data converted into one of 64 primary characters
Every 3-bytes of binary data is encoded in 4-bytes of Base64
ATT (24 bits/3 bytes -> regroup into 4 groups (6 bits each)
Decode Base 64 (Padding)
Decoding is the same (watch out for padding)
Length of 11- should be divisible of 4
Add a padding character
Bot54164 -> The attacker is managing the bots through the ID
Look for a string used as an index table
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz012345
6789+/
Try on-line conversion tools
Caution: Malware can easily modify index table to create custom
substitution ciphers very easily (see book example)
Decode Base 64
Malware can implement their own substitution cipher.
Unsuccessful decoding – not standard
“a” moving to the front to make it appear to be standard Base 64.
Cryptographic Algorithm
Simple cipher cannot be protected from brute-force
Drawbacks of standard crypto:
Crypto libraries are large and easily detected
Must hide the key for symmetric encryption algorithms
Reduced portability
Recognizing encrypted code
Imports include well-known OpenSSL or Microsoft functions
Searching cryptographic constants
FindCrypt2 plugin in IDA Pro (search program for crypto)
Or
Krypto ANALyzer plugin for PEiD
Cryptographic Algorithm
Most crypto employs some magic constant (fixed string
of bits)
Recognizing encrypted data
Some malware employs crypto algorithms that do not have
constants (RC4, IDEA generate at run-time) or do not rely on
libraries
Krypto ANAlyzer
Identify a wide range of constants (some false positives)
High-Entropy Content
In case magic constants are not found – search for high-entropy content
Entropy – expected information content of the symbol it outputs (amount of
randomness)
IDA Entropy Plugin (graphical views)
DES-encryption
Hide command-
and-control
Normal Code
about 5.6 peak
Custom Encoding
Malware uses homegrown encoding – e.g. XOR + Base64
Trace execution to see suspicious activity in a tight loop
Example: pseudo-random number generation followed by
xor (Figure 13-14, 13-15, p. 287)
Reverse engineering to break custom encoding is more
difficult
Decoding
Self-decoding malware
Malware packaged with decoding routine
Indications : strings that don't appear in binary file on disk, but appear
in debugger
Decrypt by setting a breakpoint directly after decryption routine
finishes execution
Malware may not decrypt the info you want (uncontrollable)
Malware employing decoding functions
Can sometimes use standard libraries to decode
Python's [Link]() or PyCrypto's functions
(see examples Listing 13-8 to Listing 13-10)
Programmatically use debugger to re-run malware’s decoding code
with chosen parameters (use the malware to decode/against itself)
ImmDbg (allow Python to program the debugger)
In Class Homeworks