Funky file Formats
Ange Albertini
2014/12 - 31C3
Funky
File
Ange Albertini
reverse engineering &
visual documentations
@angealbertini
ange@[Link]
[Link]
So, this talk is about files what are the usual files categories?
It depends if youre a newbie, a user, a dev, a hacker...
...but in general, valid files arent very sexy!
However, the frontier between valid and corrupted is not straight and clear !
Here is a valid file
f76f5dafdcf0818c457e6ffb50ea61a67196dcd4 *[Link]
(ok, maybe not a standard file)
This is a JPEG picture...
...thats also a Java file.
AES(
If you encrypt it with AES...
you get a PNG picture.
3DES(
If you decrypt it with Triple DES...
...you get a PDF document.
AESK (
2
If you encrypt the original file with AES again, but with a different key...
...you get a Flash Video
..that oh well, nevermind, I could go on for hours...
JPG
AESK
PNG
JAR
(ZIP + CLASS)
AESK
3DES
FLV
PDF
So, as you can see, Im just a normal guy (who likes to play with binary).
I also like to explain binary [Link] / [Link]
Lets talk about...
Identification
How do you identify a cow?
By its head?
By its body?
By sound?
in practice...
early filetype
identifier
Obvious
PE\0\0 \x7FELF BPG\xFB
\x89PNG\x0D\x0A\x1A\x0A
dex\n035\0 RAR\x1a\7\0 BZ
GIF89a BM RIFF
Not obvious
GZip 1F 8B
JPG
FF D8
Not obvious, but l33tsp34k ^_^
CAFEBABE Java / universal (old) Mach-O
DOCF11E0 Office
FEEDFACE Mach-O
FEEDFACF Mach-O (64b)
Egocentric
MZ (DOS header)
PK\3\4 (ZIP)
BPG\xFB
Mark Zbikowski
Philip Katz
Fabrice Bellard
Specific logic
TIFF:
II Intel (little) endianness
MM Motorola (big) endianness
Flash:
FWS ShockWave Flash (Flat)
CWS (zlib) compressed
ZWS LZMA compressed
Magic signatures, enforced at offset 0
not enforcing signature at offset 0: ZIP, 7z, RAR, HTML
actually enforcing signature at offset 0: bzip2, GZip
File formats not enforcing signature at offset 0
(ZIP is used in many formats: APK, ODT, DOCX, JAR)
ZIP actually enforces finishing near the end of the file.
TAR: Tape Archive
Disk images: ISO, Master Boot Record
TGA (image)
(Console) roms
Hardware-bound formats: code/data at offset 0
header often (optionally) later in the memory space
a good magic signature:
enforced at offset 0
unique
no magic no excuse
Standard tool: checks magic,
chooses path, never returns...
Another common
yet important property
(useful for abuses)
Its a complete cow (you can see its whole body), with something next:
appending something doesnt invalidate the start.
Remember:
theres nothing to parse
after the terminator.
PE
PDF
HTML
formats not enforced at offset 0
+ tolerating appended data
= polyglots
by concatenation
ZIP
a JAR(JAR) || BINK polyglot
JAR = ZIP(CLASS)
host/parasite polyglots
If a cow keeps a frog in its mouth, it can also speak 2 languages!
(the outer leaves space for an inner)
Ok, I know here is a more realistic analogy...
...if our cow swallows a microSD, its still a valid cow!
Even if it contains foreign data, that is tolerated by the system.
2 infection chains in one file:
the PDF part is stored in a Java buffer
a JavaScript || GIF polyglot (useful for pwning - also in BMP flavor)
Such parasites exist already in the wild
(they just use unallocated space)
PoC||GTFO 0x2: MBR || PDF || ZIP
by Travis Goodspeed
PoC||GTFO 0x3: JPG || AFSK || AES(PNG) || PDF || ZIP
PoC||GTFO 0x4: TrueCrypt || PDF || ZIP
by Alex Infhr
PoC||GTFO 0x5: Flash || ISO || PDF || ZIP
PoC||GTFO 0x6: TAR || PDF || ZIP
$ tar -tvf
-rw-r--r--rw-r--r--rw-r--r--
[Link]
Manul/Laphroaig
0 2014-10-06 21:33 %PDF-1.5
Manul/Laphroaig 525849 2014-10-06 21:33 [Link]
Manul/Laphroaig 273658 2014-10-06 21:33 [Link]
$ unzip -l [Link]
Archive: [Link]
warning [[Link]]: 10672929 extra bytes at...
(attempting to process anyway)
Length
Date
Time
Name
--------- ---------- -------4095 11/24/2014 23:44
[Link]
818941 08/18/2014 23:28
acsac13_zaddach.pdf
4564 10/05/2014 00:06
[Link]
342232 11/24/2014 23:44
[Link]
3785 11/24/2014 23:44
[Link]
5111 09/28/2014 21:05
[Link]
0 08/23/2014 19:21
ecb2/
unicode //
a Java || JavaScript polyglot (at source level)
a Java || JavaScript polyglot (at binary level)
Java = JavaScript
Yes, your management was right all along ;)
Extreme files bypass filters
Farmer got denied permit to build a horse shelter.
So he builds a giant table & chairs which dont need a permit.
a mini PDF (Adobe-only, 36 bytes) skipped by scanners yet valid !
a 64K sections PE (all executed) crashes many softwares, evades scanning
Parsing
This is a how a user sees a cow.
This is how a dev sees a cow
This is how another dev sees a cow !
(this one: brazilian beef cut - previous: french beef cut)
Same data, different parsers
it would have been too easy ;)
commented line
missing trailer keyword
a schizophrenic PDF: 3 different trailers, seen by 3 different readers
a schizophrenic PDF (screen printer)
PDF viewer
PDF slides
a (generated) PDF || PE || JAR [JAVA+ZIP] || HTML polyglot...
...which is also a schizophrenic PDF
$ du -h stringme
141
stringme
$ strings stringme
Segmentation fault (core dumped)
Extra problem: parsers can be present in unexpected places
[Link] (CVE-2014-8485)
metadata
Whos the owner?
A hidden cow just looks like another cow...
so cattle is branded.
But brandings can be faked!
or patched into another symbol
attribution is hard
and in a pure PoC||GTFO fashion,
@munin forged a branding iron !
an encrypted file is not always encrypted
encrypt(file) is not always random
encrypt(file) can be valid
.D.A.T.A.[.[Link].[Link].9.A.B
.C.D.E.F.].E.N.D
?
.T.E.X.T0A.t.h.i.s. .i.s. .a. .t
.e.x.t0A
We want to encrypt a DATA file to a TEXT file.
DATA tolerates appended data after its END marker
TEXT accepts /* */ comments chunk (think parasite in a host)
.D.A.T.A.[.[Link].[Link].9.A.B
.C.D.E.F.].E.N.D
<random>
if we encrypt, we get random result. we cant control AES output & input together.
AES works with blocks
File encryption applies AES via a mode of operation
Electronic Code Book:
penguin = bad
choose the IV to control
both first blocks (P1 & C1)
.D.A.T.A.[.[Link].[Link].9.A.B
.C.D.E.F.].E.N.D
+IV1
.T.E.X.T <something we control>
<random rest>
Encrypt with pure AES, then determine IV to control the output block
.D.A.T.A.[.[Link].[Link].9.A.B
.C.D.E.F.].E.N.D
+IV2
.T.E.X.T./.*
<ignored random rest>
We cant control the rest of the garbage so lets put a comment start in the first block
.D.A.T.A.[.[Link].[Link].9.A.B
.C.D.E.F.].E.N.D
.T.E.X.T./.*
<ignored random rest>
.*./0A.t.h.i.s. .i.s. .a. .t
.e.x.t0A
If we close the comment and append the target files data in the encrypted file.
then this file is valid and equivalent to our initial target.
.D.A.T.A.[.[Link].[Link].9.A.B
.C.D.E.F.].E.N.D
<pre-decrypted ignored random>
+IV2
.T.E.X.T./.*
<ignored random rest>
.*./0A.t.h.i.s. .i.s. .a. .t
.e.x.t0A
...then we decrypt that file: we get the original source file,
with some random data, that will be ignored since its appended data.
.D.A.T.A.[.[Link].[Link].9.A.B
.C.D.E.F.].E.N.D
<pre-decrypted ignored random>
+IV2
.T.E.X.T./.*
<ignored random rest>
.*./0A.t.h.i.s. .i.s. .a. .t
.e.x.t0A
Since AES CBC only depends on previous blocks,
this DATA file will indeed encrypt to a TEXT file.
AngeCryption PoC layout
00:
10:
20:
30:
4441
4344
f6fe
2e8e
5441
4546
17cf
6996
5b31
5d45
0802
5854
3233
4e44
7449
824c
3435
0000
58de
c09c
3637
0000
cdf2
1b7d
3839
0000
f9c4
4898
4142
0000
45ce
a29e
DATA[123456789AB
CDEF]END........
......tIX.....E.
..[Link].L...}H...
openssl enc -aes-128-cbc -nopad
-K `echo OurEncryptionKey|xxd -p`
-iv A37A69F13417F5AB3CC4A1546B97FD76
00:
10:
20:
30:
5445
3f81
2a2f
740a
5854
11a9
0a74
454e
2f2a
2540
6869
4400
0000
ded5
7320
0000
0000
096a
6973
0000
0000
83c9
2061
0000
0000
f191
2074
0000
0000
d8bb
6578
0000
TEXT/*..........
?...%@...j......
*/.this is a tex
[Link]...........
You can even try it at home :)
Chimera
(if you skip identified bodies, youll miss other files)
a JPEG || ZIP || PDF Chimera
image data
a chimera defeats sequential parsing with optimization
a Picture of Cat
(BMP ! uncompressed ! OMG)
BMP let us define bit masks for each color:
32 bits: 0000000000000000rrrrrggggggbbbbb (no alpha)
16 bits of free space!
lets play the picture!
no, seriously :)
Consider the BMP
as RAW 32b PCM
1. store sound in the lower 16 bits:
sound ignored by BMP
image data too low to be audible
2. store a picture encoded as sound
viewable as spectrogram
[Link]
an RGB BMP || raw (3-channel spectrogram) polyglot by @doegox
Cerbero
same type of heads, one body
an RGB picture...
RGB picture data = bytes triplets for R, G, B colors
...with an unused palette
palette picture data = each byte is an index in the palette
in theory, it could be used:
How to make a pic-ception
adjust each RGB value to the closest palette index
store a second picture with the same data.
(original idea by @reversity)
We get another picture of
the same type from the
same data!
BTW, thats a barcode inception:
a DataMatrix barcode inside a QRCode, both valid
[Link]
Hash collisions
This is the actual SHA-1 with only 4 of its 5 constants modified
This doesnt give a collision in the actual SHA-1
2 colliding blocks: mostly random and unpredictable
At most three consecutive bytes without a difference.
Typically, in every dword, only the middle two bytes have no differences.
Abusing JPEGs multiple unused APPx (FF Ex) markers
Much better! (images chosen at random)
a polyglot collision (multiple use for a single backdoor)
Pwnie award for the best song! err what is it pwning exactly ?
Even songs should also have a nice PoC
(never forget to load your PDFs in your favorite NES emulator)
Do you remember this ?
A Super NES & Megadrive rom
(and PDF at the same time)
Conclusion
Anges recipes :)
Never forget to:
open your PDFs in a hex editor
open your pictures in a sound player
run your documents in a console emulator
encrypt/decrypt with any cipher
double-check what you printed
Security advice:
DONT *
Its easy to blame others - new insecure paths appear everyday
Research advice:
DO *
PoC||GTFO ! stop the marketing! cheap blamers blatant marketers?
F.F.F. conclusion
many abuses of the specs
specs often are wrong or misleading
few parsers, even fewer dissectors
standard tools evolve the wrong way
try to repair corrupted file outside the specs
standard and recovery mode
For technical details, check my previous talks.
ACK
@doegox @pdfkungfoo @veorq @reversity
@travisgoodspeed @sergeybratus qkumba
@internot @gynvael @munin
@solardiz @0xabadidea @ashutoshmehra
lytron @JacobTorrey @thicenl
and anybody who gave me feedback!
Bonus
after the talk, we tried some PoCs on professional
(very expensive!) forensic softwares:
polyglot files
a single file format found + no warning whatsoever
schizophrenic files:
no warning yet different tabs of the same software showing
different content :D
BIG FAIL - yet we trust them for court cases ?
**
*this is a valid..
**
Albertini
...TAR & Adobe PDF:
PoC or
____ _____ _____ ___
_
/ ___|_
_| ___/ _ \ | |
| | _ | | | |_ | | | ||_|
| |_| | | | | _|| |_| | _
\____| |_| |_|
\___/ |_|
%PDF-1.
trailer<</Root<</Pages<<>>>>>>
The initial abstract of this talk:
ASCII-only, PDF/TAR polyglot
Solar Designer made a great keynote - thats actually a real game to play!
But one have to load and play through the game - not so accessible!
[Link]
a PDF:
containing the game as ZIP
hand-written
with walkthroughs screenshots
(in original resolution)
a lightweight title
while maintaining compatibility
a good way to distribute as a single file!
$ unzip -t [Link]
Archive: [Link]
warning [[Link]]:
(attempting to process anyway)
testing: ZN14GAME/
OK
testing: ZN14GAME/COMMON/
OK
...
6381506 extra bytes
Quine
prints its own source
a PE quine (in assembler, no linker)
Most quines arent very sexy
Using a compiler is cheap :p
Quine Relay
A prints Bs source
B prints As source
a PE ELF quine relay
(no linker)
a 50-languages quine relay
[Link]
other AngeCryption PoCs (PDF, PNG, JPG)
A bit of everything
@angealbertini
[Link]
Damn, that's the second time those alien bastards shot up my ride!