How does
the SLUB allocator work
Joonsoo Kim
LGE CTO SWP Lab.
[Link]@[Link]
js1304@[Link]
Contents
• Memory allocation hierarchy
• Implementation – the SLUB
• Difference between the SLAB and the SLUB
• Current status
MEMORY ALLOCATION
HIERARCHY
Page allocator
• page allocator
– fundamental memory allocator
– manage all physical memory in system
– page size = 4096 bytes
– allocate 2order pages at once
• limit
– size less than page size
What is the SLAB allocator?
• The SLAB allocator
– in-kernel library like in-userspace library malloc()
– kmalloc() = malloc()
– kmem_cache_create(), kmem_cache_alloc(), …
• The object allocator providing same API
– The SLAB allocator: traditional allocator
– The SLOB allocator: for embedded system
– The SLUB allocator: default, for large system
Allocation hierarchy
generic kernel code
object allocator
page allocator (buddy allocator)
physical page frame
Warning: term
• “the SLAB allocator” vs “the slab”
– the SLAB allocator
• one of the object allocator
– the slab
• just data structure
• used by the slab allocator and the slub allocator
IMPLEMENTATION
- SLUB
Overall structure
UMA, SMP
per cpu cpu0 cpu1
name
size
align cpu2 cpu3
…
kmem_cache kmem_cache_cpu
per node
kmem_cache kmem_cache
node0
kmem_cache_node
kmem_cache
slab caches slab slab slab slab
slab
object
Slab
1 page 1 page 1 page
slab fragmentation slab
index freelist
inuse
_mapcount objects
frozen
overload
lru lru
… …
struct page struct page
Per cpu structure
free objects from slab1
freelist
tid
cpu slab slab1
partial slabs
kmem_cache_cpu
slab2
slab3 free object
slab
Allocation: fast-path
return this object
freelist
tid
cpu slab slab
partial slabs
kmem_cache_cpu
slab
slab free object
slab
Allocation: slow-path
freelist
tid 1
cpu slab slab try
partial slabs on node
kmem_cache_cpu partial
slabs
slab
4
2
slab free object
3 slab
Allocation: very slow-path
cpu0 cpu1
per cpu
get a new
cpu2 freelist
slab from
name tid page
size
cpu slab allocator
align
… partial slabs
1 kmem_cache_cpu
lock 6
per node
kmem_cache
node0
kmem_cache_node
slab slab slab slab
2 3 4 5
Free: fast-path
object from slab1
freelist
tid
cpu slab slab1
partial slabs
kmem_cache_cpu
slab2
slab3 free object
slab
Free: slow-path
case 1.
cpu0 cpu1 remote cpu slab
cpu2 cpu3 slab
kmem_cache
slab
case 4.
add a slab to first free object
cpu partial
list
node0
slab
kmem_cache_node case 2.
normal case
slab
case 3.
discard inuse = 0
Performance optimization
• this_cpu_cmpxchg_double
– avoid disabling interrupt
• cmpxchg_double
– avoid taking a lock
Performance optimization:
freelist of kmem_cache_cpu
allocation request
new allocation request
read freelist of interrupt
kmem_cache_cpu read freelist of
kmem_cache_cpu
detach first object(A)
return from
detach first object(A) interrupt assign following object(B)
to freelist
assign following object(B)
to freelist return object(A)
return object(A) crash need
irq disable
Performance optimization:
freelist of kmem_cache_cpu
allocation request
new allocation request
read freelist, tid of interrupt
kmem_cache_cpu read freelist, tid of
kmem_cache_cpu
detach first object(A)
return from
detach first object(A) interrupt this_cpu_cmpxchg_double:
new freelist(B),new tid
this_cpu_cmpxchg_double:
new freelist(B),new tid return object(A)
fail no need
retry “irq disable”
Performance optimization:
freelist of a slab
acquire freelist of slab insert object into
freelist of slab
read read
freelist, counter of slab freelist, counter of slab
attach one object(A)
acquire freelist
cmpxchg_double_slab:
cmpxchg_double_slab: new freelist(A),
new freelist(NULL), new counter
new counter
success
fail no need
retry “lock”
allocation path free path
A DIFFERENCE BETWEEN
THE SLAB AND THE SLUB
Caching policy
cpu0 In case of
the SLAB
cpu1
kmem_cache cpu2
cpu3
node0 In case of
the SLUB
slab
kmem_cache_node
slab
Free object management
cpu object cache The SLAB The SLUB
data structure array list
max number of objects 120 don’t care
size (64bits) 120 * 8 bytes 8 byte
slab The SLAB The SLUB
data structure array list
max number of objects 202 don’t care
size (64bits) 202 * 4 bytes 8 byte
(overload “struct page”)
Miscellaneous
• kmalloc alignment
• fallback order slab
• kmem_cache alignment
• debugging feature
• NUMA
CURRENT STATUS
Trends
• per cpu partial lists
• common sl[aou]b
• slab accounting for memcg
Any questions?