0% found this document useful (0 votes)

50 views42 pages

Linux Physical Memory Page Allocation - ZH-CN - en

The document discusses Linux physical memory page allocation. It describes kernel page allocation and recycling APIs, how free pages are managed through the buddy system, and details of the page allocation process including UMA, NUMA, and page reclaim functions.

Uploaded by

st.andrews.eve

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views42 pages

Linux Physical Memory Page Allocation - ZH-CN - en

Uploaded by

st.andrews.eve

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Translated from Chinese (Simplified) to English - www.onlinedoctranslator.

com

LinuxPhysical memory page allocation

Linuxphysical memory

Page allocation

http://www.ilinuxkernel.com

Original link address:http://ilinuxkernel.com/?p=1371

Welcome to criticize, correct or leave questions.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

Head record
1 Overview................................................ ................................................................. .................................................................3

2 Kernel page allocation and recyclingAPI ........................................................ ................................................................. ............3

3 Management of free pages........................................ ................................................................. .............................4

3.1 Physical memory space description........................................ ................................................................. .............4

3.2 Management of free pages........................................ ................................................................. .....................5

4 Partner Algorithm................................................ ................................................................. ........................................7

4.1 Buddy System................................................ ................................................................. .............7

4.2 Example of partner algorithm................................................ ................................................................. .....................8

4.2.1 Page allocation process................................................ ................................................................. ............9

4.2.2 Page recycling process................................................ ................................................................. ...........10

4.3 BuddyView system information................................................ ................................................................. ........11

5 Page Allocation................................................ ................................................................. ....................................................12

5.1 UMAPage Allocation................................................ ................................................................. ............12

5.2 NUMAPage Allocation................................................ ................................................................. .............13
5.2.1 NUMAStrategy andcpusetFunction................................................. ........................................14
5.2.2 alloc_pages_current()........................................ ........................................14
5.3 __alloc_pages_nodemask()........................................ .................................................................16
5.3.1 Memory migration types andlockdep........................................................ ........................................16

5.3.2 __alloc_pages_nodemask()........................................ .....................................17

5.4 get_page_from_freelist()........................................ ....................................................19
5.4.1 area(Zone)level................................................ ................................................................. .19
5.4.2 Hot-N-Coldpage................................................. ................................................................. .....twenty one

5.4.3 get_page_from_freelist()........................................ ....................................................twenty two

5.4.4 __rmqueue()........................................ ................................................................. .......27
5.5 __alloc_pages_slowpath()........................................ ....................................................32
5.5.1 __alloc_pages_direct_compact()........................................ ............................36
5.5.2 __alloc_pages_direct_reclaim()........................................ .............................38
6 Affects page allocation behaviorGFPLogo........................................ ................................................................. ..39

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

1Overview

In user modeCIn language programs, we allocate memory functionsmalloc()orcalloc() is very familiar; function execution

If the operation is successful, the required memory starting address will be returned. Obviously these functions cannot run in the kernel mode. There are special functions in the kernel mode.

Gate memory application/release function.

LinuxHow to allocate and reclaim memory in the kernel? How to manage free memory? This article is based onlinux 2.6.32-220.el6Version

This kernel source code is based on the introductionLinuxHow physical memory pages are allocated in the kernel.

2 Kernel page allocation and recyclingAPI

Let’s first understand what memory pages are allocated and recycled in the kernel.API,commonAPIAs shown in the following table:

surface1Physical memory page allocation and recyclingAPI

function describe

alloc_page(gfp_mask) Allocate a page and return the page data structure

alloc_pages(gfp_mask,order) distribute2 orderspage, returns the data structure of the first page

__get_free_page(gfp_mask) Allocates a page and returns the logical address of the page

__get_free_pages(gfp_mask, distribute2 orderspage, and returns the logical address of the page

get_zeroed_page(gfp_mask) distribute1page, the data is cleared, and the logical address is returned

get_dma_pages(gfp_mask,order) Assignment fitsDMAOperation page

__free_pages(page,order) freed2 orderspage, the first parameter is the page address

free_pages(addr,order) freed2 orderspage, the first parameter is the logical address

free_page(addr) freed2 orderspages

Note: In kernel programming, these functions are usually not directly dealt with, but oftenkmalloc(),vmalloc(),

kmem_cache_alloc() and other functions. Here we only introduce page allocation.slabThe mechanism will be introduced inkmalloc()

functions.

The relationship between the various functions of page allocation is as follows:

picture2Relationship between page allocation functions

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

The relationship between the various functions of page recycling is as follows:

picture3The relationship between page recycling functions

3Management of free pages

3.1Physical memory space description

exist"LinuxPhysical Memory Description》(http://ilinuxkernel.com/?p=1332 ), we introduced in detail the

The kernel divides physics into three levels: nodes (Node),area(Zone) and pages (Page).

The physical memory space is described at a high level as a node (Nodes), and then the node includes multiple areas (Zones),

The back area contains many pages (Pages), the relationship between the three is shown in the figure below.

picture4Node, area and page relationship diagram

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

3.2Management of free pages

When allocating and recycling pages, it must involve how free pages are managed.

When shopping in the supermarket, we can pay attention to how the cashier manages money (organizes, stores). usually put5

Put all the corners together,1Yuan all put together... face value100Yuan's all put together.LinuxKernel idle

The idea of memory management is similar.

existLinuxIn the kernel, the basic unit of free memory management is the page (x86/x86-64 CPUdefined page), that is

Manage physical memory in units of pages (kmallocwaitslab/slubmechanism, which is a smaller segment than a page).

Just like a supermarket cashier managing money,LinuxEach free block of memory managed by the kernel is2to the power of pages,

The size of the power isorder. Bundle1put together the free pages,2free pages (with consecutive physical addresses) placed together

rise,4free pages (with consecutive physical addresses) put together...2MAX_ORDER-1pages (physical addresses are consecutive)

together. Free pages are organized as shown in the figure below.

picture5Management of free page blocks

exist2.6.32-220.el6In the kernel,MAX_ORDERusually defined as11, the kernel manages large contiguous idle physical

The memory size is211-1pages, that is4MB.

00022:/Free memory management - zoned buddy allocator. /

00023:#ifndefCONFIG_FORCE_MAX_ZONEORDER 00024:#define
MAX_ORDER11 00025:#else

00026:#defineMAX_ORDERCONFIG_FORCE_MAX_ZONEORDER
00027:#endif
00028:#defineMAX_ORDER_NR_PAGES(1<<(MAX_ORDER-1))

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

Zone and free page

in area (zone), there is an array in the data structurefree_area[MAX_ORDER]to save each empty

Free memory block linked list,

00287:structzone{
00288: /* Fields commonly accessed by the page allocator */
00289:
.......

00331: structfree_area free_area[MAX_ORDER];

.......

00445: unsigned longpadding[16];

00446:}? end zone? cacheline_internodealigned_in_smp;

sofree_area[MAX_ORDER]No. 1 in the array1elements, pointing to a memory block size of20Right now1pages

free page linked list; the first in the array2elements point to a memory block of size20Right now1free page linked list of pages; after

An element points to a large free memory block linked list with size2MAX_ORDER-1pages, currentlyMAX_ORDERvalue

defined as11.

Each area (zone) has onefree_area[MAX_ORDER]array, its data typefree_areastructure

The body is defined as follows:

00057:structfree_area{
00058: structlist_head free_list[MIGRATE_TYPES];
00059: unsigned long nr_free;
00060:};

The meaning of each member variable is as follows:

free_list: Double linked list of free page blocks;

nr_free: The number of free page blocks in this area;

Each element on each free page linked list (consecutive physical pages of the same size) is passedstruct pagedouble strand in

table member variables to connect, as shown in the figure below.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

picture6Management of free page linked lists

we knowLinuxThe kernel describes physical memory at three levels: nodes, regions, and pages. Management of free pages only

is in the area (Zone) this layer, node (Node) manages its own free physical pages. null

The relationship between free page management, nodes, and regions is as shown below.

picture7The relationship between free page management and nodes and regions

4partner algorithm

4.1 Buddy System

Partner system (Buddy System) is a very simple memory allocation algorithm in theory. Its main purpose is to

Possibly less external debris (external fragmentation), while allowing rapid allocation and reclaiming of physical pages. in order to reduce

Less external fragmentation, contiguous free pages, organized into different blocks according to the size of free blocks (composed of contiguous free pages)

linked list (ororders). The free physical page management introduced in the previous section is part of the partner system.

so all2page-sized free blocks in a linked list,4page-sized free blocks in another chain

table, and so on. Note that blocks of different sizes will not overlap spatially. The figure below shows the allocation of free pages.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

picture.

picture8Allocation of free pages

When a requirement is4consecutive pages, check whether there is a page of size23-1free blocks of pages to quickly satisfy requests.

If on the linked list (each node is of size4block of the page) If there is a free block, it is allocated to the user, otherwise it goes to the next

level(order) to search in the linked list. If exists (8page) free block (now on another level of the linked list),

then split the page block into two4Blocks of pages, one block is allocated to the requester, and another block is added to4Block list of pages

middle. This avoids splitting large free blocks when there are smaller page blocks that can satisfy demand, thus reducing external fragmentation.

4.2Example of partner algorithm

The previous text description seems relatively abstract. Now let's take a specific example to illustrate the content of the partner algorithm. Assume our

System memory is only32pagesRAM, the physical page usage is shown in the figure below.

f = free u = used

picture9Partner Algorithm Example

At this time, the free memory pages are organized as shown below.order=1linked list (size21-1pages)5node;

order=2linked list (size22-1pages)3node;order=3The linked list is empty;order=4linked list (large

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

Small24-1pages)1node;order=5The linked list is empty;order=6The linked list is empty.

order=0 1
order=1 2
order=2 4
order=3 8
order=4 16
order=5 32

picture10Buddy Algorithm Example - Free Page Organization

4.2.1Page allocation process

Now the upper layer requests allocation4indivualAddresses are consecutiveof free physical page blocks.

The steps to allocate page blocks are as follows:

(1)4 = 23-1, so fromOrder = 3Start looking for free blocks on the free block list;

(2) due toorder=3There is no free block on the linked list; you need to go to the upper levelorderFind if there are free blocks;

(2)fromOrder = 4Start searching on the linked list, there is a free node; but the block size of each node on the linked list

for8pages, allocated4Give the page to the upper layer and mark the page table as used.

(3) left4pages. At this time, the remaining4pages, putorder=3on the linked list;

(4) to update relevant statistical information.

Note: After the page allocation is completed, other data structures need to be updated, which will be introduced in the detailed analysis of the code. Here we focus on the page allocation process.

distribute4After the page blocks are given to the upper layer, the free pages are organized as shown below.

= free = used = used

f f f f u u f u f u f u u u u u
u u u u f f f f f u f u f f u u

picture11Buddy Algorithm Allocation Page Block Example

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

freearea[ ]

order=0 1
order=1 2
order=2 4
order=3 8
order=4 16
order=5 32

picture12Buddy Algorithm Example - Free Page Organization after Allocating a Block

Question: Is there any non-2Power of page application request? Such as one-time application from the upper level6pages. existLinuxIn the kernel, it does not exist

Such a memory size request,APIThe function guarantees that it must be applied2order-1pages. If the upper management really wants to apply6page, you can

One-time application8page, you can also apply4(order=3)+2(order=2) pages (but make sure this6The physical addresses of the pages are consecutive).

4.2.2 Page recycling process

Now the upper layer releases a physical page, as marked in purple.

= free = used = used

f f f f u u f u f u f u u u u u
u u u u u u f f f u f u f f u u

picture13Partner Algorithm Recycling Page Block Example

The steps to recycle page blocks are as follows:

(1) marks the page block as free;

(2) Check whether adjacent physical pages are free; if adjacent physical pages are free, try to merge them into larger connections.

Continue physical page blocks (this avoids memory fragmentation);

(3) If there is a merge, it needs to be updatedfreearea []Medium linked list element;

(4) to update relevant statistical information.

From the above example, you can see that the pages before and after the released page are free, so they can be merged into4consecutive free objects

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

management page (order=3,23-1). The pages are recycled, and the free pages after merging are organized as follows:

= free = used u = used

f f f f u u f u f u f u u u u u
u u u u u u f f f u f f f f u u

picture14System physical memory page status

order=0 1
order=1 2
order=2 4
order=3 8
order=4 16
order=5 32

picture15Buddy Algorithm Example - Free Page Organization after Recycling a Block

4.3 BuddyView system information

we can view/proc/buddyinfoFile content to understand the current systemBuddysystem status.

You can also passecho m > /proc/sysrq-triggerCome and observeBuddysystem status. and/proc/buddyinfoof

The message is consistent.

[ 134.154722] Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB
1*1024kB 1*2048kB 3*4096kB = 15908kB
[ 134.154747] Node 0 DMA32: 10*4kB 7*8kB 5*16kB 8*32kB 8*64kB 7*128kB 6*256kB
7*512kB 3*1024kB 3*2048kB 731*4096kB = 3010352kB
[ 134.154770] Node 0 Normal: 202*4kB 227*8kB 51*16kB 120*32kB 90*64kB 44*128kB
19*256kB 8*512kB 5*1024kB 3*2048kB 18*4096kB = 112624k B

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

5Page allocation

Linuxprovides a series ofAPIto allocate physical pages, theseAPIAll based on functionsalloc_pages( ). Place

someAPIfunctions are usedgfp_maskParameter that determines the behavior of the allocator. Later, we will introduce in detail

memory allocation flagsGFP(Get Free Page) flags, which determine the memory allocator andkswapdAssign and return

Behavior when closing the page.

alloc_pagess() and other functions are defined in the fileinclude/linux/gfp.hmiddle.

00304:#ifdefCONFIG_NUMA
00305:externstructpage *alloc_pages_current(gfp_tgfp_mask ,
unsignedorder );
00306:
00307:static inlinestructpage *
00308:alloc_pages(gfp_tgfp_mask ,unsigned intorder )
00309:{
00310: returnalloc_pages_current(gfp_mask,order);
00311:}
00312:externstructpage *alloc_pages_vma(gfp_tgfp_mask ,int
order ,
00313: structvm_area_struct *vma ,unsigned longaddr ,
00314: intnode );
00315:#else
00316:#definealloc_pages(gfp_mask ,order )\
00317: alloc_pages_node(numa_node_id(),gfp_mask,order)
00318:#definealloc_pages_vma(gfp_mask ,order ,vma ,addr ,node )\
00319: alloc_pages(gfp_mask,order)
00320:#endif
00321:#definealloc_page(gfp_mask )alloc_pages(gfp_mask,0)
00322:#definealloc_page_vma(gfp_mask ,vma ,addr ) 00323: \
alloc_pages_vma(gfp_mask,0,vma,addr,numa_node_id())
00324:#definealloc_page_vma_node(gfp_mask ,vma ,addr ,node ) \
00325: alloc_pages_vma(gfp_mask,0,vma,addr,node)
00326:
00327:extern unsigned long__get_free_pages(gfp_tgfp_mask ,
unsigned intorder );
00328:extern unsigned longget_zeroed_page(gfp_tgfp_mask );

5.1 UMAPage allocation

UMAUnder the structurealloc_pages() Will eventually __alloc_pages_nodemask()function. andNUMAshelf

Use the same function under the constructor.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

00316:#definealloc_pages(gfp_mask ,order )\
00317: alloc_pages_node(numa_node_id(),gfp_mask,order)

00286:static inlinestructpage *alloc_pages_node(intnid ,gfp_t

gfp_mask ,
00287: unsigned intorder )
00288:{
00289: /* Unknown node is current node */
00290: if(nid<0)
00291: nid=numa_node_id();
00292:
00293: return__alloc_pages(gfp_mask,order,node_zonelist(nid,
gfp_mask));
00294:}

00279:static inlinestructpage *
00280:__alloc_pages(gfp_tgfp_mask ,unsigned intorder ,
00281: structzonelist*zonelist )
00282:{
00283: return__alloc_pages_nodemask(gfp_mask,order,
zonelist,NULL);
00284:}

5.2 NUMAPage allocation

Here we analyzeNUMAPage allocation in the schema, i.e.alloc_pages()transferalloc_pages_current

().

When a process needs to allocate several consecutive physical pages, it can passalloc_pages()To be done.

Because huge page blocks are2MAX_ORDER-1,likeordergreater than or equal toMAX_ORDER, obviously out of scope.

MAX_ORDERdefined as11, so when the user requests page allocation from the kernel, each time the user requests2MAX_ORDER-1Right now

210pages. exist4Kpage size system, each timeMultiple allocation4MBof contiguous physical memory. Some people can't help but ask,

If the kernel requires more memory than4MBwhat to do? At this time, you can only apply multiple times in a row4MBThe memory is broken into large chunks of memory, and

Check and ensure that the physical addresses are continuous.

Under normal circumstances, it is best not to exceed the memory requested by the driver.256KBcontiguous memory, because as the system runs, the memory

Page usage will be fragmented, making it difficult to find pages larger than256KBof contiguous physical memory space.

00307:static inlinestructpage *
00308:alloc_pages(gfp_tgfp_mask ,unsigned intorder )
00309:{
00310: returnalloc_pages_current(gfp_mask,order);

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

00311:}

5.2.1 NUMAStrategy andcpusetFunction

Memory policy refers to the degree to which an application controls its own memory allocation. able to passmbindandset_memolicyltwo

The system calls to set the memory policy. You can set the policy for the entire process or a segment of address space. The memory policy will be obtained from the parent process.

inherit.libnumalibrary and itsnumactlGadgets are easy to useNUMAMemory.

System SupportMPOL_DEFAULT,MPOL_PREFERRED,MPOL_INTERLEAVEand

MPOL_BINDFour allocation strategies, inmm/mempolicy.cimplemented in the file.

surface2NUMAMemory allocation strategy

Strategy meaning

Default policy. That is to say, memory should be allocated from the current node. When the current node does not have free memory, memory
MPOL_DEFAULT
should be allocated from the nearest node with free memory.

Allocate memory from the specified node. If there is no free memory on this node, any other node
MPOL_PREFERRED
can.

Memory allocation should cover all nodes. This strategy is usually used in shared memory areas. The allocated memory
MPOL_INTERLEAVE
covers all areas to ensure that no node is overloaded and that the memory size used on each node is the same.

Memory allocation is specified in a specific set of nodes (that is, certain nodes). When these nodes cannot provide
MPOL_BIND
the required memory, memory allocation fails.

cpusetyes2.6A module in the kernel version that allows users to convert multiplecpuThe system is divided into different areas,

Each area includescpuand physical memory segments. You can set a process to be executed only in a specific area, and the process

Computing resources outside the region will not be used. General applications such aswebserver, orNUMAHigh performance in the architecture

Even servers can be usedcpusetto improve performance. can be done in the kernelconfigIn the configuration file, view

CONFIG_CPUSETto determine if the memory is openCPUSETFunction.

cpusetIt has the following characteristics:

- Limit the memory allowed to a set of tasksnodeandcpuresource;

- cpusethas a higher qualified priority than the memory policy (withnoderelated) and bound (withcpurelated);

5.2.2 alloc_pages_current()

functionalloc_pages_current() is implemented in the filemm/mempolicy.cmiddle.

01862:structpage *alloc_pages_current(gfp_tgfp ,unsignedorder )

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

01863:{
01864: structmempolicy*pol=current->mempolicy;
01865: structpage *page;
01866:
01867: if(!pol| |in_interrupt()| |(gfp&__GFP_THISNODE))
01868: pol= &default_policy;
01869:
01870: get_mems_allowed();
01871: /*
01872: * No reference counting needed for current->mempolicy
01873: * nor system default_policy
01874: */
01875: if(pol->mode ==MPOL_INTERLEAVE)
01876: page =alloc_page_interleave(gfp,order,interleave_nodes(pol));
01877: else
01878: page =__alloc_pages_nodemask(gfp,order,
01879: policy_zonelist(gfp,pol,numa_node_id()),
01880: policy_nodemask(gfp,pol));
01881: put_mems_allowed();
01882: returnpage;
01883:}? end alloc_pages_current ?
01884:EXPORT_SYMBOL(alloc_pages_current );

Four types were introduced earlierNUMAMemory allocation strategy, when the memory allocation flag is set__GFP_THISNODE,bright

When the memory allocation is indeed applied on the current node, or the code application is interrupted, or the memory allocation policy of the current process is empty

(1867OK); that is, the system default allocation strategy is used. The default memory allocation strategy isMPOL_PREFERRED.

00115:structmempolicydefault_policy={
00116: .refcnt=ATOMIC_INIT(1),/*never free it */
00117: .mode=MPOL_PREFERRED,
00118: . flags =MPOL_F_LOCAL,
00119:};

exist1870OK,1881OK, call two functions separatelyget_mems_allowed()andput_mems_allowed

(), these two functions are related tocpu setrelated.

If the kernel is turned oncpusetfunction, thenget_mems_allowed()andput_mems_allowed()two

The function function is to increase and decrease the current processmems_allowed_change_disablecount. If the kernel is shut down

cpusetfunction, the two function implementations are empty.mems_allowed_change_disableThe purpose of counting is to

During the memory allocation process, avoid the upper layer from changing the memory allocation strategy.

00094:static inline voidget_mems_allowed(void)

00095:{
00096: current->mems_allowed_change_disable++;

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

00097:
00106: smp_mb();
00107:}

00109:static inline voidput_mems_allowed(void)

00110:{
00119: smp_mb();
00120: - - ACCESS_ONCE(current->mems_allowed_change_disable);
00121:}

Here we analyzeNUMAThe situation under the default memory allocation policy, that is, the code will be executed1878OK. two parameters

policy_zonelist(gfp, pol, numa_node_id())andpolicy_nodemask(gfp, pol),is based onNUMAStrategy

To determine which nodes and which areas to allocate memory on.

Next we analyze the function __alloc_pages_nodemask()accomplish.

5.3 __alloc_pages_nodemask()

5.3.1 Memory migration types andlockdep

existNUMAIn the architecture, memory can be moved between nodes to make the page more local to the using process.

sex. When multiple processes are running on a node set, and then a process terminates, resulting in uneven memory usage, you need to

To move part of the memory from one node to another node to restore memory distribution balance and reduceNUMADelay. Page

Surface migration types include5species, defined in the fileinclude/linux/mmzone.h.

00038:#defineMIGRATE_UNMOVABLE 0
00039:#defineMIGRATE_RECLAIMABLE 1
00040:#defineMIGRATE_MOVABLE 2
00041:#defineMIGRATE_PCPTYPES 3/*the number of types on the pcp lists */
00042:#defineMIGRATE_RESERVE 3
00043:#defineMIGRATE_ISOLATE 4/*can't allocate from here */
00044:#defineMIGRATE_TYPES 5

lockdepyeslinuxA debugging module of the kernel, used to check the kernel mutual exclusion mechanism, especially the potential deadlock of spin locks.

question. SpinLock(spin lock) Because it waits in query mode and does not release the processor, it is more tolerant than the general mutual exclusion mechanism.

It is easy to deadlock, so it is introducedlockdepCheck for possible deadlocks in the following situations:

- The same process recursively locks the same lock

- A lock has both performed a locking operation when the interrupt (or the lower half of the interrupt) is enabled, and also executed the lock operation when the interrupt (or the lower half of the interrupt) is enabled.

The locking operation was performed in the lower half). In this way, the lock may be attempted to be locked at the same time due to an interruption.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

A lock is placed on a processor;

- Locking causes the dependency graph to form a closed loop, which is a typical deadlock phenomenon.

lockdepSupport requires opening the configuration in the kernelCONFIG_LOCKDEP_SUPPORT.

5.3.2 __alloc_pages_nodemask()

__alloc_pages_nodemask()yesLinuxKernelBuddyThe core function of the allocator, the source code is in the file

mm/page_alloc.c.

02190:/*
02191:*This is the 'heart' of the zoned buddy allocator.
02192:*/
02193:structpage *
02194:__alloc_pages_nodemask(gfp_tgfp_mask ,unsigned int
order ,
02195: structzonelist*zonelist ,nodemask_t *nodemask )
02196:{
02197: enumzone_typehigh_zoneidx=gfp_zone(gfp_mask);
02198: structzone *preferred_zone; structpage *page;
02199:
02200: intmigratetype=allocflags_to_migratetype(gfp_mask);
02201:
02202: gfp_mask&=gfp_allowed_mask;
02203:
02204: lockdep_trace_alloc(gfp_mask);
02205:
02206: might_sleep_if(gfp_mask&__GFP_WAIT);
02207:

2197Row: According to parametersgfp_mask, find a suitable area (the area includesZONE_DMA,ZONE_DMA32,

ZONE_NORMAL,ZONE_HIGHMEM), here used as the given node arraynode_zonelists[ ]under

mark.

2200Row: Apply for flags based on the pagegfp_maskConvert to the corresponding memory migration type (migrate type);

2202Line: updateGFP(Get Free Page) mark;

2204Line: current process memory requestlockdepexamine;

2206OK: Whengfp_maskSet in__GFP_WAITWhen, set the current function to sleep;

02208: if(should_fail_alloc_page(gfp_mask,order))
02209: returnNULL;
02210:

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

02211: /*
02212: * Check the zones suitable for the gfp_mask contain at least one
02213: * valid zone. It's possible to have an empty zonelist as a result
02214: * of GFP_THISNODE and a memoryless node
02215: */
02216: if(unlikely(!zonelist- >_zonerefs- >zone))
02217: returnNULL;
02218:
02219: get_mems_allowed();
02220: /* The preferred zone is used for statistics later */
02221: first_zones_zonelist(zonelist,high_zoneidx,nodemask,
&preferred_zone);
02222: if(!preferred_zone) {
02223: put_mems_allowed();
02224: returnNULL;
02225: }
02226:

2208~2209Line: before actually trying to allocate the page, based ongfp_maskandordervalue, check if it can

Satisfy the page allocation request, if not, return empty directly;

2216~2217Line: Check againstgfp_mask, there is at least one valid area; if there is no valid area, then

If the request cannot be satisfied, returnNULL;

2219,2223Rows: Increase and decrease the current process respectivelymems_allowed_change_disablecount;

2221Row: In the area list, according tonodemask, find the appropriate value less than or equal tohighest_zoneidxdistrict

domain, the value is stored inpreferred_zonein variables;

Assuming that an area that meets the request conditions is found, we continue to analyze.

02227: /* First allocation attempt */

02228: page =get_page_from_freelist(gfp_mask|__GFP_HARDWALL,
nodemask,order,
02229: zonelist,high_zoneidx,ALLOC_WMARK_LOW|
ALLOC_CPUSET,
02230: preferred_zone,migratetype);
02231: if(unlikely(!page))
02232: page =__alloc_pages_slowpath(gfp_mask,order,
02233: zonelist,high_zoneidx,nodemask,
02234: preferred_zone,migratetype);
02235: put_mems_allowed();
02236:
02237: trace_mm_page_alloc(page,order,gfp_mask,migratetype);
02238: returnpage;
02239:}? end alloc_pages_nodemask?
02240:EXPORT_SYMBOL(__alloc_pages_nodemask );

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

The previous codes are all basic checks. After the checks pass, try to passget_page_from_freelist

()(2228row) assigned from the area list2orderindivualPhysical addresses are consecutivepage. i.e. first try on the free page chain

Allocate pages in the table. Obviously as the system runs, there will be fewer and fewer free pages (such asPage CacheWill occupy memory,

i.e. used as a block deviceI/Ocache), viaget_page_from_freelist() may fail to allocate memory. In this case, you need to

transfer__alloc_pages_slowpath() function allocates pages from the global memory pool, and its work includes recycling physical

memory page.

Below we analyze respectivelyget_page_from_freelist()and__alloc_pages_slowpath() two letters

The realization of numbers.

5.4 get_page_from_freelist()

5.4.1 area(Zone)level

When the free memory in the system becomes low,kswapdThe daemon will be awakened to release the page. If the memory free rate

very low,kswapdwill release the memory synchronously, sometimes called direct recycling (direct-reclaim)path.

Each area has three levels:pages_low,pages_minandpages_high, the three levels reflect a

pressure on the region. The regional level is similar to the water level of the reservoir. During drought, the reservoir has a low-level alarm; it can also

Different low water levels are divided into different alarm levels.

00159:enumzone_watermarks{
00160: WMARK_MIN,
00161: WMARK_LOW,
00162: WMARK_HIGH,
00163: NR_WMARK
00164:};

00166:#definemin_wmark_pages(z ) (z->watermark[WMARK_MIN])
00167:#definelow_wmark_pages(z ) (z->watermark[WMARK_LOW])
00168:#definehigh_wmark_pages(z ) (z->watermark[WMARK_HIGH])
00169:

The relationship between the three is shown in the figure below.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

picture16Regional level map

in functionsetup_per_zone_wmarks() set during initialization in the functionpage_minandpages_lowvalue.

04972:void setup_per_zone_wmarks(void)
04973:{
04974: unsigned longpages_min=min_free_kbytes>>
(PAGE_SHIFT-10);
04975: unsigned longpages_low=extra_free_kbytes>>
(PAGE_SHIFT-10);

Different actions will be taken at different levels of free pages.

pages_low:When the free page reachespages_lowhour,buddyallocator wake upkswapddaemon to recycle

page. The default value ispages_mintwice;

pages_min:When the free page reachespages_min, the allocator will wake upkswapdWork in a synchronous manner,

Sometimes called direct recycling (direct-reclaim)path.

pages_high:wakekswapdAfter the process, the free page reachespages_high, it will not be considered necessary

Balance area. When this level is reached,kswapdwill enter sleep state;pages_highThe default value is

pages_minthree times.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

5.4.2 Hot-N-Coldpage

When a certain physical memory page data is inCPU Cachemiddle,CPUTo access the data on this page, you can quickly and directly access theCache

is read in, at this time the page is calledhot(Hot)page; Otherwise, the page is notCPU Cachecalled incold(Cold)

page. in manyCPUIn the system, eachCPUEveryone has their ownCache, so hot and cold pages pressCPUto manage, each

cpuBoth maintain a memory pool of cold/hot pages.

struct zonemembers of structurepageset[NR_CPUS]It is used to implement hot/cold page management.NR_CPUSand

Not the current systemCPUquantity, but the maximum size supported by the current kernelCPUquantity.

00287:structzone{

...

00312:#ifdefCONFIG_NUMA
...
00319: structper_cpu_pageset *pageset[NR_CPUS];
00320:#else
00321: structper_cpu_pageset pageset[NR_CPUS];
00322:#endif
...

00446:}? end zone?

data structureper_cpu_pagesetis defined in the fileinclude/linux/mmzone.hmiddle.

00179:structper_cpu_pageset{
00180: structper_cpu_pagespcp;
00181:#ifdefCONFIG_NUMA
00182: s8expire;
00183:#endif
00184:#ifdefCONFIG_SMP
00185: s8stat_threshold;
00186: s8vm_stat_diff[NR_VM_ZONE_STAT_ITEMS];
00187:#endif
00188:} cacheline_aligned_in_smp;

Useful data structuresper_cpu_pagesThe definition is alsommzone.h.

00041:#defineMIGRATE_PCPTYPES 3/the number of types on the pcp lists /

00170:structper_cpu_pages{
00171: intcount; /* number of pages in the list */
00172: inthigh; /* high watermark, emptying needed */ /
00173: intbatch; * chunk size for buddy add/remove */

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

00174:
00175: /* Lists of pages, one per migrate type stored on the pcp-lists */
00176: structlist_headlists[MIGRATE_PCPTYPES];
00177:};

countis the number of pages in the linked list;highis the level, ifcount > high, it means there are too many pages in the linked list,

Some parts need to be cleared;batchagainstbuddyBlock size added/removed by the algorithm;listFor the page list header.

The picture below shows twoCPUIn the system, hot and cold page statistical information and data structure.

picture17pairCPUin the systemPer CPU Cache

we can passecho m > /proc/sysrq-triggerTo observe the cold/hot page situation in the current system. Below is

An instance.

Node 1 DMA per-cpu:

CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
Node 1 DMA32 per-cpu:
CPU 0: hi: 186, btch: 31 usd: 212
CPU 1: hi: 186, btch: 31 usd: 0
Node 1 Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 138
CPU 1: hi: 186, btch: 31 usd: 52

5.4.3 get_page_from_freelist()

get_page_from_freelist() function is also in the filemm/page_alloc.cmiddle. The function body is1666OK

for_each_zone_zonelist_nodemask() loop statement, innodemaskAll areas in the determined node,

Find free pages that satisfy the requested number.

01643:/*
01644:*get_page_from_freelist goes through the zonelist trying to allocate
01645:*a page. 01646:*/

01647:staticstructpage *
01648:get_page_from_freelist(gfp_tgfp_mask ,nodemask_t

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

01648: *nodemask ,unsigned intorder ,

01649: structzonelist*zonelist ,inthigh_zoneidx ,intalloc_flags ,
01650: structzone*preferred_zone ,intmigratetype )
01651:{
01652: structzoneref *z; struct
01653: page *page=NULL; int
01654: classzone_idx;
01655: structzone*zone;
01656: nodemask_t *allowednodes=NULL;/* zonelist_cache
approximation */
01657: intzlc_active=0; int /* set if using zonelist_cache */
01658: did_zlc_setup=0; /* just call zlc_setup() one time */
01659:
01660: classzone_idx=zone_idx(preferred_zone);
01661:zonelist_scan :
01662: /*
01663: * Scan zonelist, looking for a zone with enough free.
01664: * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
01665: */
01666: for_each_zone_zonelist_nodemask(zone,z,zonelist,
01667: high_zoneidx,nodemask) {
01668: if(NUMA_BUILD&&zlc_active&&
01669: !zlc_zone_worth_trying(zonelist,z,allowednodes))
01670: continue;
01671: if((alloc_flags&ALLOC_CPUSET)&&
01672: !cpuset_zone_allowed_softwall(zone,gfp_mask))
01673: goto-try_next_zone;
01674:
01675: BUILD_BUG_ON(ALLOC_NO_WATERMARKS<
NR_WMARK);
01676: if(!(alloc_flags&ALLOC_NO_WATERMARKS)) {
01677: unsigned longmark;
01678: intret;
01680: mark=zone->watermark[alloc_flags&
ALLOC_WMARK_MASK];
01681: if(zone_watermark_ok(zone,order,mark,
01682: classzone_idx,alloc_flags))
01683: goto-try_this_zone;
01684:
01685: if(zone_reclaim_mode==0)
01686: goto-this_zone_full;
01687:
01688: ret=zone_reclaim(zone,gfp_mask,order);
01689: switch(ret) {
01690: caseZONE_RECLAIM_NOSCAN:
01691: /* did not scan */ goto-
01692: try_next_zone; case
01693: ZONE_RECLAIM_FULL:
01694: /* scanned but unreclaimable */
01695: goto-this_zone_full; default:
01696:
01697: /* did we reclaim enough */

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

01698: if(!zone_watermark_ok(zone,order,mark,
01699: classzone_idx,alloc_flags))
01700: goto-this_zone_full;
01701: }
01702: }? end if !(alloc_flags&ALLOC_N... ?
01703:

Before actually looking for free pages, do some basic checks. If the kernel is turned onNUMA, just pass

zlc_zone_worth_trying() function to quickly check whether the area is worthy of further searching for free memory; if the

If the area is not worth searching, go to the next area and perform the same action (1669~1670OK).

Next, we need to check the regional level to see if the current area meets the level requirements (1676~1702OK).

(1) If the level of this area meets the requirements, directly try to allocate pages in this area (1681~1683OK);

(2) If the level of this area cannot meet the requirements, and the areazone_reclaim_modeThe value is0(1685~1686

line), jump tothis_zone_full;

Note:zone_reclaim_modeThe value can be passed /proc/sys/vm/zone_reclaim_modeChange, the default value is0.

(3) If neither of the above two conditions are met, you have to callzone_reclaim() Try to reclaim the memory in this area

(1681~1701OK). If the return value is that the area is not scannedZONE_RECLAIM_NOSCAN, skip this area

Try the next zone (1690row); if the return value isZONE_RECLAIM_FULLIf it cannot be recycled, mark the area

It is full. Next time, others should not waste time scanning this area (1693line); if part of the memory is successfully reclaimed,

Then recheck the area level.

We continue to look at the code. We checked the area level earlier and believe that the current area can have enough free pages to satisfy the request.

So callbuffered_rmqueue() function to allocate pages from the region.

01704:try_this_zone :
01705: page =buffered_rmqueue(preferred_zone,zone,order,
01706: gfp_mask,migratetype);
01707: if(page)
01708: break;
01709:this_zone_full :
01710: if(NUMA_BUILD)
01711: zlc_mark_zone_full(zonelist,z);
01712:try_next_zone :
01713: if(NUMA_BUILD&& !did_zlc_setup&&nr_online_nodes>1) {

01714: /*
01715: * we do zlc_setup after the first zone is tried but only
01716: * if there are multiple nodes make it worthwhile
01717: */
01718: allowednodes=zlc_setup(zonelist,alloc_flags);
01719: zlc_active=1;

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

01720: did_zlc_setup=1;
01721: }
01722: }
01723:
01724: if(unlikely(NUMA_BUILD&& page ==NULL&&zlc_active)) {
01725: /* Disable zlc cache for second zonelist scan */
01726: zlc_active=0; goto-zonelist_scan;
01727:
01728: }
01729: returnpage;
01730:}? end get_page_from_freelist?

Detailed analysis in the next sectionbuffered_rmqueue() function implementation.

1. buffered_rmqueue()

buffered_rmqueue() function source code is inmm/page_alloc.cmiddle.

picture18 buffered_rmqueue()Function calling relationship

01292:static inline
01293:structpage *buffered_rmqueue(structzone*preferred_zone ,
01294: structzone*zone ,intorder ,gfp_tgfp_flags ,
01295: intmigratetype )
01296:{
01297: unsigned longflags;
01298: structpage *page;
01299: intcold= ! !(gfp_flags&__GFP_COLD);
01300: intcpu;
01301:
01302:again :
01303: cpu =get_cpu();
01304: if(likely(order==0)) {
01305: structper_cpu_pages *pcp;

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

01306: structlist_head *list;

01307:
01308: pcp= &zone_pcp(zone,cpu)->pcp;
01309: list= &pcp->lists[migratetype];
01310: local_irq_save(flags);
01311: if(list_empty(list)) {
01312: pcp->count +=rmqueue_bulk(zone,0,
01313: pcp->batch,list,
01314: migratetype,cold);
01315: if(unlikely(list_empty(list)))
01316: goto-failed;
01317: }
01318:
01319: if(cold)
01320: page =list_entry(list->prev,structpage,lru);
01321: else
01322: page =list_entry(list->next,structpage,lru);
01323:
01324: list_del(&page- >lru);
01325: pcp->count- -;

First determine whether only one page is requested to be allocated (1304OK). eachCPUThey all maintain a linked list of migration pages.

There are three types.

00038:#defineMIGRATE_UNMOVABLE 0
00039:#defineMIGRATE_RECLAIMABLE 1
00040:#defineMIGRATE_MOVABLE 2

in the specifiedmigratetypeCheck whether there is a free page on the migration page linked list. If there is a free page,

(1308~1311row), allocate a page and update the count in the migration list. In addition, depending on whether to specify the application

Please cold/hot page, return to the corresponding type page (1319~1322OK).

If the requested memory is more than one page, execute1326line of code behind. Typically, rarely used

__GFP_NOFAILflag, which allows page allocation to fail.

01326: }else{
01327: if(unlikely(gfp_flags&__GFP_NOFAIL)) {
01338: WARN_ON_ONCE(order>1);
01339: }
01340: spin_lock_irqsave(&zone->lock,flags); page =__
01341: rmqueue(zone,order,migratetype);
01342: spin_unlock(&zone->lock); if(!page)
01343:
01344: goto-failed;
01345: __mod_zone_page_state(zone,NR_FREE_PAGES,-(1<<
order));
01346: }? end else?

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

The next step is to pass__rmqueue() Originzoneallocated in area2orderpages with consecutive physical addresses, in

Before performing page allocation, the area must be locked to prevent otherCPUAlso operate this area (1340row), the allocation page is completed

and then releasezone->lockLock. At this point in the analysis, we found that after struggling for so long, we have not yet implemented the real analysis.

Equipped with physical pages.

01348: __count_zone_vm_events(PGALLOC,zone,1<<order);
01349: zone_statistics(preferred_zone,zone,gfp_flags);
01350: local_irq_restore(flags); put_cpu();
01351:
01352:
01353: VM_BUG_ON(bad_range(zone,page)); if(
01354: prep_new_page(page,order,gfp_flags))
01355: goto-again;
01356: returnpage;
01357:
01358:failed :
01359: local_irq_restore(flags);
01360: put_cpu();
01361: returnNULL;
01362:}? end buffered_rmqueue?

1348~1349OK, here are some statistical information about the region. After the page application is successful, return directly to the first page block

page address (1356OK).

5.4.4 __rmqueue()

__rmqueue() function is the correspondingzoneThe operation of obtaining multiple pages in

The core code of the page being allocated. The source code is also in the filemm/page_alloc.cmiddle.

existNUMAIn the system, the kernel always first tries to start the process fromCPUAllocate memory on the node, so that the memory performance

Can be better. However, this allocation strategy does not guarantee success every time. For situations where memory cannot be allocated from this node, every time

Each node provides afallbacklinked list. This linked list contains other nodes and regions that can be used as a substitute for memory allocation.

Generation choice.

First pass__rmqueue_smallest()function(982row), try to find exactly what satisfies the givenordersize,

migratetypeType of page block.

If it cannot be found in the area linked list, give it to youordersize andmigratepetype of page block, you need to call

__rmqueue_fallback()fromfallbackassigned in linked listorderandmigratePage blocks.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

00972:/*
00973:*Do the hard work of removing an element from the buddy
allocator.
00974:*Call me with the zone->lock already held.
00975:*/
00976:staticstructpage *__rmqueue(structzone*zone ,unsigned int
order ,
00977: intmigratetype )
00978:{
00979: structpage *page;
00980:
00981:retry_reserve :
00982: page =__rmqueue_smallest(zone,order,migratetype);
00983:
00984: if(unlikely(!page)&&migratetype! =MIGRATE_RESERVE) {
00985: page =__rmqueue_fallback(zone,order,migratetype);
00986:
00987: /*
00988: * Use MIGRATE_RESERVE rather than fail an allocation.
goto
00989: * is used because rmqueue_smallest is an inline function
00990: * and we want just one call site
00991: */
00992: if(!page) {
00993: migratetype=MIGRATE_RESERVE;
00994: goto-retry_reserve;
00995: }
00996: }
00997:
00998: trace_mm_page_alloc_zone_locked(page,order,migratetype);
00999: returnpage;
01000:}? end rmqueue?

Below we analyze each __rmqueue_smallest()and__rmqueue_fallback()function.

1. __rmqueue_smallest()

__rmqueue_smallest()The source code is also in the filemm/page_alloc.cmiddle.

The steps to allocate page blocks are as follows:

(1)fromorderStart looking for free blocks on the starting free block list;

(2) if currentorderIf there is a free page block on the page; then remove the free page block and then jump to step (4);

(3) if currentorderThere is no free page block on theorder=order+1, the jump needs to go to the previous level to find

Whether there is a free block; jump to step (2) execution; iforder>MAX_ORDER-1, then jump to step (5);

(4) if the allocated page block is locatedorderis greater than the requested value, the remaining page blocks should also be placed in a lowerorderchain

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

on the table (802OKexpand() function), the page allocation is returned successfully;

(5) page allocation fails and returns.

00779:/*
00780:*Go through the free lists for the given migratetype and remove
00781:*the smallest available page from the freelists 00782:*/

00783:static inline

00784:structpage *__rmqueue_smallest(structzone*zone ,
unsigned intorder ,
00785: intmigratetype )
00786:{
00787: unsigned intcurrent_order;
00788: structfree_area *area; struct
00789: page *page;
00790:
00791: /* Find a page of the appropriate size in the preferred list */
00792: for(current_order=order;current_order<MAX_ORDER;
+ +current_order) {
00793: area= &(zone->free_area[current_order]); if(
00794: list_empty(&area->free_list[migratetype]))
00795: continue;
00796:
00797: page =list_entry(area->free_list[migratetype].next,
00798: structpage,lru);
00799: list_del(&page- >lru);
00800: rmv_page_order(page);
00801: area->nr_free- -;
00802: expand(zone,page,order,current_order,area,migratetype);
00803: returnpage;
00804: }
00805:
00806: returnNULL;
00807:}? end rmqueue_smallest?

2. __rmqueue_fallback()

__rmqueue_fallback() function’s allocation page block process and __rmqueue_smallest() similar to area

The difference lies infallbackin the linked list, not on this nodezonearea. No detailed explanation is given here.

00902:/*Remove an element from the buddy allocator from the fallback list
*/
00903:static inlinestructpage *
00904:__rmqueue_fallback(structzone*zone ,intorder ,int
start_migratetype )
00905:{
00906: structfree_area *area;
00907: intcurrent_order;

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

00908: structpage *page;

00909: intmigratetype,i;
00910:
00911: /* Find the largest possible block of pages in the other list */
00912: for(current_order=MAX_ORDER-1;current_order>=order;
00913: - - current_order) {
00914: for(i=0;i<MIGRATE_TYPES-1;i++) {
00915: migratetype=fallbacks[start_migratetype][i];
00916:
00917: /* MIGRATE_RESERVE handled later if necessary */ if(
00918: migratetype==MIGRATE_RESERVE)
00919: continue;
00920:
00921: area= &(zone->free_area[current_order]); if(
00922: list_empty(&area->free_list[migratetype]))
00923: continue;
00924:
00925: page =list_entry(area->free_list[migratetype].next,
00926: structpage,lru);
00927: area->nr_free- -;
00928:
00929: /*
00930: * If breaking a large block of pages, move all free
00931: * pages to the preferred allocation list. If falling
00932: * back for a reclaimable kernel allocation, be more
00933: * agressive about taking ownership of free pages
00934: */
00935: if(unlikely(current_order>=(pageblock_order>>1))| |
00936: start_migratetype==
MIGRATE_RECLAIMABLE| |
00937: page_group_by_mobility_disabled){
00938: unsigned longpages;
00939: pages=move_freepages_block(zone,page,
00940: start_migratetype);
00941:
00942: /* Claim the whole block if over half of it is free */ if(
00943: pages>=(1<<(pageblock_order-1))| |
00944: page_group_by_mobility_disabled)
00945: set_pageblock_migratetype(page,
00946: start_migratetype);
00947:
00948: migratetype=start_migratetype;
00949: }
00950:
00951:/*Remove the page from the freelists */
00952: list_del(&page- >lru);
00953: rmv_page_order(page);
00954:
00955: /* Take ownership for orders >= pageblock_order */ if(
00956: current_order>=pageblock_order)
00957: change_pageblock_range(page,current_order,

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

00958: start_migratetype);
00959:
00960: expand(zone,page,order,current_order,area,migratetype);
00961:
00962: trace_mm_page_alloc_extfrag(page,order,current_order,
00963: start_migratetype,migratetype);
00964:
00965: returnpage;
00966: }? end for i=0;i<MIGRATE_TYPES-1... ?
00967: }? end for current_order=MAX_ORD... ?
00968:
00969: returnNULL;
00970:}? end rmqueue_fallback?

3. expand()

expandThe main function of the () function is to allocate part of the page block on the free linked list, and then put the free part into

to area lowerorderFree page block linked list.

00722:static inline voidexpand(structzonezone ,structpagepage ,

00723: intlow ,inthigh ,structfree_area *area ,
00724: intmigratetype )
00725:{
00726: unsigned longsize=1<<high;
00727:
00728: while(high>low) {
00729: area- -;
00730: high- -;
00731: size>>=1;
00732: VM_BUG_ON(bad_range(zone,&page[size]));
00733: list_add(&page[size].lru,&area-
> free_list[migratetype]);
00734: area->nr_free++;
00735: set_page_order(&page[size],high);
00736: }
00737:}

Call the parameter listlowCorresponds to the block size representing the page to which it belongsorder,andhighthen corresponds to indicating that the current idle

area queue (that is, the queue from which page blocks that meet the requirements are obtained)curr_order. When the two match, from782

line startswhileThe loop is skipped. If the allocated page block is larger than the required size (it is impossible to be smaller than the required size

size), then chain the page block into a lower block (smallerorder), that is, in the idle queue with the physical block size halved

go. Then cut off half of the physical block and use the second half as a new physical block, and then start the next cycle.

That is, the lower level free page block linked list is processed. In this way, there will behighandlowThe two are equal, that is, the actual

When the remaining physical blocks are exactly equal to the requirements, the cycle ends.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

5.5 __alloc_pages_slowpath()

Obviously as the system runs, there will be fewer and fewer free pages (such asPage CacheWill occupy memory, that is, used as a block device

PrepareI/Ocache), viaget_page_from_freelist() is likely to fail to allocate memory, and it must be called at this time

__alloc_pages_slowpath() function, the function source code is still in the filemm/page_alloc.c. Just from the function name

It can be seen that the process of allocating memory pages here is relatively slow.

02024:static inlinestructpage *
02025: __alloc_pages_slowpath(gfp_tgfp_mask ,unsigned
intorder ,
02026: structzonelist*zonelist ,enumzone_typehigh_zoneidx ,
02027: nodemask_t *nodemask ,structzone *preferred_zone ,
02028: intmigratetype )
02029:{
02030: const gfp_twait=gfp_mask&__GFP_WAIT;
02031: structpage *page=NULL;
02032: intalloc_flags;
02033: unsigned longpages_reclaimed=0;
02034: unsigned longdid_some_progress;
02035: structtask_struct *p=current;
02036: boolsync_migration=false;
02037:
02038: /*
02039: * In the slowpath, we sanity check order to avoid ever trying to
02040: * reclaim >= MAX_ORDER areas which will never succeed. Callers may
02041: * be using allocators in order of preference for an area that is
02042: * too large.
02043: */
02044: if(order>=MAX_ORDER) {
02045: WARN_ON_ONCE(!(gfp_mask&__GFP_NOWARN));
02046: returnNULL;
02047: }
02048:
02057: if(NUMA_BUILD&&(gfp_mask&GFP_THISNODE)==
GFP_THISNODE)
02058: goto-nopage;
02059:
02060:restart :
02061: if(!(gfp_mask&__GFP_NO_KSWAPD))
02062: wake_all_kswapd(order,zonelist,high_zoneidx);
02063:

02065: * OK, we're below the kswapd watermark and have kicked background
02066: * reclaim. Now things get more complex, so set up alloc_flags according
02067: * to how we want to proceed.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

02068: */
02069: alloc_flags=gfp_to_alloc_flags(gfp_mask);
02070:
02071: /* This is the last chance, in general, before the goto nopage. */
02072: page =get_page_from_freelist(gfp_mask,nodemask,
02073: order,zonelist,high_zoneidx,alloc_flags&
~ALLOC_NO_WATERMARKS,
02074: preferred_zone,migratetype);
02075: if(page)
02076: goto-got_pg;
02077:

There are still some necessary checks to be done before trying to allocate memory (2044~2058OK). likegfp_maskmiddle,

Not allowed to be called unless specifiedkswapdThe kernel thread reclaims memory and wakes upkswapdThreads reclaim memory in the background. at this time,

Still try to call onceget_page_from_freelist() to quickly allocate memory. If the memory is still not allocated,

At this point it is time to continue going down.

02078:rebalance :
02079: /* Allocate without watermarks if the context allows */
02080: if(alloc_flags&ALLOC_NO_WATERMARKS) {
02081: page =__alloc_pages_high_priority(gfp_mask,order,
02082: zonelist,high_zoneidx,nodemask,
02083: preferred_zone,migratetype);
02084: if(page)
02085: goto-got_pg;
02086: }
02087:
02088: / * Atomic allocations - we can't balance anything */
02089: if(!wait)
02090: goto-nopage;
02091:
02092: /* Avoid recursion of direct reclaim */
02093: if(p->flags &PF_MEMALLOC)
02094: goto-nopage;
02095:
02096: / * Avoid allocations with no watermarks from looping endlessly */
02097: if(test_thread_flag(TIF_MEMDIE)&& !(gfp_mask&
__GFP_NOFAIL))
02098: goto-nopage;
02099:

passgfp_maskflag, converted into the correspondingalloc_flags(2069line), if there is no allocation flag

ALLOC_NO_WATERMARKSflag, indicating that this is a high-priority memory application, call it directly

__alloc_pages_high_priority()Allocate memory. __alloc_pages_high_priority() is only a multi-function

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

get_page_from_freelist() makes a layer of encapsulation,alloc_flagsThe parameters are set to

ALLOC_NO_WATERMARKS.

If you have tried many times before and still failed, it means that it is difficult to apply for memory, just like raising money is becoming more and more difficult.

Money is running out :-( .

02100: /*
02101: * Try direct compaction. The first pass is asynchronous. Subsequent
02102: * attempts after direct reclaim are synchronous
02103: */
02104: page =__alloc_pages_direct_compact(gfp_mask,order,
02105: zonelist,high_zoneidx,
02106: nodemask,
02107: alloc_flags,preferred_zone,
02108: migratetype,&did_some_progress,
02109: sync_migration);
02110: if(page)
02111: goto-got_pg;
02112: sync_migration=true;
02113:

It's not difficult yet, let's call __alloc_pages_direct_compact() function to allocate within

live(2104~2109line), this function is executed asynchronously, and its main job is to block small pages in the free page linked list

Merge into a large page table (such as twoorderfor2of page blocks merged into1indivualorderfor3block of pages), reallocate pages

piece.

If merging small page blocks fails, the only option is to try harder and allocate memory synchronously.

(2115~2119OK). __alloc_pages_direct_reclaimThe job of () is to pass firsttry_to_free_pages

() Recycle some rarely used andpage cachepages in to free up more space in physical memory.

Then, the kernel will call againget_page_from_freelist() attempts to allocate memory.

02114: /* Try direct reclaim and then allocating */

02115: page =__alloc_pages_direct_reclaim(gfp_mask,order,
02116: zonelist,high_zoneidx,
02117: nodemask,
02118: alloc_flags,preferred_zone,
02119: migratetype,&did_some_progress);
02120: if(page)
02121: goto-got_pg;
02122:

If the kernel still fails to allocate successfully after performing the above recycling and reallocation process, that is,

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

did_some_progressfor0, then at this time the kernel does not consider whether it has occurredOOM (Out of Memory). like

gfp_maskallowVFS I/Ooperation and allow retries (2128OK), we can continue to try.

Check if setoom_killer_disabled, if the process is not allowed to be killed, jump directly tonopage. Otherwise

transfer__alloc_pages_may_oom() allocates memory. If it fails again at this time, it will be calledout_of_memory

() Kill the process that applies for a lot of memory (may kill multiple processes). and jump torestartplace, restart the

storage allocation.

02123: /*
02124: * If we failed to make any progress reclaiming, then we are
02125: * running out of options and have to consider going OOM
02126: */
02127: if(!did_some_progress) {
02128: if((gfp_mask&__GFP_FS)&& !(gfp_mask&
__GFP_NORETRY)) {
02129: if(oom_killer_disabled)
02130: goto-nopage;
02131: page =__alloc_pages_may_oom(gfp_mask,order,
02132: zonelist,high_zoneidx,
02133: nodemask,preferred_zone,
02134: migratetype);
02135: if(page)
02136: goto-got_pg;
02137:
02144: if(order>PAGE_ALLOC_COSTLY_ORDER&&
02145: !(gfp_mask&__GFP_NOFAIL))
02146: goto-nopage;
02147:
02148: goto-restart;
02149: }? end if (gfp_mask& GFP_FS)&&... ?
02150: }? end if ! did_some_progress ?
02151:

At this time, it is judged again whether to make a memory application again. If necessary, wait for the write operation to complete

then jump again torebalanceRetry at (2154~2157OK).

02152: /* Check if we should retry the allocation */

02153: pages_reclaimed+=did_some_progress;
02154: if(should_alloc_retry(gfp_mask,order,pages_reclaimed)) {
02155: / * Too much pressure, back off a bit at let reclaimers do work */
02156: wait_iff_congested(preferred_zone,BLK_RW_ASYNC,
HZ/50);
02157: goto-rebalance;
02158: }
02159:

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

There are two situations at the end of the page block allocation function. In the first case, the allocation fails and the required page blocks are not obtained.

So print some information about memory allocation failure and print the current stack information (2161~2181OK).

Another situation is that the page block is allocated successfully, then the first page of the page block is returned directly.pagestructure(2183~2186

OK).

02160:nopage :
02161: if(!(gfp_mask&__GFP_NOWARN)&&printk_ratelimit()) {
02162: unsigned intfilter=SHOW_MEM_FILTER_NODES;
02163:
02164: /*
02165: * This documents exceptions given to allocations in certain
02166: * contexts that are allowed to allocate outside current's set
02167: * of allowed nodes.
02168: */
02169: if(!(gfp_mask&__GFP_NOMEMALLOC))
02170: if(test_thread_flag(TIF_MEMDIE)| |
02171: (current->flags &(PF_MEMALLOC|PF_EXITING)))
02172: filter&= ~SHOW_MEM_FILTER_NODES;
02173: if(in_interrupt()| | !wait)
02174: filter&= ~SHOW_MEM_FILTER_NODES;
02175:
02176: pr_warning("%s: page allocation failure. order:%d,
mode:0x%x\n",
02177: p->comm,order,gfp_mask);
02178: dump_stack();
02179: if(!should_suppress_show_mem())
02180: show_mem(filter);
02181: }
02182: returnpage;
02183:got_pg :
02184: if(kmemcheck_enabled)
02185: kmemcheck_pagealloc_alloc(page,order,gfp_mask);
02186: returnpage;
02187:
02188:}? end alloc_pages_slowpath?
02189:

5.5.1 __alloc_pages_direct_compact()

__alloc_pages_direct_compactThe main job of the () function is to merge small page blocks into large page blocks and then allocate pages

noodle pieces. The kernel is turned on by defaultCONFIG_COMPACTIONoptional.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

There are two main functionstry_to_compact_pages() merge small page blocks (1846~1847OK), then

call againget_page_from_freelist() allocates page blocks (1855~1858OK). We will not analyze the phase in detail here.

Function implementation.

01830:#ifdefCONFIG_COMPACTION
01831:/*Try memory compaction for high-order allocations before reclaim */
01832:staticstructpage *
01833:__alloc_pages_direct_compact(gfp_tgfp_mask ,
unsigned intorder ,
01834: structzonelist*zonelist ,enumzone_typehigh_zoneidx,
01835: nodemask_t *nodemask ,intalloc_flags ,structzone
*preferred_zone,
01836: intmigratetype ,unsigned long *did_some_progress ,
01837: boolsync_migration )
01838:{
01839: structpage *page;
01840: structtask_struct *p=current;
01841:
01842: if(!order| |compaction_deferred(preferred_zone))
01843: returnNULL;
01844:
01845: p->flags | =PF_MEMALLOC;
01846: *did_some_progress=try_to_compact_pages(zonelist,
order,gfp_mask,
01847: nodemask,sync_migration);
01848: p- >flags &= ~PF_MEMALLOC;
01849: if(*did_some_progress! =COMPACT_SKIPPED) {
01850:
01851: / * Page migration frees to the PCP lists but we want merging */
01852: drain_pages(get_cpu());
01853: put_cpu();
01854:
01855: page =get_page_from_freelist(gfp_mask,nodemask,
01856: order,zonelist,high_zoneidx,
01857: alloc_flags,preferred_zone,
01858: migratetype);
01859: if(page) {
01860: preferred_zone->compact_considered =0;
01861: preferred_zone->compact_defer_shift =0;
01862: count_vm_event(COMPACTSUCCESS);
01863: returnpage;
01864: }
01865:
01866: /*
01867: * It's bad if compaction run occurs and fails.
01868: * The most likely reason is that pages exist,
01869: * but not enough to satisfy watermarks.

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

01870: */
01871: count_vm_event(COMPACTFAIL);
01872: defer_compaction(preferred_zone);
01873:
01874: cond_resched();
01875: }? end if *did_some_progress! =C... ?
01876:
01877: returnNULL;
01878:}? end alloc_pages_direct_compact?

5.5.2 __alloc_pages_direct_reclaim()

function__alloc_pages_direct_reclaimThe function of () is to pass firsttry_to_free_pages()Recycle

page, freeing up some memory (1912OK). Then, the kernel will call againget_page_from_freelist()try

Allocate memory(1927~1930OK).

01891:/The really slow allocator path where we enter direct reclaim /

01892:static inlinestructpage *
01893:__alloc_pages_direct_reclaim(gfp_tgfp_mask ,
unsigned intorder ,
01894: structzonelist*zonelist ,enumzone_typehigh_zoneidx,
01895: nodemask_t *nodemask ,intalloc_flags ,structzone
*preferred_zone,
01896: intmigratetype ,unsigned long *did_some_progress )
01897:{
01898: structpage *page=NULL;
01899: structreclaim_statereclaim_state;
01900: structtask_struct *p=current;
01901: booldrained=false;
01902:
01903: cond_resched();
01904:
01905: /* We now go into synchronous reclaim */
01906: cpuset_memory_pressure_bump();
01907: p->flags | =PF_MEMALLOC;
01908: lockdep_set_current_reclaim_state(gfp_mask);
01909: reclaim_state.reclaimed_slab =0;
01910: p->reclaim_state = &reclaim_state;
01911:
01912: *did_some_progress=try_to_free_pages(zonelist,order,
gfp_mask,
01912:nodemask);
01913:
01914: p->reclaim_state =NULL;
01915: lockdep_clear_current_reclaim_state();
01916: p- >flags &= ~PF_MEMALLOC;

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

01917:
01918: cond_resched();
01919:
01920: if(order! =0)
01921: drain_all_pages();
01922:
01923: if(unlikely(!(*did_some_progress)))
01924: returnNULL;
01925:
01926:retry :
01927: page =get_page_from_freelist(gfp_mask,nodemask,
order,
01928: zonelist,high_zoneidx,
01929: alloc_flags,preferred_zone,
01930: migratetype);
01931:
01932: /*
01933: * If an allocation failed after direct reclaim, it could be because
01934: * pages are pinned on the per- cpu lists. Drain them and try again
01935: */
01936: if(!page && !drained) {
01937: drain_all_pages();
01938: drained=true;
01939: goto-retry;
01940: }
01941:
01942: returnpage;
01943:}? end alloc_pages_direct_reclaim?

6 Affects page allocation behaviorGFPlogo

In the previous memory page allocation code analysis process, we must usegfp_maskparameter, this parameter is used throughout the

A concept of virtual memory.GFP(Get Free Page) flags, which determine the memory allocator andkswapd

The behavior when allocating and recycling pages, that is, specifying which areas to apply for memory, what kind of memory to apply for, and what to use it for.

Purpose etc. For example, an interrupt handler cannot go to sleep, then it cannot be set to __GRP_WAITsign, because

This flag indicates that the caller can go to sleep.

allGFPThe flag is defined in the fileinclude/linux/gfp.h.

00040:#define__GFP_WAIT((__forcegfp_t)0x10u)/* Can wait and reschedule? */

00041:#define__GFP_HIGH((__forcegfp_t)0x20u)/* Should access emergency
pools? */
00042:#define__GFP_IO ((__forcegfp_t)0x40u)/* Can start physical IO? */
00043:#define__GFP_FS ((__forcegfp_t)0x80u)/* Can call down to low-level
FS? */
00044:#define__GFP_COLD((__forcegfp_t)0x100u)/* Cache-cold page required

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

*/
00045:#define__GFP_NOWARN ((__forcegfp_t)0x200u)

00046:#define__GFP_REPEAT ((__forcegfp_t)0x400u)/* See above */

00047:#define__GFP_NOFAIL ((__forcegfp_t)0x800u)/* See above */

00048:#define__GFP_NORETRY ((__forcegfp_t)0x1000u)/* See above */
00049:#define__GFP_COMP((__forcegfp_t)0x4000u)/* Add compound page
metadata */
00050:#define__GFP_ZERO((__forcegfp_t)0x8000u)/* Return zeroed page on
success */
00051:#define__GFP_NOMEMALLOC((__forcegfp_t)0x10000u)/*Don't
use emergency reserves */
00052:#define__GFP_HARDWALL((__forcegfp_t)0x20000u)
00052:/*Enforce hardwall cpuset memory allocs */
00053:#define__GFP_THISNODE((__forcegfp_t)0x40000u)/* No fallback, no
policies */
00054:#define__GFP_RECLAIMABLE((__forcegfp_t)0x80000u)/* Page is
reclaimable */
00062:#define__GFP_NO_KSWAPD((__forcegfp_t)0x400000u)
00063:#define__GFP_OTHER_NODE((__forcegfp_t)0x800000u)

00021:#define__GFP_DMA((__forcegfp_t)0x01u)
00022:#define__GFP_HIGHMEM((__forcegfp_t)0x02u)
00023:#define__GFP_DMA32 ((__forcegfp_t)0x04u)
00024:#define__GFP_MOVABLE((__forcegfp_t)0x08u) /* Page is movable */
00025:#defineGFP_ZONEMASK(__GFP_DMA|__GFP_HIGHMEM|
__GFP_DMA32|__GFP_MOVABLE)

00062:#define__GFP_NO_KSWAPD((__forcegfp_t)0x400000u)
00063:#define__GFP_OTHER_NODE((__forcegfp_t)0x800000u)
00064:

00069:#define__GFP_NOTRACK_FALSE_POSITIVE(__GFP_NOTRACK)
00070:
00071:#define__GFP_BITS_SHIFTtwenty three /* Room for 23 GFP_FOO bits */
00072:#define__GFP_BITS_MASK((__forcegfp_t)((1<<
__GFP_BITS_SHIFT)-1))

mainGFPThe meaning of the signs is as follows:

surface2BasicGFPLogo meaning

Flag bit Flag bit meaning

Memory requests can be interrupted. That is, other processes can be scheduled to run at this time, or they can be
__GFP_WAIT
interrupted by more important events; the process can also be blocked.

This request is very important, i.e. the kernel urgently needs memory. This flag is usually used if memory

__GFP_HIGH application failure has a serious impact on the kernel or even crashes. Memory application is not allowed and

interrupts are not allowed (this flag is the same asHIGHMEMirrelevant)

__GFP_IO Before trying to find free memory, you can do a diskI/Ooperate

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

Allow execution of file systemI/Ooperation, this flag avoids theVFSLayer application, because it
__GFP_FS
may cause infinite loop recursive calls

__GFP_COLD Request to use cold pages

__GFP_NOWARN Allocation failure does not generate alarm information

__GFP_REPEAT After the allocation fails, it will try to allocate several times, and if it fails again, it will stop.

__GFP_NOFAIL Allocation fails and tries again and again until success

__GFP_NORETRY Allocation failed, no more attempts

__GFP_NO_GROW SlabMechanism internal use

__GFP_COMP for large pages

__GFP_ZERO Allocated pages are filled with zeros

__GFP_NOMEMALLOC Do not use pages intended for emergency use

only inNUMAValid in , limits the memory allocation of the process on the node. The process is

__GFP_HARDWALL bound to the specifiedCPUrun on. If the process runs on allCPUThis flag has no effect when

running on (this is the default)

__GFP_THISNODE only inNUMAValid in , memory can only be used on the current node or the specified node.

__GFP_RECLAIMABLE After this memory is allocated, it can be reclaimed

__GFP_MOVABLE After this memory is allocated, it can be moved

In addition to the basicGFPIn addition to flags, the kernel also defines some combination flags.

00073:
00074:/*This equals 0, but use constants in case they ever change */
00075:#defineGFP_NOWAIT(GFP_ATOMIC& ~__GFP_HIGH)
00076:/*GFP_ATOMIC means both ! wait ( GFP_WAIT not set) and use emergency pool */
00077:#defineGFP_ATOMIC(__GFP_HIGH)
00078:#defineGFP_NOIO (__GFP_WAIT)
00079:#defineGFP_NOFS (__GFP_WAIT|__GFP_IO)
00080:#defineGFP_KERNEL(__GFP_WAIT|__GFP_IO|__GFP_FS)
00081:#defineGFP_TEMPORARY(__GFP_WAIT|__GFP_IO|__GFP_FS| \

http://www.ilinuxkernel.com
LinuxPhysical memory page allocation

00094:#ifdefCONFIG_NUMA
00095:#defineGFP_THISNODE (__GFP_THISNODE|__GFP_NOWARN|
__GFP_NORETRY)
00096:#else

00097:#defineGFP_THISNODE ((__forcegfp_t)0)

00098:#endif
00099:

surface3combinationGFPLogo meaning

combination logo meaning

Used for atomic memory allocation operations, that is, no interruption is allowed under any circumstances, and "emergency
GFP_ATOMIC
reserved" memory may be used

GFP_NOIO not allowedI/Ooperation, but can be interrupted

GFP_NOFS not allowedVFSoperation, but can be interrupted

GFP_KERNEL Used for kernel space memory application, this flag is commonly used in kernel code

GFP_USER Used for user space memory application

GFP_HIGHUSER rightGFP_USERextension, indicating that the high-end memory area can be used

GFP_IOFS used forVFSofI/Ooperate

GFP_THISNODE __GFP_THISNODE | __GFP_NOWARN | __GFP_NORETRY
GFP_DMA used forDMAoperate

GFP_DMA32 used forDMA32operate(x86_64(under construction)

http://www.ilinuxkernel.com

Lecture 14
No ratings yet
Lecture 14
12 pages
Linux Memory
No ratings yet
Linux Memory
25 pages
Introduction To Memory Management
No ratings yet
Introduction To Memory Management
24 pages
Os Mid Term
No ratings yet
Os Mid Term
2 pages
l19 VM Linux
No ratings yet
l19 VM Linux
30 pages
Linux Virtual Memory Management Guide
No ratings yet
Linux Virtual Memory Management Guide
37 pages
Chapter 07 Memory
No ratings yet
Chapter 07 Memory
17 pages
Linux Memory Management Guide
No ratings yet
Linux Memory Management Guide
20 pages
Where Is The Memory Going? Memory Waste Under Linux: Andi Kleen, SUSE Labs August 15, 2006
No ratings yet
Where Is The Memory Going? Memory Waste Under Linux: Andi Kleen, SUSE Labs August 15, 2006
11 pages
Embedded Software 5
No ratings yet
Embedded Software 5
24 pages
Linux Memory Management Guide
No ratings yet
Linux Memory Management Guide
46 pages
CS124 Lec 20
No ratings yet
CS124 Lec 20
30 pages
Operating Systems: Memory Management (Chapter 8: 8.1-8.6)
No ratings yet
Operating Systems: Memory Management (Chapter 8: 8.1-8.6)
48 pages
A Case Study of External Fragmentation in Modern Linux Memory Systems
No ratings yet
A Case Study of External Fragmentation in Modern Linux Memory Systems
12 pages
Virtual Memory Management in Linux
No ratings yet
Virtual Memory Management in Linux
11 pages
Rohini 13560621321
No ratings yet
Rohini 13560621321
12 pages
Linux Contiguous Memory Allocation Guide
No ratings yet
Linux Contiguous Memory Allocation Guide
14 pages
Chapter 3 Handout
No ratings yet
Chapter 3 Handout
23 pages
Chapter 3 Memory Managment
No ratings yet
Chapter 3 Memory Managment
38 pages
UNIX Memory Management Guide
100% (1)
UNIX Memory Management Guide
35 pages
Operating Systems: Memory Management (Chapter 8: 8.1-8.6)
No ratings yet
Operating Systems: Memory Management (Chapter 8: 8.1-8.6)
48 pages
ch9 Unix
No ratings yet
ch9 Unix
7 pages
Memory Management for Developers
No ratings yet
Memory Management for Developers
10 pages
Kernel Memory Managemen
No ratings yet
Kernel Memory Managemen
30 pages
Operating Systems CMPSC 473
No ratings yet
Operating Systems CMPSC 473
31 pages
Chapter - 5
No ratings yet
Chapter - 5
51 pages
Unit-4 Memory MT
No ratings yet
Unit-4 Memory MT
49 pages
Memory Management Techniques Overview
No ratings yet
Memory Management Techniques Overview
40 pages
Memory Management and Virtual Memory Overview
No ratings yet
Memory Management and Virtual Memory Overview
10 pages
Virtual Memory in Red Hat Linux 3
No ratings yet
Virtual Memory in Red Hat Linux 3
20 pages
Module 5
No ratings yet
Module 5
48 pages
Osunit 3
No ratings yet
Osunit 3
72 pages
Memory Management & Virtual Memory
No ratings yet
Memory Management & Virtual Memory
70 pages
CPU-Memory Buffering Explained
No ratings yet
CPU-Memory Buffering Explained
16 pages
OS Unit 4 Notes
No ratings yet
OS Unit 4 Notes
12 pages
Chapter No 5 Memory Management
No ratings yet
Chapter No 5 Memory Management
48 pages
Kernel Memory
No ratings yet
Kernel Memory
24 pages
06 Resource Management
No ratings yet
06 Resource Management
7 pages
CH 7Memory$Device MNGT
No ratings yet
CH 7Memory$Device MNGT
64 pages
Memory Management: Show Frames No Frames
No ratings yet
Memory Management: Show Frames No Frames
19 pages
5.memory Management
No ratings yet
5.memory Management
20 pages
Memory Management
No ratings yet
Memory Management
10 pages
Unit3 - Memory Management
No ratings yet
Unit3 - Memory Management
145 pages
OS Unit 5-Memory
No ratings yet
OS Unit 5-Memory
60 pages
Unit 5 OS
No ratings yet
Unit 5 OS
48 pages
Memory Management in LINUX
No ratings yet
Memory Management in LINUX
16 pages
Os Put 4
No ratings yet
Os Put 4
12 pages
Module 03
No ratings yet
Module 03
63 pages
CHapter-05 Operating System
No ratings yet
CHapter-05 Operating System
10 pages
Lecture 11
No ratings yet
Lecture 11
12 pages
Memory Management: Operating Systems Lecture Notes
No ratings yet
Memory Management: Operating Systems Lecture Notes
38 pages
Kernel Memory Management Guide
No ratings yet
Kernel Memory Management Guide
22 pages
Memory Management Essentials
No ratings yet
Memory Management Essentials
56 pages
Memory Management Lca2006
No ratings yet
Memory Management Lca2006
35 pages
Memory Management: Concept of Memory Hierarchy
No ratings yet
Memory Management: Concept of Memory Hierarchy
10 pages
OS-Chap3-2021 01 22
No ratings yet
OS-Chap3-2021 01 22
70 pages
Paging ND Seg
No ratings yet
Paging ND Seg
38 pages
Memory Management 3
No ratings yet
Memory Management 3
48 pages
COS 318: Virtual Memory Project
No ratings yet
COS 318: Virtual Memory Project
14 pages
FDS ODL Summit 26sept2016
No ratings yet
FDS ODL Summit 26sept2016
131 pages
Scientific Software Packaging Guide
No ratings yet
Scientific Software Packaging Guide
6 pages
Linux Huge Pages Guide
No ratings yet
Linux Huge Pages Guide
6 pages
Virtual Memory Management in The Move From 32 To 64 Bit With Reference To Mac Os X and Linux
No ratings yet
Virtual Memory Management in The Move From 32 To 64 Bit With Reference To Mac Os X and Linux
71 pages
Linux VM MM
No ratings yet
Linux VM MM
126 pages
Assignment
100% (1)
Assignment
3 pages
Comprehensive Guide to Operating Systems
No ratings yet
Comprehensive Guide to Operating Systems
18 pages
Ertos
No ratings yet
Ertos
5 pages
CS604 Current Final Term Papers 2022
No ratings yet
CS604 Current Final Term Papers 2022
48 pages
Interprocess Communication Methods Explained
No ratings yet
Interprocess Communication Methods Explained
13 pages
Distributed Process Management: Operating Systems: Internals and Design Principles, 6/E
No ratings yet
Distributed Process Management: Operating Systems: Internals and Design Principles, 6/E
76 pages
Distributed Mutual Exclusion Algorithms
No ratings yet
Distributed Mutual Exclusion Algorithms
56 pages
Process and Thread Synchronization
No ratings yet
Process and Thread Synchronization
41 pages
Mathematical Foundations in CS Syllabus
No ratings yet
Mathematical Foundations in CS Syllabus
12 pages
OS Concepts for Computer Science Students
No ratings yet
OS Concepts for Computer Science Students
3 pages
Process Synchronization-Race Conditions - Critical Section Problem - Peterson's Solution
No ratings yet
Process Synchronization-Race Conditions - Critical Section Problem - Peterson's Solution
44 pages
Concurrency Bug CH 16
No ratings yet
Concurrency Bug CH 16
16 pages
OS Unit-3
No ratings yet
OS Unit-3
40 pages
MCA 3rd Sem OS Lesson Plan
No ratings yet
MCA 3rd Sem OS Lesson Plan
1 page
21 Concurrency1
No ratings yet
21 Concurrency1
43 pages
OS Previous Year Question
No ratings yet
OS Previous Year Question
99 pages
Distributed Mutual Exclusion
No ratings yet
Distributed Mutual Exclusion
41 pages
CS609 SOLVED MCQs FINAL TERM BY JUNAID
100% (1)
CS609 SOLVED MCQs FINAL TERM BY JUNAID
38 pages
System: A: Process
No ratings yet
System: A: Process
11 pages
OSlab Proj
No ratings yet
OSlab Proj
37 pages
2.practice Questions and Solutions Set-2
No ratings yet
2.practice Questions and Solutions Set-2
14 pages
Operating Systems Practical File
No ratings yet
Operating Systems Practical File
47 pages
Multi Threading in C++
100% (2)
Multi Threading in C++
268 pages
Concurrency: Sync & Mutual Exclusion
No ratings yet
Concurrency: Sync & Mutual Exclusion
32 pages
Operating System Question Bank For Unit 1&2
No ratings yet
Operating System Question Bank For Unit 1&2
9 pages
OS Lab Manual for 2nd Year Students
100% (1)
OS Lab Manual for 2nd Year Students
69 pages
Joriel B 04 Task Performance 1 - ARG
No ratings yet
Joriel B 04 Task Performance 1 - ARG
3 pages
CSE357 Workbook
No ratings yet
CSE357 Workbook
62 pages
Chapter 05
No ratings yet
Chapter 05
29 pages
What Is Deadlock?
No ratings yet
What Is Deadlock?
79 pages

Linux Physical Memory Page Allocation - ZH-CN - en

Uploaded by

Linux Physical Memory Page Allocation - ZH-CN - en

Uploaded by

Translated from Chinese (Simplified) to English - www.onlinedoctranslator.

LinuxPhysical memory page allocation

Original link address:http://ilinuxkernel.com/?p=1371

Welcome to criticize, correct or leave questions.

2 Kernel page allocation and recyclingAPI ........................................................ ................................................................. ............3

3 Management of free pages........................................ ................................................................. .............................4

3.1 Physical memory space description........................................ ................................................................. .............4

3.2 Management of free pages........................................ ................................................................. .....................5

4 Partner Algorithm................................................ ................................................................. ........................................7

4.1 Buddy System................................................ ................................................................. .............7

4.2.1 Page allocation process................................................ ................................................................. ............9

4.2.2 Page recycling process................................................ ................................................................. ...........10

4.3 BuddyView system information................................................ ................................................................. ........11

5 Page Allocation................................................ ................................................................. ....................................................12

5.1 UMAPage Allocation................................................ ................................................................. ............12

5.3.2 __alloc_pages_nodemask()........................................ .....................................17

5.4.3 get_page_from_freelist()........................................ ....................................................twenty two

Gate memory application/release function.

2 Kernel page allocation and recyclingAPI

surface1Physical memory page allocation and recyclingAPI

alloc_page(gfp_mask) Allocate a page and return the page data structure

get_dma_pages(gfp_mask,order) Assignment fitsDMAOperation page

__free_pages(page,order) freed2 orderspage, the first parameter is the page address

free_pages(addr,order) freed2 orderspage, the first parameter is the logical address

free_page(addr) freed2 orderspages

The relationship between the various functions of page allocation is as follows:

picture2Relationship between page allocation functions

The relationship between the various functions of page recycling is as follows:

picture3The relationship between page recycling functions

3Management of free pages

3.1Physical memory space description

exist"LinuxPhysical Memory Description》(http://ilinuxkernel.com/?p=1332 ), we introduced in detail the

picture4Node, area and page relationship diagram

3.2Management of free pages

The idea of memory management is similar.

together. Free pages are organized as shown in the figure below.

picture5Management of free page blocks

The memory size is211-1pages, that is4MB.

00022:/*Free memory management - zoned buddy allocator. */

Zone and free page

Free memory block linked list,

00331: structfree_area free_area[MAX_ORDER];

00445: unsigned longpadding[16];

sofree_area[MAX_ORDER]No. 1 in the array1elements, pointing to a memory block size of20Right now1pages

Each area (zone) has onefree_area[MAX_ORDER]array, its data typefree_areastructure

The body is defined as follows:

The meaning of each member variable is as follows:

free_list: Double linked list of free page blocks;

nr_free: The number of free page blocks in this area;

table member variables to connect, as shown in the figure below.

picture6Management of free page linked lists

4.1 Buddy System

so all2page-sized free blocks in a linked list,4page-sized free blocks in another chain

picture8Allocation of free pages

4.2Example of partner algorithm

picture9Partner Algorithm Example

order=2linked list (size22-1pages)3node;order=3The linked list is empty;order=4linked list (large

Small24-1pages)1node;order=5The linked list is empty;order=6The linked list is empty.

picture10Buddy Algorithm Example - Free Page Organization

4.2.1Page allocation process

The steps to allocate page blocks are as follows:

(4) to update relevant statistical information.

= free = used = used

picture11Buddy Algorithm Allocation Page Block Example

picture12Buddy Algorithm Example - Free Page Organization after Allocating a Block

4.2.2 Page recycling process

Now the upper layer releases a physical page, as marked in purple.

= free = used = used

picture13Partner Algorithm Recycling Page Block Example

The steps to recycle page blocks are as follows:

(1) marks the page block as free;

Continue physical page blocks (this avoids memory fragmentation);

(3) If there is a merge, it needs to be updatedfreearea []Medium linked list element;

(4) to update relevant statistical information.

= free = used u = used

picture14System physical memory page status

00022:/Free memory management - zoned buddy allocator. /

00041:#defineMIGRATE_PCPTYPES 3/the number of types on the pcp lists /