Linux Physical Memory Page Allocation - ZH-CN - en
Linux Physical Memory Page Allocation - ZH-CN - en
com
Linuxphysical memory
Page allocation
http://www.ilinuxkernel.com
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
Head record
1 Overview................................................ ................................................................. .................................................................3
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
1Overview
In user modeCIn language programs, we allocate memory functionsmalloc()orcalloc() is very familiar; function execution
If the operation is successful, the required memory starting address will be returned. Obviously these functions cannot run in the kernel mode. There are special functions in the kernel mode.
LinuxHow to allocate and reclaim memory in the kernel? How to manage free memory? This article is based onlinux 2.6.32-220.el6Version
This kernel source code is based on the introductionLinuxHow physical memory pages are allocated in the kernel.
Let’s first understand what memory pages are allocated and recycled in the kernel.API,commonAPIAs shown in the following table:
function describe
alloc_pages(gfp_mask,order) distribute2 orderspage, returns the data structure of the first page
__get_free_page(gfp_mask) Allocates a page and returns the logical address of the page
__get_free_pages(gfp_mask, distribute2 orderspage, and returns the logical address of the page
get_zeroed_page(gfp_mask) distribute1page, the data is cleared, and the logical address is returned
kmem_cache_alloc() and other functions. Here we only introduce page allocation.slabThe mechanism will be introduced inkmalloc()
functions.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
The kernel divides physics into three levels: nodes (Node),area(Zone) and pages (Page).
The physical memory space is described at a high level as a node (Nodes), and then the node includes multiple areas (Zones),
The back area contains many pages (Pages), the relationship between the three is shown in the figure below.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
When allocating and recycling pages, it must involve how free pages are managed.
When shopping in the supermarket, we can pay attention to how the cashier manages money (organizes, stores). usually put5
Put all the corners together,1Yuan all put together... face value100Yuan's all put together.LinuxKernel idle
existLinuxIn the kernel, the basic unit of free memory management is the page (x86/x86-64 CPUdefined page), that is
Manage physical memory in units of pages (kmallocwaitslab/slubmechanism, which is a smaller segment than a page).
Just like a supermarket cashier managing money,LinuxEach free block of memory managed by the kernel is2to the power of pages,
The size of the power isorder. Bundle1put together the free pages,2free pages (with consecutive physical addresses) placed together
rise,4free pages (with consecutive physical addresses) put together...2MAX_ORDER-1pages (physical addresses are consecutive)
exist2.6.32-220.el6In the kernel,MAX_ORDERusually defined as11, the kernel manages large contiguous idle physical
00026:#defineMAX_ORDERCONFIG_FORCE_MAX_ZONEORDER
00027:#endif
00028:#defineMAX_ORDER_NR_PAGES(1<<(MAX_ORDER-1))
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00287:structzone{
00288: /* Fields commonly accessed by the page allocator */
00289:
.......
free page linked list; the first in the array2elements point to a memory block of size20Right now1free page linked list of pages; after
An element points to a large free memory block linked list with size2MAX_ORDER-1pages, currentlyMAX_ORDERvalue
defined as11.
00057:structfree_area{
00058: structlist_head free_list[MIGRATE_TYPES];
00059: unsigned long nr_free;
00060:};
Each element on each free page linked list (consecutive physical pages of the same size) is passedstruct pagedouble strand in
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
we knowLinuxThe kernel describes physical memory at three levels: nodes, regions, and pages. Management of free pages only
is in the area (Zone) this layer, node (Node) manages its own free physical pages. null
The relationship between free page management, nodes, and regions is as shown below.
picture7The relationship between free page management and nodes and regions
4partner algorithm
Partner system (Buddy System) is a very simple memory allocation algorithm in theory. Its main purpose is to
Possibly less external debris (external fragmentation), while allowing rapid allocation and reclaiming of physical pages. in order to reduce
Less external fragmentation, contiguous free pages, organized into different blocks according to the size of free blocks (composed of contiguous free pages)
linked list (ororders). The free physical page management introduced in the previous section is part of the partner system.
table, and so on. Note that blocks of different sizes will not overlap spatially. The figure below shows the allocation of free pages.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
picture.
When a requirement is4consecutive pages, check whether there is a page of size23-1free blocks of pages to quickly satisfy requests.
If on the linked list (each node is of size4block of the page) If there is a free block, it is allocated to the user, otherwise it goes to the next
level(order) to search in the linked list. If exists (8page) free block (now on another level of the linked list),
then split the page block into two4Blocks of pages, one block is allocated to the requester, and another block is added to4Block list of pages
middle. This avoids splitting large free blocks when there are smaller page blocks that can satisfy demand, thus reducing external fragmentation.
The previous text description seems relatively abstract. Now let's take a specific example to illustrate the content of the partner algorithm. Assume our
System memory is only32pagesRAM, the physical page usage is shown in the figure below.
f = free u = used
At this time, the free memory pages are organized as shown below.order=1linked list (size21-1pages)5node;
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
order=0 1
order=1 2
order=2 4
order=3 8
order=4 16
order=5 32
Now the upper layer requests allocation4indivualAddresses are consecutiveof free physical page blocks.
(1)4 = 23-1, so fromOrder = 3Start looking for free blocks on the free block list;
(2) due toorder=3There is no free block on the linked list; you need to go to the upper levelorderFind if there are free blocks;
(2)fromOrder = 4Start searching on the linked list, there is a free node; but the block size of each node on the linked list
for8pages, allocated4Give the page to the upper layer and mark the page table as used.
(3) left4pages. At this time, the remaining4pages, putorder=3on the linked list;
Note: After the page allocation is completed, other data structures need to be updated, which will be introduced in the detailed analysis of the code. Here we focus on the page allocation process.
distribute4After the page blocks are given to the upper layer, the free pages are organized as shown below.
f f f f u u f u f u f u u u u u
u u u u f f f f f u f u f f u u
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
freearea[ ]
order=0 1
order=1 2
order=2 4
order=3 8
order=4 16
order=5 32
Question: Is there any non-2Power of page application request? Such as one-time application from the upper level6pages. existLinuxIn the kernel, it does not exist
Such a memory size request,APIThe function guarantees that it must be applied2order-1pages. If the upper management really wants to apply6page, you can
One-time application8page, you can also apply4(order=3)+2(order=2) pages (but make sure this6The physical addresses of the pages are consecutive).
f f f f u u f u f u f u u u u u
u u u u u u f f f u f u f f u u
(2) Check whether adjacent physical pages are free; if adjacent physical pages are free, try to merge them into larger connections.
From the above example, you can see that the pages before and after the released page are free, so they can be merged into4consecutive free objects
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
management page (order=3,23-1). The pages are recycled, and the free pages after merging are organized as follows:
f f f f u u f u f u f u u u u u
u u u u u u f f f u f f f f u u
order=0 1
order=1 2
order=2 4
order=3 8
order=4 16
order=5 32
You can also passecho m > /proc/sysrq-triggerCome and observeBuddysystem status. and/proc/buddyinfoof
[ 134.154722] Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB
1*1024kB 1*2048kB 3*4096kB = 15908kB
[ 134.154747] Node 0 DMA32: 10*4kB 7*8kB 5*16kB 8*32kB 8*64kB 7*128kB 6*256kB
7*512kB 3*1024kB 3*2048kB 731*4096kB = 3010352kB
[ 134.154770] Node 0 Normal: 202*4kB 227*8kB 51*16kB 120*32kB 90*64kB 44*128kB
19*256kB 8*512kB 5*1024kB 3*2048kB 18*4096kB = 112624k B
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
5Page allocation
Linuxprovides a series ofAPIto allocate physical pages, theseAPIAll based on functionsalloc_pages( ). Place
someAPIfunctions are usedgfp_maskParameter that determines the behavior of the allocator. Later, we will introduce in detail
memory allocation flagsGFP(Get Free Page) flags, which determine the memory allocator andkswapdAssign and return
00304:#ifdefCONFIG_NUMA
00305:externstructpage *alloc_pages_current(gfp_tgfp_mask ,
unsignedorder );
00306:
00307:static inlinestructpage *
00308:alloc_pages(gfp_tgfp_mask ,unsigned intorder )
00309:{
00310: returnalloc_pages_current(gfp_mask,order);
00311:}
00312:externstructpage *alloc_pages_vma(gfp_tgfp_mask ,int
order ,
00313: structvm_area_struct *vma ,unsigned longaddr ,
00314: intnode );
00315:#else
00316:#definealloc_pages(gfp_mask ,order )\
00317: alloc_pages_node(numa_node_id(),gfp_mask,order)
00318:#definealloc_pages_vma(gfp_mask ,order ,vma ,addr ,node )\
00319: alloc_pages(gfp_mask,order)
00320:#endif
00321:#definealloc_page(gfp_mask )alloc_pages(gfp_mask,0)
00322:#definealloc_page_vma(gfp_mask ,vma ,addr ) 00323: \
alloc_pages_vma(gfp_mask,0,vma,addr,numa_node_id())
00324:#definealloc_page_vma_node(gfp_mask ,vma ,addr ,node ) \
00325: alloc_pages_vma(gfp_mask,0,vma,addr,node)
00326:
00327:extern unsigned long__get_free_pages(gfp_tgfp_mask ,
unsigned intorder );
00328:extern unsigned longget_zeroed_page(gfp_tgfp_mask );
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00316:#definealloc_pages(gfp_mask ,order )\
00317: alloc_pages_node(numa_node_id(),gfp_mask,order)
00279:static inlinestructpage *
00280:__alloc_pages(gfp_tgfp_mask ,unsigned intorder ,
00281: structzonelist*zonelist )
00282:{
00283: return__alloc_pages_nodemask(gfp_mask,order,
zonelist,NULL);
00284:}
().
When a process needs to allocate several consecutive physical pages, it can passalloc_pages()To be done.
Because huge page blocks are2MAX_ORDER-1,likeordergreater than or equal toMAX_ORDER, obviously out of scope.
MAX_ORDERdefined as11, so when the user requests page allocation from the kernel, each time the user requests2MAX_ORDER-1Right now
210pages. exist4Kpage size system, each timeMultiple allocation4MBof contiguous physical memory. Some people can't help but ask,
If the kernel requires more memory than4MBwhat to do? At this time, you can only apply multiple times in a row4MBThe memory is broken into large chunks of memory, and
Under normal circumstances, it is best not to exceed the memory requested by the driver.256KBcontiguous memory, because as the system runs, the memory
Page usage will be fragmented, making it difficult to find pages larger than256KBof contiguous physical memory space.
00307:static inlinestructpage *
00308:alloc_pages(gfp_tgfp_mask ,unsigned intorder )
00309:{
00310: returnalloc_pages_current(gfp_mask,order);
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00311:}
Memory policy refers to the degree to which an application controls its own memory allocation. able to passmbindandset_memolicyltwo
The system calls to set the memory policy. You can set the policy for the entire process or a segment of address space. The memory policy will be obtained from the parent process.
System SupportMPOL_DEFAULT,MPOL_PREFERRED,MPOL_INTERLEAVEand
Strategy meaning
Default policy. That is to say, memory should be allocated from the current node. When the current node does not have free memory, memory
MPOL_DEFAULT
should be allocated from the nearest node with free memory.
Allocate memory from the specified node. If there is no free memory on this node, any other node
MPOL_PREFERRED
can.
Memory allocation should cover all nodes. This strategy is usually used in shared memory areas. The allocated memory
MPOL_INTERLEAVE
covers all areas to ensure that no node is overloaded and that the memory size used on each node is the same.
Memory allocation is specified in a specific set of nodes (that is, certain nodes). When these nodes cannot provide
MPOL_BIND
the required memory, memory allocation fails.
cpusetyes2.6A module in the kernel version that allows users to convert multiplecpuThe system is divided into different areas,
Each area includescpuand physical memory segments. You can set a process to be executed only in a specific area, and the process
Computing resources outside the region will not be used. General applications such aswebserver, orNUMAHigh performance in the architecture
Even servers can be usedcpusetto improve performance. can be done in the kernelconfigIn the configuration file, view
- cpusethas a higher qualified priority than the memory policy (withnoderelated) and bound (withcpurelated);
5.2.2 alloc_pages_current()
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
01863:{
01864: structmempolicy*pol=current->mempolicy;
01865: structpage *page;
01866:
01867: if(!pol| |in_interrupt()| |(gfp&__GFP_THISNODE))
01868: pol= &default_policy;
01869:
01870: get_mems_allowed();
01871: /*
01872: * No reference counting needed for current->mempolicy
01873: * nor system default_policy
01874: */
01875: if(pol->mode ==MPOL_INTERLEAVE)
01876: page =alloc_page_interleave(gfp,order,interleave_nodes(pol));
01877: else
01878: page =__alloc_pages_nodemask(gfp,order,
01879: policy_zonelist(gfp,pol,numa_node_id()),
01880: policy_nodemask(gfp,pol));
01881: put_mems_allowed();
01882: returnpage;
01883:}? end alloc_pages_current ?
01884:EXPORT_SYMBOL(alloc_pages_current );
Four types were introduced earlierNUMAMemory allocation strategy, when the memory allocation flag is set__GFP_THISNODE,bright
When the memory allocation is indeed applied on the current node, or the code application is interrupted, or the memory allocation policy of the current process is empty
(1867OK); that is, the system default allocation strategy is used. The default memory allocation strategy isMPOL_PREFERRED.
00115:structmempolicydefault_policy={
00116: .refcnt=ATOMIC_INIT(1),/*never free it */
00117: .mode=MPOL_PREFERRED,
00118: . flags =MPOL_F_LOCAL,
00119:};
The function function is to increase and decrease the current processmems_allowed_change_disablecount. If the kernel is shut down
During the memory allocation process, avoid the upper layer from changing the memory allocation strategy.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00097:
00106: smp_mb();
00107:}
Here we analyzeNUMAThe situation under the default memory allocation policy, that is, the code will be executed1878OK. two parameters
5.3 __alloc_pages_nodemask()
existNUMAIn the architecture, memory can be moved between nodes to make the page more local to the using process.
sex. When multiple processes are running on a node set, and then a process terminates, resulting in uneven memory usage, you need to
To move part of the memory from one node to another node to restore memory distribution balance and reduceNUMADelay. Page
00038:#defineMIGRATE_UNMOVABLE 0
00039:#defineMIGRATE_RECLAIMABLE 1
00040:#defineMIGRATE_MOVABLE 2
00041:#defineMIGRATE_PCPTYPES 3/*the number of types on the pcp lists */
00042:#defineMIGRATE_RESERVE 3
00043:#defineMIGRATE_ISOLATE 4/*can't allocate from here */
00044:#defineMIGRATE_TYPES 5
lockdepyeslinuxA debugging module of the kernel, used to check the kernel mutual exclusion mechanism, especially the potential deadlock of spin locks.
question. SpinLock(spin lock) Because it waits in query mode and does not release the processor, it is more tolerant than the general mutual exclusion mechanism.
- A lock has both performed a locking operation when the interrupt (or the lower half of the interrupt) is enabled, and also executed the lock operation when the interrupt (or the lower half of the interrupt) is enabled.
The locking operation was performed in the lower half). In this way, the lock may be attempted to be locked at the same time due to an interruption.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
- Locking causes the dependency graph to form a closed loop, which is a typical deadlock phenomenon.
5.3.2 __alloc_pages_nodemask()
__alloc_pages_nodemask()yesLinuxKernelBuddyThe core function of the allocator, the source code is in the file
mm/page_alloc.c.
02190:/*
02191:*This is the 'heart' of the zoned buddy allocator.
02192:*/
02193:structpage *
02194:__alloc_pages_nodemask(gfp_tgfp_mask ,unsigned int
order ,
02195: structzonelist*zonelist ,nodemask_t *nodemask )
02196:{
02197: enumzone_typehigh_zoneidx=gfp_zone(gfp_mask);
02198: structzone *preferred_zone; structpage *page;
02199:
02200: intmigratetype=allocflags_to_migratetype(gfp_mask);
02201:
02202: gfp_mask&=gfp_allowed_mask;
02203:
02204: lockdep_trace_alloc(gfp_mask);
02205:
02206: might_sleep_if(gfp_mask&__GFP_WAIT);
02207:
mark.
2200Row: Apply for flags based on the pagegfp_maskConvert to the corresponding memory migration type (migrate type);
02208: if(should_fail_alloc_page(gfp_mask,order))
02209: returnNULL;
02210:
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
02211: /*
02212: * Check the zones suitable for the gfp_mask contain at least one
02213: * valid zone. It's possible to have an empty zonelist as a result
02214: * of GFP_THISNODE and a memoryless node
02215: */
02216: if(unlikely(!zonelist- >_zonerefs- >zone))
02217: returnNULL;
02218:
02219: get_mems_allowed();
02220: /* The preferred zone is used for statistics later */
02221: first_zones_zonelist(zonelist,high_zoneidx,nodemask,
&preferred_zone);
02222: if(!preferred_zone) {
02223: put_mems_allowed();
02224: returnNULL;
02225: }
02226:
2208~2209Line: before actually trying to allocate the page, based ongfp_maskandordervalue, check if it can
2216~2217Line: Check againstgfp_mask, there is at least one valid area; if there is no valid area, then
2221Row: In the area list, according tonodemask, find the appropriate value less than or equal tohighest_zoneidxdistrict
Assuming that an area that meets the request conditions is found, we continue to analyze.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
The previous codes are all basic checks. After the checks pass, try to passget_page_from_freelist
()(2228row) assigned from the area list2orderindivualPhysical addresses are consecutivepage. i.e. first try on the free page chain
Allocate pages in the table. Obviously as the system runs, there will be fewer and fewer free pages (such asPage CacheWill occupy memory,
i.e. used as a block deviceI/Ocache), viaget_page_from_freelist() may fail to allocate memory. In this case, you need to
transfer__alloc_pages_slowpath() function allocates pages from the global memory pool, and its work includes recycling physical
memory page.
5.4 get_page_from_freelist()
5.4.1 area(Zone)level
When the free memory in the system becomes low,kswapdThe daemon will be awakened to release the page. If the memory free rate
very low,kswapdwill release the memory synchronously, sometimes called direct recycling (direct-reclaim)path.
pressure on the region. The regional level is similar to the water level of the reservoir. During drought, the reservoir has a low-level alarm; it can also
Different low water levels are divided into different alarm levels.
00159:enumzone_watermarks{
00160: WMARK_MIN,
00161: WMARK_LOW,
00162: WMARK_HIGH,
00163: NR_WMARK
00164:};
00166:#definemin_wmark_pages(z ) (z->watermark[WMARK_MIN])
00167:#definelow_wmark_pages(z ) (z->watermark[WMARK_LOW])
00168:#definehigh_wmark_pages(z ) (z->watermark[WMARK_HIGH])
00169:
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
04972:void setup_per_zone_wmarks(void)
04973:{
04974: unsigned longpages_min=min_free_kbytes>>
(PAGE_SHIFT-10);
04975: unsigned longpages_low=extra_free_kbytes>>
(PAGE_SHIFT-10);
pages_min:When the free page reachespages_min, the allocator will wake upkswapdWork in a synchronous manner,
pages_high:wakekswapdAfter the process, the free page reachespages_high, it will not be considered necessary
Balance area. When this level is reached,kswapdwill enter sleep state;pages_highThe default value is
pages_minthree times.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
5.4.2 Hot-N-Coldpage
When a certain physical memory page data is inCPU Cachemiddle,CPUTo access the data on this page, you can quickly and directly access theCache
is read in, at this time the page is calledhot(Hot)page; Otherwise, the page is notCPU Cachecalled incold(Cold)
page. in manyCPUIn the system, eachCPUEveryone has their ownCache, so hot and cold pages pressCPUto manage, each
Not the current systemCPUquantity, but the maximum size supported by the current kernelCPUquantity.
00287:structzone{
...
00312:#ifdefCONFIG_NUMA
...
00319: structper_cpu_pageset *pageset[NR_CPUS];
00320:#else
00321: structper_cpu_pageset pageset[NR_CPUS];
00322:#endif
...
00179:structper_cpu_pageset{
00180: structper_cpu_pagespcp;
00181:#ifdefCONFIG_NUMA
00182: s8expire;
00183:#endif
00184:#ifdefCONFIG_SMP
00185: s8stat_threshold;
00186: s8vm_stat_diff[NR_VM_ZONE_STAT_ITEMS];
00187:#endif
00188:} cacheline_aligned_in_smp;
00170:structper_cpu_pages{
00171: intcount; /* number of pages in the list */
00172: inthigh; /* high watermark, emptying needed */ /
00173: intbatch; * chunk size for buddy add/remove */
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00174:
00175: /* Lists of pages, one per migrate type stored on the pcp-lists */
00176: structlist_headlists[MIGRATE_PCPTYPES];
00177:};
countis the number of pages in the linked list;highis the level, ifcount > high, it means there are too many pages in the linked list,
Some parts need to be cleared;batchagainstbuddyBlock size added/removed by the algorithm;listFor the page list header.
The picture below shows twoCPUIn the system, hot and cold page statistical information and data structure.
we can passecho m > /proc/sysrq-triggerTo observe the cold/hot page situation in the current system. Below is
An instance.
5.4.3 get_page_from_freelist()
01643:/*
01644:*get_page_from_freelist goes through the zonelist trying to allocate
01645:*a page. 01646:*/
01647:staticstructpage *
01648:get_page_from_freelist(gfp_tgfp_mask ,nodemask_t
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
01698: if(!zone_watermark_ok(zone,order,mark,
01699: classzone_idx,alloc_flags))
01700: goto-this_zone_full;
01701: }
01702: }? end if !(alloc_flags&ALLOC_N... ?
01703:
Before actually looking for free pages, do some basic checks. If the kernel is turned onNUMA, just pass
zlc_zone_worth_trying() function to quickly check whether the area is worthy of further searching for free memory; if the
If the area is not worth searching, go to the next area and perform the same action (1669~1670OK).
Next, we need to check the regional level to see if the current area meets the level requirements (1676~1702OK).
(1) If the level of this area meets the requirements, directly try to allocate pages in this area (1681~1683OK);
(2) If the level of this area cannot meet the requirements, and the areazone_reclaim_modeThe value is0(1685~1686
(3) If neither of the above two conditions are met, you have to callzone_reclaim() Try to reclaim the memory in this area
(1681~1701OK). If the return value is that the area is not scannedZONE_RECLAIM_NOSCAN, skip this area
Try the next zone (1690row); if the return value isZONE_RECLAIM_FULLIf it cannot be recycled, mark the area
It is full. Next time, others should not waste time scanning this area (1693line); if part of the memory is successfully reclaimed,
We continue to look at the code. We checked the area level earlier and believe that the current area can have enough free pages to satisfy the request.
01704:try_this_zone :
01705: page =buffered_rmqueue(preferred_zone,zone,order,
01706: gfp_mask,migratetype);
01707: if(page)
01708: break;
01709:this_zone_full :
01710: if(NUMA_BUILD)
01711: zlc_mark_zone_full(zonelist,z);
01712:try_next_zone :
01713: if(NUMA_BUILD&& !did_zlc_setup&&nr_online_nodes>1) {
01714: /*
01715: * we do zlc_setup after the first zone is tried but only
01716: * if there are multiple nodes make it worthwhile
01717: */
01718: allowednodes=zlc_setup(zonelist,alloc_flags);
01719: zlc_active=1;
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
01720: did_zlc_setup=1;
01721: }
01722: }
01723:
01724: if(unlikely(NUMA_BUILD&& page ==NULL&&zlc_active)) {
01725: /* Disable zlc cache for second zonelist scan */
01726: zlc_active=0; goto-zonelist_scan;
01727:
01728: }
01729: returnpage;
01730:}? end get_page_from_freelist?
1. buffered_rmqueue()
01292:static inline
01293:structpage *buffered_rmqueue(structzone*preferred_zone ,
01294: structzone*zone ,intorder ,gfp_tgfp_flags ,
01295: intmigratetype )
01296:{
01297: unsigned longflags;
01298: structpage *page;
01299: intcold= ! !(gfp_flags&__GFP_COLD);
01300: intcpu;
01301:
01302:again :
01303: cpu =get_cpu();
01304: if(likely(order==0)) {
01305: structper_cpu_pages *pcp;
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
First determine whether only one page is requested to be allocated (1304OK). eachCPUThey all maintain a linked list of migration pages.
00038:#defineMIGRATE_UNMOVABLE 0
00039:#defineMIGRATE_RECLAIMABLE 1
00040:#defineMIGRATE_MOVABLE 2
in the specifiedmigratetypeCheck whether there is a free page on the migration page linked list. If there is a free page,
(1308~1311row), allocate a page and update the count in the migration list. In addition, depending on whether to specify the application
If the requested memory is more than one page, execute1326line of code behind. Typically, rarely used
01326: }else{
01327: if(unlikely(gfp_flags&__GFP_NOFAIL)) {
01338: WARN_ON_ONCE(order>1);
01339: }
01340: spin_lock_irqsave(&zone->lock,flags); page =__
01341: rmqueue(zone,order,migratetype);
01342: spin_unlock(&zone->lock); if(!page)
01343:
01344: goto-failed;
01345: __mod_zone_page_state(zone,NR_FREE_PAGES,-(1<<
order));
01346: }? end else?
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
The next step is to pass__rmqueue() Originzoneallocated in area2orderpages with consecutive physical addresses, in
Before performing page allocation, the area must be locked to prevent otherCPUAlso operate this area (1340row), the allocation page is completed
and then releasezone->lockLock. At this point in the analysis, we found that after struggling for so long, we have not yet implemented the real analysis.
01348: __count_zone_vm_events(PGALLOC,zone,1<<order);
01349: zone_statistics(preferred_zone,zone,gfp_flags);
01350: local_irq_restore(flags); put_cpu();
01351:
01352:
01353: VM_BUG_ON(bad_range(zone,page)); if(
01354: prep_new_page(page,order,gfp_flags))
01355: goto-again;
01356: returnpage;
01357:
01358:failed :
01359: local_irq_restore(flags);
01360: put_cpu();
01361: returnNULL;
01362:}? end buffered_rmqueue?
1348~1349OK, here are some statistical information about the region. After the page application is successful, return directly to the first page block
5.4.4 __rmqueue()
The core code of the page being allocated. The source code is also in the filemm/page_alloc.cmiddle.
existNUMAIn the system, the kernel always first tries to start the process fromCPUAllocate memory on the node, so that the memory performance
Can be better. However, this allocation strategy does not guarantee success every time. For situations where memory cannot be allocated from this node, every time
Each node provides afallbacklinked list. This linked list contains other nodes and regions that can be used as a substitute for memory allocation.
Generation choice.
If it cannot be found in the area linked list, give it to youordersize andmigratepetype of page block, you need to call
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00972:/*
00973:*Do the hard work of removing an element from the buddy
allocator.
00974:*Call me with the zone->lock already held.
00975:*/
00976:staticstructpage *__rmqueue(structzone*zone ,unsigned int
order ,
00977: intmigratetype )
00978:{
00979: structpage *page;
00980:
00981:retry_reserve :
00982: page =__rmqueue_smallest(zone,order,migratetype);
00983:
00984: if(unlikely(!page)&&migratetype! =MIGRATE_RESERVE) {
00985: page =__rmqueue_fallback(zone,order,migratetype);
00986:
00987: /*
00988: * Use MIGRATE_RESERVE rather than fail an allocation.
goto
00989: * is used because rmqueue_smallest is an inline function
00990: * and we want just one call site
00991: */
00992: if(!page) {
00993: migratetype=MIGRATE_RESERVE;
00994: goto-retry_reserve;
00995: }
00996: }
00997:
00998: trace_mm_page_alloc_zone_locked(page,order,migratetype);
00999: returnpage;
01000:}? end rmqueue?
1. __rmqueue_smallest()
(1)fromorderStart looking for free blocks on the starting free block list;
(2) if currentorderIf there is a free page block on the page; then remove the free page block and then jump to step (4);
(3) if currentorderThere is no free page block on theorder=order+1, the jump needs to go to the previous level to find
Whether there is a free block; jump to step (2) execution; iforder>MAX_ORDER-1, then jump to step (5);
(4) if the allocated page block is locatedorderis greater than the requested value, the remaining page blocks should also be placed in a lowerorderchain
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00779:/*
00780:*Go through the free lists for the given migratetype and remove
00781:*the smallest available page from the freelists 00782:*/
00783:static inline
00784:structpage *__rmqueue_smallest(structzone*zone ,
unsigned intorder ,
00785: intmigratetype )
00786:{
00787: unsigned intcurrent_order;
00788: structfree_area *area; struct
00789: page *page;
00790:
00791: /* Find a page of the appropriate size in the preferred list */
00792: for(current_order=order;current_order<MAX_ORDER;
+ +current_order) {
00793: area= &(zone->free_area[current_order]); if(
00794: list_empty(&area->free_list[migratetype]))
00795: continue;
00796:
00797: page =list_entry(area->free_list[migratetype].next,
00798: structpage,lru);
00799: list_del(&page- >lru);
00800: rmv_page_order(page);
00801: area->nr_free- -;
00802: expand(zone,page,order,current_order,area,migratetype);
00803: returnpage;
00804: }
00805:
00806: returnNULL;
00807:}? end rmqueue_smallest?
2. __rmqueue_fallback()
__rmqueue_fallback() function’s allocation page block process and __rmqueue_smallest() similar to area
The difference lies infallbackin the linked list, not on this nodezonearea. No detailed explanation is given here.
00902:/*Remove an element from the buddy allocator from the fallback list
*/
00903:static inlinestructpage *
00904:__rmqueue_fallback(structzone*zone ,intorder ,int
start_migratetype )
00905:{
00906: structfree_area *area;
00907: intcurrent_order;
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00958: start_migratetype);
00959:
00960: expand(zone,page,order,current_order,area,migratetype);
00961:
00962: trace_mm_page_alloc_extfrag(page,order,current_order,
00963: start_migratetype,migratetype);
00964:
00965: returnpage;
00966: }? end for i=0;i<MIGRATE_TYPES-1... ?
00967: }? end for current_order=MAX_ORD... ?
00968:
00969: returnNULL;
00970:}? end rmqueue_fallback?
3. expand()
expandThe main function of the () function is to allocate part of the page block on the free linked list, and then put the free part into
Call the parameter listlowCorresponds to the block size representing the page to which it belongsorder,andhighthen corresponds to indicating that the current idle
area queue (that is, the queue from which page blocks that meet the requirements are obtained)curr_order. When the two match, from782
line startswhileThe loop is skipped. If the allocated page block is larger than the required size (it is impossible to be smaller than the required size
size), then chain the page block into a lower block (smallerorder), that is, in the idle queue with the physical block size halved
go. Then cut off half of the physical block and use the second half as a new physical block, and then start the next cycle.
That is, the lower level free page block linked list is processed. In this way, there will behighandlowThe two are equal, that is, the actual
When the remaining physical blocks are exactly equal to the requirements, the cycle ends.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
5.5 __alloc_pages_slowpath()
Obviously as the system runs, there will be fewer and fewer free pages (such asPage CacheWill occupy memory, that is, used as a block device
PrepareI/Ocache), viaget_page_from_freelist() is likely to fail to allocate memory, and it must be called at this time
__alloc_pages_slowpath() function, the function source code is still in the filemm/page_alloc.c. Just from the function name
It can be seen that the process of allocating memory pages here is relatively slow.
02024:static inlinestructpage *
02025: __alloc_pages_slowpath(gfp_tgfp_mask ,unsigned
intorder ,
02026: structzonelist*zonelist ,enumzone_typehigh_zoneidx ,
02027: nodemask_t *nodemask ,structzone *preferred_zone ,
02028: intmigratetype )
02029:{
02030: const gfp_twait=gfp_mask&__GFP_WAIT;
02031: structpage *page=NULL;
02032: intalloc_flags;
02033: unsigned longpages_reclaimed=0;
02034: unsigned longdid_some_progress;
02035: structtask_struct *p=current;
02036: boolsync_migration=false;
02037:
02038: /*
02039: * In the slowpath, we sanity check order to avoid ever trying to
02040: * reclaim >= MAX_ORDER areas which will never succeed. Callers may
02041: * be using allocators in order of preference for an area that is
02042: * too large.
02043: */
02044: if(order>=MAX_ORDER) {
02045: WARN_ON_ONCE(!(gfp_mask&__GFP_NOWARN));
02046: returnNULL;
02047: }
02048:
02057: if(NUMA_BUILD&&(gfp_mask&GFP_THISNODE)==
GFP_THISNODE)
02058: goto-nopage;
02059:
02060:restart :
02061: if(!(gfp_mask&__GFP_NO_KSWAPD))
02062: wake_all_kswapd(order,zonelist,high_zoneidx);
02063:
02065: * OK, we're below the kswapd watermark and have kicked background
02066: * reclaim. Now things get more complex, so set up alloc_flags according
02067: * to how we want to proceed.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
02068: */
02069: alloc_flags=gfp_to_alloc_flags(gfp_mask);
02070:
02071: /* This is the last chance, in general, before the goto nopage. */
02072: page =get_page_from_freelist(gfp_mask,nodemask,
02073: order,zonelist,high_zoneidx,alloc_flags&
~ALLOC_NO_WATERMARKS,
02074: preferred_zone,migratetype);
02075: if(page)
02076: goto-got_pg;
02077:
There are still some necessary checks to be done before trying to allocate memory (2044~2058OK). likegfp_maskmiddle,
Not allowed to be called unless specifiedkswapdThe kernel thread reclaims memory and wakes upkswapdThreads reclaim memory in the background. at this time,
Still try to call onceget_page_from_freelist() to quickly allocate memory. If the memory is still not allocated,
02078:rebalance :
02079: /* Allocate without watermarks if the context allows */
02080: if(alloc_flags&ALLOC_NO_WATERMARKS) {
02081: page =__alloc_pages_high_priority(gfp_mask,order,
02082: zonelist,high_zoneidx,nodemask,
02083: preferred_zone,migratetype);
02084: if(page)
02085: goto-got_pg;
02086: }
02087:
02088: / * Atomic allocations - we can't balance anything */
02089: if(!wait)
02090: goto-nopage;
02091:
02092: /* Avoid recursion of direct reclaim */
02093: if(p->flags &PF_MEMALLOC)
02094: goto-nopage;
02095:
02096: / * Avoid allocations with no watermarks from looping endlessly */
02097: if(test_thread_flag(TIF_MEMDIE)&& !(gfp_mask&
__GFP_NOFAIL))
02098: goto-nopage;
02099:
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
ALLOC_NO_WATERMARKS.
If you have tried many times before and still failed, it means that it is difficult to apply for memory, just like raising money is becoming more and more difficult.
02100: /*
02101: * Try direct compaction. The first pass is asynchronous. Subsequent
02102: * attempts after direct reclaim are synchronous
02103: */
02104: page =__alloc_pages_direct_compact(gfp_mask,order,
02105: zonelist,high_zoneidx,
02106: nodemask,
02107: alloc_flags,preferred_zone,
02108: migratetype,&did_some_progress,
02109: sync_migration);
02110: if(page)
02111: goto-got_pg;
02112: sync_migration=true;
02113:
It's not difficult yet, let's call __alloc_pages_direct_compact() function to allocate within
live(2104~2109line), this function is executed asynchronously, and its main job is to block small pages in the free page linked list
Merge into a large page table (such as twoorderfor2of page blocks merged into1indivualorderfor3block of pages), reallocate pages
piece.
If merging small page blocks fails, the only option is to try harder and allocate memory synchronously.
() Recycle some rarely used andpage cachepages in to free up more space in physical memory.
If the kernel still fails to allocate successfully after performing the above recycling and reallocation process, that is,
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
did_some_progressfor0, then at this time the kernel does not consider whether it has occurredOOM (Out of Memory). like
Check if setoom_killer_disabled, if the process is not allowed to be killed, jump directly tonopage. Otherwise
() Kill the process that applies for a lot of memory (may kill multiple processes). and jump torestartplace, restart the
storage allocation.
02123: /*
02124: * If we failed to make any progress reclaiming, then we are
02125: * running out of options and have to consider going OOM
02126: */
02127: if(!did_some_progress) {
02128: if((gfp_mask&__GFP_FS)&& !(gfp_mask&
__GFP_NORETRY)) {
02129: if(oom_killer_disabled)
02130: goto-nopage;
02131: page =__alloc_pages_may_oom(gfp_mask,order,
02132: zonelist,high_zoneidx,
02133: nodemask,preferred_zone,
02134: migratetype);
02135: if(page)
02136: goto-got_pg;
02137:
02144: if(order>PAGE_ALLOC_COSTLY_ORDER&&
02145: !(gfp_mask&__GFP_NOFAIL))
02146: goto-nopage;
02147:
02148: goto-restart;
02149: }? end if (gfp_mask& GFP_FS)&&... ?
02150: }? end if ! did_some_progress ?
02151:
At this time, it is judged again whether to make a memory application again. If necessary, wait for the write operation to complete
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
There are two situations at the end of the page block allocation function. In the first case, the allocation fails and the required page blocks are not obtained.
So print some information about memory allocation failure and print the current stack information (2161~2181OK).
Another situation is that the page block is allocated successfully, then the first page of the page block is returned directly.pagestructure(2183~2186
OK).
02160:nopage :
02161: if(!(gfp_mask&__GFP_NOWARN)&&printk_ratelimit()) {
02162: unsigned intfilter=SHOW_MEM_FILTER_NODES;
02163:
02164: /*
02165: * This documents exceptions given to allocations in certain
02166: * contexts that are allowed to allocate outside current's set
02167: * of allowed nodes.
02168: */
02169: if(!(gfp_mask&__GFP_NOMEMALLOC))
02170: if(test_thread_flag(TIF_MEMDIE)| |
02171: (current->flags &(PF_MEMALLOC|PF_EXITING)))
02172: filter&= ~SHOW_MEM_FILTER_NODES;
02173: if(in_interrupt()| | !wait)
02174: filter&= ~SHOW_MEM_FILTER_NODES;
02175:
02176: pr_warning("%s: page allocation failure. order:%d,
mode:0x%x\n",
02177: p->comm,order,gfp_mask);
02178: dump_stack();
02179: if(!should_suppress_show_mem())
02180: show_mem(filter);
02181: }
02182: returnpage;
02183:got_pg :
02184: if(kmemcheck_enabled)
02185: kmemcheck_pagealloc_alloc(page,order,gfp_mask);
02186: returnpage;
02187:
02188:}? end alloc_pages_slowpath?
02189:
5.5.1 __alloc_pages_direct_compact()
__alloc_pages_direct_compactThe main job of the () function is to merge small page blocks into large page blocks and then allocate pages
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
There are two main functionstry_to_compact_pages() merge small page blocks (1846~1847OK), then
call againget_page_from_freelist() allocates page blocks (1855~1858OK). We will not analyze the phase in detail here.
Function implementation.
01830:#ifdefCONFIG_COMPACTION
01831:/*Try memory compaction for high-order allocations before reclaim */
01832:staticstructpage *
01833:__alloc_pages_direct_compact(gfp_tgfp_mask ,
unsigned intorder ,
01834: structzonelist*zonelist ,enumzone_typehigh_zoneidx,
01835: nodemask_t *nodemask ,intalloc_flags ,structzone
*preferred_zone,
01836: intmigratetype ,unsigned long *did_some_progress ,
01837: boolsync_migration )
01838:{
01839: structpage *page;
01840: structtask_struct *p=current;
01841:
01842: if(!order| |compaction_deferred(preferred_zone))
01843: returnNULL;
01844:
01845: p->flags | =PF_MEMALLOC;
01846: *did_some_progress=try_to_compact_pages(zonelist,
order,gfp_mask,
01847: nodemask,sync_migration);
01848: p- >flags &= ~PF_MEMALLOC;
01849: if(*did_some_progress! =COMPACT_SKIPPED) {
01850:
01851: / * Page migration frees to the PCP lists but we want merging */
01852: drain_pages(get_cpu());
01853: put_cpu();
01854:
01855: page =get_page_from_freelist(gfp_mask,nodemask,
01856: order,zonelist,high_zoneidx,
01857: alloc_flags,preferred_zone,
01858: migratetype);
01859: if(page) {
01860: preferred_zone->compact_considered =0;
01861: preferred_zone->compact_defer_shift =0;
01862: count_vm_event(COMPACTSUCCESS);
01863: returnpage;
01864: }
01865:
01866: /*
01867: * It's bad if compaction run occurs and fails.
01868: * The most likely reason is that pages exist,
01869: * but not enough to satisfy watermarks.
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
01870: */
01871: count_vm_event(COMPACTFAIL);
01872: defer_compaction(preferred_zone);
01873:
01874: cond_resched();
01875: }? end if *did_some_progress! =C... ?
01876:
01877: returnNULL;
01878:}? end alloc_pages_direct_compact?
5.5.2 __alloc_pages_direct_reclaim()
page, freeing up some memory (1912OK). Then, the kernel will call againget_page_from_freelist()try
Allocate memory(1927~1930OK).
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
01917:
01918: cond_resched();
01919:
01920: if(order! =0)
01921: drain_all_pages();
01922:
01923: if(unlikely(!(*did_some_progress)))
01924: returnNULL;
01925:
01926:retry :
01927: page =get_page_from_freelist(gfp_mask,nodemask,
order,
01928: zonelist,high_zoneidx,
01929: alloc_flags,preferred_zone,
01930: migratetype);
01931:
01932: /*
01933: * If an allocation failed after direct reclaim, it could be because
01934: * pages are pinned on the per- cpu lists. Drain them and try again
01935: */
01936: if(!page && !drained) {
01937: drain_all_pages();
01938: drained=true;
01939: goto-retry;
01940: }
01941:
01942: returnpage;
01943:}? end alloc_pages_direct_reclaim?
In the previous memory page allocation code analysis process, we must usegfp_maskparameter, this parameter is used throughout the
A concept of virtual memory.GFP(Get Free Page) flags, which determine the memory allocator andkswapd
The behavior when allocating and recycling pages, that is, specifying which areas to apply for memory, what kind of memory to apply for, and what to use it for.
Purpose etc. For example, an interrupt handler cannot go to sleep, then it cannot be set to __GRP_WAITsign, because
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
*/
00045:#define__GFP_NOWARN ((__forcegfp_t)0x200u)
00021:#define__GFP_DMA((__forcegfp_t)0x01u)
00022:#define__GFP_HIGHMEM((__forcegfp_t)0x02u)
00023:#define__GFP_DMA32 ((__forcegfp_t)0x04u)
00024:#define__GFP_MOVABLE((__forcegfp_t)0x08u) /* Page is movable */
00025:#defineGFP_ZONEMASK(__GFP_DMA|__GFP_HIGHMEM|
__GFP_DMA32|__GFP_MOVABLE)
00062:#define__GFP_NO_KSWAPD((__forcegfp_t)0x400000u)
00063:#define__GFP_OTHER_NODE((__forcegfp_t)0x800000u)
00064:
00069:#define__GFP_NOTRACK_FALSE_POSITIVE(__GFP_NOTRACK)
00070:
00071:#define__GFP_BITS_SHIFTtwenty three /* Room for 23 GFP_FOO bits */
00072:#define__GFP_BITS_MASK((__forcegfp_t)((1<<
__GFP_BITS_SHIFT)-1))
surface2BasicGFPLogo meaning
Memory requests can be interrupted. That is, other processes can be scheduled to run at this time, or they can be
__GFP_WAIT
interrupted by more important events; the process can also be blocked.
This request is very important, i.e. the kernel urgently needs memory. This flag is usually used if memory
__GFP_HIGH application failure has a serious impact on the kernel or even crashes. Memory application is not allowed and
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
Allow execution of file systemI/Ooperation, this flag avoids theVFSLayer application, because it
__GFP_FS
may cause infinite loop recursive calls
__GFP_REPEAT After the allocation fails, it will try to allocate several times, and if it fails again, it will stop.
__GFP_NOFAIL Allocation fails and tries again and again until success
only inNUMAValid in , limits the memory allocation of the process on the node. The process is
__GFP_HARDWALL bound to the specifiedCPUrun on. If the process runs on allCPUThis flag has no effect when
__GFP_THISNODE only inNUMAValid in , memory can only be used on the current node or the specified node.
In addition to the basicGFPIn addition to flags, the kernel also defines some combination flags.
00073:
00074:/*This equals 0, but use constants in case they ever change */
00075:#defineGFP_NOWAIT(GFP_ATOMIC& ~__GFP_HIGH)
00076:/*GFP_ATOMIC means both ! wait ( GFP_WAIT not set) and use emergency pool */
00077:#defineGFP_ATOMIC(__GFP_HIGH)
00078:#defineGFP_NOIO (__GFP_WAIT)
00079:#defineGFP_NOFS (__GFP_WAIT|__GFP_IO)
00080:#defineGFP_KERNEL(__GFP_WAIT|__GFP_IO|__GFP_FS)
00081:#defineGFP_TEMPORARY(__GFP_WAIT|__GFP_IO|__GFP_FS| \
00082: __GFP_RECLAIMABLE)
00083:#defineGFP_USER (__GFP_WAIT|__GFP_IO|__GFP_FS|
__GFP_HARDWALL)
00084:#defineGFP_HIGHUSER (__GFP_WAIT|__GFP_IO|__GFP_FS|
__GFP_HARDWALL| \
00085: __GFP_HIGHMEM)
00086:#defineGFP_HIGHUSER_MOVABLE(__GFP_WAIT|__GFP_IO|
__GFP_FS| \
00087: __GFP_HARDWALL|__GFP_HIGHMEM| \
00088: __GFP_MOVABLE)
00089:#defineGFP_IOFS (__GFP_IO|__GFP_FS)
00090:#defineGFP_TRANSHUGE(GFP_HIGHUSER_MOVABLE|
__GFP_COMP| \
00091: __GFP_NOMEMALLOC|__GFP_NORETRY|
__GFP_NOWARN| \
00092: __GFP_NO_KSWAPD)
00093:
http://www.ilinuxkernel.com
LinuxPhysical memory page allocation
00094:#ifdefCONFIG_NUMA
00095:#defineGFP_THISNODE (__GFP_THISNODE|__GFP_NOWARN|
__GFP_NORETRY)
00096:#else
00097:#defineGFP_THISNODE ((__forcegfp_t)0)
00098:#endif
00099:
surface3combinationGFPLogo meaning
Used for atomic memory allocation operations, that is, no interruption is allowed under any circumstances, and "emergency
GFP_ATOMIC
reserved" memory may be used
GFP_KERNEL Used for kernel space memory application, this flag is commonly used in kernel code
GFP_HIGHUSER rightGFP_USERextension, indicating that the high-end memory area can be used
http://www.ilinuxkernel.com