Unit-4
Memory Management
Shyam B. Verma (Assistant
Professor)
Objectives
To provide a detailed description of various
ways of organizing memory hardware
To discuss various memory-management
techniques, including paging and
segmentation
To provide a detailed description of the Intel
Pentium, which supports both pure
segmentation and segmentation with paging
Background
Memory is the important part of the computer that is
used to store the data. Its management is critical to
the computer system because the amount of main
memory available in a computer system is limited. At
any time, many processes are competing for it.
Moreover, to increase performance, several processes
are kept in memory simultaneously. So it is even more
important to manage them effectively.
Main memory and registers are only storage CPU can
access directly. Register access in one CPU clock (or
less) but main memory can take many cycles, causing
a stall.
Cache sits between main memory and CPU registers.
Protection of memory required to ensure correct
operation.
Role of Memory management
Memory manager is used to keep track of the status
of memory locations, whether it is free or allocated.
Memory manager permits computers with a small
amount of main memory to execute programs larger
than the size or amount of available memory. It does
this by moving information back and forth between
primary memory and secondary memory by using
the concept of swapping.
The memory manager is responsible for protecting
the memory allocated to each process from being
corrupted by another process. If this is not ensured,
then the system may exhibit unpredictable behavior.
Memory managers should enable sharing of memory
space between processes.
Basic Bare Machine
It is a logic hardware in the computer system which
can execute the programs in the processor without
using the O/S.
In the early days, before the O/S were developed,
the instructions were executed directly on the
hardware without any interfering software. But the
only drawback was that the Bare Machine accepts
the program and instructions in Machine
Language.
Due to this, only the trained people who were able
to understand and instruct the computer in Machine
language were able to operate on such computers.
Due to this reason, the Bare Machine was termed as
inefficient and cumbersome after the development
of different Operating Systems.
Resident Monitor
The Resident Monitor is a code which runs
on Bare Machine. Its acts like an operating
system which controls everything inside a
processor and performs all the functions.
The Resident Monitor is thus also known as the
Job Sequencer because like the O/S, it also
sequences the jobs and sends it to the processor
for execution.
After the jobs are scheduled, the Resident Monitor
loads the Programs one by one into the main
memory according to their sequence. The
advantage of using a Resident Monitor over an
Operating System is that there is no gap or lag
between the program executions. So, the
processing is faster in the Resident Monitors.
Parts of Resident Monitor
Control Language Interpreter
The job of the Control Language Interpreter is to read and
carry out the instructions line by line to the next level.
Loader
The Loader is the main part of the Resident Monitor. As
the name suggests, it Loads all the required system and
application programs into the main memory.
Device Driver
The Device Driver Takes care of all the Input-Output
devices connected with the system. So, all the
communication that takes place between the user and
the system is handled by the Device Driver. It simply acts
as an intermediate between the requests and the
response, requests that are made by the user to the
system, and they respond that the system produces to
fulfill these requests.
Contiguous Memory Management
Schemes
Each program occupies a single contiguous block
of storage locations, i.e., a set of memory locations
with consecutive addresses.
Single contiguous memory management
schemes:
It is the simplest memory management scheme
used in the earliest systems. Main memory is
divided into two contiguous areas or partitions.
The O/S reside permanently in lower portion and
the user process is loaded into the other partition.
Advantages:
Simple to implement. Easy to manage and design.
Once a process is loaded, it is given full processor's
time, and no other processor will interrupt it.
Disadvantagesof Single contiguous
memory management schemes:
Wastage of memory space due to unused
memory as the process is unlikely to use all the
available memory space.
The CPU remains idle, waiting for the disk to load
the binary image into the main memory.
It can not be executed if the program is too large
to fit the entire available main memory space.
It does not support multiprogramming, i.e., it
cannot handle multiple programs simultaneously.
Multiple Partitioning
The problem of inefficient CPU use can be overcome
using multiprogramming that allows more than one
program to run concurrently.
To switch between two processes, the operating
systems need to load both processes into the main
memory.
The operating system needs to divide the available
main memory into multiple parts to load multiple
processes into the main memory. Thus multiple
processes can reside in the main memory
simultaneously.
The multiple partitioning schemes can be of
two types:
Fixed Partitioning
Dynamic Partitioning
Fixed-Sized Partitioning
Memory is divided into several fixed-sized partitions.
Each partition may contain exactly one process.
So, the degree of multiprogramming is bounded by the
no. of partitions.
The O/S keeps a table indicating free and occupied
partitions.
When a process arrives it search for a partition large
enough to accommodate it.
To search for a partition we can use three techniques:
First fit
Best fit
Worst fit
Disadvantages:
Internal fragmentation: Free space available inside a partition.
How to satisfy a request of size n from a list of free holes?
First-fit: Search from the beginning,
allocate the first block that is big enough
Best-fit: Search every free block and
allocate the smallest block that is big
enough
Worst-fit: Allocate the largest block; must
also search entire list
Example: Allocate the following processes:
P1=357 KB, P2=210 KB, P3=468 KB,
P4=491
200 KB
400 KB 600 KB 500 KB 300 KB 250
KB KB
First-fit and best-fit better than worst-fit in terms of speed and storage
utilization
First-fit : P4 will not get space
200 KB 400 KB 600 KB 500 KB 300 KB 250 KB
P1 P2 P3
Best-fit: Internal fragmentation=43+40+32+109
200 KB 400 KB 600 KB 500 KB 300 KB 250 KB
P1 P4 P3 P2
Worst-fit: P3 and P4 will not get space
200 KB 400 KB 600 KB 500 KB 300 KB 250 KB
P1 P2
Variable-Sized Partitioning
Initially, all memory is available as one block.
When a process arrives we search for a block
and if found allocate only as much memory as
needed.
When a process terminates, it releases its
memory.
Disadvantage: External fragmentation
Example: Allocate following processes in
memory:
50 KB
P1-300KB,
150 KB
P2-25KB,
300 KB (Not
P3-125KB,
350 KB 600 KB
P4-
50KB.
(Not An instanceavailable)
(Available) of memory(Available)
is given below:
(Not
available available)
)
First-fit
50KB P2-25KB and 300KB P1-300KB 600KB
P3-125 KB and P4-
50KB
Best-fit: P4 will not get space
50KB P3-125 KB 300KB P1-300KB 600KB
and P2-
25KB
Worst-fit
50KB P2-25KB and 300KB P1-300KB 600KB
P3-125 KB and P4-
50KB
Fragmentation
External Fragmentation – total memory space
exists to satisfy a request, but it is not contiguous
Internal Fragmentation – allocated memory
may be slightly larger than requested memory;
this size difference is memory internal to a
partition, but not being used
Depending on the total memory and the average
process size, external fragmentation problem may
be minor or major. Statistical analysis of First fit
reveals that given N blocks allocated, another 0.5
N blocks will be lost due to fragmentation
1/3 memory may be unusable. This property is called
50-percent rule
Fragmentation (Cont.)
Reduce external fragmentation by
compaction
Shuffle memory contents to place all free
memory together in one large block
Compaction is possible only if relocation is
dynamic, and is done at execution time
Another method is to move all processes
towards one end.
Base and Limit Registers
A pair of base and limit registers define the
logical address space
CPU must check every memory access
generated in user mode to be sure it is
between base and limit for that user
Hardware Address Protection
Address Binding
A programs resides on a disk as a binary executable file, it
must be loaded into memory and placed within a process for
it to be executed. Depending on memory management
scheme, the process may be moved between disk and
memory during its execution.
Addresses are represented in different ways at different
stages of a program’s life cycle:
Source code addresses usually symbolic
The compiler will bind these addresses to relocatable
addresses.
i.e. “14 bytes from beginning of this module”
Linker or loader will bind relocatable addresses to
absolute addresses
i.e. 74014
Bindings are done at:
Compile time
Load time
Execution time
Logical vs. Physical Address
Space
Logical address – address generated by the CPU;
also referred to as virtual address
Physical address – address that are loaded into
the MAR or seen by the memory unit
Logical and physical addresses are the same in
compile-time and load-time address-binding
schemes; logical (virtual) and physical addresses
differ in execution-time address-binding scheme
Logical addresses generated by CPU are mapped
with the physical address. So, before execution
physical addresses are generated.
Logical address space is the set of all logical
addresses generated by a program
Physical address space is the set of all physical
addresses generated by a program
Memory-Management Unit
(MMU)
Hardware device that at run time maps virtual
to physical address
Many methods possible
To start, consider simple scheme where the
value in the relocation register is added to
every address generated by a user process at
the time it is sent to memory
Base register now called relocation register
MS-DOS on Intel 80x86 used 4 relocation
registers
The user program deals with logical addresses;
it never sees the real physical addresses
Execution-time binding occurs when reference is
made to location in memory
Hardware Support for Relocation and Limit Registers
Dynamic Linking
Static linking – system libraries and program
code are combined by the loader into the binary
program image
Dynamic linking –linking postponed until execution
time
A small piece of code, called stub, used to locate
the appropriate memory-resident library routine
Stub replaces itself with the address of the routine,
and executes the routine
Operating system checks if routine is in processes’
memory address
If not in address space, add to address space
Dynamic linking is particularly useful for libraries
System also known as shared libraries
Swapping
A process can be swapped temporarily out
of memory to a backing store, and then
brought back into memory for continued
execution. This is done by MTS. This
swapping of processes is done when:
A high priority process arrives and no
space in the ready queue.
The time quantum of a process expires
and no space in ready queue.
The waiting queue is full.
Backing store – fast disk large enough to
accommodate copies of all memory images
for all users
Roll out, roll in –lower-priority process is
swapped out so higher-priority process can
be loaded and executed
System maintains a ready queue of ready-to-run
processes which have memory images on disk
Whenever the scheduler wants to execute a
process, it calls the dispatcher.
The dispatcher checks if the process is present in
ready queue or not.
If not and no space available in memory, then
the dispatcher swap-out a process from memory
and swap-in the desired process and transfer the
control to the selected process.
Swap time is time taken to transfer the data
from memory to backing store or vice-versa.
For effective utilization of CPU, the execution time
must be greater than the total swap time.
Swapping (Cont.)
Does the swapped out process need to
swap back in to the same physical
addresses?
It depends on address binding methods.
If the binding is done at load-time then
the swapped out process will be kept in
the same memory space previously
allocated.
If the binding is done at execution time,
then a process can be swapped into
different memory locations.
Schematic View of Swapping
Process size=1 MB, Transfer rate=5 MB/sec, latency time=8
ms.
Swap time=process size/Transfer rate + Latency time=208
ms
Total swap time=swap-in time + swap-out time
User’s View of a Program
A program is a collection of
segments. A segment is a
logical unit such as:
• main program
• procedure
• Function
• Method
• Object
• local variables, global
variables
• common block
• Stack
• symbol table
• arrays
Logical View of Segmentation
4
1
3 2
4
user space physical memory space
Segmentation
Segmentation divides processes into smaller
subparts known as modules. The divided segments
need not be placed in contiguous memory. It
supports user view of memory. User prefer to view
memory as a collection of variable sized segment.
Since there is no contiguous memory allocation,
internal fragmentation does not take place. The
length of the segments of the program and memory
is decided by the purpose of the segment in the user
program.
The user’s view is mapped onto physical memory. A
logical address space is collection of segments and
each segment has a number and a length.
The address specifies both the segment number and
the offset within a segment.
So, a logical address consists of <Segment-
number, Offset>
Segmentation Hardware
Segmentation Architecture
Logical address consists of a two tuple:
<segment-number, offset>
Segment table – stores the base address of a
segment and the limit.
base – contains the starting physical address
where the segments reside in memory
limit – specifies the length of the segment
Segment-table base register (STBR) points
to the segment table’s location in memory
Segment-table length register (STLR)
indicates number of segments used by a
program;
segment number s is legal if s <
STLR
Segmentation Architecture
(Cont.)
Protection & Sharing
With each entry in segment table associate:
validation bit = 0 illegal segment
read/write/execute privileges
Two different processes may share the same
segment. Protection bits associated with
segments; code sharing occurs at segment level.
For all processes, a shared segment will have the
same segment number and only one copy in
physical memory.
Segments local to a process are not sharable.
Since segments vary in length, memory
allocation is a dynamic storage-allocation
problem
Disadvantages of
Segmentation
As we know that segments are of variable length, it
may cause external fragmentation.
In this case process may wait until a large block of
memory become available or compaction create a
large block.
Because segmentation is dynamic by nature, we can
compact memory whenever we need it.
The problem of fragmentation depends on the
average segment size:
If the average segment size is small, external
fragmentation will also be small.
Solution?
Fixed-sized small segment: Paging
Paging
Paging is Memory management scheme that permits the
physical-address space of a process to be non-contiguous.
Memory is broken into fixed-sized blocks called frames.
Size is power of 2, between 512 bytes and 16 Mbytes
Divide logical memory into blocks of same size called pages
Process is allocated physical memory whenever the frames
are available
Avoids external fragmentation
Avoids problem of varying sized memory chunks
Keep track of all free frames
To run a program of size N pages, need to find N free frames
and load program
Set up a page table to translate logical to physical
addresses
Still have Internal fragmentation
Paging Hardware
Address Translation Scheme
Address generated by CPU is divided into:
Page number (p) – used as an index into a
page table which contains base address of
each page in physical memory
Page offset (d) – combined with base address
to define the physical memory address that is
sent to the memory unit
Page size is equal to frame size.
For given logical address space 2m and page
size 2n addressing unit (bytes or words). Then
page number page offset
no. of bits for page no.=m-n (higher bits) and
p d
page offset=n (lower bits)
m -n n
Example
Page size=4 bytes, physical memory=32 bytes,
logical address space 24
The CPU will generate a logical address of 4 bits. 2
bits to locate a page and 2 bits to locate a words
within a page.
No. of frames=32/4 = 8 (3 bits are required to
represent frames)
Paging Example
n=2 and m=4 32-byte memory and 4-byte pages
Paging (Cont.)
Calculating internal fragmentation
Page size = 2,048 bytes
Process size = 72,766 bytes
35 pages + 1,086 bytes
Internal fragmentation of 2,048 - 1,086 = 962 bytes
Worst case fragmentation = 1 frame – 1 byte
On average fragmentation = 1/2 frame size
So small frame sizes desirable, but involves overhead in
page table entry.
But each page table entry takes memory to track
Page sizes growing over time
Solaris supports two page sizes – 8 KB and 4 MB
Process view and physical memory now very different
By implementation process can only access its own
memory
Free Frames
Before allocation After allocation
Implementation of Page Table
Each O/S has its own method for storing page table. Most
allocate a page table for each process. A pointer to the
page table is stored in PCB of a process. Uses two
registers.
Page table is kept in main memory and a Page-table
base register (PTBR) points to the page table. Changing
page table during context switch requires only changing
PTBR.
Page-table length register (PTLR) indicates size of the
page table
In this scheme every data/instruction access requires two memory
accesses. One for the page table and one for the data / instruction.
The two memory access problem can be solved by the
use of a special fast-lookup hardware cache called
associative memory or translation look-aside buffers
(TLBs). When an item is searched all keys are compared
simultaneously. Hardware is expensive but fast.
Implementation of Page Table
(Cont.)
The TLB contains entries of pages in main-memory.
TLB store the <page-no. , frame no. > entry.
When a logical address is generated, its page no. is
searched in TLB. In case of TLB hit, corresponding
frame no. loaded and in case of a miss, search the
page no. in the page table, make an entry in TLB
also.
If the TLB is full the O/S will replace an entry
(depending on the policy) with new one.
Some TLBs store address-space identifiers
(ASIDs) in each TLB entry – uniquely identifies each
process to provide address-space protection for that
process
Otherwise need to flush at every context switch
TLBs typically small (64 to 1,024 entries)
Paging Hardware With TLB
Associative Memory
Associative memory – parallel search
P a ge # F ra m e #
Address translation (p, d)
If p is in associative register, get frame # out
Otherwise get frame # from page table in
memory
Effective Access Time
Associative Lookup = time unit
Can be < 10% of memory access time
Hit ratio =
Hit ratio – percentage of times that a page number is
found in the TLB; ratio related to number of associative
registers
Consider = 80%, = 20ns for TLB search, 100ns
for memory access time (MAT)
Effective Access Time (EAT)
EAT = (1*MAT + ) + (2*MAT + )(1 –
)
Consider = 80%, = 20ns for TLB search, 100ns
for memory access
EAT = 0.80 x (100 + 20) + 0.20 x (2*100 + 20) =
140ns
Consider more realistic hit ratio ->
= 98%, =
20ns for TLB search, 100ns for memory access
EAT = 0.98 x (100 + 20) + 0.02 x (200 + 20) = 101ns
Memory Protection
Memory protection implemented by associating
protection bit with each frame to indicate if
read-only or read-write access is allowed
Can also add more bits to indicate page execute-
only, and so on
Valid-invalid bit attached to each entry in the
page table:
“valid” indicates that the associated page is in
the process’ logical address space, and is thus a
legal page
“invalid” indicates that the page is not in the
process’ logical address space
Or use page-table length register (PTLR)
Any violations result in a trap to the kernel
Valid (v) or Invalid (i) Bit In A Page
Table
Shared Pages
Shared code
One copy of read-only (reentrant) code
shared among processes (i.e., text editors,
compilers, window systems)
Similar to multiple threads sharing the same
process space
Also useful for interprocess communication if
sharing of read-write pages is allowed
Private code and data
Each process keeps a separate copy of the
code and data
The pages for the private code and data can
appear anywhere in the logical address space
Shared Pages Example
Structure of the Page Table
Structuring of the page table is important for
efficient use of paging.
Consider a 32-bit logical address space as on
modern computers
Page size of 4 KB (212)
No. of pages= 220
Page table would have 1 million entries (2 32 / 212)
4 MB of physical address space. No. of frames= 2 10
If each entry in page table is of size 4 bytes then,
memory for page table alone is of size 4MB. This
page table will also be divided into pages and stored
in memory. There are different ways to store page
table in memory:
Hierarchical Paging
Hashed Page Tables
Inverted Page Tables
Hierarchical Page Tables
Break up the logical address space into
multiple page tables
A simple technique is a two-level page table
In this method the page table itself is also
paged. As per the previous example, the size of
page table=4MB
P1=10 bits P2=10 bits D=12 bits
No. of pages required to store page
table=4MB/4KB
CPU (32 P (20 D (12
bits)
=210 bits) bits)
Logical
Address
P1 (10 P2 (10
bits) bits)
Two-Level Page-Table Scheme
Two-Level Paging Example
A logical address (on 32-bit machine with 1K page size)
is divided into:
a page number consisting of 22 bits
a page offset consisting of 10 bits
Since the page table is paged, the page number is
further divided into:
a 12-bit page number
a 10-bit page offset
Thus, a logical address is as follows:
where p is an index into the outer page table, and p is
1 2
the displacement within the page of the inner page table
Known as forward-mapped page table
Address-Translation Scheme
64-bit Logical Address Space
Even two-level paging scheme not sufficient
If page size is 4 KB (212)
Then page table has 252 entries
If two level scheme, inner page tables could be 210 4-
byte entries
Address would look like
Outer page table has 242 entries or 244 bytes
One solution is to add a 2nd outer page table
But in the following example the 2nd outer page table is
still 234 bytes in size
And possibly 4 memory access to get to one physical memory
location
Three-level Paging Scheme
Hashed Page Tables
If the system uses address spaces > 32 bits
The virtual page number is hashed into a
page table
This page table contains a chain of elements
hashing to the same location
Each element contains:
the virtual page number
the value of the mapped page frame
a pointer to the next element
Virtual page numbers are compared in this
chain searching for a match
If a match is found, the corresponding physical
frame is extracted
Hashed Page Table
Inverted Page Table
Rather than each process having a page table
and keeping track of all possible logical pages.
In the page table there is one entry for each
frame i.e. the size of the page table contains
entry for each frame.
The logical address consists of <process-id,
page-no., offset>
On search it return i (address of ith frame)
It decreases memory needed to store each page
table, but increases time needed to search the
table when a page reference occurs
Use hash table to limit the search to one — or at
most a few — page-table entries
TLB can accelerate access
Inverted Page Table Architecture
Chapter 9: Virtual Memory
Chapter 9: Virtual Memory
Background
Demand Paging
Copy-on-Write
Page Replacement
Allocation of Frames
Thrashing
Memory-Mapped Files
Allocating Kernel Memory
Other Considerations
Operating-System Examples
Objectives
To describe the benefits of a virtual memory
system
To explain the concepts of demand paging,
page-replacement algorithms, and allocation
of page frames
To discuss the principle of the working-set
model
To examine the relationship between shared
memory and memory-mapped files
To explore how kernel memory is managed
Background
Code needs to be in memory to execute, but
entire program rarely used
Error code, unusual routines, large data structures
Entire program code not needed at same time
Consider ability to execute partially-loaded
program
Program no longer constrained by limits of
physical memory
Each program takes less memory while running ->
more programs run at the same time
Increased CPU utilization and throughput with no increase
in response time or turnaround time
Less I/O needed to load or swap programs into
memory -> each user program runs faster
Background (Cont.)
Virtual memory is a technique that allow the
execution of processes that may not be
completely in memory.
Virtual memory – separation of user logical
memory from physical memory
Only part of the program needs to be in memory
for execution
Logical address space can therefore be much
larger than physical address space
Allows address spaces to be shared by several
processes
Allows for more efficient process creation
More programs running concurrently
Less I/O needed to load or swap processes
Background (Cont.)
Virtual address space – logical view of
how process is stored in memory
Usually start at address 0, contiguous
addresses until end of space
Meanwhile, physical memory organized in
page frames
MMU must map logical address to physical
address
Virtual memory can be implemented via:
Demand paging
Demand segmentation
Virtual Memory That is Larger Than Physical
Memory
Demand Paging
Similar to paging system with swapping
process.
We could bring entire process into memory
at load time Or bring a page into
memory only when it is needed.
Lazy swapper – who never swaps a page
into memory unless that page will be
needed by process.
A Swapper that deals with pages is called
a pager
Less I/O needed, no unnecessary I/O
Page is needed reference to it
invalid reference abort
not-in-memory bring to memory
Basic Concepts
When a process is to be swapped in, the pager
guesses which pages will be used before the
process is swapped out again. The pager brings
in only those pages into memory.
How to determine those set of pages?
Need new MMU functionality to implement demand
paging
Valid-Invalid Bit
With each page table entry a valid–invalid bit is
associated
(v in-memory – memory resident, i not-
in-memory)
Initially valid–invalid bit is set to i on all entries
Example of a page table snapshot:
During MMU address translation, if valid–
invalid bit in page table entry is i page fault
Page Table When Some Pages Are Not in Main Memory
Page Fault
If a reference to a page is made, if it is a first
reference to that page, it will trap to the
operating system: a page fault occurs
1.The procedure for handling page fault will
look at the page reference made. If it is:
an invalid reference abort the process
not in memory
a. Find a free frame in physical memory
b. Swap page into frame via scheduled disk
operation
c. Reset tables to indicate page now in memory,
Set validation bit = v
2.Restart the instruction that caused the page
fault
Steps in Handling a Page Fault
Aspects of Demand Paging
In extreme case – start process with no pages in
memory
The O/S will bring the first page in memory and start
executing.
Page faults may occur until all pages needed by the process
is in memory.
This scheme is called Pure demand paging
Locality of reference
To reduce the page faults, execute process only when
required pages are loaded into memory.
To find required pages needed, a locality model used.
It states that when a process executes it moves from locality
to locality.
A locality is a set of pages that are actively used together.
Hardware support needed for demand paging
Page table with valid / invalid bit
Secondary memory (swap device with swap space)
Instruction restart
Instruction Restart
It is very important to restart the instruction
after the page fault.
The page fault can occur at any stage.
If it occur at instruction fetch then, restart the
instruction fetch after the page fault.
If it occur at operand fetch then, fetch and
decode the instruction again and then fetch
the operand again.
Paging
Stages in Demand Paging (worse case)
1. Trap to the operating system
2. Save the user registers and process state
3. Determine that the interrupt was a page fault
4. Check that the page reference was legal and determine the location of
the page on the disk
5. Issue a read from the disk to a free frame:
1. Wait in a queue for this device until the read request is serviced
2. Wait for the device seek and/or latency time
3. Begin the transfer of the page to a free frame
6. While waiting, allocate the CPU to some other user
7. Receive an interrupt from the disk I/O subsystem (I/O completed)
8. Save the registers and process state for the other user
9. Determine that the interrupt was from the disk
10.Correct the page table and other tables to show page is now in memory
11.Wait for the CPU to be allocated to this process again
12.Restore the user registers, process state, and new page table, and then
resume the interrupted instruction
Paging
Three major activities
Service the interrupt – careful coding means just
several hundred instructions needed
Read the page – lots of time
Restart the process – again just a small amount of
time
Page Fault Rate 0 p 1
if p = 0 no page faults
if p = 1, every reference is a fault
Effective Access Time (EAT)
EAT = (1 – p) x memory access
+ p (page fault overhead
+ swap page out
+ swap page in )
Demand Paging Example
Memory access time = 200 nanoseconds
Average page-fault service time = 8 milliseconds
EAT = (1 – p) * 200 + p (8 milliseconds)
= (1 – p) * 200 + p * 8,000,000
= 200 + p * 7,999,800
If one access out of 1,000 causes a page fault, then
EAT = 8.2 microseconds.
This is a slowdown by a factor of 40!!
If want performance degradation < 10 percent
220 > 200 + 7,999,800 x p
20 > 7,999,800 x p
p < .0000025
< one page fault in every 400,000 memory accesses
What Happens if There is no Free
Frame?
If all the space of physical memory is used up by
process pages.
In this case the O/S has several options:
It could terminate the user process.
The O/S could swap out a process by freeing all its
frames. It will reduce the degree of
multiprogramming.
Replace an old page by a new page. This process is
called Page Replacement.
Page replacement – find some page in memory, but
not really in use, page it out.
Performance – want an algorithm which will result in
minimum number of page faults
Same page may be brought into memory several
times
Page Replacement
Prevent over-allocation of memory by
modifying page-fault service routine to include
page replacement
Use modify (dirty) bit to reduce overhead of
page transfers – only modified pages are
written to disk
Page replacement completes separation
between logical memory and physical memory
– large virtual memory can be provided on a
smaller physical memory
Need For Page Replacement
Basic Page Replacement
1. Find the location of the desired page on disk
2. Find a free frame:
- If there is a free frame, use it
- If there is no free frame, use a page
replacement algorithm to select a victim frame
- Write victim frame to disk if dirty
3. Bring the desired page into the (newly) free
frame; update the page and frame tables
4. Continue the process by restarting the instruction
that caused the trap
Note now potentially 2 page transfers for page fault
– increasing EAT
Page Replacement
Page and Frame Replacement
Algorithms
Frame-allocation algorithm determines
How many frames to be given to each process
Which frames to replace
Page-replacement algorithm
Want lowest page-fault rate on both first access and re-access
Evaluate algorithm by running it on a particular string
of memory references (reference string) and
computing the number of page faults on that string
String is just page numbers, not full addresses
Repeated access to the same page does not cause a page fault
Results depend on number of frames available
In all our examples, the reference string of
referenced page numbers is
7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
Graph of Page Faults Versus The Number of
Frames
Algorithm
Reference string:
7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
3 frames (3 pages can be in memory at a time per
process)
15 page faults
Can vary by reference string: consider
1,2,3,4,1,2,5,1,2,3,4,5
Adding more frames can cause more page faults!
Belady’s Anomaly: the page fault rate may increase as the no.
of allocated frames increases.
How to track ages of pages?
Just use a FIFO queue
Anomaly
Algorithm
Replace page that will not be used for longest
period of time
This algorithm guarantees the lowest page fault rate for a
mixed no. of frames.
Never suffers from Belady’s anomaly.
There are 9 page-faults and this is optimal for the example
How do you know this which page is not required in
future?
Can’t read the future
Used for measuring and comparing with other
algorithms to show how well your algorithm
performs
Least Recently Used (LRU)
Algorithm
Use past knowledge rather than future
Replace page that has not been used in the most
amount of time
Associate time of last use with each page
12 faults – better than FIFO but worse than OPT
Generally good algorithm and frequently used
But how to implement?
LRU Algorithm (Cont.)
Counter implementation
Every page entry has a counter; every time page is
referenced through this entry, copy the clock into the
counter
When a page needs to be changed, look at the counters
to find smallest value
Search through table needed
Stack implementation
Keep a stack of page numbers in a double link form:
Page referenced:
move it to the top
requires 6 pointers to be changed
But each update more expensive
No search for replacement
LRU and OPT are cases of stack algorithms
that don’t have Belady’s Anomaly
Use Of A Stack to Record Most Recent Page References
Thrashing
If a process does not have “enough” pages,
the page-fault rate is very high
Page fault to get page
Replace existing frame
But quickly need replaced frame back
This leads to:
Low CPU utilization
Operating system thinking that it needs to increase
the degree of multiprogramming
Another process added to the system
Thrashing a process is busy swapping
pages in and out
Thrashing (Cont.)
Working-Set Model
It is based on assumption of locality.
working-set window. It examine the most
recent page references and put them into the
set.
If a page is in use, it will be in the working set. If it
is not in use, drop it from working set. time units
after its last use (reference). Thus, WS is an
approximation of program’s locality.
WSSi (working set of Process Pi) =
total number of pages referenced in the most recent (varies in
time)
if too small will not encompass entire locality
if too large will encompass several localities
if = will encompass entire program
Total demand for frames D = WSSi Approximation of
locality
if D > m Thrashing
Policy if D > m, then suspend or swap out one of
the processes
Set
Approximate with interval timer + a reference
bit
Example: = 10,000
Timer interrupts after every 5000 time units
Keep in memory 2 bits for each page
Whenever a timer interrupts copy and sets the
values of all reference bits to 0
If one of the bits in memory = 1 page in
working set
Why is this not completely accurate?
Improvement = 10 bits and interrupt every
1000 time units
Page-Fault Frequency
More direct approach than WSS
Establish “acceptable” page-fault frequency
(PFF) rate and use local replacement policy
If actual rate too low, process loses frame
If actual rate too high, process gains frame
Rates
Direct relationship between working set of a process
and its page-fault rate
Working set changes over time
Peaks and valleys over time
End of Chapter 9