0% found this document useful (0 votes)
29 views88 pages

IO Performance Patterns

Uploaded by

varun vikas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views88 pages

IO Performance Patterns

Uploaded by

varun vikas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

IO Performance Patterns

By Mohit Kumar
IO:File System:Interfaces
IO:File System:Cache
IO:File System:Cache
● The read returns data either from cache (cache hit) or from disk
(cache miss).
● Cache misses are stored in the cache,populating the cache
(warming it up)
● The file system cache may also buffer writes to be
written(flushed) later.
IO:File System:Second Level Cache
● Second-level cache may be any memory type;
IO:File System:Latency
● File system latency is the primary metric of file system
performance, measured as the time from a logical file
systemrequest to its completion.
● It includes time spent in the filesystem and disk I/O subsystem,
and waiting on disk devices—the physical I/O.
– Application threads often block during anapplication request
in order to wait for file system requests tocomplete—in this
way, file system latency directly and proportionally affects
application performance.
IO:File System:Latency
● Cases where applications may not be directly affected include
the use of non-blocking I/O, prefetch, and when I/O is issued
from an asynchronous thread (e.g., a background flush
thread).
● Operating systems have not historically made file system
latency readily observable, instead providing disk device-level
statistics.
● But there are many cases where such statistics are unrelated to
application performance, and where they are also misleading.
– An example of this is where file systems perform
background flushing of written data, which may appear as
bursts of high-latency disk I/O. From the disk device-level
statistics, this looks alarming; however, no application is
waiting on these to complete.(LOGs are great example of
these)
IO:File System:Sequential vs Random
● Due to the performance characteristics of certain
storagedevices (Disks), file systems have historically attempted
to reduce random I/O by placing file data on disk sequentially
and contiguously.
– File systems may measure logical I/O access patterns so
thatthey can identify sequential workloads, and then improve
their performance using prefetch or read-ahead.
IO:File System:Sequential vs Random
● Due to the performance characteristics of certain
storagedevices (Disks), file systems have historically attempted
to reduce random I/O by placing file data on disk sequentially
and contiguously.
– File systems may measure logical I/O access patterns so
thatthey can identify sequential workloads, and then improve
their performance using prefetch or read-ahead.
IO:File System:Sequential vs Random

DEMO
Disk:Just what are we
measuring?
IO:File System:Sequential vs Random
IO:File System:Sequential vs Random
IO:File System:Sequential:Prefetch
● Prefetch or Readahead can detect a sequential read workload based on
the current and previous file I/O offsets, and then predict and issue disk
reads before the application has requested them.
– This populates the file system cache, so that if the application does
perform the expected read, it results in a cache hit (the data needed
was already in the cache)
– Sequence
● An application issues a file read(2), passing execution to the kernel.

● The data is not cached, so the file system issues the read to disk.

● The previous file offset pointer is compared to the current location,

and if they are sequential, the file system issues additional reads
(prefetch).
● The first read completes, and the kernel passes the data and

execution back tothe application.


● Any prefetch reads complete, populating the cache for future

applicationreads.
● Future sequential application reads complete quickly via the cache

in RAM.
IO:File System:Sequential:Writeback
● Write-back caching is commonly used by file systems toimprove
write performance. It works by treating writes as completed
after the transfer to main memory, and writing them to disk
sometime later, asynchronously.
● The file systemprocess for writing this “dirty” data to disk is
called flushing.
● Sequence
– An application issues a file write(2), passing execution to the
kernel.
– Data from the application address space is copied to the
kernel.
– The kernel treats the write(2) syscall as completed, passing
execution back tothe application.
– Sometime later, an asynchronous kernel task finds the
written data and issuesdisk writes.
IO:File System:Sequential vs Random:biopattern
● Write-back caching is commonly used by file systems toimprove
write performance. It works by treating writes as completed
after the transfer to main memory, and writing them to disk
sometime later, asynchronously.
● The file systemprocess for writing this “dirty” data to disk is
called flushing.
● Sequence
– An application issues a file write(2), passing execution to the
kernel.
– Data from the application address space is copied to the
kernel.
– The kernel treats the write(2) syscall as completed, passing
execution back tothe application.
– Sometime later, an asynchronous kernel task finds the
written data and issuesdisk writes.
IO:File System:Sequential:cache-hit vs cache-miss
● Write-back caching is commonly used by file systems toimprove
write performance. It works by treating writes as completed
after the transfer to main memory, and writing them to disk
sometime later, asynchronously.
● The file systemprocess for writing this “dirty” data to disk is
called flushing.
● Sequence
– An application issues a file write(2), passing execution to the
kernel.
– Data from the application address space is copied to the
kernel.
– The kernel treats the write(2) syscall as completed, passing
execution back tothe application.
– Sometime later, an asynchronous kernel task finds the
written data and issuesdisk writes.
IO:File System:Random:cache-hit vs cache-miss
● Write-back caching is commonly used by file systems toimprove
write performance. It works by treating writes as completed
after the transfer to main memory, and writing them to disk
sometime later, asynchronously.
● The file systemprocess for writing this “dirty” data to disk is
called flushing.
● Sequence
– An application issues a file write(2), passing execution to the
kernel.
– Data from the application address space is copied to the
kernel.
– The kernel treats the write(2) syscall as completed, passing
execution back tothe application.
– Sometime later, an asynchronous kernel task finds the
written data and issuesdisk writes.
IO:File System:Sequantial:Read-a-head(Prefetch)
● Analyze file system readahead in the kernel.
– By counting all functions containing“readahead” using
funccount(8) (from perf-tools), while generating a workload
that is expected to trigger it:
IO:File System:Sequantial:Read-a-head(Prefetch)
● Ftrace can collect stack traces on events, which show why the
function was called – their parent functions.
– And which application triggered it and from what ppoint in
the code.
IO:File System:Sequantial:Read-a-head(Prefetch)
● A count of read-a-ahead efficacy.
IO:File System:Non blocking IO
● Non-Blocking I/ONormally, file system I/O will either complete
immediately(e.g., from cache) or after waiting (e.g., for disk
device I/O).
● If waiting is required, the application thread will block and leave
CPU, allowing other threads to execute while it waits. While the
blocked thread cannot perform other work, this typically isn’t a
problem since multithreaded applications can create additional
threads to execute while some are blocked.
● In some cases, non-blocking I/O is desirable, such as when
avoiding the performance or resource overhead of thread
creation.
– Non-blocking I/O may be performed by using the
O_NONBLOCK or O_NDELAY flags to the open(2)
syscall,which cause reads and writes to return an EAGAIN
errorinstead of blocking, which tells the application to try
again later.
IO:File System:Non blocking IO
● The OS may also provide a separate asynchronous I/O
interface, such as aio_read(3) and aio_write(3).
● Linux 5.1 added a new asynchronous I/O interface called
io_uring, with improved ease of use, efficiency, and
performance.
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM
IO:File System:VMM:Page Oriented IO
IO:File System:VMM
IO:File System:VMM
IO:File System:MemoryMappedFile
● For some applications and workloads, file system I/O
performance can be improved by mapping files to the process
address space and accessing memory offsets directly.
● This avoids the syscall execution and context switch overheads
incurred when calling read(2) and write(2) syscalls to access file
data.
● It can also avoid double copying of data, if the kernelsupports
directly mapping the file data buffer to the process address
space.
● Memory mappings are created using the mmap syscall
andremoved using munmap. Mappings can be tuned using
madvise
IO:File System:IO Stack
IO:File System:IO Stack:VFS “Interface”
● VFS (the virtual file system interface) provides a common
interface for different file system types.
● The terminology used by the Linux VFS interface can be a little
confusing, since it re uses the terms inodes and superblocks to
refer to VFS objects—terms that originated from Unix file
system on-disk data structures.
● The terms used for Linux on-disk data structures are usually
prefixed with their file system type, for example, ext4_inode and
ext4_super_block.
● The VFS inodes and VFS superblocks are in memory only.
IO:File System:IO Stack:File System cache
● Unix originally had only the buffer cache to improve the
performance of block device access. Nowadays, Linux has
multiple different cache types.
IO:File System:IO Stack:Buffer cache
● Buffer Cache: Unix used a buffer cache at the block device
interface to cachedisk device blocks. This was a separate,
fixed-size cache and,with the later addition of the page cache,
presented tuning problems when balancing different workloads
between them, as well as the overheads of double caching and
synchronization.
– These problems have largely been addressed by using the
page cache to store the buffer cache, an approach
introduced by SunOS and called the unified buffer cache.
IO:File System:IO Stack:Page cache
Page Cache: The page cache was first introduced to SunOS
during a virtual memory rewrite in 1985 and added to SVR4 Unix.
– It cached virtual memory pages, including mapped
filesystem pages, improving the performance of file and
directory I/O.
– It was more efficient for file access than the buffer cache,
which required translation from file offset to disk offset for each
lookup.
– Multiple file system types could use the page cache, including
the original consumers UFS and NFS.
– The size was dynamic: the page cache would grow to use
available memory, freeing it again when applications needed
it.
– Linux has a page cache with the same attributes. The size of
the Linux page cache is also dynamic, with a tunable to set the
balance between evicting from the page cache and swapping
IO:File System:IO Stack:Page cache
Page Cache: Prior to Linux 2.6.32, there was a pool of page
dirty flush (pdflush)threads, between two and eight as needed.
– These have sincebeen replaced by the flusher threads
(named flush), which are created per device to better
balance the per-device workload and improve throughput.
– Pages are flushed to disk for the following reasons:
● After an interval (30 seconds)

● The sync(2), fsync(2), msync(2) system calls

● Too many dirty pages (the dirty_ratio and dirty_bytes

tunables)
● No available pages in the page cache
IO:File System:IO Stack:Dentry cache
Dentry Cache: The dentry cache (Dcache) remembers
mappings from directory entry (struct dentry) to VFS inode,
similar to an earlier Unix directory name lookup cache (DNLC).
– The Dcache improves the performance of path name
lookups (e.g.,via open(2)): when a path name is traversed,
each namelookup can check the Dcache for a direct inode
mapping,instead of stepping through the directory contents.
– The Dcache entries are stored in a hash table for fast and
scalable lookup(hashed by the parent dentry and directory
entry name).
IO:File System:IO Stack:Dentry cache
Inode Cache: This cache contains VFS inodes (struct inodes),
each describing properties of a file system object, many of
which are returned via the stat(2) system call.
– These properties are frequently accessed for file system
workloads, such as checking permissions when opening
files, or updating timestamps during modification.
– These VFS inodes are stored in a hash table for fast and
scalable lookup (hashed by inode number and file system
superblock), although most of the lookups will be done via
the Dentry cache.
IO:File System:IO Stack:BIO
JBOD: Modern disks include an on-disk queue for I/O requests.
– IO accepted by the disk may be either waiting on the queue
or being serviced. This simple model is similar to a grocery
store checkout, where customers queue to be serviced. It is
also well suited for analysis using queueing theory
– While this may imply a first-come, first-served queue, the on-
disk controller can apply other algorithms to optimize
performance.
– These algorithms could include elevator seeking for
rotational disks
IO:File System:IO Stack:BIO:disc cache
JBOD: Caching DiskThe addition of an on-disk cache allows
some read requests tobe satisfied from a faster memory type
– While cache hits return with very low (good) latency, cache
misses are still frequent, returning with high disk-device
latency.
IO:File System:IO Stack:BIO:disc cache
JBOD: Caching DiskThe addition of an on-disk cache allows
some read requests tobe satisfied from a faster memory type
– The on-disk cache may also be used to improve write
performance, by using it as a write-back cache.
– This signals writes as having completed after the data
transfer to cache and before the slower transfer to persistent
disk storage.
– The counter-term is the write-through cache, which
completes writes only after the full transfer to the next level.
IO:File System:IO Stack:BIO:Time
Measuring Time: I/O time can be measured as:
– I/O request time (also called I/O response time): The
entire time from issuing an I/O to its completion
– I/O wait time: The time spent waiting on a queue
– I/O service time: The time during which the I/O was
processed (not waiting)
IO:File System:IO Stack:BIO:Time
IO:File System:IO Stack:BIO:Time
From the kernel:
– Block I/O wait time (also called OS wait time) is the time
spent from when a new I/O was created and inserted into a
kernel I/O queue to whenit left the final kernel queue and
was issued to the disk device. This mayspan multiple kernel-
level queues, including a block I/O layer queue anda disk
device queue.
– Block I/O service time is the time from issuing the request
to the deviceto its completion interrupt from the device.
– Block I/O request time is both block I/O wait time and block
I/O servicetime: the full time from creating an I/O to its
completion
IO:File System:IO Stack:BIO:Time
From the disk:
– Disk wait time is the time spent on an on-disk queue.
– Disk service time is the time after the on-disk queue
needed for an I/O to be actively processed.
– Disk request time (also called disk response time and disk
I/O latency)is both the disk wait time and disk service time,
and is equal to the blockI/O service time.
IO:StreamWrite
IO:ManualBufferStreamWrite
IO:ManualBufferStreamWrite
IO:ManualBufferStreamWrite
IO:RandomAccessFile
IO:RandomAccessFile
IO:RandomAccessFile
IO:RandomAccessFile
IO:BufferedStreamWrite
IO:BufferedStreamWrite
IO:BufferedStreamWrite
IO:BufferedChannelWrite
IO:BufferedChannelWrite
IO:BufferedChannelWrite
IO:MemoryMappedFile
IO:MemoryMappedFile
IO:MemoryMappedFile
IO:MemoryMappedFile
IO:MemoryMappedFile:Working Around Limitations
IO:MemoryMappedFile:sharedMemory
IO:MemoryMappedFile:sharedMemory
IO:MemoryMappedFile:sharedMemory
IO:MemoryMappedFile:sharedMemory
IO:NIO:Buffer:Limitations

You might also like