0 ratings0% found this document useful (0 votes) 57 views10 pagesUnit-4 File Systems and Its Implementation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Unit - 4: File Systems and Its Implementation:
The File System Interface
The File System Interface is a set of system calls and commands that an operating system provides to
allow users and programs to interact with the file system. It is an essential component of an
operating system, as it provides an abstraction over the physical storage medium, allowing users and
applications to work with files and directories in a consistent and user-friendly way. The File System
Interface typically provides the following functionalit
1. File Creation and Deletion: These operations allow users to create new files and delete existing
cones. When a file is created, the file system allocates space for it and creates a directory entry.
When a file is deleted, the file system releases the space and removes the directory entry.
File Open and Close: These operations allow users to open a file for reading or writing and close
it when done. When a file is opened, the file system creates a file descriptor or handle that the
program can use to refer to the file. When a file is closed, the file system releases the file
descriptor.
File Read and Write: These operations allow users to read data from a file or write data to a file.
‘The read operation returns a specified number of bytes from the file, starting at the current file
position. The write operation writes a specified number of bytes to the file, starting at the
current file position.
File Positioning: These operations allow users to change the current position within a file. The
file position is used by the read and write operations to determine where to start reading or
writing. Common file positioning operations include seeking to a specific position, moving
forward or backward by a specified number of bytes, or moving to the beginning or end of the
file.
5. File Truncation: This operation allows users to change the size of a file. It can be used to extend
or shorten a file, If a file is extended, the new space is typically filled with zeros. If a file is
shortened, the data beyond the new size is lost.Device Drivers
Device drivers are software programs that communicate between the operating system and the
hardware devices. They provide a set of functions that the operating system can use to control the
device, such as reading, writing, initializing, and managing device-specific operations. Device drivers
serve as an abstraction layer between the hardware and the higher-level software, making it easier
to write applications without worrying about the specifics of how each device works,
Device Handling
Device handling refers to how the operating system manages devices, allocates resources to them,
and schedules their use. The operating system uses a device management subsystem to keep track
of all the devices connected to the system, their types, status, and the drivers that control them. It
handles requests from applications to access devices, translates them into device-specific
commands, and manages any contention for resources.
Disk Scheduling Algorithms
Disk scheduling algorithms are used by the operating system to decide the order in which disk /O
requests should be served to optimize performance. Some common disk scheduling algorithms
include:
2) First-Come, First-Served (FCFS): Requests are processed in the order they arrive.
) Shortest Seek Time First (STF): The request with the shortest seek time (the smallest move for
the disk head) is processed next.
©) SCAN (or Elevator): The disk arm moves in one direction and services requests until it reaches
the end, then it reverses direction.
4) CSCAN: Similar to SCAN, but the disk arm moves in only one direction and jumps to the other
end when it reaches the end of the disk.
¢) LOOK and C-LOOK: Variations of SCAN and C-SCAN that stop moving in a direction if there are no
more requests in that direction,
Swap Space Management
‘Swap space is a space on disk that is used to temporarily hold data that doesn't fit into RAM. The
operating system swaps data between RAM and the swap space as needed. Swap space
management involves deciding which data to swap out to disk, and which data to bring back into
RAM. This is closely related to virtual memory management and page replacement algorithms, Some
common page replacement algorithms used in swap space management include:
a) Least Recently Used (LRU): The page that hasn't been used for the longest time is replaced,
b)_ First-in, First-Out (FIFO): The oldest page is replaced.File concepts refer to the fundamental principles and charact
Directory Operations: These operations allow users to create and delete directories, list the
contents of a directory, and navigate the directory hierarchy. Common directory operations
include creating a new directory, removing a directory, listing the files and subdirectories ina
directory, and changing the current working directory.
File Attributes: These operations allow users to query and modify the attributes of a file or
directory. Common attributes include the file's name, size, creation date, modification date, and
permissions. The file system interface provides operations to read and set these attributes.
File Locking: These operations allow users to lock a file or a portion of a file to prevent other
processes from accessing it concurrently. File locking is essential in multi-user and multi-process
environments where multiple processes may try to access the same file simultaneously.
Concepts
istics that define the nature of files
and the manner in which they are managed by the file system. Here are some key concepts related
to files:
1. File: A file is a collection of data stored on a storage device. A file can contain text, binary data,
multimedia content, program instructions, or any other type of data. Files are the primary units
of data storage and management in a file system.
File Name: Each file has a unique name within a directory that identifies it. The file name is used
to reference the file and perform operations on it. File names can include alphanumeric
characters and special characters, depending on the file system.
File Path: The file path specifies the location of a file within the file system's directory structure.
It consists of the sequence of directories leading to the file, separated by a path separator (e.¢., /
‘on Unix-based systems and \ on Windows)
File Extension: The file extension is a suffix appended to the file name, indicating the file type or
format. It is usually separated from the file name by a dot (e.g., [Link]). File extensions
help the operating system and applications recognize the file's content and determine how to
handle it.
File Attributes: File attributes are metadata that provide information about the file. Common
attributes include the file's size, creation date, modification date, access date, and permissions
(eg, read, write, execute).6. File Content: The file content refers to the actual data stored in the file. This data can be text,
binary, multimedia, or any other type of information.
7. File Type: Files can be categorized into various types based on their content and purpose.
Common file types include regular files (which contain user data), directories (which contain
references to other files and directories), and special files (such as device files and symbolic
links).
8. File Size: The file size is the amount of storage space occupied by the file's content. It is usually
measured in bytes, kilobytes (KB), megabytes (MB), or gigabytes (6B).
9. File Permissions: File permissions define the level of access that users and processes have to 2
file. Permissions can include read, write, and execute permissions. They determine who can read
the file's content, who can modify it, and who can execute it as a program.
10. File Descriptor: A file descriptor is a low-level handle that a process uses to interact with a file. It
isa numerical value returned by the file system when a file is opened, and itis used in
subsequent read, write, and close operations.
Access methods
‘Access methods are the techniques or mechanisms through which data can be read from or written
to files. These methods determine how data is retrieved and stored, and they play a crucial role in
file system design and performance. The three main types of access methods are:
1. Sequential Access:
In sequential access, data is read or written in a
method is similar to reading a book or a tape from start to end. It's simple and efficient for tasks that
involve processing data in a specific order, such as reading log files or streaming media.
ear sequence, one record or block at a time. This
When using sequential access, the system keeps track of the position within the file, and each read
or write operation advances the position to the next record or block. If you need to access a spet
record, you may have to read through all the preceding records to get to it, which can be time-
consuming.
2. Direct (Random) Access:
Direct access, also known as random access, allows data to be accessed non-sequentially. This
method enables programs to jump directly to any position within the file and read or write data. This
is especially useful for tasks that require frequent and rapid access to various locations within a file,
such as database operations.In direct access, the file is divided into fixed-size blocks or records, and each block can be accessed
directly by specifying its block number. This method provides fast access times, but it can be more
complex to implement and manage than sequential access.
3. Indexed Access:
Indexed access combines aspects of both sequential and direct access methods. In indexed access,
an index is maintained that maps logical records to physical locations on the storage device. The
index allows for efficient random access to the data.
‘This method provides fast access to individual records, similar to direct access, but it also maintains
an ordered structure like sequential access. Indexed access is often used in databases and file
systems where performance and efficiency are crucial
Directory Structure
Directory structures are essential components of file systems that help organize and manage files.
‘They provide a hierarchical or tree-like structure to organize files and directories (folders) withir
file system. Here are the common types of directory structures:
the
Single-Level Directory: In a single-level directory, al files are placed in the same directory, and
there are no subdirectories. This is the simplest type of directory structure, but it has significant
limitations. For example, all file names must be unique, and it becomes ditficult to manage a
large number of files.
‘Two-Level Directory: In a two-level directory structure, the file system has a separate directory
for each user. Each user directory contains the files owned by that user. This structure
introduces a basic level of organization and separation, but it still lacks the flexibility to organize
files into subdirectories.
Hierarchical (Tree) Directory Structure: The hierarchical or tree directory structure is the most
common and flexible type of directory structure. It allows the creation of directories and
subdirectories, forming a tree-like structure. The root of the tree is the top-level directory, and
branches represent subdirectories. This structure provides an organized and efficient way to
manage files and directories.
Graph Directory Structure: The graph directory structure allows directories to have multiple
parent directories, forming a graph structure, This structure introduces the concept of shared
directories or files. It's more flexible than the tree structure but introduces complexities like the
possibility of cycles and the need for garbage collection.5. Acyclic Graph Directory Structure: The acyclic graph directory structure is similar to the graph
directory structure but without cycles. It uses links or pointers to share directories or files, but it
ensures that there are no cycles in the structure. This provides the benefits of sharing without
the complexities associated with cycles.
6. General Graph Directory Structure: The general graph directory structure allows directories to
have multiple parent directories and can contain cycles. This is the most flexible type of directory
structure, but it is also the most complex to manage. It requires sophisticated algorithms to
handle cycles and garbage collection.
File System Mounting
File system mounting is the process of attaching a file system to a specific directory in the existing
file system hierarchy. This allows the file system to become accessible to the user and applications
through the specified directory, known as the mount point.
‘The concept of mounting is particularly important in operating systems that support multiple file
systems or remote file systems. For instance, Unix and Unix-like operating systems (including Linux
and macOS) use a single hierarchical file system tree, and additional file systems can be mounted
into this tree at various points.
Here's how the mounting process typically works:
4. Mount Point: The mount point is the directory in the existing file system hierarchy where the
new file system will be attached. This directory acts as the root of the mounted file system. Any
existing data in this directory becomes temporarily inaccessible while the file system is mounted.
2. Mount Command: The operating system provides a command (such as mount in Unix-like
systems) to attach a file system. The command typically requires the source (the file system or
device to be mounted) and the target (the mount point). Additional options may be specified,
such as the file system type, read-only or read-write access, etc.
3. File System Integration: Once mounted, the new file system becomes an integral part of the file
system hierarchy. Files and directories within the mounted file system can be accessed and
manipulated just like any other files on the system, relative to the mount point.
4. Unmount Command: The operating system also provides a command (such as umount in Unix-
like systems) to detach a mounted file system. This process is called unmounting. After
unmounting, the mount point directory returns to its previous state, and any data that was
previously accessible becomes available again.5, Remote Mounting: File system mounting is not limited to local storage devices. Remote file
systems (e.g., NFS, SMB) can also be mounted, allowing a system to access files and directories
located on another computer over a network.
6. Automounting: Some operating systems provide a feature called automounting, where file
systems are automatically mounted when accessed and unmounted when not in use. This is
often used for removable media (e.g., USB drives, CD-ROMs) or network drives.
Directory implementation
Directory implementation refers to how the file system organizes and manages directories and their
entries internally. The choice of data structure for directory implementation can significantly affect
the efficiency of file system operations such as file lookup, creation, and deletion. Here are some
commen methods for directory implementation:
2. Linear List: this simple method, a directory is implemented as a linear list of fle entries. Each
entry contains the file name and a pointer to the file's data (or an inode, which we'll discuss
shortly). This method is easy to implement but can be inefficient for large directories, as
searching fora file requires linearly traversing the lst
2. Hash Table: To improve the efficiency of lookup operations, a hash table can be used. In this
method, a hash function is applied to the file name to compute an index into an array of,
directory entries. This allows for faster lookup times, but it introduces complexity and can lead
to hash collisions, which need to be handled.
3. B-Trees: B-trees are balanced tree data structures that provide efficient searching, insertion, and
deletion operations. They can be used to implement directories, where the directory entries are
stored in the tree nodes. B-trees are especially useful for large directories and file systems that
need to maintain a sorted order of file names.
4. Inodes: in Unix-like file systems, an inode (index node) is a data structure that stores metadata
about a file or directory, such as its size, permissions, and pointers to its data blocks. Each
directory entry contains the file name and the corresponding inode number. Inodes provide a
level of indirection that decouples the file name from the actual data and allows for efficient file
management.
5. Symbolic Links: Symbolic links (or symlinks) are special types of entries in a directory that point
to another file or directory. They provide a way to create aliases or shortcuts, allowing a file or
directory to be accessed through multiple paths. Symbolic links are resolved at runtime, and the
file system needs to handle them appropriately.6. Hard Links: Hard links are another type of directory entry that allows multiple fle nares tO refer
to the same inode (and hence, the same file data). Unlike symbolic links, hard links have No
distinction between the original and the link, and they all directly point to the inode. The file
system must manage the reference count of inades to determine when the data can be freed.
Allocation Methods
Allocation methods in file systems refer to the techniques used to allocate storage space to files ona
storage device. These methods determine how the filesystem distributes, tracks, and manages disk
space to store fle data efficiently. Here ae the three primary allocation methods:
1. Contiguous Allocation
« Description: in contiguous allocation, each files stored in a set of contiguol® blocks on the disk.
‘This means that all the data fora single files stored in adjacent storage locations
« Advantages: Contiguous allocation provides fast and straightforward access to files, especially for
sequential access. The file can be read or written in singe disk operation.
«Disadvantages: The main drawbackis fragmentation, both external (unallocated space scattered
throughout the disk) and internal (unused space within a block). Also, can be challenging to
predict the file size initially, leading to wasted space or fle ‘truncation.
«= Use Cases: I's suitable for small fle ystems or systems where file are typically accessed
sequentially.
2. Linked Allocation
‘© Description: \n linked allocation, each file is divided into blocks, and each block contains a pointer
‘to the next block in the file. The blocks can be scattered anywhere on the disk.
* Advantages: This method eliminates ‘external fragmentation since blocks can be allocated
anywhere. It's easy to ‘extend or shrink files as needed.
« Disadvantages: It introduces overhead due to the storage of pointers in each block. Access times
can be slow for direct access. since the system may. need to traverse multiple blocks to reach the
desired location.
«+ use Cases: Suitable for files that grow dynamically and for fle systems where dynamic allocation
and deallocation of space are common.
3. Indexed Allocation
+ Description: in indexed allocation, each file has an associated index block that contains pointers to
the blocks of the file, The index block provides @ {evel of indirection, allowing for efficient random
access.
«Advantages: It provides fast direct access to any block and supports both sequential and random,
‘access patterns. It eliminates external fr mentation,Free Space Management — Efficiency and Performance
Free space management refers to how a file system keeps track of and manages the unallocated
space on a storage device. Efficient free space management is crucial for the performance and
usability of the file system.
Bit Vector (Bit Map): This method uses a bit vector or bit map to represent the free and occupied
blocks on the disk. Each bit represents a block, and its value indicates whether the block is free (0)
or allocated (1). This method provides fast allocation and deallocation, but it can consume a
significant amount of memory for large disks.
* Free Block List: In this method, the file system maintains a linked list of free blocks. When a block
is allocated, it's removed from the list, and when it's deallocated, it's added back to the list. This
‘method is simple but can be slower than a bit vector for allocation and deallocation operations.
* Grouped Blocks: Some file systems, such as ext4, group blocks into block groups or clusters. Free
space is managed within each group, reducing the overhead of managing free space across the
entire disk.
The efficiency and performance of free space management depend on factors such as the size of the
disk, the average file size, and the frequency of file operations. Choosing the right method for free
space management can impact the overall performance and usability of the file system.
Recovery
Recovery refers to the process of restoring a file system to a consistent state after a crash or error.
{t's a crucial aspect of file system design, ensuring data integrity and consistency. Common recovery
techniques include:
* Journaling: in journaling file systems, changes to the file system are first recorded in a journal
before being applied. If the system crashes, it can replay the journal to recover the file system to 2
consistent state.
'* Checkpoints: Some file systems periodically create checkpoints, which are snapshots of the file
system's state. In the event of a crash, the system can roll back to the last checkpoint and recover
from there.* Transactional File Systems: Transactional file systems use transactions to ensure atomic updates
to the file system. Each transaction either completes entirely or is rolled back, ensuring
consistency.
The choice of recovery technique depends on factors such as the file system's architecture,
performance requirements, and the importance of data consistency and integrity.
Log-Structured File Systems
Log-structured file systems (LFS) are a type of file system that stores data sequentially in a log-like
structure. They are designed to optimize write performance and take advantage of the sequential
write capabilities of modern storage devices.
* Design: In an LFS, all writes are appended to the end of the log. The log is divided into segments,
and each segment contains a sequence of write operations. Segments are periodically flushed to
disk, and the file system maintains an in-memory index to locate data.
* Advantages: LFS provides high write performance and efficient use of storage space. It eliminates
the need for random writes, reducing the wear on storage devices. LFS also simplifies the recovery
process, as the log can be replayed to restore the file system.
‘* Disadvantages: LFS can be less efficient for random reads, as data may be scattered throughout
the log. It also requires regular garbage collection to reclaim space occupied by obsolete data.
Log-structured file systems are suitable for write-intensive workloads and systems where write
performance is a critical requirement. They are often used in storage systems such as SSDs and flash
storage.