0% found this document useful (0 votes)
30 views56 pages

Operating System Mod 5 File Management

The document outlines the responsibilities of an operating system in file management, including creating, deleting, and manipulating files and directories. It explains the concept of files, their attributes, and the operations that can be performed on them, such as reading, writing, and deleting. Additionally, it discusses file locking mechanisms, file structures, and the implementation of file systems, including directory organization and path naming conventions.

Uploaded by

valechaamisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views56 pages

Operating System Mod 5 File Management

The document outlines the responsibilities of an operating system in file management, including creating, deleting, and manipulating files and directories. It explains the concept of files, their attributes, and the operations that can be performed on them, such as reading, writing, and deleting. Additionally, it discusses file locking mechanisms, file structures, and the implementation of file systems, including directory organization and path naming conventions.

Uploaded by

valechaamisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

File Management

Mod 5
File Management
The operating system is responsible for the following activities in connection
with file management:
● Creating and deleting files
● Creating and deleting directories to organize files
● Supporting primitives for manipulating files and directories
● Mapping files onto secondary storage
● Backing up files on stable (nonvolatile) storage media
These programs create, delete, copy, rename, print, dump, list, and generally
manipulate files and directories.
File Concept
● the operating system provides a uniform logical view of information storage.
● The operating system abstracts from the physical properties of its storage
devices to define a logical storage unit, the file.
● Files are mapped by the operating system onto physical devices. These
storage devices are usually nonvolatile, so the contents are persistent
through power failures and system reboots.
● A file is a named collection of related information that is recorded on
secondary storage.
● In general, a file is a sequence of bits, bytes, lines, or records, the meaning of
which is defined by the file's creator and user. The concept of a file is thus
extremely general.
File Concept
Many different types of information may be stored in a file-source programs, object
programs, executable programs, numeric data, text, payroll records, graphic
images, sound recordings, and so on.
A file has a certain defined which depends on its type:
● A text file is a sequence of characters organized into lines (and possibly
pages).
● A source file is a sequence of subroutines and functions, each of which is
further organized as declarations followed by executable statements.
● An object file is a sequence of bytes organized into blocks understandable
by the system's linker.
● An executable file is a series of code sections that the loader can bring into
memory and execute.
File Attributes
A file's attributes vary from one operating system to another but typically consist of these:
● Name. The symbolic file name is the only information kept in human readable form.
● Identifier. This unique tag, usually a number, identifies the file within the file system;
it is the non-human-readable name for the file.
● Type. This information is needed for systems that support different types of files.
● Location. This information is a pointer to a device and to the location of the file on
that device.
● Size. The current size of the file (in bytes, words, or blocks) and possibly the
maximum allowed size are included in this attribute.
● Protection. Access-control information determines who can do reading, writing,
executing, and so on.
● Time, date, and user identification. This information may be kept for creation, last
modification, and last use. These data can be useful for protection, security, and
usage monitoring.
File System
The information about all files is kept in the directory structure, which also resides
on secondary storage.

Typically, a directory entry consists of the file's name and its unique identifier.

The identifier in turn locates the other file attributes.

It may take more than a kilobyte to record this information for each file. In a
system with many files, the size of the directory itself may be megabytes.

Because directories, like files, must be nonvolatile, they must be stored on the
device and brought into memory piecemeal, as needed.
File Operations
A file is an To define a file properly, we need to consider the operations that can be
performed on files.

1. Creating a file. Two steps are necessary to create a file:


● First, space in the file system must be found for the file.
● Second, an entry for the new file must be made in the directory.

2. Writing a file. To write a file, we make a system call specifying both the name
of the file and the information to be written to the file. Given the name of the
file, the system searches the directory to find the file's location. The system
must keep a write pointer to the location in the file where the next write is to
take place. The write pointer must be updated whenever a write occurs.
File Operations
3. Reading a file. To read from a file, we use a system call that specifies the name of
the file and where (in memory) the next block of the file should be put.
● Again, the directory is searched for the associated entry, and the system needs to
keep a read pointer to the location in the file where the next read is to take place.
● Once the read has taken place, the read pointer is updated.
● Because a process is usually either reading from or writing to a file, the current
operation location can be kept as a per-process. Both the read and write operations
use this same pointer, saving space and reducing system complexity.

4. Repositioning within a file. The directory is searched for the appropriate entry, and
the current-file-position pointer is repositioned to a given value.
● Repositioning within a file need not involve any actual I/0.
● This file operation is also known as a file seek.
File Operations
5. Deleting a file. To delete a file, we search the directory for the named file.
Having found the associated directory entry, we release all file space, so that
it can be reused by other files, and erase the directory entry.

6. Truncating a file. The user may want to erase the contents of a file but keep
its attributes. Rather than forcing the user to delete the file and then recreate
it, this function allows all attributes to remain unchanged -except for file
length-but lets the file be reset to length zero and its file space released.
File Operations
Other common operations include appending new information to the end of an existing
file and renaming an existing file.
Most of the file operations mentioned involve searching the directory for the entry
associated with the named file. To avoid this constant searching, many systems require
that an open () system call be made before a file is first used actively.
The operating system keeps a small table, called the Open File Table containing
information about all open files.
When a file operation is requested, the file is specified via an index into this table, so
no searching is required.
When the file is no longer being actively used, it is closed by the process, and the
operating system removes its entry from the open-file table. ( create and delete are
system calls that work with closed rather than open files.)
File Operations
Some systems implicitly open a file when the first reference to it is made.
The file is automatically closed when the job or program that opened the file terminates. Most
systems, however, require that the programmer open a file explicitly with the open() system call
before that file can be used.
The open() call can also accept accessmode information - create, read-only, read-write,
append-only, and so on.
This mode is checked against the file's permissions. If the request mode is allowed, the file is
opened for the process.
The open () system call typically returns a pointer to the entry in the open-file table. This
pointer, not the actual file name, is used in all I/0 operations, avoiding any further searching and
simplifying the system-call interface.
File Operations
several pieces of information are associated with an open file:

File pointer. On systems that do not include a file offset as part of the read() and write() system calls, the
system must track the last readwrite location as a current-file-position pointer. This pointer is unique to
each process operating on the file and therefore must be kept separate from the on-disk file attributes.

File-open count. As files are closed, the operating system must reuse its open-file table entries, or it could
run out of space in the table. Because multiple processes may have opened a file, the system must wait
for the last file to close before removing the open-file table entry. The file-open counter tracks the number
of opens and closes and reaches zero on the last close. The system can then remove the entry.

Disk location of the file. Most file operations require the system to modify data within the file. The
information needed to locate the file on disk is kept in memory so that the system does not have to read it
from disk for each operation.

Access rights. Each process opens a file in an access mode. This information is stored on the
per-process table so the operating system can allow or deny subsequent I/0 requests.
File Locking
● Some operating systems provide facilities for locking an open file (or sections
of a file).
● File locks allow one process to lock a file and prevent other processes from
gaining access to it.
● File locks are useful for files that are shared by several processes-for
example, a system log file that can be accessed and modified by a number of
processes in the system.
● A shared lock is akin to a reader lock in that several processes can acquire
the lock concurrently.
● An exclusive lock behaves like a writer lock; only one process at a time can
acquire such a lock.
File Locking
If the locking scheme is mandatory, the operating system ensures locking
integrity.

For advisory locking, it is up to software developers to ensure that locks are

appropriately acquired and released.

As a general rule, Windows operating systems adopt mandatory locking, and

UNIX systems employ advisory locks.


File Types
File Structure
File types also can be used to indicate the internal structure of the file, source and
object files have structures that match the expectations of the programs that read
them.
For example, the operating system requires that an executable file have a
specific structure so that it can determine where in memory to load the file and
what the location of the first instruction is.
Some operating systems extend this idea into a set of system-supported file
structures, with sets of special operations for manipulating files with those
structures.
File Structure
the operating system that support multiple file structures: the resulting size of
the operating system is cumbersome.
If the operating system defines five different file structures, it needs to contain the
code to support these file structures.
In addition, it may be necessary to define every file as one of the file types
supported by the operating system.
When new applications require information structured in ways not supported by
the operating system, severe problems may result.
File Structure
The Macintosh operating system also supports a minimal number of file structures.
It expects files to contain two parts: a resource fork and a data fork.
The resource fork contains information of interest to the user. For instance, it
holds the labels of any buttons displayed by the program.
A foreign user may want to re-label these buttons in his own language, and the
Macintosh operating system provides tools to allow modification of the data in the
resource fork.
The data fork contains program code or data-the traditional file contents.
Too few structures make programming inconvenient, whereas too many cause
operating-system bloat and programmer confusion.
Internal File Structure
Internally, locating an offset within a file can be complicated for the operating

system. Disk systems typically have a well-defined block size determined by

the size of a sector. All disk I/0 is performed in units of one block (physical

record), and all blocks are the same size. It is unlikely that the physical record

size will exactly match the length of the desired logical record. Logical records

may even vary in length. Packing a number of logical records into physical

blocks is a common solution to this problem.


File Structure
Files can be structured in any of several ways. Three common possibilities are
depicted as:

Unstructured
Sequence of
Bytes
(a) Unstructured Sequence of Bytes
The operating system does not know or care what is in the file. All it sees are
bytes.
Any meaning must be imposed by user-level programs.
All versions of UNIX (including Linux and OS X) and Windows use this file model..
provides the maximum amount of flexibility.
The operating system does not help, but it also does not get in the way. For users
who want to do unusual things.
(b) Sequence of fixed-length records
Each record has some internal structure

File is treated as a collection of records, rather than just a stream of bytes.

This means that when you perform a read operation, you don’t retrieve arbitrary
bytes but instead one complete record at a time.

Similarly, when you write, you don’t just insert raw data anywhere—you either
overwrite an existing record or append a new one in a structured way.

This ensures that data remains organized and can be accessed efficiently.
(c) Tree
Not necessarily all the same length, each containing a key field in a fixed position
in the record. The tree is sorted on the key field, to allow rapid searching for a
particular key.
The basic operation here is not to get the ‘‘next’’ record, although that is also
possible, but to get the record with a specific key.
For the zoo file, one could ask the system to get the record whose key is pony, for
example, without worrying about its exact position in the file.
New records can be added to the file, with the operating system, and not the user,
deciding where to place them.
we see a simple executable binary file taken from an early version of UNIX.

Even though a file is fundamentally just a sequence of bytes, the operating system (OS) enforces
specific rules on how an executable file must be structured to run properly.

For an OS to recognize and execute a file, it must follow a specific format. This format typically consists
of five sections:

1. Header – Contains metadata about the file, including a magic number, which uniquely identifies it
as an executable.
2. Text – The actual machine code (instructions) that the CPU executes.
3. Data – Contains initialized global and static variables.
4. Relocation Bits – Helps the OS adjust memory addresses if the program is loaded at a different
location in memory.
5. Symbol Table – Stores information about variable and function names, which is useful for
debugging or linking.

The magic number in the header is a special identifier that tells the OS, "This file is an executable", so
it knows how to handle it correctly. If the magic number is missing or incorrect, the OS will refuse to
execute the file.
Archive File

Executable File
Directories
To keep track of files, file systems normally have directories or folders, which are
themselves files.
● Single Level Directory Systems:
The simplest form of directory system is having one directory containing all
the files. Sometimes it is called the root directory.
● The advantages of this scheme are its simplicity and the ability to locate files
quickly.
● It is sometimes still used on simple
embedded devices such as digital
cameras and some portable music players.
Directories
● There can be as many
directories as are needed to
group the files in natural ways.
● If multiple users share a
common file server, each user
can have a private root
directory for his or her own
hierarchy.
● The ability for users to create
an arbitrary number of
subdirectories provides a
powerful structuring tool for
users to organize their work.
● Nearly all modern file systems
are organized in this manner.
Path Names
When the file system is organized as a directory tree, some way is needed for
specifying file names.

Two different methods are commonly used:

In the first method, each file is given an absolute path name consisting of the
path from the root directory to the file. Absolute path names always start at the
root directory and are unique.
Path Names
The other kind of name is the relative path name.

This is used in conjunction with the concept of the working directory (also called the current directory).

A user can designate one directory as the current working directory, in which case all path names not
beginning at the root directory are taken relative to the working directory.

For example, if the current working directory is /usr/ast, then the file whose absolute path is
/usr/ast/mailbox can be referenced simply as mailbox. In other words, the UNIX command

cp /usr/ast/mailbox /usr/ast/mailbox.bak

and the command

cp mailbox mailbox.bak

do exactly the same thing if the working directory is /usr/ast.


File System Implementation
Implementation involves :
● how files and directories are stored,
● how disk space is managed, and
● how to make everything work efficiently and reliably.
File System Layout :
● Sector 0 of the disk is called the MBR (Master Boot Record) and is used to boot the
computer.
● The end of the MBR contains the partition table.
● This table gives the starting and ending addresses of each partition.
● One of the partitions in the table is marked as active.
● The BIOS reads in and executes the MBR.
● The first thing the MBR program does is locate the active partition, read in the boot
block, and execute it.
Every partition starts with a boot block.

The first one is the superblock. It contains all the key parameters about the file
system and is read into memory when the computer is booted or the file system is
first touched.
File System Layout
Superblock includes a magic number to identify the file-system type, the number
of blocks in the file system etc.

Free blocks in the file system maybe in the form of a bitmap or a list of pointers.

The i-nodes, an array of data structures, one per file, telling all about the file.

The root directory, which contains the top of the file-system tree.

Finally, the remainder of the disk contains all the other directories and files.
Contiguous Allocation
Contiguous Allocation
● It is easy to implement because you only need to remember two numbers: the
disk address of the first block and the total number of blocks in the file.
● Given the number of the first block, the number of any other block can be
found by a simple addition.
● The read performance is excellent because the entire file can be read from
the disk in a single operation. Only one seek is needed (to the first block).
Thus contiguous allocation is simple to implement and has high performance.
drawback: over the course of time, the disk becomes fragmented. The disk
ultimately consists of files and holes, as illustrated in the figure.
Reusing the space requires maintaining a list of holes, which is doable. However,
when a new file is to be created, it is necessary to know its final size in order to
choose a hole of the correct size to place it in.
Linked-List Allocation
This method for storing files is to keep each one as a linked list of disk blocks, as
shown. The first word of each block is used as a pointer to the next one. The rest
of the block is for data.
Linked-List Allocation
Unlike contiguous allocation, every disk block can be used in this method. No
space is lost to disk fragmentation

While reading a file sequentially is easy, random access is very slow. To reach
block *n*, the operating system must start from the beginning and read the
previous *n - 1* blocks one by one.

Also, the amount of data storage in a block is no longer a power of two because
the pointer takes up a few bytes.
Linked-List Allocation Using a Table in Memory
Both disadvantages of the linked-list allocation can
be eliminated by taking the pointer word from each
disk block and putting it in a table in memory.

File A - starts from 4


File B - starts from 6

Both chains are terminated with a special marker


(e.g., −1) that is not a valid block number. Such a
table in main memory is called a FAT (File
Allocation Table).
Linked-List Allocation Using a Table in Memory
The primary disadvantage of this method is that the entire table must be in
memory all the time to make it work.

With a 1-TB disk and a 1-KB block size, the table needs 1 billion entries, one for
each of the 1 billion disk blocks. Each entry has to be a minimum of 3 bytes.

Thus the table will take up 3 GB or 2.4 GB of main memory all the time, depending
on whether the system is optimized for space or time.

It was the original MS-DOS file system and is still fully supported by all versions of
Windows though.
I-nodes
We previously used a method where each file
now has a special data structure called an
i-node (index-node). This i-node stores the file's
details and the locations of its blocks on the
disk.
The i-node scheme requires an array in memory
whose size is proportional to the maximum
number of files that may be open at once. It
does not matter if the disk is 100 GB, 1000 GB,
or 10,000 GB.
Implementing Directories
Implementing Directories
For systems that use i-nodes, another possibility for storing the attributes is in the
i-nodes, rather than in the directory entries. In that case, the directory entry can be
shorter: just a file name and an i-node number. This approach is illustrated in Fig.
4-14(b).

(we have made the assumption that files have short, fixed-length names.)

The simplest approach is to set a limit on file-name length, typically 255


characters, and then use one of the designs of Fig. 4-14 with 255 characters
reserved for each file name. This approach is simple, but wastes a great deal of
directory space, since few files have such long names
Implementing
Directories:

2 ways of
handling long
files in a
directory:
(a) - In-Line
(b) - In a Heap
Implementing Directories:
This fixed-length header is followed by the actual file name, however long it may
be, as shown in Fig.(a) in big-endian format (e.g., SPARC). In this example we
have three files, project-budget, personnel, and foo. Each file name is terminated
by a special character (usually 0), which is represented in the figure by a box with
a cross in it. To allow each directory entry to begin on a word boundary, each file
name is filled out to an integral number of words

A disadvantage of this method is that when a file is removed, a variable-sized gap


is introduced into the directory into which the next file to be entered may not fit.
Implementing Directories:
Another way to handle variable-length names is to make the directory entries themselves
all fixed length and keep the file names together in a heap at the end of the directory, as
shown in Fig.(b).
This method has the advantage that when an entry is removed, the next file entered will
always fit there.
Disadvantages: Of course, the heap must be managed and page faults can still occur
while processing file names.
Advantage: One minor win here is that there is no longer any real need for file names to
begin at word boundaries, so no filler characters are needed after file names.
● For extremely long directories, linear searching can be slow. One way to speed up
the search is to use a hash table in each directory.
File Sharing
One of C’s files is present in one of B’s
directories as well.
The connection between B’s directory and the
shared file is called a link.
The file system itself is now a Directed Acyclic
Graph, or DAG, rather than a tree. (Having the
file system be a DAG complicates maintenance)
Problem with File Sharing
If directories really do contain disk addresses, then a copy of the disk addresses
will have to be made in B’s directory when the file is linked.

If either B or C subsequently appends to the file, the new blocks will be listed only
in the directory of the user doing the append.

The changes will not be visible to the other user,

thus defeating the purpose of sharing.


Solution in File Sharing
First solution, disk blocks are not listed in directories, but in a little data structure
associated with the file itself.
The directories would then point just to the little data structure. This is the approach used
in UNIX (where the little data structure is the i-node).
Second solution, B links to one of C’s files by having the system create a new file, of
type LINK, and entering that file in B’s directory.
The new file contains just the path name of the file to which it is linked.
When B reads from the linked file, the operating system sees that the file being read from
is of type LINK, looks up the name of the file, and reads that file.
This approach is called symbolic linking, to contrast it with traditional (hard) linking.
Drawbacks of the Solution’s in File Sharing:
In the first method, at the moment that B links to the shared file, the i-node records
the file’s owner as C. Creating a link does not change the ownership, but it does
increase the link count in the i-node, so the system knows how many directory
entries currently point to the file.
If C subsequently tries to remove the file, the system is faced with a problem: If it
removes the file and clears the i-node, B will have a directory entry pointing to an
invalid i-node.
If the i-node is later reassigned to another file, B’s link will point to the wrong file.
Pointers to the directories cannot be stored in the inode because there can be an
unlimited number of directories.
File Sharing
File Sharing
~with i-nodes:

The only thing to do is remove C’s directory entry, but leave the i-node intact, with
count set to 1, as shown in Fig. 4-17(c).

If the system does accounting or has quotas, C will continue to be billed for the file
until B decides to remove it, if ever, at which time the count goes to 0 and the file
is deleted.
File Sharing with Symbolic Links
● With symbolic links this problem does not arise because only the true
owner has a pointer to the i-node.
● Users who have linked to the file just have path names, not i-node
pointers.
● When the owner removes the file, it is destroyed.
● Subsequent attempts to use the file via a symbolic link will fail when the
system is unable to locate the file.
● Removing a symbolic link does not affect the file at all.
Symbolic Links Drawbacks & Advantage
● The problem with symbolic links is the extra overhead required. The file
containing the path must be read, then the path must be parsed and followed,
component by component, until the i-node is reached.
● an extra i-node is needed for each symbolic link, as is an extra disk block to
store the path, although if the path name is short, the system could store it in
the i-node itself, as a kind of optimization.
● Symbolic links have the advantage that they can be used to link to files on
machines anywhere in the world, by simply providing the network address of
the machine where the file resides in addition to its path on that machine.
File Sharing Drawback:
When links are allowed, files can have two or more paths. Programs that start at a
given directory and find all the files in that directory and its subdirectories will
locate a linked file multiple times
9.5 pts & above
A brief case study on :

○ Log Structured File Systems


○ Journaling File Systems

You might also like