0% found this document useful (0 votes)
6 views22 pages

Unit 12final

Unit 12 covers file structures and external storage devices, including magnetic tapes, drums, and disks, as well as their organization and processing. It introduces file organization methodologies, focusing on sequential, indexed sequential, and direct file access methods. The unit aims to equip learners with the ability to explain external storage types and describe the structure and processing of various file types.

Uploaded by

Suresh Khadka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views22 pages

Unit 12final

Unit 12 covers file structures and external storage devices, including magnetic tapes, drums, and disks, as well as their organization and processing. It introduces file organization methodologies, focusing on sequential, indexed sequential, and direct file access methods. The unit aims to equip learners with the ability to explain external storage types and describe the structure and processing of various file types.

Uploaded by

Suresh Khadka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data and File Structures Unit 12

Unit 12 File Structures


Structure:
12.1 Introduction
Objectives
12.2 External Storage Devices
Magnetic tapes
Magnetic drums
Magnetic disks
Mass storage devices
Intermediate storage devices
12.3 Introduction to File Organization
12.4 Sequential Files
Structure of sequential files
Processing sequential files
12.5 Indexed Sequential Files
Structure of indexed sequential files
Processing indexed sequential files
12.6 Direct Files
Structure of direct files
Processing direct files
12.7 Summary
12.8 Terminal Questions
12.9 Answers

12.1 Introduction
In the previous unit, we have discussed about sorting as one of the
important technique which need to be applied for any group of data for easy
future retrivel. We also discussed various sorting technique like bubble,
selection, insertion and heap. Sorting make the searching process easy and
we discussed two important searching techniques called linear and binary
search.
This unit begins with the external storage devices, why external devices are
required and we discussed about various types of external storage devices
like magnetic tape, drum and disk. We are extending the discussion of
external storage devices with mass and intermediate storage devices with

Sikkim Manipal University Page No.: 192


Data and File Structures Unit 12

its role. We also have one more important topic under this unit is file
structure. Files are mandatory where we use large amount of data. File
systems support with efficient storage and easy retrieval process. We are
concluding this unit with the discussion on three different file accesses like
sequential, indexed sequential and direct files.
Objectives:
After studying this unit, you should be able to:
 explain external storage devices and it types
 describe structure and processing of sequential file
 discuss structure and processing of indexed sequential file
 describe structure and processing of direct files.

12.2 External Storage Devices


The capacity of main memory is limited by two factors, the cost of main
memory and the technical problem in developing a large capacity of main
memory. We discover that in some instances programs can also reside in
storage units which do not belong to main memory.
An external storage device may be defined as device other than the main
memory on which information or data can be stored and from which the
information retrieved for processing. External storage devices are having
larger capacities and fewer expenses to store compared to main memory.
Primary uses of external storage devices are:
 Backup or overlay of program during execution.
 Storage of programs, subprograms and data.
 Storage of information in files.
Under this section we are going to discuss some of the external devices like
magnetic tape, drum and disk.
12.2.1 Magnetic tape
Magnetic tape is an information storage medium consisting of a
magnetisable coating on a thin plastic strip. Nearly all recording tape is of
this type, whether used for video with a video cassette recorder, audio
storage (reel-to-reel tape, compact audio cassette, digital audio tape (DAT),
digital linear tape (DLT) and other formats including 8-track cartridges) or
general purpose digital data storage using a computer (specialized tape

Sikkim Manipal University Page No.: 193


Data and File Structures Unit 12

formats, as well as the above-mentioned compact audio cassette, used with


home computers of the 1980s, and DAT, used for backup in workstation
installations of the 1990s).
Modern magnetic tape is most commonly packaged in cartridges and
cassettes. The device that performs actual writing or reading of data is a
tape drive. When storing large amounts of data, tape can be substantially
less expensive than disk or other data storage options. Tape storage has
always been used with large computer systems. Modern usage is primarily
as a high capacity medium for backups and archives. Early industry-
standard magnetic tape was half an inch wide and wound on removable
reels 10.5 inches in diameter as shown in figure 12.1. Different lengths were
available with 2400 feet and 4800 feet being common. DECtape was a
variation on this “round tape” In modern magnetic tape systems the reels
are much smaller and are fixed inside a cartridge to protect the tape and for
ease of handling. Cartridge formats include QIC, DAT Exabyte. Tape is read
and written on a tape drive (or "deck") which winds the tape from one reel to
the other causing it to move past a read/write head. Early tape had seven
parallel tracks of data along the length of the tape allowing six bit characters
plus parity written across the tape. A typical recording density was 556
characters per inch. The tape had reflective marks near its end which
signaled beginning of tape (BOT) and end of tape (EOT) to the hardware.
Data is written to tape in blocks with inter-block gaps between them. Each
block is typically written in a single operation with the tape running
continuously during the write. The larger the block the larger the data
buffer required in order to supply or receive the data written to or read from
the tape. The smaller the block the more tape is wasted as inter-block gaps.
Several logical records may be combined into one physical block to reduce
wastage. Finding a certain block on the tape generally involved reading
sequentially from the beginning, in contrast to magnetic disks. Tape is not
suitable for random access. The exception to this is that some systems
allow tape marks to be written which can be detected while winding the tape
forward or rewinding it at high speed. These are typically used to separate
logical files on a tape. Most tape drives now include some kind of data
compression.

Sikkim Manipal University Page No.: 194


Data and File Structures Unit 12

Figure 12.1: Magnetic tape

12.2.2 Magnetic drums


A magnetic drum, is a direct-access or random-access storage device, also
referred to as drum. is a metal cylinder coated with magnetic iron-oxide
material on which data and programs can be stored. Magnetic drums were
once used as a primary storage device but have since been implemented as
auxiliary storage devices.
The tracks on a magnetic drum are assigned to channels located around the
circumference of the drum, forming adjacent circular bands that wind around
the drum. A single drum can have up to 200 tracks. As the drum rotates at a
speed of up to 3,000 rpm, the device's read/write heads deposit magnetized
spots on the drum during the write operation and sense these spots during a
read operation. This action is similar to that of a magnetic tape or disk drive.
Unlike some disk packs, the magnetic drum cannot be physically removed.
The drum is permanently mounted in the device. Magnetic drums are able to
retrieve data at a quicker rate than tape or disk devices but are not able to
store as much data as either of them.
A magnetic drum differs from a magnetic disk in that the tracks in which the
data is stored are assigned to channels located around the circumference of
the drum as shown in figure 12.2. That is, the channels form circular bands
around the drum. The coded representation of data in figure 12.2 is similar
to that used on 9-track magnetic tape, 8-bit code. The basic functions of the
read/write heads are to place magnetized spots (those little binary 0's and
1's) on the drum during a writing operation and to sense these spots during
a reading operation. The read/write heads of a drum perform in a manner
similar to the read/write heads of a magnetic tape unit or disk drive unit.
The tracks on each channel are grouped into sectors. It is almost like the
format used on disk packs when referring to tracks (or cylinders) and

Sikkim Manipal University Page No.: 195


Data and File Structures Unit 12

sectors. As the drum rotates, the reading or writing occurs when the
specified sector of a given channel passes under the read/write head for
that channel. Some drums are mounted in a horizontal position, while
others are mounted in a vertical position. Another major difference in the
design is the number of read/write heads. Some drums use only one
read/write head, which services all channels on the drum. In this case, the
head moves back and forth (or up and down) over the surface of the drum
as required. Other drums, using multiple read/write heads, have one
principal advantage over drums with the single-head type. Since one
read/write head is assigned to each channel, no read/write head movement
is required. That is, the time required for head positioning is zero. The only
significant time required when reading or writing is the rotational delay that
occurs in reaching a desired record location.

Figure 12.2: Magnetic drum

12.2.3 Magnetic disks


Magnetic disks are generally termed as secondary storage for Computer
systems. They are used to temporarily hold data that is not immediately
required for computer operations and to store programs that are not
currently being executed. Through the years, magnetic disk data capacities
have increased at tremendous rates. The first fixed disk drives had a
capacity of just 5 megabytes. Today, fixed disk capacities are approaching
several gigabytes. The same holds true for floppy disk drives. The original
8-inch floppy was a single-sided disk with a total capacity of 180 kilobytes.
Today we have 3.5-inch floppy disks with a capacity of over 1.4 megabytes.

Sikkim Manipal University Page No.: 196


Data and File Structures Unit 12

Also, there are disk file units with removable disk packs that have capacities
of several gigabytes. Disk file units are used with mainframe computer
systems with large databases to speed up access times and to provide
flexibility to system configuration.
Types of Disks:
As mentioned previously, there are currently two types of disks: the hard
disk and the floppy disk or diskette.
Hard disks: These are divided into two groups, the disk packs used with
disk file systems and the fixed disks.
 Disk packs: Disk packs contain large (usually 14”) platters. They are
packaged in vertical stacks of up to 16 disks. Each disk surface is
coated with a magnetic medium and can be used for data storage,
although the top and bottom surfaces of the pack are usually used as
protective surfaces. Disk packs are easily removed from the drive
system. They have very large capacities and can store from 500
megabytes to several gigabytes. Disk cartridges are another form of disk
pack with the heads and head actuator assemblies contained
within a sealed cartridge. Since the disk pack is never removed from the
cartridge, disk cartridges suffer less contamination problems from dust
and dirt than standard disk packs.
 Fixed disks: Fixed disks are small sealed units that contain one or more
disk platters. Fixed disks are known by several terms, such as
Winchester drive, hard drive, or fixed disk. Fixed disks are used in
minicomputers and personal computers. They can also be adapted for
use in mainframe computers instead of having separate disk file units.
Floppy disks: Floppy disks come in several sizes and densities. They are
called floppy disks because the magnetic coating is placed on a thin flexible
polyester film base. The 8-Inch floppy disk was the first disk widely used for
commercial purposes. It is available as both single- or double-sided and
single- or double density. The 8-inch disk is quickly becoming obsolete.
The 3.5 Inch floppy disk is the current choices these disks are also used
with personal computers and minicomputers. These smaller disks have data
capacities of 720K for double-density disks and 1.44M for high-density disks.

Sikkim Manipal University Page No.: 197


Data and File Structures Unit 12

Organizing data on disk


Before data can be stored on a magnetic disk, the disk must first be
divided into numbered areas so that data can be easily retrieved. Dividing
the disk is done so that data can be easily written and retrieved, which is
known as formatting the disk. The format program divides each data
surface into tracks and sectors.
Tracks: Concentric rings, called tracks, are written on the disk during the
formatting process. Floppy disks have 40 or 80 tracks per side. Fixed disks
and disk packs can have from 300 to over 1,000 tracks per side. Figure 12.3
shows an example of how tracks are written on a disk surface. Each track is
assigned a number. The outermost track on a disk is assigned number 00.
The innermost track is assigned the highest consecutive number.
Sectors: Each track is divided into sectors. Sectors are numbered divisions
of the tracks designed to make data storage more manageable. Without
sectors, each track would hold more than 4,500 bytes of information and
small files would use an entire track.

Figure 12.3: Magnetic disk

12.2.4 Mass storage devices


Mass storage refers to various techniques and devices for storing large
amounts of data. The earliest storage devices were punched paper cards,
but modern mass storage devices include all types of disk drives and tape
drives. Mass storage is distinct from memory, which refers to temporary
storage areas within the computer. Unlike main memory, mass storage
devices retain data even when the computer is turned off. Mass storage

Sikkim Manipal University Page No.: 198


Data and File Structures Unit 12

devices are characterized by sustainable transfer speed, seek time, cost


and capacity.
We will discuss now some of the uses of mass storage devices. Mass
storage devices used in desktop and most server computers typically have
their data organized in a file system. The choice of file system is often
important in maximizing the performance of the device: general purpose file
systems) tend to do poorly on slow-seeking optical storage such as compact
discs. Some relational databases can also be deployed on mass storage
devices without an intermediate file system or storage manager. Oracle and
MYSQL, for example, can store table data directly on raw block devices. On
removable media, archive formats tar achieves on magnetic tape, which
pack file data end-to-end) are sometimes used instead of file systems
because they are more portable and simpler to stream. On embedded
computers, it is common to memory map the contents of a mass storage
device (usually ROM or flash memory) so that its contents can be traversed
as in-memory data structures or executed directly by programs.
Types of mass storages are:
Floppy disks : Relatively slow and have a small capacity, but they are
portable, inexpensive, and universal.
Hard disks : Very fast and with more capacity than floppy disks, but also
more expensive. Some hard disk systems are portable (removable
cartridges), but most are not.
Optical disks : Unlike floppy and hard disks, which use electromagnetism
to encode data, optical disk systems use a laser to read and write data.
Optical disks have very large storage capacity, but they are not as fast as
hard disks. In addition, the inexpensive optical disk drives are read-only.
Read/write varieties are expensive.
Tapes: Relatively inexpensive and can have very large storage capacities,
but they do not permit random access of data.
12.2.5 Intermediate storage devices
A new development in the area of external storage devices is the electronic
disk, is named “electronic” because it provides fast access times without
mechanical movement. It is an intermediate form of storage because it is
being developed to fill the gap between main memory and direct-access

Sikkim Manipal University Page No.: 199


Data and File Structures Unit 12

memory. This is achieved by having lower cost than main memory and a
lower access time than is currently available with external storage devices.
The devices that currently show the greatest potential for being an
acceptable electronic disk are charge-coupled devices. They are based on
semi conductor technology, but their cost per bit should be one third that of
main memory. The average access time they provide is 60 micro seconds.
Self Assessment Questions
1. Magnetic tapes, magnetic disks, optical disks are all the kind of
_______________ storage devices.
2. The typical recording density of an old magnetic tape was ______
characters per inch.
3. The magnetic drum is another example of a ________________ storage
device.
4. Mass storage refers to various techniques and devices for storing
__________ amounts of data.

12.3 Introduction to File Organization


In our daily life, huge amount of data has to be collected and processed, so
it is very difficult to handle it. But this can be handled fast and easily by
using files. File organization is the methodology which is applied to
structured computer files. Files contain computer records which can be
documents or information which is stored in a certain way for later retrieval.
File organization refers primarily to the logical arrangement of data (which
can itself be organized in a system of records with correlation between the
fields/columns) in a file system. It should not be confused with the physical
storage of the file.
Different application requires a variety of record types and file structure; one
basic distinction is between fixed and variable length records. A fixed length
record has all the field sizes and a number of fields fixed or known in
advance whereas in the variable length record type the number of fields is
not specified in advance. The variable length record makes programming
and file design a complex one. The only way out be to break this variable-
length record in to several fixed length records and identify them as a
header and trailer records (which may be a variable).

Sikkim Manipal University Page No.: 200


Data and File Structures Unit 12

A file should be organized in such a way that the records are always
available for processing with no delay. This should be done in line with the
activity and volatility of the information. Make sure the following conditions
are fulfilled.
1) Rapid access to a record or a number of records which are related to
each other.
2) The Adding, modification, or deletion of records.
3) Efficiency of storage and retrieval of records.
4) Redundancy, being the method of ensuring data integrity.
Field
A field is a meaningful collection of related characters. It is the smallest
logical data entity that is treated as a single unit in data processing.
Record
Fields are generally grouped together to form a record. A record is a
collection of related fields that are treated as a single unit. A record can be
classified as:
Logical record: A logical record is a group of related information uniquely
identifiable and treated as a unit. The data record is usually considered to
be a group item comprising of several related items.
Physical record: A physical record is a physical unit of information whose
size and record mode is convenient for a particular input or output device for
the storage of data. The physical and the logical records are related to each
other in many possible ways. A physical record may consist of a portion of
the logical records and may consist of a mixture of several complete and
partial logical record.
Label record: Label records are normally used for files that are stored on
magnetic tape or direct access devices. The record usually contains the
information relative to the file. Card files do not contain any label records.
Generally label records are written at the end or at the beginning of a file.
File
A file is a collection of number of related records that may have same or
varying length stored in a particular area on the disk. There are two types of
access on the files, they are sequential access files and random access
files. In sequential access files, the data or text can be stored or read back

Sikkim Manipal University Page No.: 201


Data and File Structures Unit 12

sequentially. In the random file access files, data can be stored and
accessed randomly. Also files can be divided into two types, one is a data
file and other is a program file. Two types of operations can be done on data
i.e. data can be transferred between the console and the program and
further can be transferred between program and a disk file.

12.4 Sequential Files


This is the most common structure for large files; here all the records have
the same size and same field format, with the fields having fixed size as
well. The records are sorted in the file according to the content of a field of a
scalar type, called “key”. However, adding and deleting records to this kind
of is a tricky process, the logical sequence of record typically matches their
physical layout to the media storage, so to ease file navigation, hence
adding a record and maintaining the key order requires a reorganization if
the whole file. The usual solution to make use of a “log file”(also called
“transaction file”), structured as a pile, to perform this kind of modification,
and periodically perform a batch update on the master file. In a sequential
file, records are arranged in the ascending or descending order of
chronological order of a key field which may be numeric (such as customer's
name), or both (a file of vehicle license including both letters of alphabet and
numerals). Since the records are ordered by the key field, there is no
storage location identification. Sequential file organization is particularly
suited to such applications are payroll in which the file is to be processed
entirely, i. e, each and every record is processed in the same setup.
12.4.1 Structure of sequential files
All external devices support a sequential file organization some devices by
their nature supports only sequential file. For example, Information stored in
magnetic tape as a continuous series of records along the length of the
tape. In order to reach a particular record we need to access all previous
records in the file. Magnetic disks and drums supports both direct and
sequential access. While storing sequential file in a magnetic disk or drum it
will be stored in a adjacent locations on the track. If the file size is larger
than the track then the records are stored in adjacent tracks.
Operations performed on sequential files vary with the storage devices,
which we are using. File on the magnetic tape can be opened either for
input or output but not both at a time. Now we are going to discuss how
Sikkim Manipal University Page No.: 202
Data and File Structures Unit 12

information from a file is transmitted to a user program and vice verse.


Execution of a read or write instruction in a high level programming
language corresponding a logical record follows the structure specified in
the parameter list of an instruction. For example
READ FILE (MASTER) INTO (STUDENT) where STUDENT is specified
with structure as: DECLARE STUDENT
NAME
SUR NAME CHAR(20)
INITIAL CHAR(6)
SCORE FIXED(5,2)
PERCENTAGE FIXED(4);
Each the read instruction is executed the next record from the MASTER
sequential file is moved in to the program area and assigned to the structure
STUDENT. Where ever the read and write operation executed for a
particular storage device, a block of logical record is transferred. Read or
write commands issued for a particular device is resolved by using a buffer
between external storage and data area of program. A buffer is a section of
main memory equal to size of a block of logical records.
12.4.2 Processing sequential files
Sequential processing is the access of records one after other in the
physical order in which they appear on the file. Sequential processes are
more effective if high percentage of the records need to be processed. We
will see some of the notations that are used while processing the file.
Following are the algorithmic notations for opening a file in a different mode.
Open <file name> file for input
Open <file name> file for output
Open <file name> file for update
Input specifies the records may be read only, output specifies that records
may be written only and update specifies that the records can be read and
possibly written at the end of the file. When the processing of a file is
complete the buffer space can be deallocated by writing
Close <file name> file.
A Program cannot open a file that is currently open for writing by another
program, until that program closes the file. But file allows several users to
read from a file concurrently. The object of the read statement or the source

Sikkim Manipal University Page No.: 203


Data and File Structures Unit 12

of the write statement should be the identifiers of a variable or structure that


corresponds to the records in the file.
Read from file name <file name> file into <record name>
Write <record name> on <file name> file.
Rewrite <record name> on <file name> file.
In the above notation record name is used as variable or structure.
Major focuses of sequential processing of sequential files are:
1) Sequential processing is most advantageous if a large number of
transactions can be batched to form a single “run” on the file.
2) A new file should be created if there are any additions and a significant
number of deletions requested.
3) Quick transactions time cannot be expected for a transaction or a batch
of transactions.
4) The requirement that the records in a sequential file be ordered by a
particular key is not essential if the file is being scanned to perform the
same operation on every record.
Self Assessment Questions
5. File organization is the methodology which is applied to
_____________ computer files.
6. Sequential processes are more effective if ______ percentage of the
records need to be processed
7. Magnetic disks and drums supports both ________ and sequential
access.
8. File organization refers primarily to the logical arrangement of data in a
____________.

12.5 Indexed Sequential Files


In situations where we want to access a record directly without scanning all
of the records, then we use indexed file organization. In this technique we
use two separate files to store records.
 Master file
 Index file
Master file: It is that table which contains actual data.
Index file: An index file is a table or other data structure that holds
addresses of other individual records of a file which contain only two data

Sikkim Manipal University Page No.: 204


Data and File Structures Unit 12

fields called Primary key and Storage address. Primary key uniquely
identifies each record and storage address show the secondary storage
address where each record is stored.
In a sequential indexed organization records are stored sequentially by
primary key value and a simple index that normally points to (address to) a
group of records. Sequential indexed is faster than sequential organization,
because here first we access an index file and find the address of record
where it is stored, then we directly access it by visiting that address. Master
file contains the data which is sequentially stored.
12.5.1 Structure of indexed sequential files
If we are searching for a record with a given primary key value and if the
sequential file is sorted on the primary key, we can reduce the time by using
a binary search on the blocks of a file. If we have an index on the primary
key, we can do the search even faster. Furthermore, indexes can also
speed up the search for a record based on values other than the primary
key. An index is an auxiliary structure for a file that consists of an ordered
sequence of value-pointer pairs, or, more generally, an ordered sequence of
value-tuple/pointer-list pairs.
A value-pointer pair, for example, might be a customer-ID value and a disk-
address pointer that points to the block that includes the customer-ID value.
A sequence of these pairs ordered by customer ID would be an index. More
generally, the value can be a value-tuple because we may be searching on
a composite key such as Name, Address or we may be searching on
several attribute values such as Customer ID and Date to find an order.
The general idea of an index is simple, but there are a number of factors to
consider. One is whether the index itself is stored on disk or in main
memory. If it is on disk, we must consider the number of disk accesses
required to read the index into main memory. The number of disk accesses
depends on how large the index is. If it is particularly large, we may organize
it as a two or three-level, or in general as an n-level, index tree so that we
need not read all the blocks of the index to find the index value we are
seeking. Pointers in the upper-level blocks of an index tree point to index
blocks, and pointers in the leaf-level blocks point to file blocks. When the
values in the index pairs are primary keys, we call an index a primary-key
index or sometimes just a primary index. When the values are not primary

Sikkim Manipal University Page No.: 205


Data and File Structures Unit 12

keys, we call an index a secondary-key index or often just a secondary


index.
A sequential file sorted on its primary key, along with a sparse index on the
primary key, is called an indexed sequential file. Figure 12.4 shows an
example of an indexed sequential file organization for our Customer relation.
The index holds the CustomerID value of the first record in each block. We
can use the index to find a record with a given customer ID – say b12 – as
follows. First, we do a binary search on the index to find that b12 lies
between a92 and b22. The record, if it exists, must therefore be in the
second block. Hence, we make one access to the disk to retrieve the
second block and then do a binary search on the records of the second
block to retrieve the record. If we fail to find it, we need not look elsewhere,
and we can report that the record does not exist.

Figure 12.4: Indexed sequential file

Insertions are more complicated than deletions, suppose we wish to insert a


record into an indexed sequential file we need to find an appropriate block to
place the record. Even if the space is not found exactly the right place we
can reorder the records in a block so that we can retain the sort order. If no
block is empty to place the record we have to use overflow bucket which
holds all the overflow records of the indexed sequential file. There is no
guarantee that the overflow blocks are physically next to each other.
Records in overflow block will not be sorted order because it is scattered
across. So searching in overflow block will happen in the sequential way,
which leads to time consuming. An overflow bucket can become long with
Sikkim Manipal University Page No.: 206
Data and File Structures Unit 12

respect to the size of an indexed sequential file. The longer an overflow


bucket becomes, the more an indexed sequential file degenerates into an
unordered file. Periodic reorganization, which rebuilds the indexed
sequential file from scratch, may be necessary to keep the file efficient.
12.5.2 Processing indexed sequential files
Indexed sequential file is identified in PL/I program through a declaration
statement of the form
DECLARE MASTER FILE RECORD KEYED ENV(INDEXED)
OPEN FILE(MASTER) SEQUENTIAL OUTPUT
Remember all the indexed sequential files must be created sequentially and
thereafter can be accessed directly or sequentially. We can express the
user level commands required for the direct processing of an indexed
sequential file in the following six steps.
1. Open MASTER file for update and read TRAN_REC
2. If TRANS = „ADD‟
then write INFO of TRAN_REC
with key of TRAN_REC on MASTER file
Exit
3. Read from MASTER file into MTR_REC with key of
TRAN_REC
4. If TRANS = „ALTER‟
then rewrite INFO of TRAN_REC
with key of TRAN_REC on MASTER file
Exit
5. If TRANS = „DELETE‟
then Delete from MASTER file the record with
KEY of TRAN_REC
Exit
6. If TRANS ≠ „READ‟
then print “ILLEGAL TRANSACTION”
Exit

Sikkim Manipal University Page No.: 207


Data and File Structures Unit 12

Self Assessment Questions


9. Indexed file organization uses two separate files to store records, a
Master file and an ______________.
10. ______________ is that table which contains actual data.
11. The ____________ holds all the overflow records of the indexed
sequential file.
12. Periodic reorganization rebuilds the indexed sequential file from
scratch, to keep the file __________________.

12.6 Direct Files


A direct file is one whose records are stored in such a way that access to
any one of them is direct, that is without going through a lot of records to get
one particular record. Records in a Direct File are not placed one after the
other – rather a location for the file is set aside on disk and then the records
are written to any location that is free within this area. A technique known as
“hashing” uses a Key to produce the location on the disk for each of the
records, and through this individual records can be accessed directly.
12.6.1 Structure of direct files
Direct file organization offers an effective way to organize data when there,
is a need to access individual records directly. To access record directly( or
random access) a relationship is used to translate the key value into a
physical address. This is called the mapping function R.R (key value)
address.
Hashing is a useful searching technique, which can be used for
implementing indexes. The main motivation for hashing is improving
searching time. The idea of hashing is to discover the location of a key by
simply examining the key. For that we need to design a hash function. A
hash function is a function h(k) that transforms a key into an address. An
address space is chosen before hand. For example we may decide the file
will have 1,000 available addresses. If we have set of possible keys varies
from 0 to 999 that is h: U { 0, 1,.... 999}

Sikkim Manipal University Page No.: 208


Data and File Structures Unit 12

Example:
Name Ascii Code for 2 letters Product Address
BALL 66 65 66 X 65 = 4290 290
LOWELL 76 79 76 X 79 = 6004 004
TREE 84 82 84 X 82 = 6888 888

LOWELL 000
.
.
BALL h(n) 004 LOWELL
.
290 BALL
TREE .
.
888 TREE
.
999

There is no obvious connection between the key and the location. Two
different keys may be sent to the same address generating a collision. For
example names like LOWELL and OLIVER will result to the same location.
These keys are called as synonyms. We can reduce the collisions by
selecting the good hash function. Use extra memory that is increase the size
of the address space. Put more than one record at a single address using
the bucket concept.
Direct files are stored on DASD (Direct Access Storage Device). A
calculation if performed on the key value to get an address. This address
calculation technique is often termed as hashing. The calculation applied is
called a hash function. Here we discuss a very commonly used hash
function called Division – Reminder
Division-Remainder hashing
According to this method, key value is divided by an appropriate number,
generally a prime number, and the division of remainder is used as the
address for the record. The choice of appropriate divisor may not be so

Sikkim Manipal University Page No.: 209


Data and File Structures Unit 12

simple. If it is known that the file is to contain n records, then we must,


assuming that only one record can be stored a given address, have
divisor n.
Also we may have a very large key space as compared to the address
space. Key space refers to all the possible key values. The address space
possibly may not match the actual number of key values in the file, the size
of the key space; therefore a one mapping may not be there. That is the
address calculated may not be unique which is called as collision.
R(K1) = R(K2) but K1 ≠ K2 Two unequal keys have been calculated to have
the same address.
Hashing with buckets is variation of hashed files in which more than one
record/key is stored per hash address. In hash table one bucket holds the
block of records to one address.

000
.
.
004 LOWELL
. OLIVER
290 BALL
.
.
888 TREE
.
999

You can observe in the above table LOWELL and OLIVER leads to same
address are handled with the help of bucket. But there are some
implementation issues with bucket that it should contain a counter that
keeps track of the number of records stored in it and empty slots in a bucket
need to be marked.
12.6.2 Processing direct files
Direct files are primarily processed directly, that is a key is mapped to an
address, and depending on the nature of the file transaction, a record is
created, deleted, updated or accessed at that address or possibly at some
subsequent address if a collision takes place. Subsequent address is

Sikkim Manipal University Page No.: 210


Data and File Structures Unit 12

determined by the overflow handling technique which is adopted. The


following steps explain how we can search a key x from the primary bucket.
Repeat for j=1,2 ... m
If x = kij
then R  bij
exit
else if kij < 0
then exit unsuccessfully

The required record is assigned to R if it is found in bucket b i otherwise the


process exists unsuccessfully. Now we will see in the case of unsuccessful
result how the algorithm proceeds with search in over flow chain.
Repeat while true
If P = NULL
then exit unsuccessfully
if KEY(P) = x
then R OR(P)
exit
else PLINK(P)

Here the successive node of the overflow chain is examined until the record
is found or the end of the list is encountered.
Self Assessment Questions
13. To access record directly, a relationship is used to translate the key
value into a __________________.
14. _________________ is a useful searching technique, which can be
used for implementing indexes.
15. In Hashing with buckets one bucket holds the block of records to one
___________________________ address.
16. Direct files are stored on ____________________________.

12.7 Summary
We started this unit with the discussion of external storage devices and its
different types and you also learnt about one more important topic under this

Sikkim Manipal University Page No.: 211


Data and File Structures Unit 12

unit, which is file structure. A file system is a method of storing and


organizing computer files and their data. Essentially, it organizes these files
into a database for the storage, organization, manipulation, and retrieval by
the computer's operating system. File systems are used on data storage
devices such as hard disks or CD-ROMs to maintain the physical location of
the [Link] discussed three different file structures in this unit which are:
sequential, indexed sequential and direct files. Sequential file structure is a
common and popular structure. Indexed sequential files are used when you
have huge amount of data to manipulate and the direct files are used when
you want to manipulate the data in the fastest way.

12.8 Terminal Questions


1. What do you mean by external storage device? Explain its various
types.
2. Explain sequential file structure
3. Describe the structure and processing of indexed file.
4. What do you mean by direct file? Discuss its structure.

12.9 Answers
Self Assessment Questions
1. External
2. 556
3. direct-access
4. large
5. structured
6. high
7. direct
8. file system
9. Index file
10. 10 Master File
11. overflow bucket
12. efficient
13. physical address
14. Hashing
15. address
16. DASD (Direct Access Storage Device)

Sikkim Manipal University Page No.: 212


Data and File Structures Unit 12

Terminal Questions
1. Some instances programs can also reside in storage units which do not
belong to main memory. (Refer section 12.2 for detail)
2. This is the most common structure for large files; here all the records
have the same size and same field format. (Refer section 12.4 for detail)
3. In situations where we want to access a record directly without scanning
all of the records, then we use indexed file organization. (Refer section
12.5 for detail)
4. A direct file is one whose records are stored in such a way that access
to any one of them is direct. (Refer section 12.6 for detail)

Sikkim Manipal University Page No.: 213

You might also like