0% found this document useful (0 votes)
71 views67 pages

Module 1 Part2

This document discusses fundamental concepts of file structures and managing files of records. It covers stream files, field structures, record structures, and using classes to manipulate buffers. Some key points include: - Stream files store data as a continuous stream without structure, making it hard to retrieve organized records. Field and record structures add organization. - Common field structure methods include fixed-length fields, length indicators, and delimiters. Record structures group related fields and use similar methods like length indicators. - Classes can represent buffers to read, write and unpack variable-length records that use length indicators or delimiters. Fixed-length buffers use simpler methods. Proper file structures are important for organizing, reading and writing data in

Uploaded by

Chirag Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views67 pages

Module 1 Part2

This document discusses fundamental concepts of file structures and managing files of records. It covers stream files, field structures, record structures, and using classes to manipulate buffers. Some key points include: - Stream files store data as a continuous stream without structure, making it hard to retrieve organized records. Field and record structures add organization. - Common field structure methods include fixed-length fields, length indicators, and delimiters. Record structures group related fields and use similar methods like length indicators. - Classes can represent buffers to read, write and unpack variable-length records that use length indicators or delimiters. Fixed-length buffers use simpler methods. Proper file structures are important for organizing, reading and writing data in

Uploaded by

Chirag Srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Module-1: chapter4 & 5

Fundamental File Structure


Concepts & Managing Files of
Records

1
Outline I: Fundamental File
Structure Concepts
• Stream Files
• Field Structures
• Reading a Stream of Fields
• Record Structures
• Record Structures that use a length
indicator

2
Outline II: Managing Files of
Records
• Record Access
• More About Record Structures
• File Access and File Organization
• More Complex File Organization and
Access
• Portability and Standardization

3
Field and Record Organization:
Overview
• When we deal with file structures :
– Data to be persistent
– i.e. data read by a file/ written by another file data
should be same.
• The basic logical unit of data is the field
which contains a single data value.
• Fields are organized into aggregates, either as
many copies of a single field (an array) or as
a list of different fields (a record).
4
Field and Record Organization:
Overview
• When a record is stored in memory, we
refer to it as an object and refer to its
fields as members.
• Here we will study the ways that objects
can be represented as records in files.

5
Stream Files
• Here we deal with how data is handled
in streams.
• For E.g.

6
Stream Files

• If our input is as follows

Input 1 Input 2
•Mary Ames •Alan Mason
•123 Maple •90 Eastgate
•S llwater, OK 74075 •Ada, OK 74820

7
Stream Files
• In Stream Files, the information is written as a
stream of bytes containing no added
information as follows:

AmesMary123 MapleStillwaterOK74075MasonAlan90 EastgateAdaOK74820

• Problem: There is no way to get the


information
back in the organized record format.
8
Field Structures
• Due to the above problem we should use
some types of structures.
• There are many ways of adding structure to
files to maintain the identity of fields:
– Force the field into a predictable length
– Begin each field with a length indicator
– Place a delimiter at the end of each field to
separate from next field.
– Use a “keyword = value” expression to identify
each field and its content.
9
Field Structures
• Method 1:Force the field into a predictable length

The last byte


is used for
‘\0’

• Each field is fixed length specified in the above


class/ structure.
• In above class one record =>10+10+15+15+2+9=>61
bytes
10
Field Structures
Method 1 Contd…
• Result looks as follows:

• Problems:
– Wastage of space
• Ames requires 4 bytes but we use 10 bytes.
– If require more space than allotted.
• Solve these by fixing the lengths to larger space.
. 11
Field Structures
Method 2:Begin each field with a length indicator
• Begin each field with the length of that field
value.
• If length is too long we require more space for
length.
• Looks as follows:

12
Field Structures
Method 3: Place a delimiter at the end of each field to
separate from next field.
• Each field is separated by a delimiter.
• Delimiter can be white space characters like blank,
new line, tab
• The above can be used with in the values like blank
can be used in address.
• Hence we use vertical bar character.

13
Field Structures
Method 3:Use a “keyword = value” expression to
identify each field and its content.

This type of method is self-describing.


A unknown person can also understand the contents.
Use full for identifying missing values.
Overhead for few applications which doesn’t demand
this much information.
14
Reading a Stream of Fields
• A Program can easily read a stream of
fields and output ===>
Output

15
Reading a Stream of Fields
• This time, we do preserve the notion of
fields, but something is missing:
– Rather than a stream of fields
– These should be two records

16
Record Structure I
• A record can be defined as a set of fields that
belong together when the file is viewed in
terms of a higher level of organization.
• Like the notion of a field, a record is another
conceptual tool which needs not exist in the
file in any physical sense.
• Yet, they are an important logical notion
included in the file’s structure.

17
Record Structures II
• Methods for organizing the records of a file
include:
– Requiring that the records be a predictable number of
bytes in length.
– Requiring that the records be a predictable number of
fields in length.
– Beginning each record with a length indicator
consisting of a count of the number of bytes that the
record contains.
– Using a second file to keep track of the beginning byte
address for each record.
– Placing a delimiter at the end of each record to
separate it from the next record.
18
Record Structures II
Method 1:Requiring that the records be a predictable number of
bytes in length.(fixed length not for field it is for record)

Method 2: Requiring that the records be a predictable number of


fields in length.

19
Record Structures II
Method 3:Beginning each record with a length indicator
consisting of a count of the number of bytes that the
record contains.

Method 4:Using a second file to keep track of the


beginning byte address for each record.

20
Record Structures II
Method 5:Placing a delimiter at the end of each
record to separate it from the next record.

21
Record Structures that Use a
Length Indicator
• To known how the record structure are dealt
we will consider length indicator method.
• Implementation:
– Writing the variable-length records to the
file
– Representing the record length
– Reading the variable-length record from the
file.

22
Record Structures that Use a
Length Indicator
Writing the variable-length records to the file:
–If we want to write length of a record to the initial position.
–We need to know the length of a record
–Hence we will read the data to a buffer then identify the length
using strlen function

23
Record Structures that Use a
Length Indicator
Representing the record length:
• 2 byte binary integer
• Convert into character string.
fprintf(file, ’%d’, length); //C stream
stream<<length<<‘ ’; //C++ sream
The above 2 functions inserts the length and places a
space as delimiter.

24
Record Structures that Use a
Length Indicator
Reading the variable-length record from the file:
–Read the records from a file
– records is read into buffer
–Then to object p.
–The value from buffer is read into character string
strbuff.

25
Mixing numbers & characters:
Use a file dump Contd..
• The actual length represented in a file
as a character string is as follows:

• If the data needs to be represented as a


2 byte integer:

26
Mixing numbers & characters:
Use a file dump Contd…

• Finally the data will be viewed in a file


as follows:
• When it is 2 byte representation.

27
Mixing numbers & characters:
Use a file dump
• In UNIX platform the data is dumped as
shown.(od – UNIX command)

28
Using Classes to Manipulate
Buffers
• Buffers mainly depends upon whether
they are:
– Fixed length
– Variable length
• It also depends on:
– Delimiter

29
Using Classes to Manipulate
Buffers-I
• Class with delimiter:

30
Using Classes to Manipulate
Buffers-I
• Pack function of a delimiter:

• Practically the data is packed is as


follows:

31
Using Classes to Manipulate
Buffers-I
• Unpack Function (Fields):

// Next field to be read hence NextByte is initialized

32
Using Classes to Manipulate
Buffers-II
• For Fixed length buffers:

33
Using Classes to Manipulate
Buffers-II
• There is initialize function which will
initializes the fields of the file.

34
Using Inheritance for Record
Buffer Classes
• Here we use Inheritance to remove
duplication of code if same procedures
are used by more classes.
• We have seen classes
– fstream , istream, ostream
– fstream inherits input/output operations
from parent class iostream.
– Which is nothing but inherits istream,
ostream
35
Using Inheritance for Record
Buffer Classes

• They have used multiple inheritance:-


more than one base class.
• Virtual :- ensure that the class ios is
included only once in the hierarchy.
36
Using Inheritance for Record
Buffer Classes
• 2 main classes
– Iostream (basic stream operations)
– fstreambase( to access the OS file
operations)

37
Using Inheritance for Record
Buffer Classes

• Class hierarchy for record buffer


objects

38
Using Inheritance for Record
Buffer Classes
• IOBuffer is the base class
• Protected members- to be used by only
inherited classes

39
Using Inheritance for Record
Buffer Classes
• All methods are declared virtual : allows
subclass for there own implementation.
• =0 (pure virtual class):-
– IOBuffer doesn’t include implementation of
any method.
– No objects can be created.

40
Using Inheritance for Record
Buffer Classes
• Write function of variable length buffer class.
• Tellp() : returns position in the output
sequence.
• Returns the address where it has written.

41
Using Inheritance for Record
Buffer Classes

Here we are checking which function is called.


We are calling DelimFieldBuffer function

42
Assignment-1
• Explain with a program how data is
packed, unpacked with fixed length
records.
• Explain with a program how data is
packed, unpacked with variable length
records.

43
Record Access: Keys

• When looking for an individual record, it is


convenient to identify the record with a key
based on the record’s content (e.g., the Ames
record).
• When we consider to retrieve the record using
key then the key should having following
constraints:
– Canonical form ( rules to define a key)
– uniquely define a record
44
Record Access: Keys
• Rules:
– E.g. if key = AMES
• Then data can be written as Ames / AMES /
ames
• We should design a rule so that what
ever is input :
– It should convert any input to all Caps.

45
Record Access: Keys
• Uniquely key:
– i.e. if there are many records of same
• key : AMES
• To prevent the above:
– Define a primary key
– Which is unique to a record
• We can also create a secondary key in
support to the primary key.
46
Record Access: Keys
• When we choose a primary key we
should be careful as it contains real
data:
• Key should be unchangeable.
• To avoid the above problem we should
not choose data of a record as key
discussed later.

47
Record Access:
Using Sequential Search
• Evaluating Performance of Sequential
Search.
• Improving Sequential Search
Performance with Record Blocking.
• When is Sequential Search Useful?

48
Record Access:
Using Sequential Search
Evaluating Performance of Sequential
Search:
– Best case: 1
– Average case: n/2
– Worst case: n
Sequential search steps:
– Read calls for each record
– To perform read the seek required to read a record.
– E.g.. 10 records=>10 read calls => 10 seek
– Seeking takes more time than read.
49
Record Access:
Using Sequential Search
Improving Sequential Search
Performance with Record Blocking:
•If we have 100 records =>100 read calls
•Hence make a block of records
– E.g. 1 Block => 10 records
– Then 10 read calls => 10 blocks
– Block size will almost be of sector oriented.
– If 1 sector => 512 bytes => 10 records
50
Record Access:
Using Sequential Search
Points of record blocking:
– Searching is still O(n) as no of records are
same.
– Seek time is reduced
– The amount of data transfer is more.
• Even if need to access the first record.
– Too expensive

51
Record Access:
Using Sequential Search
When is Sequential Search Good?
– It is extremely easy to program
– Simple file structures
Mainly depends on:
• Processor speed
Mainly used:
• Tapes
• Lesser number of records

52
Record Access:
UNIX tools for sequential
processing

File structure in UNIX:


•ASCII file:- new line character => record delimiter
White space => field delimiter
•Provides rich no. of tools:- which are sequential
•cat myfile:- contents of my file

•wc(word count):- no. of lines, words, characters


–2 12 76
53
Record Access:
UNIX tools for sequential
processing
• grep (generalized regular expression):-
searches for a pattern
– grep Ada my file:displays as follows

– grep Ada my file | wc


• 1 6 36

54
Direct Access
• How do we know where the beginning of the
required record is?
Ü It may be in an Index (discussed in a different
unit)
Ü We know the relative record number (RRN)
Ü Position of a record relative to begining
Ü E.g. First record=> RRN 0, next record=> RRN 1
and so on

55
Direct Access
• RRN are not useful when working with variable
length-records: the access is still sequential.
• In order to work with RRN we need to work with
fixed-length records.
– If records are of fixed length:
• Using RRN we can calculate ByteOffset
• Byteoffset = n* r n=> no. of bytes
r => RRN no. of a record.
– If fixed length is 512 bytes & RRN=500 then
byteoffset?
56
Record Structure
 Choosing a Record Structure and Record
Length
 Header Records
 Adding Headers to C++ Buffer classes

57
Record Structure
Choosing a Record Structure and Record Length:
•To use RRN no. for direct access:
– First we should fix record length.
– Record length means: size of the field to be fixed
•Two ways to do:
– Fixed length field

– Fixed record length

58
Record Structure
1. Fixed length field approach:
• Simplicity
2. Fixed record length
• More efficient as a fixed amount of space at the end.
In the above 2 methods => 1 identification to be made:
– Differentiate between real data / unused space in
the record.
– The above can be done as follows:
• Record length indicator
• Delimiter
• Count fields
59
Record Structure
Header records:
•General information of a file.
•Header record at the beginning of the file to
hold this information.
•Information in header file:
– Count of no. of records
– Length of data records
– Date and time of the file updated.
– Name of the file
60
Record Structure
• Header record will be self describing
object
• Any to access a file will know about:
– File structures used in the file
– Helps in access of a record
– E.g. header record:

61
Record Structure
• Header record an example:

62
Encapsulating Record I/O
Operation in a single class
• Till now we have done a read/ write
operation :
– Two steps:
• Read/ write to a buffer
• Then buffer to a file
• Here we will use a class that hides
buffer.
• It looks as though we have read/ written
with a file. 63
Encapsulating Record I/O
Operation in a single class
• RecordFile is a class inherits BufferFile
• BufferFile contains functions to read/ write from a
buffer.
• Only we will use this functions.

64
Encapsulating Record I/O
Operation in a single class
• Shows how read/write functions of a
BufferFile is used to perform our task of
reading / writing.

65
File Access and File
Organization: A Summary

• File organization depends on:


– What use you want to make of the
file?
• Since using a file implies:
– File access
– File organization
– Both are linked.
66
File Access and File
Organization: A Summary
• Example:
– Fixed-length records makes direct access easier.
– If the documents have variable lengths, fixed-
length records is not a good solution
– The application determines our choice of both
access and organization.
– Hence we need to determine both access and
organization of a file.

67

You might also like