0% found this document useful (0 votes)
32 views335 pages

Advanced DBMS

Unit 1 of the Advanced Database Management System course discusses the significance, applications, and types of Database Management Systems (DBMS). It outlines the importance of databases in reducing redundancy, ensuring security, and maintaining data integrity, while also categorizing different types of databases such as personal, two-tier, multi-tier, and enterprise applications. The unit emphasizes the role of the Database Administrator (DBA) and provides a comparison between centralized and distributed databases.

Uploaded by

pathaklokesh073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views335 pages

Advanced DBMS

Unit 1 of the Advanced Database Management System course discusses the significance, applications, and types of Database Management Systems (DBMS). It outlines the importance of databases in reducing redundancy, ensuring security, and maintaining data integrity, while also categorizing different types of databases such as personal, two-tier, multi-tier, and enterprise applications. The unit emphasizes the role of the Database Administrator (DBA) and provides a comparison between centralized and distributed databases.

Uploaded by

pathaklokesh073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Advanced Database Management System Unit 1

Unit 1 Comparison between Different Databases


Structure:
1.1 Introduction
Objectives
1.2 Significance of Databases
1.3 Applications of Database System
Personal databases
Two-Tier client/server databases
Multi-tier client/server databases
Enterprise application
1.4 Different Types of DBMS
Based on data model
Based on number of user
Based on number of sites
Based on cost
Based on purpose
1.5 Comparison between Centralised and Distributed Database
1.6 Summary
1.7 Glossary
1.8 Terminal Questions
1.9 Answers

1.1 Introduction
A Database Management System commonly termed as DBMS is computer
software that manages the organization and access to data in a database
and it is employed for management of different databases. DBMS are used
by organizations and people around the world to effectively manage their
valuable data. Here Data refers to the information that is recorded and
stored on computers. A DBMS makes use of database models and
implement them on their networks. The structure of a Database is provided
by the DBMS. It helps in storage and recovery of data. Sometimes, there is
confusion between Database and Database Management System.
Database is essentially a collection of data which is stored in a computer
system in a planned manner. A database model is used for organization of
such data and its organization.

Manipal University of Jaipur B1649 Page No. 1


Advanced Database Management System Unit 1

To support the different needs of people, there exist various types of


databases. For a few thousand rupees, you can buy a DBMS for your
personal computer whereas for large scale enterprises, much more complex
and costly DBMSs are needed. Lots of mainframe based DBMSs are also
leased by organizations.
As there are many types of DBMSs present in the market, you ought to be
familiar with their fundamental features, and also advantages and
disadvantages. In this unit, you will learn about the various types of
databases, their importance and their applications. You will also study about
the advantages and disadvantages of various DBMS.
Objectives:
After studying this unit, you should be able to:
 describe the significance of database
 discuss the applications of database
 elucidate the advantages and disadvantages of different database
management system
 differentiate between DBMS and RDBMS
 compare different types of databases

1.2 Significance of Databases


Generally, data is considered the most significant resource of any
organisation. It is utilized and gathered practically from all over, from
organizations making an attempt to find out the patterns of clients
depending on the usage of credit card usage, to space organizations
making an attempt to gather data from various planets.
As data is a very significant resource, it requires tough, protected, and easily
obtainable software that can collect and utilize it fast. A substantial and a
consistent database is the solution to these requirements.
An organisation comprising a database system usually includes a person
called as Database Administrator (DBA). The operational data’s
responsibility is handled by DBA. A database administrator is responsible for
the following:
 Installation of databases
 Configuration of databases

Manipal University of Jaipur B1649 Page No. 2


Advanced Database Management System Unit 1

 Administration of databases
 Upgrade of databases
 Maintaining databases
 Monitoring databases
A DBA possess technical knowledge in detail. Also DBA has the capability
to recognize and examine the needs of management.
A database is very significant for an organisation mainly because of
following reasons:
1. Reduces redundancy: In databases, storing data in an outmoded
method can be averted. In case of systems which are non-databases,
every application comprises its own individual files which can frequently
cause significant redundancy in accumulated data. Thus this process
results in wasting storage area.
2. Security can be imposed: In DBMS, appropriate security is assured to
the database. Security guidelines are defined by DBMS to use the data.
Thus database can be accessed only by doing suitable login. To access
all parts of information included in the database, there is a need to set
up different types of deterrents for all kinds of access such as insert,
retrieve, delete, and so on.
3. Providing permission for multiple user interfaces: DBMS can offer
simultaneous execution of numerous parts included in database. Here,
DBMS also manages deadlock in addition to other conflicts.
4. Maintaining Integrity: Database’s Integrity signifies that there should
be accurate data. Two entries having some variation that propose to
display the same "information" is considered as an example showing
deficiency of integrity. This may take place because of the redundancy
or inaccurate data. For instance, it may be shown that an employee
works for 400 hours in the week instead of 40, or as relating to a section
that is not available. These types of problems can be prevented by
controlling the database in a centralised manner. Integrity constraints,
(also called business rules), can be defined and applied by means of
DBMS. This is done to check integrity if an update operation is carried
out.

Manipal University of Jaipur B1649 Page No. 3


Advanced Database Management System Unit 1

5. Backup & recovery: DBMS include suitable backup in the situation of


failure. Numerous methods are used for recovering failure in a DBMS.
6. Standards can be imposed: The process of standardising the
representation of data is significant for replacing data or moving data
among systems. Various global standards that are to be imposed are
permitted by DBMS.
7. Providing constant storage space for the objects of database: The
database can be utilised as stable warehouse of application objects,
database as well as structures of database. This signifies that database
can store a complicated object of languages.
8. Application development time decreases: To build up the programs
related to client utility, the effort needed decreases significantly with
databases. Also the time decreases accordingly.
Self Assessment Questions
1. What is a repository of data, intended to assist proficient storage of
data, retrieval and preservation called?
a. DBMS
b. ADBMS
c. Database
d. RDBMS
2. DBMS can provide simultaneous implementation of different portions of
database. (True/False)

1.3 Applications of Database System


Having studied significance of databases, now we will discuss the
applications of database system. As shown in Figure 1.1, people can have
an interaction with data available in database through various methods.
Clients can straight forwardly have an interaction with database by means of
user interface which is offered by DBMS. This way, the commands can be
issued against database by clients. These commands are known as queries.
The clients can inspect the outcomes or even collect and store the results in
Excel or Microsoft Word format.

Manipal University of Jaipur B1649 Page No. 4


Advanced Database Management System Unit 1

Figure 1.1: Database Environment Constituents

Interacting with database in this way is known as ad-hoc querying. For this,
knowledge of query language is needed on the client part. As many clients
do not owns this degree of knowledge, making use of application programs
is another and more general method used for using the database.
There are two main constituents available in an application program. A
graphical user interface is utilised to consent the request of the clients such
as to enter, modify or delete data. Also it offers a method for showing the
data recovered from database. Business logic includes programming logic
which is essential to execute the requirements of the clients. Here the
machine which is utilised for running user interface is considered as client.
Also the machine which is utilised for running DBMS and includes a
database is considered as the server of a database. It is to be recognised
that the applications in addition to the database are not required to exist on
same workstation.
For recognising the database applications in a simple and improved way,
they can be classified into various categories. These categories are chosen

Manipal University of Jaipur B1649 Page No. 5


Advanced Database Management System Unit 1

according to client’s location in addition to database software. We have


defined these categories as below:
 Personal databases
 Two-tier databases
 Multi-tier databases
 Enterprise application
Now, these categories are discussed in detail in the following sections
1.3.1 Personal databases
These databases are prepared for assisting one client. These databases
offers the client with capability to handle (that is, storing, deleting, updating
and retrieving) some quantity of data proficiently. These databases are
mainly utilised on the following:
 personal computers
 laptops
 PDAs smart phones
Nowadays these databases are extensively used as they can significantly
enhance individual productivity. However some limitations are there. A risk
is there and also the users cannot easily distribute the data between them.
For instance, a marketing manager needs a combined view of client
contacts however he cannot get this quickly from databases of all marketing
persons. Thus personal databases are mostly utilised by small-scale
organisations where the requirement for sharing data with someone else is
about negligible.
1.3.2 Two-Tier client/server databases
Personal database is restricted to a single user only. However there are
numerous circumstances when the clients or workgroup want the data to be
distributed or shared between them. For this, the most frequent method
used for sharing data is the creation of two-tier client/server databases
which we have shown in Figure 1.2 as below.

Manipal University of Jaipur B1649 Page No. 6


Advanced Database Management System Unit 1

Figure 1.2: Two-tier Databases

All the links in a workgroup are the computer workstations. These


computers are linked to each other through network which can be either
wired LAN or wireless LAN. Mostly, all computers uses a particular
application (client) providing the user an interface and business logic by
which manipulation of data is performed. The database in addition to DBMS
is stored on the workgroup’s central device and workgroup have permission
to use shared data.
Different members of group, for example, project manager, programmer,
etc., may have different point of views regarding the database which is
shared among them. The main objection to the databases of personal
computers can be overcome by this arrangement. The objection is that
users cannot share the data easily. However, various concerns of data
management that does not exist with the databases used by one user are
established by this arrangement. These concerns include data security in

Manipal University of Jaipur B1649 Page No. 7


Advanced Database Management System Unit 1

addition to data integrity in the case when numerous users try to alter as
well as update the data simultaneously.
1.3.3 Multi-tier client/server databases
Architecture of two-tier database includes one drawback. That is, the total
functionality which is required to be coded in the application available on the
client’s system can be quite large. This is because both business logic as
well as the user interface logics required to be included by it. This signifies
that the computers of clients are required to be sufficiently powerful for
handling programmed application.
Other disadvantage is that whenever some change is done to either the
business logic or user interface, the application comprised by every
computer of client is to be modified.
To avoid these disadvantages, most current applications that are required to
assist numerous users are built by means of the multi-tier architecture
concept. These applications, in most of the companies, are intended to
assist a branch or a division which is usually big as compared to workgroup
(usually among 25 and 100 individuals). We have shown an example based
on a company comprising several applications of multi-tier database in
Figure 1.3 below:

Figure 1.3: Architecture of Three-Tiered Client/Server Database

Manipal University of Jaipur B1649 Page No. 8


Advanced Database Management System Unit 1

In case of multi-tiered databases, personal computers of clients can make


access to user interface. To carry out the transactions of business which are
requested by clients, the required business logic is included in the
application layer or Web server layer. Application layer performs an
interaction with database server.
This architecture assists in the development of the database from the
information systems modules that concentrate son either business logic,
presentation logic, or both. This permits us to make the performance as well
as maintenance level of database and application better.
1.3.4 Enterprise application
This database is designed for the utilisation of whole organisation. It
supports organisation-wide procedures as well as decision making. At times,
particularly for medium to huge organisations, using database of single
enterprise is not considered as appropriate. This is due to the reasons given
below:
 the intricacies in performance for huge databases,
 different users having various requirements,
 the problem in accomplishing data’s single definition for every client of
database
However the requirements of information from various branches and
divisions are successfully supported by an enterprise database. The
enterprise databases progress has effected in the following developments:
 ERP (Enterprise resource planning ) systems
 Data warehousing.
Enterprise Resource Planning (ERP): There is an important connection
between ERP systems and database systems. Databases are utilised to
store the incorporated data needed by the applications of ERP. Other
particular applications, such as CRM (customer relationship management)
systems as well as SCM (supply chain management) systems also rely on
databases for data warehousing.
By means of Data warehouses, users can work with the data used in the
past. This is done so as to recognise pattern, trends, and solutions to the
questions regarding planned business. Thus Data warehouses need data
from every branch of the organisation.

Manipal University of Jaipur B1649 Page No. 9


Advanced Database Management System Unit 1

In the Table 1.1, we have summed up numerous kinds of database


applications, clients for each application, and the sizes of database
applications.
Table 1.1: Database Applications Summary

Activity 1
Illustrate the DBMS application in the industries specified below:
(a) E-commerce Industry
(b) Health Industry
(c) Hotel and Tourism

Self Assessment Questions


3. For sharing data among clients, the most frequent way is the creation
of two-tier client/server databases. (True/False)
4. An application program comprises two components, one being the GUI.
Name the other component.
a) Presentation logic
b) Business logic
c) Message logic
d) User interface logic
5. The requirements of information from various branches as well as
divisions are successfully supported by a _________ database.

1.4 Different Types of DBMS


We can define DBMS (database management system) as one software and
a group of software applications that are used to manage the formation,
preservation, and the utilisation of a database. Establishments are permitted

Manipal University of Jaipur B1649 Page No. 10


Advanced Database Management System Unit 1

to formulate databases for different applications. DBMS provides permission


to several users to make use of same database simultaneously. DBMS
offers resources for handling the access of data and imposing data integrity.
Also it allows database to recover after collapsing and the data is restored
from backup. In addition, DBMS maintains the security of the database.
Servers of database are committed systems that carry the real databases.
Also they execute just the DBMS and the connected software.
We use the numerous criteria to categorize DBMSs. These are given below:
 based on data model
 based on user
 based on sites
 based on cost
 based on purpose
Now we will discuss about numerous types of DBMS based on the criterions
defined above.
1.4.1 Based on data model
For database Systems, various models are used based on data model:
 Hierarchical Model
 Network Model
 Relational Model
 Object-relational Model
 Object Model
This criterion is the one which is most extensively accepted for classification
of database. So now we will study in detail about the DBMS based on these
data model.
(i) Hierarchical data model: Here, we organise data into a structure which
appears as a tree. In this structure, every node comprises one parent
node. Root node is an exception, that is, it does not contain any parent.
This is to note that although a node can comprise numerous child
nodes, but it can comprise just one parent node. A record is represented
by a node available in structure which appears as tree. Every attribute of
a particular record is available below entity type. All entity types are
connected to one another by means of 1: N mapping.

Manipal University of Jaipur B1649 Page No. 11


Advanced Database Management System Unit 1

Advantages
Hierarchical databases include the following advantages:
 Simplicity: It has a simple design process since data (in many practical
circumstances) usually consists of hierarchical relationship. It is thus
easier to see the data organised in this manner.
 Security: These databases can execute the changing level of security
characteristics.
 Database integrity: It is extremely advertised in these systems due to
its inbuilt structure of parent and child.
 Relationship Handling: These databases are considered as extremely
proficient for the relationships of the type 1 : M.
Disadvantages
Hierarchical databases include the following disadvantages:
 Complication in implementation: Implementing Hierarchical database
is quite difficult since it is based on the data’s physical warehousing.
 Trouble in management: It is not easy to manage hierarchical
database. For instance moving a part of data from one position to the
other position needs you to modify every accessing application related
to that data which is complicated to perform.
 Structural reliance: In this database, relationships are defined in a rigid
manner. Thus if a change is performed in any portion of database
structure, then the programs using it would also require change. So the
process of maintaining the database is a very complicated and
monotonous job.
 Difficulty in programming: Programming this type of database is
considered a quite difficult task. To do this, the programmers should
essentially recognise the physical route followed by data items.
 Bad portability: These databases offers bad portability since there is no
standard available.
(ii) Network model: This model was invented by Charles Bachman.
Afterwards, by means of CODASYL (Conference on Data Systems
Languages) association in 1969, it was formulated into a standard
arrangement. This model is considered similar to the hierarchical model.
However this model permits every record to comprise numerous records

Manipal University of Jaipur B1649 Page No. 12


Advanced Database Management System Unit 1

of parent as well as child, thus generating a network structure. We have


shown below an example of this model in Figure 1.4.

Figure 1.4: Network Model

Some network Databases examples that are most commonly used are:
 Turbo IMAGE,
 IDMS,
 RDM Embedded
 RDM Server
Advantages
Network databases include the following advantages:
 Simplicity: This model is easy to design because mostly all instances of
data relationship usually occur as N: M type.
 Relationship handling: It is proficient of handling relationship in a
better manner. The relationship of the type N: M can accommodate
various complicated relationships that are available in real data
relationship situations.
 Flexibility: These databases provide more flexibility as compared to
hierarchical database. This is because here you can navigate data items
in numerous ways. Thus high level of flexibility of data access is
provided.
 Standards: These databases include the development of worldwide
standards.

Manipal University of Jaipur B1649 Page No. 13


Advanced Database Management System Unit 1

Disadvantages
Network databases include the following disadvantages:
 Complication in implementation: Implementing these databases is not
easy.
 Problem in Management: Managing this database is not easy.
 Structural reliance: As access relies on navigational paths which are
available in database at any moment, then you cannot consider
programs autonomous of structures of database. So there is a need to
modify them on the modification of database structure.
Because of these drawbacks, their recognition is quickly lost between clients
and then relational databases replaced them.
(iii) Relational model: This model makes use of a mathematical relation
concept. Thus, you can consider database as group of relations. You
can consider a relation as a table where table’s each row symbolises a
set of associated data values. Here, a table’s row can be addressed as
tuple. Also the header of column can be addressed as attribute.

A relation schema R is indicated by R (A1,...,An) . It includes the name


of the relation R in addition to attributes list indicated by A1,...,An . The
abbreviated form of domain of Ai is indicated by dom ( Ai ) which is the
set of all the possible values that this attribute may take.
Relation schema that illustrates a relation denoted by R is considered as
the relation’s name. The relation’s degree is based on how many
attributes exists in respective relation schema.
Information regarding a person or anything of interest is included in a
record. Data regarding a particular facet of that person or thing is
accumulated by a field.
Advantages
Relational model include the following advantages:
 Simplicity: It is simple as well as easy to handle RDBMS. Clients can
make access to a relation by means of attributes used commonly. Also
they can do modification or manipulation of relations easily.
 Relationship Handling: These databases can handle every type of
relationship.

Manipal University of Jaipur B1649 Page No. 14


Advanced Database Management System Unit 1

 Flexibility: This model possibly provides more flexibility as compared to


other database models.
 Follow every mathematical theory: Based on the idea of relational
algebra it thus follows obeys every mathematical theory.
Disadvantages
Relational model include the following disadvantages:
 Hardware expenses: This type of database requires more influential
hardware systems in addition to the devices of data warehousing in
comparison to its antecedents.
 Awful design occurs from easy design: It is simple to design, apply
and deal with this type of database. Here the user is not required to
recognize the complexities of the data warehousing. However this
process of easy design mostly results in a very badly intended DBMS.
(iv) Object-relational DBMS: This type of database is defined as a DBMS
which is considered analogous to relational database, however with an
OODBMS. Here, classes, objects, in addition to inheritance are straight
forwardly assisted in database schemas as well as query language. Also
it assists the expansion of data model by means of standard data types
along with techniques.
This system spans the gap among abstract methods of data modelling
like ERD (Entity-relationship Diagram) in addition to ORM (Object-
relational Mapping), that frequently make use of:
 classes
 inheritance
 relational databases

The terms defined above do not assist them straightforwardly. It also


spans the gap among relational databases in addition to object-oriented
methods utilized in languages like C++, Java, etc.
ORDBMS permits the developers of software to incorporate their
individual types in addition to methods and these can be used in the
DBMS. This database offers a span among relational as well as object-
oriented DBMS.

Manipal University of Jaipur B1649 Page No. 15


Advanced Database Management System Unit 1

Advantages
Object-Relational databases include the following advantages:
 Flexibility: These databases are considered as flexible but not more
than the flexibility of relational database.
 Reusability: This type of data model occurs from reuse as well as
sharing. DBMS server is permitted to carry out functionalities centrally,
instead of coding it in all applications. If the functionality can be
implanted in server, it is not required to specify it in all applications that
require it. Thus every application is allowed to allocate the functionality.
 Managing relationship: This type of data model allows to utilise the
relationships among data so as to gather related records effortlessly.
 Abstraction: This type of DBMS permits the developers of software to
incorporate their individual types as well as technique that are applied to
them in DBMS. The technology of ORDBMS is intended to permit
developers to raise the degree of abstraction over which the domain of
problem is observed.
Disadvantages
Object-relational database model include the following disadvantages:
 Complication in implementation: Implementing these types of
databases is difficult as well as perplexing in nature.
 Costly: It comprises increased overheads connected with its working.
(v) Object database: This type of database is also known as object oriented
database. In simple words, we can say that Object Database
Management System (ODBMS) is a combination of database and
Object Oriented Programming (OOP) concepts. In this type of DBMS,
the database objects are emerged as the objects of programming
language in various programming languages. Programming language is
extended by means of ODBMS with the following qualities:
 concurrency control
 transparently persistent data
 data recovery
 associative queries, etc.

Some of these types of databases are intended to function intimately


with the programming languages of OODBMS like Java, C++, C#,
Smalltalk, etc. Others possess their personal programming languages.

Manipal University of Jaipur B1649 Page No. 16


Advanced Database Management System Unit 1

OODBMS is usually suggested on the business requirement for


performing well on a complicated data.
Advantages
ODBMS include the following advantages:
 Relationship Handling: These databases can handle every type of
relationship.
 Better as compared to other models: The main reason why ODBMS
is better than other database management systems is because
operations in ODBMS are carried out using navigational interfaces
instead of declarative one (used in other types). The use of pointers in
ODBMS allows the navigational access to data in a very efficient
manner.
Disadvantages
ODBMS include the following disadvantages:
 Complexity: In the case of general purpose type of queries related with
same information, formulating pointer-based methods will becomes
lower as well as more complicated as compared to relational.
 Deficiency of interoperability: There is a deficiency of numerous
devices or features in ODBMS. These devices or features are
considered as fixed in:
 SQL
 reporting tools
 OLAP tools
 Backup & recovery standards.
 Flaws in their query assistance: Furthermore, dissimilar to relational
model, these databases are not comprised by formal mathematical
basis, which is the reason for flaws in query assistance. However, this
limitation is equalized by the actuality that SQL and navigational access
are completely supported by some ODBMSs. For example, Matisse,
SQL++, etc.
1.4.2 Based on number of user
Now, another criterion is based on how many clients are supported by
means of system. A system based on single user can support just single
client at a particular time. Also it is mainly utilised with personal computers.

Manipal University of Jaipur B1649 Page No. 17


Advanced Database Management System Unit 1

On the other hand, multi-user systems comprising the bulk of DBMSs, can
capably support simultaneous multiple clients.
1.4.3 Based on number of sites
Other essential criterion used for classification is based on the quantity of
sites upon which database is allocated. Depending on this, database can be
distributed.
You can consider a DBMS centralized if data is accumulated at an individual
sites of computer. Centralized DBMS have the capability to support
numerous clients. However the DBMS in addition to database exist
completely at a particular site of computer.
In case of DDBMS (distributed DBMS), the real database in addition to the
software of DBMS are distributed on a number of sites, which are linked by
means of computer network. Distributed DBMS may be further divided into
homogeneous database or heterogeneous database. At every site, the
similar software of DBMS is utilised by homogeneous DDBMSs. On the
other hand, different software of DBMS is utilised by heterogeneous
DDBMSs at every site.
1.4.4 Based on cost
Now, another criterion is considered as cost. Intermediary vendors having
extra services supports free (open source) DBMS such as MySQL, etc.
Some DBMSs are obtainable as free assessment one month copy versions
in addition to personal versions, and this may cost about ` 5000 and permit
better functionality. Some are obtainable with license restrictions depending
on the quantity of simultaneous clients or quantity of client seats available at
a site.
Separate single user related versions of several databases like MS Access
are traded for each copy or considered as incorporated in the entire a
desktop configuration or laptop configuration. The advanced characteristics
like data warehousing as well as data mining, assistance for more data
types can be obtainable by paying more price.
1.4.5 Based on purpose
At last, you can classify a DBMS into the following:
 general purpose DBMS

Manipal University of Jaipur B1649 Page No. 18


Advanced Database Management System Unit 1

 special purpose DBMS


In case when the level of performing is a main concern, then special-
purpose DBMS are planned and constructed for a particular application. You
cannot use this type of system to support other applications with no main
changes. Various airway bookings as well as systems of phone book folders
are considered as special-purpose DBMSs. These takes place under the
group of OLTP (online transaction processing) systems. OLTP is required to
support huge simultaneous transactions without postponing much.

Activity 2
Make use of different kinds of data models so as to represent numerous
people in your institution or organization.

Self Assessment Questions


6. In a _________ schema, we organise data into a structure which
appears as a tree.
7. Network schema provides permission for only 1:1 relationships.
(True/False)
8. In case of relational schema, every tuple is separated into fields which
we call as _________.
9. Which of the following is not considered as a logical structure of
database?
a) Tree
b) Relational
c) Network
d) Chain
10. Relational model makes use of some unknown terminology. A tuple is
said to be equal to a _________.
11. Logical data structure having 1:M relationship is considered as a:
a) Network
b) Tree
c) Chain
d) Relation

1.5 Comparison between Centralized and Distributed Database

Manipal University of Jaipur B1649 Page No. 19


Advanced Database Management System Unit 1

Now we will differentiate between centralized database system and


distributed database system. Firstly, let us define both the terms. In case of
centralized database system, every component of system exists at a single
computer. Distributed database can be defined as database managed by
means of central DBMS.
In case of distributed databases, devices used for storage are not
connected to a mutual Central processing unit. We have shown the main
points of differences between centralized database system and distributed
database system in the Table 1.3 as below:
Table 1.3: Differences between Centralized and Distributed DB

Manipal University of Jaipur B1649 Page No. 20


Advanced Database Management System Unit 1

Self Assessment Questions


12. It is easy to preserve and update the _________ database
13. In case of distributed database, data is handled by numerous servers.
(True/False)

1.6 Summary
Let us recapitulate the important concepts discussed in this unit:

Manipal University of Jaipur B1649 Page No. 21


Advanced Database Management System Unit 1

 DBMSs (Database management systems) were generated to manage


incorporated drawbacks of file system. In addition, data integrity,
elimination of redundancy, and promoting data security are imposed by
DBMS.
 An organisation comprising a database system usually includes a
person called as DBA (Database Administrator). The operational data’s
main liability is handled by DBA.
 We can classify the database applications into various categories, based
on the client’s location and software of the database. These databases
are termed as are personal, two-tier, and multi-tier databases.
 We can classify DBMSs in accordance with various criterion, that is data
model, quantity of users, quantity of sites, types of access paths, and
price.
 In relational data model, a database is considered as a set of tables and
you can accumulate every table as an individual file.
 In a centralized DBMS, a data is accumulated at individual sites of
computer. Distributed database can be defined as database managed
by means of central DBMS (database management system).
 There also occur various distinctions among a DBMS and RDBMS. All
principles of E.F. Codd are followed by RDBMS and it depends on
relational database model. A DBMS may not necessarily follow relational
database model.

1.7 Glossary
 Centralized Database: In a centralized DBMS, a data is accumulated at
individual sites of computer.
 CODASYL: (Conference on Data Systems Languages) It is a standard
which specifies the process of storing the data into network database
and retrieving the data from network database.
 Data Model: It is a conceptual representation that captivates the
relation’s real meaning with data items.
 Database Client: The machine which is utilized for running user
interface is considered as client.

Manipal University of Jaipur B1649 Page No. 22


Advanced Database Management System Unit 1

 Database Server: The machine which is utilized for running DBMS and
includes a database is considered as the server of a database.
 Database: It is defined as a repository for every file available in the
organisation, that is, structured as well as integrated to make the
updation of files easy and retrieval of information from them.
 Distributed Database Management System: Distributed database can
be defined as database managed by means of central DBMS (database
management system).
 Hierarchical Data Model: In this model, we organize data into a
structure which appears as a tree. In this structure, every node
comprises a parent node. Root node is an exception, that is, it does not
contain any parent.
 IMS: IMS (Information management system) offers information which is
needed to handle organizations in an efficient and effective manner. It is
utilized to examine operational actions taking place in organisation.
 Network Data Model: This model observes database as set of linked
items connected with one another by means of links.
 Personal Databases: These databases are intended to assist a single
client.
 Relational Model: This makes use of a set of tables used to show data
as well as the relationship between those data. All tables comprise
numerous columns and every column comprises an exclusive name.

1.8 Terminal Questions


1. Explain hierarchical model and network model.
2. Describe a relational model and its characteristics.
3. Explain the significance of database.
4. Discuss the various applications of database.
5. Briefly describe network database model. Also explain the difference
between network database and hierarchical database.
6. What are the various types of database management system? Briefly
explain.
7. Discuss the classification of DBMS based on location of database.
8. Differentiate between a centralised and distributed database systems.

Manipal University of Jaipur B1649 Page No. 23


Advanced Database Management System Unit 1

9. What do you understand by the term RDBMS? When and by who was
it developed?
10. Enumerate the advantages and disadvantages of RDBMS.

1.9 Answers
Self Assessment Questions
1. c) Database
2. True
3. True
4. b) Business logic
5. Enterprise
6. Hierarchical schema
7. False
8. Domains
9. d) Chain
10. Network
11. b) Tree
12. Centralised
13. True
Terminal Questions
1. In Hierarchical data model, we organise data into a structure which
appears as a tree. In this structure, every node comprises one parent
node. Root node is an exception, that is, it does not contain any parent.
Network model permits every record to comprise numerous records of
parent as well as child, thus generating a network structure. Refer
section 1.4 for more details.
2. Relational model makes use of a set of tables used to show data as
well as the relationship between those data. All tables comprise
numerous columns and every column comprises an exclusive name.
Refer Section 1.4 for more details.
3. DBMS imposes data integrity, supports data security, and removes
redundancy. Refer Section 1.2 for more details.
4. Database applications can be classified into three categories, based on
the location of the client (application) and the database software. Refer
Section 1.3 for more details.

Manipal University of Jaipur B1649 Page No. 24


Advanced Database Management System Unit 1

5. Network Data Model views database as group of related items


associated with each other through links. Refer Section 1.4 for more
details.
6. DBMSs can be classified based on various criteria such as data model,
number of sites, number of users, etc. Refer Section 1.4 for more
details.
7. On the basis of location database can be of two types. Refer Section
1.4 for more details.
8. In a centralised DBMS, a data is accumulated at individual sites of
computer. Whereas Distributed database can be defined as database
managed by means of central DBMS (database management system).
Refer Section 1.5 for more details.
9. RDBMS refers to relational database management system. Refer
Section 1.4 for more details.
10. Relational model possibly provide the high level of flexibility as
compared to other database models. Refer Section1.4 for more details.

References:
 Ramakrishnan, R. & Gehrke, J. (2003), Database Management Systems
(3rd Ed.), McGraw-Hill.
 Rob, P. & Coronel, C. (2007),Database Systems: Design,
Implementation, and Management (7th Ed.), Thomson Learning.
 Silberschatz, Korth & Sudarshan (2002) Database System Concepts
(4th Ed.), McGraw-Hill
E-References
 http://msdis.missouri.edu/resources/intro_to_gis/pdf/Database.pdf
 http://www.theukwebdesigncompany.com/articles/types-of-
databases.php
 www.personal.psu.edu/glh10/ist110/topic/topic07/topic07_06.html
 http://kianchuan.blogspot.in/2009_01_01_archive.html
 http://etutorials.org/shared/images/tutorials/tutorial_35/01fig03.gif

Manipal University of Jaipur B1649 Page No. 25


Advanced Database Management System Unit 2

Unit 2 RDBMS and SQL


Structure:
2.1 Introduction
Objectives
2.2 Relational Query Languages
2.3 SQL
2.4 Integrity Constraints
Entity integrity
Domain integrity
Referential integrity
2.5 Data Definition Statements
Creating relations in SQL
Adding and deleting tuples
Destroying and altering relations
2.6 Data Manipulation Language
SELECT statement
Subquery
Querying multiple relations
Functions
GROUP BY
Updating the database
2.7 Views
2.8 Embedding SQL Statements
2.9 Transaction Processing
2.10 Dynamic SQL
2.11 Summary
2.12 Glossary
2.13 Terminal Questions
2.14 Answers

2.1 Introduction
In previous unit, you studied the the advantages and disadvantages of those
database systems.
The interaction level of database depends on its usage. If the user uses the
database at higher level than the interaction level will also be high. Hence,

Manipal University of Jaipur B1649 Page No. 25


Advanced Database Management System Unit 2

each database systems should give several methods, languages and group
of software. So that users can submit a request, process the request; and
can get the output of the request. This unit introduces some of the database
query languages and tools.
Now as we are clear about various types of DBMS, let us start this unit,
where you will learn about query languages and you will also study SQL
features and queries.
Objectives:
After studying this unit, you should be able to:
 create relational database objects using SQL
 formulate tables and data residing in them
 create and manipulate views
 describe transaction processing
 discuss the concept of embedded SQL and dynamic SQL

2.2 Relational Query Languages


Modern RDBMSs supports several query languages for user interaction.
There are two most common query languages available are SQL
(Structured Query Language) and QBE (Query by Example) with RDBMS.
 Others are Information System Base Language (ISBL) from the
Peterlee Relational Test Vehicle (PRTV) system and QUEL (Query
Language) from INGRES (Interactive Graphics Retrieval System). ISBL
(Information System Base Language) is based on relational algebra,
Query Language and SQL is like tuple calculus and QBE is like a
domain calculus.
 In this section, we will focus on QBE and in the forthcoming sections
(section 2.3 onwards) you will study about SQL in detail.
 Query by Example (QBE): QBE was developed in mid 70s at IBM
research simultaneously with the development of SQL. M.M Zloof had
designed the Query by Example (QBE) which is a relational database
query language. It is the first graphical query language. QBE is used for
visual representation of tables where the user gives commands for
defining what is to be done, instances for defining how it is done and
conditions in which records should be admitted into the processing.

Manipal University of Jaipur B1649 Page No. 26


Advanced Database Management System Unit 2

Self Assessment Questions


1. QBE stands for _________.
2. SQL is supported by RDBMS. (True/False)

2.3 SQL
SQL (Structured Query Language) is a standard relational database
language used for creation, deletion and modification of database tables.
(Note*: The SQL Keywords are case-insensitive (SELECT, FROM, WHERE, etc);
we have used caps words where we want to put emphasis on the word. Table
names, column names etc are case-insensitive in Windows OS but it is case
sensitive in UNIX OS)

Features: SQL has a very rich set of features which are given in Table 2.1
below:
Table 2.1: SQL Features
The Data Manipulation As, the name says this language is used for
Language (DML): manipulating. The data is stored in database
objects. DML uses SELECT, INSERT, DELETE
and UPDATE command to modify the data.
The Data Definition This language is used to define the structure of the
Language (DDL): table. With CREATE, ALTER and DROP
commands the structure of the table can be
modified, it can also be deleted and created as
well.
Specifications of Triggers SQL provides the features of the triggers and
and Complex Integrating complex integrity constraints (ICs) to be applied on
Constraints: queries.
Triggers: Triggers are the actions that are run by DBMS
whenever some event related to the database
occurs.
Run-time (Dynamic) and With run-time feature of SQL, users can execute
Embedded SQL: the queries at run-time. With embedded SQL,
users can retrieve the SQL statements that are the
part of some other host language (such as C or
Cobol).
Execution of the Client- This feature allows a client program to establish a
Server Application and connection with the server database. This feature
Accessing Remote

Manipal University of Jaipur B1649 Page No. 27


Advanced Database Management System Unit 2

Database: also allows the user to access remote database.


Managing, Transaction: SQL command specifies the actions to be taken in
order to control the execution of the transaction.
Security: It controls the access to the tables and views
thereby protecting the database.
Advanced Features Many features such as recursive and decision-
provided by the SQL: support queries, object-oriented features etc are
provided by the SQL

Self Assessment Questions


3. SELECT, INSERT, DELETE and UPDATE commands are used by
_________ to modify the data.
4. SQL commands defines the actions to be taken to control _________ .

2.4 Integrity Constraints


DBMS maintains the data integrity to avoid the wrong information to enter in
database.
The condition of integrity constraints are defined on database schema. An
integrity constraint limits the data that could be stored in database instance.
When database instance fulfils all the integrity constraints -defined on the
database schema, it is then known as legal instance. A DBMS implements
integrity constraints; therefore it permits only legal instances to be stored in
the database.
The major relational constraints are Domain constraints, Key constraints
and constraints on null, Entity integrity and Referential integrity and foreign
keys.
2.4.1 Entity integrity
When all rows in the column have unique identifier it means each row is
different from other it is known as entity integrity. Entity integrity is placing a
primary key (PK) constraint on particular column. This ensures all values
inserted into the column (s) should be unique. In PK constraints you cannot
enter duplicate values and null values in column (s) because it results to
failure.
The primary key of a relational table uniquely identifies each record in the
table. It can either be a normal attribute that is guaranteed to be unique or it

Manipal University of Jaipur B1649 Page No. 28


Advanced Database Management System Unit 2

can be generated by the DBMS (such as a globally unique identifier in


Microsoft SQL Server). Primary keys may consist of a single attribute or
multiple attributes in combination. Intelligent key is the utilisation of genuine
data as a PK. Only one PK is assigned to a table. A composite PK does not
contain only one column. We can utilise the composite PK when not even
one column has unique composite key.
Hence we can say that a table can contain only one PK but a PK can
contain more than one column. If we have to apply uniqueness on more
than one column, we need to utilize a PK constraint on single column and
UNIQUE constraint or IDENTITY property on other columns that does not
contains duplicate values.
2.4.2 Domain integrity
In database language a domain is group of allowed values for a column
(domain cannot be confuse between different types of domain for example
Internet domain or Windows NT Domain.
Domain integrity is also called 'attribute' integrity for example allowed size
values, right data type, null status etc. Implementation of data integrity can
be with DEFAULT constraint, FOREIGN KEY, CHECK constraint and data
types. Data types restrict the fields in different ways. A default can be
defined as a value to be inserted into column; a rule is defined as
acceptable values to be inserted into column. Rules and defaults are same
as constraints but not similar to ANSI standard; their continued utilisation is
not promoted.
2.4.3 Referential integrity
Referential integrity is formed with the combination of Primary Key (PK) and
Foreign Key (FK).
Primary key: As explained above it is a key that uniquely recognises a
record in a field(s) of a table. Hence a particular record can be tracked
without confusion.
Foreign Key: Foreign key is a column or even a group of columns in a table
(as also called 'child table') that accept its values from the primary key (PK)
from another table (also called 'parent table'). To preserve the referential
integrity, the foreign key in the 'child' table can only take values that are in
the primary key of 'parent' table. The main aim of referential integrity is to

Manipal University of Jaipur B1649 Page No. 29


Advanced Database Management System Unit 2

avoid 'orphans'. These orphans are records in ‘child table’ that cannot be
linked to a record in the ‘parent table’.
Implementing referential integrity means when the records go through the
operations like insertion, deletion and updation on that time the relationship
should be maintained between the tables. PK-FK combination also has the
referential integrity. An example of primary key and foreign key is
represented in Figure 2.1.
In the 1st table, first column (Account Number) is the PK and in the same
table branch name is the FK. To connect 1st and 2nd table the FK has
become PK.

Figure 2.1: Instance of PK and FK

Self Assessment Questions


5. _________ is formed with the combination of PK and FK.
6. Domain integrity is also called as ‘_________' integrity.

Manipal University of Jaipur B1649 Page No. 30


Advanced Database Management System Unit 2

2.5 Data Definition Statements


Data Definition Language (DDL) permits user for the creation or modification
of database objects. Specifically, they perform the task of creating objects,
altering or modifying objects, dropping or deleting objects, etc.
2.5.1 Creating relations in SQL
We define an SQL relation using the CREATE TABLE command to create a
TABLE Structure. CREATE TABLE syntax is given below.
CREATE TABLE <tablename>
(
Column1 data type (size) [null/not null]
Column2 data type (size), …………….
)
For example, to create the table EMP enter the following query.
CREATE TABLE EMP
(
EMPNO NUMBER (4) NOT NULL
ENAME VARCHAR 2 (10),
JOB VARCHAR2 (9),
DOJ DATE,
SAL NUMBER (7,2),
COMM NUMBER (7,2),
DEPTNO NUMBER (2 NOT NULL)
)
In the above program we have created a table name “EMP”. In which there
are 7 columns EMPNO (Employee Number), ENAME (Employee Name),
JOB, DOJ (Date of Joining), SAL (Salary), COMM (Communication
Number), DEPTNO (Department Number).

Manipal University of Jaipur B1649 Page No. 31


Advanced Database Management System Unit 2

2.5.2 Adding and deleting tuples


Adding a tuple/record/row: INSERT command is used to insert record in
table. The syntax is:
INSERT INTO <tablename>
VALUES (value1, value 2, …………);
For example
INSERT INTO EMP
VALUES (‘101’, ‘Nandi’, ‘President’, ‘17-NOV-88’, 5000, null, ‘10’);
The above example shows the insertion of record into the EMP table.
To insert values into only EMPNO, DEPTNO and ENAME fields, enter the
following query.
INSERT INTO EMP (EMPNO, DEPTNO, ENAME)
VALUES (‘101’, ‘29’, ‘Sujit’);
Deleting tuple: DELETE command is used to delete row from the table.
The syntax is:
DELETE FROM <tablename>
WHERE<condition>
For example,
DELETE FROM EMP
WHERE SAL > 1000;
The above example shows the deletion of all the employees whose salary is
more than 1000. If we delete the WHERE clause then all rows of the table
will be deleted but a part of the row cannot be deleted.
2.5.3 Destroying and altering relations
DROP TABLE command is used to deletes all the information of a dropped
relation from database. Syntax is DROP TABLE<Table Name>.
Example: DROP TABLE details
There are two types of DROP command: CASCADE and RESTRICT.

Manipal University of Jaipur B1649 Page No. 32


Advanced Database Management System Unit 2

CASCADE command deletes the complete database schema which


contains tables, domains and other elements.
RESTRICT command deletes the database schema if it does not contain
any element or else the command will be terminated.
Alter table command: ALTER TABLE command adds attributes to an
existing relation. The Null values are assigned to all the tuples as a new
attribute. The syntax is
ALTER TABLE d ADD I, D
Where d is existing relation, I is added attribute, and D is the domain of the
added attribute. .
ALTER TABLE d DROP I
This statement can drop attributes from a relation. Where d is existing
relation and I is the attribute of the relation.
Example: ALTER TABLE details ADD Parents_Name VARCHAR (20);
The above example will add an attribute: Parents_Name to the table details.
Self Assessment Questions
7. There are two types of DROP commands: CASCADE and RISTRICT
(True/False)
8. _________ command helps for the creation of SQL relations.

2.6 Data Manipulation Language


Data Manipulation Language (DML) contains commands which manipulate
data in existent database schema objects. Current transactions are
committed by these statements. You can find these commands in Table 2.2.
Table 2.2: Data Manipulation Language Commands

Command Purpose
DELETE To remove rows from a table
EXPLAIN PLAN To return the execution plan for a SQL statement
INSERT To add new rows to a table
LOCK TABLE To lock a table or view, limiting access to it by other users
SELECT To select data in rows and columns from one or more tables
UPDATE To change data in a table

Manipal University of Jaipur B1649 Page No. 33


Advanced Database Management System Unit 2

The SELECT statement is used for retrieving information from the database.
2.6.1 SELECT statement
Apart from information retrieval, this statement also gives query capability. It
means that when select command runs the information present in table will
be displayed on the screen.
Syntax: The three common elements of SELECT command is SELECT,
FROM and WHERE. These elements can retrieve information from more
than one table. The syntax is:
SELECT <column_list>
FROM <table_list>
WHERE <search_criteria>
Where
 <column_list> defines the list attributes whose value is to be extracted
 <table list> defines the list of relation names
 <Condition> defines the conditional expression that recognises the
tuple.
In SQL, basic logical comparison operators are used on the WHERE clause.
Comparison operators and their meanings are given in Table 2.3:
Table 2.3: Logical Comparison Operators and their Meaning

Operator Meaning
= equal to
> greater than
>= greater than equal to
< less than
<= less than equal to
<> not equal to
!= not equal to
!> not greater than
!< not less than
() order of precedence

Let us first begin with a very basic SQL query.

Manipal University of Jaipur B1649 Page No. 34


Advanced Database Management System Unit 2

Example: Assume a table, whose name is EMPLOYEE


EMPNO ENAME DESIGNATION DEPTNO PAY INCENTIVES

1821 JOHN PRESIDENT 1 60000 8000

1858 AINA MANAGER 3 30000 6000

1875 KRIPSI MANAGER 1 20000 4000

1877 ARICA MANAGER 2 15000 1000

SELECT EMPNO, ENAME, DEPTNO


FROM EMPLOYEE
WHERE DEPTNO =2;
This query will display 3 columns, i.e., EMPNO, ENAME, AND DEPTNO of
all rows of the EMPLOYEE table, whose DEPTNO is 2.
EMPNO ENAME DEPTNO

1877 ARICA 2

Example:
SELECT *
FROM EMPLOYEE
WHERE DESIGNATION = ‘MANAGER’;
This query will display all 5 columns, i.e., EMPNO, ENAME, DESIGNATION,
DEPTNO and PAY of all rows of the EMPLOYEE table whose
DESGINATION stores MANAGER.
EMPNO ENAME DESIGNATION DEPTNO PAY INCENTIVES

1858 AINA MANAGER 3 30000 6000

1875 KRIPSI MANAGER 1 20000 4000

1877 ARICA MANAGER 2 15000 1000

Note: An asterisk (*) is used to retrieve all columns from the table.
The Complete syntax for the SELECT statement is as follows:
SELECT [ALL/DISTINCT] [TOPn] [PERCENT] [WITH TIES]]
select - list

Manipal University of Jaipur B1649 Page No. 35


Advanced Database Management System Unit 2

[INTO new_Table]
[FROM table_Sources]
[WHERE search_Condition]
[GROUP By [ALL] Group_by_expression [,........n]
[WITH {CUBE ¦ ROLLUP}]]
[HAVING search_Condition]
[ORDER BY {column_name [ASC / DESC]} [,....n]]
[COMPUTE {{Column_Name [ASC / DESC]} [,....n]]
[COMPUTE {{AVG | COUNT | MAX | MIN | SUM} (expression)} [,....n]
[By expression [,...n]]
[FOR BROWSE] [OPTION (query_hint [,...n])]
2.6.2 Subquery
With the help of WHERE and HAVING commands it is possible to embed a
SQL statement into another. In this situation the query is known as sub
query and the entire select statement is known as nested query.
The structure is:
SELECT “column_name1”
FROM “table_name1”
WHERE “column_name2” [Comparison Operator]
(SELECT “column_name3”
FROM “table_name2”
WHERE [Condition])
Example: Take the table EMPLOYEE mentioned above
EMPLOYEE
EMPNO ENAME DESIGNATION DEPTNO PAY INCENTIVES

1821 JOHN PRESIDENT 1 60000 8000

1858 AINA MANAGER 3 30000 6000

1875 KRIPSI MANAGER 1 20000 4000

1877 ARICA MANAGER 2 15000 1000

Manipal University of Jaipur B1649 Page No. 36


Advanced Database Management System Unit 2

Display the employees whose DEPTNO is the same as that of employee


1821
Select ENAME, DEPTNO
FROM EMP
Where DEPTNO =
(SELECT DEPTNO
FROM EMP
WHERE EMPNO = 1821);
In the example above you can see that the inner query is executed firstly
and then the result is followed by the outer query.
Result:
ENAME DEPTNO

JOHN 1

KRIPSI 1

2.6.3 Querying multiple relations


SQL have various set operators for instance in, any, all, exists, not exists,
union, minus, intersects. These operators are utilised for the processes like
testing and membership of value in
 a set of values or
 the values in a set of values or
 membership of a tuple in a set of topples
Example: Make the list of all employees working for a department located in
NEW YORK
EMPNO ENAME DESIGNATION DEPTNO PAY DEPTNO DEPTNAME LOCATION

1821 JOHN President 1 60000 1 Accounting New York

1858 AINA Manager 3 30000 2 Research Dallas

1875 KRIPSI Manager 1 20000 3 Sales Chicago

1877 ARICA Manager 2 15000 4 Operations Boston

SELECT * FROM EMP WHERE DEPTNO IN


(SELECT DEPTNO FROM DEPT WHERE LOC = ‘NEW YORK’);

Manipal University of Jaipur B1649 Page No. 37


Advanced Database Management System Unit 2

Result:
EMPNO ENAME DESIGNATION DEPTNO

1821 JOHN President 1

1875 KRIPSI Manager 1

2.6.4 Functions
A Subprogram that returns a value is known as functions. SQL supports
various aggregate functions shown below.
(a) Count: COUNT function contains a column name and returns the count
of tupple in that column. When DISTINCT command is used then it will
return only the COUNT of unique tupple or distinct values of the column.
If the column name and DISTINCT command is not used then it will
return the count of all tupples including duplicates also COUNT (*)
displays all the tuples of the column.
Example: Write a query to List the number of employee in the company
from a table employee
SELECT COUNT (*)
FROM EMPLOYEE
(b) SUM: SUM function is written with column name and gives the sum of all
tupples present in that column.
(c) AVG: AVG function or Average function is written with column name and
returns the AVG value of that column.
(d) MAX: MAX function or Maximum value function written with column
name returns the maximum value present in that column.
(e) MIN: MIN function or Minimum value function written with column name
returns the minimum value present in that column.
Examples of Queries Based on Aggregate Functions Queries
Find the sum of salaries of all the employees and also the minimum,
maximum and average salary.
Solution:
SELECT SUM (E.ESAL) AS SUM_SALARY,

Manipal University of Jaipur B1649 Page No. 38


Advanced Database Management System Unit 2

MAX (E.ESAL) AS MAX_SALARY,


MIN (E.ESAL) AS MIN_SALARY,
AVG ([DISTINCT] E.ESAL) AS AVERAGE_SALARY
FROM EMPLOYEE
This query calculates the total, minimum, maximum and average salaries
and also renames the column names.
2.6.5 GROUP BY
GROUP BY clause is utilised with the group functions for retrieving the data
which is grouped according to one or more columns.
Example: Calculate the total number of salary spent on each department.
What would be the query?
SELECT DEPT, SUM (SALARY)
FROM EMPLOYEE
GROUP BY DEPT;
The output would be like:
dept salary
---------------- --------------
Electrical 25000
Electronics 55000
Aeronautics 35000
InfoTech 30000

2.6.6 Updating the database


UPDATE command is used for updating a single value without updating all
the values in tupple. Syntax is
Update table_name set attribute = newvalue where condition;
Suppose we wish to change the house name of the student ‘Simran’ stored
in the relation ST_DATA. The following statement will serve the purpose.
UPDATE ST_DATA
SET ST_HNAME =’pranavam’
WHERE ST_NAME=’meenu’;
Manipal University of Jaipur B1649 Page No. 39
Advanced Database Management System Unit 2

Self Assessment Questions


9. With the help of WHERE and _________ commands it is possible to
embed a SQL statement into another.
10. It is not possible to query multiple relations in SQL. (True/ False)

Activity 1
Generally, there are numerous ways to specify the same query in SQL. In
your opinion, what are the main advantages and disadvantages of this
flexibility?

2.7 Views
A view is a subschema in which logical tables are generated from more than
one base table. For example Windows; windows are similar to a created
view where user can see the stored information in tables. View is stored as
a query as it does not contain its own data. During the query execution
contents are taken from other tables. When the table content gets modified
or changed then the view will change dynamically,
The syntax to create a view is given below.

CREATE VIEW <view name>


AS <query>;

In a single table if query does not have GROUP BY clause and DISTINCT
clause then user can UPDATE and DELETE rows in a view. And if query
have columns defined by expressions then user can INSERT rows.
Example: In order to create a view of EMP table named DEPT20, to show
the employees in department 20 and their annual salary, use the following
command.
CREATE VIEW DEPT20
AS SELECT ENAME, SAL *12 FROM EMP WHERE DEPARTNO= 20;
Once the VIEW is created, it can be treated like any other table. Thus the
following is a valid command.
SELECT * FROM DEPT20;

Manipal University of Jaipur B1649 Page No. 40


Advanced Database Management System Unit 2

Self Assessment Questions


11. A _________ is a subschema in which logical tables are generated
from more than one base table.
12. During the query execution contents are taken from other tables.
(True/False)

2.8 Embedding SQL Statements


SQL statements can be embed into various types of programming
languages such as C, Cobol, Pascal, Fortaran etc. Host language is the
language in which the SQl queries are embedded. Therefore, C, FORTRAN,
Pascal etc are the host languages. The SQl structure, which is embedded in
the host language, is termed as embedded SQl. Therefore, the
programmers can make use of the various SQL commands to access and
update any data, which is stored in the database.
The use of embedded statements makes it easier to make any amendments
in the database. It also enhances the programmer’s capability to modify the
database largely. Database system is responsible for all query execution.
The database then returns the result (one tuple at a time) to the program.
Before compiling the program, the embedded SQL statement is processed
by using a special pre-processor. To allow the embedded SQL program to
be processed at runtime, they are replaced with the declarations and
procedure calls of the host-language. After doing so the resultant program is
sent for compilation. For easily recognising the embedded SQL statements
to pre-processor, you may use the EXEC SQL statement. It has the
following syntax:

EXEC SQL <embedded SQL statement > END-EXEC

The syntax given above is a generalised form, however the syntax may
differ somewhat depending upon the host language for which it is being
used.
Declaring Variables and Exceptions: SQL INCLUDE can be used in the
host program to determine the place for inserting the special variables
(variables which are used in communication within the database and
program) by the pre-processor. Host language variables can also used
inside the embedded SQL statements. It is a good practice to append a

Manipal University of Jaipur B1649 Page No. 41


Advanced Database Management System Unit 2

colon before the host variables to differentiate them from other variables
used in SQL. Declare cursor statement is used for writing a embedded SQL
query within a host program. It does not runs the query. Separate command
is used to fetch the result of the embedded query.
Let us take an example of banking schema. Suppose you have a host-
language variable termed as “amount”, and you want to determine the
names and residing cities of all the bank customers who currently have
balance more than a particular amount in any of their accounts. The query
for finding this can be written as shown below:
EXEC SQL
declare c cursor for
select customer-name, customer-city
from deposit, customer
where deposit.customer-name = customer.customer-name and
deposit.balance > : amount
END-EXEC
The variable c that is used in the above query statement is termed as the
‘cursor’. This cursor is used for identifying a query in an open statement and
also helps in query evaluation.
This cursor variable is also used in the fetch statement. It places the values
of a tuple/row in the host language variable. Given below is an example of
this.

EXEC SQL open c END-EXCE

When any error occurs in the execution of SQL query then the error report is
stored inside a special variable. These special variables are called as SQL
communication-area (SQLCA) variables. The declarations for the SQLCA
variables are contained inside SQL INCLUDE statement.
Fetch Statement
A sequence of fetch statements are used to make tuples of the result
available to the program. One host language variable is required for each
attribute of the result relation. Therefore in the banking schema example we
require two separate variables i.e. one for storing customer name and the
other for storing customer resident city. Let us assume we take a variable en
Manipal University of Jaipur B1649 Page No. 42
Advanced Database Management System Unit 2

for storing the customer name and cc for storing customer city. Then the
tuple of the result relation can be obtained by using the following statement:
EXEC SQL fetch c into: en,: cc END-EXEC
After this the programmer can modify the values of the two variables en and
cc by using the host language commands and features.
Close statement
The close statement is another embedded SQL statement, which is used for
deleting the temporary relation that stores the query result. Given below is
the use of close statement in our example:

EXEC SQL close c END-EXEC

Embedded SQL statements for database modifications


The Embedded SQL statements which are used for database modification
such as update, insert, & delete return no result. Therefore, they are simple
and easy to use. For example, a database-modification statement in
Embedded SQL has the following syntax:

EXEC SQL < any valid update, insert, or delete> END-EXEC

A SQL database modification expression may also contain the host-


language variables, that is preceded by a colon. In case of an error during
statement execution, SQLCA comes into picture.
Self Assessment Questions
13. To recognise embedded SQL requests to the pre-processor, we use
the _________ statement.
14. It is a good practice to append a colon before the host variables to
differentiate them from other variables used in SQL. (True/False)

2.9 Transaction Processing


The logical unit of database processing is defined by mechanism provided
by transaction processing. Transaction processing systems consists of
immense databases and lakhs of users concurrently executing database
transaction. Transaction is a logical unit of data manipulation related tasks
wherein either all the component tasks must be completed or none of them
is executed in order to keep the database consistent. When many

Manipal University of Jaipur B1649 Page No. 43


Advanced Database Management System Unit 2

transactions proceed in the database environment it is imperative that a


strict control is applied on them failing which the consistency of the
database cannot be ensured.
ACID Properties: Unwanted inconsistencies can easily occur in the
database particularly when various transactions are executing
simultaneously. The term ACID defines those properties that must be
related with transactions in order that the reliability of the database is
assured. The term ACID when extended can be read as the following:
A Atomicity
C Consistency
I Isolation
D Durability
 Atomicity: A transaction usually includes various database operations.
This property of a transaction makes sure that either every operation is
executed in a successful manner or none of them is executed at all.
 Consistency: This property requires that the database integrity rules
must be obeyed properly.
 Isolation: In case of multi-transaction environment various transactions
may be carrying out simultaneously on a single database. This property
provides assurance that all transactions are executed independently.
 Durability: When a transaction is completed successfully, this property
makes sure that the changes performed in the database are saved in
the physical database.
Transaction support in SQL
SQL offers the concurrency control for the execution of a transaction via a
Data Control Language which can also be called as (SQL DCL). When a
transaction begins, we use the statement BEGIN TRANSACTION offered by
SQL DCL whereas when a transaction ends, we use the statement END
TRANSACTION.
There are two statements provided by SQL that makes the process of
concurrent transaction control easy.
 COMMIT: On the execution of this statement, every modification done
by the related transaction until now is made constant.
 ROLLBACK: On the execution of this statement, every change
performed since the preceding COMMIT statement is rejected.

Manipal University of Jaipur B1649 Page No. 44


Advanced Database Management System Unit 2

There are some conditions into which transactions may occur. These
conditions are shown in Table 2.4 below:
Table 2.4: Conditions into which Transactions may occur

Conditions Features

Dirty read This condition arises when a transaction reads data written by
a concurrent uncommitted transaction.

Non-repeatable This condition is caused by a transaction which reads data


read again and finds that data has been modified by committed write
operation of some other transaction.

Phantom read This condition arises when a transaction executes a query


again it had previously executed and gets rows different from
what it got earlier.

Depending upon the conditions given above some levels of transaction


isolation are defined by SQL. These levels are discussed as below:
 Read uncommitted isolation: Here the transactions are permitted to
perform the execution of all non-repeatable, dirty, and phantom reads.
 Read committed isolation: In this level when the execution of a
transaction takes place, the data committed before the beginning of a
query is obtained by a SELECT query.
 Repeatable read: This level does not allow dirty and non-repeatable
reads. It provides permission for only phantom read.
 Serialisable isolation: From all the levels of isolation, this level is
considered as the most rigid one. Here the transactions are forced to
execute sequentially. Thus a transaction can start only after the
completion of the existing transaction. As serialisation failures in this
level can take place often, it must assure the withdrawal of a transaction.
Self Assessment Questions
15. SQL offers _________ statements that make easy the process of
concurrent transaction control.
16. In transaction processing, the integrity rules of a database are
maintained by _________ property.

Manipal University of Jaipur B1649 Page No. 45


Advanced Database Management System Unit 2

Activity 2
Create a list of all Transaction Control commands in SQL and explain
then with there uses.

2.10 Dynamic SQL


Dynamic SQL permits to create and submit SQL queries dynamically or run
time. But, the embedded SQL statements should be entirely there at
compile time, and are executed by the embedded SQL pre-processor.
Dynamic SQL is used for creating SQL queries as strings at run time
depending on the input from the users. As well as it can be compiled
immediately or have them prepared for later use.
Below example shows the use of Dynamic SQL in C program
char * sqlprog = “UPDATE account SET balance = balance * 1.05
WHERE account-number=?”
EXEC SQL PREPARE dynprog from: sqlprog;
char account [10] = “A-101”;
EXEC SQL EXECUTE dynprog using: account;
A “?” denotes a place holder for any value in a dynamic SQL program query.
PREPARE and EXECUTE are two important commands illustrated below:
char c_sqlstring[] = {“DELETE FROM Sailors WHERE rating>5”};
EXEC SQL PREPARE readytogo FROM: c_sqlstring;
EXEC SQL EXECUTE readytogo;
This example shows the declaration of C variable “c_sqlstring” and
initialisation of values to the string representation of SQL commands in the
first statement. The second statement results in this string being parsed and
compiled as an SQL command, with the resulting executable bound to the
SQL variable ready to go. (Since ready to go is an SQL variable, just like a
cursor name, it is not prefixed by a colon.) The third statement executes the
command.

Manipal University of Jaipur B1649 Page No. 46


Advanced Database Management System Unit 2

Self Assessment Questions


17. _________ permits to create and submit SQL queries dynamically or
run time
a) Miscellaneous SQL
b) Dynamic SQL
c) Data Definition Language
d) SQL Preprocessor
18. Using dynamic SQL, programs cannot create SQL queries as strings at
run time. (True/ False)

2.11 Summary
Let us recapitulate the important concepts discussed in this unit:
 SQL and QBE are the main types of relational query languages.
 DBMS maintains the data integrity to avoid the wrong information to
enter in database.
A DBMS implements integrity constraints; therefore it permits only legal
instances to be stored in the database.
 A PK is known as a ' surrogate/alternate key' for those who does not
contain genuine data.
 A subquery is simply a query within another Query.
 SQL supports various functions such as max, min, avg, count etc.
 Transaction Control commands manages changes made by Data
Manipulation Language commands.
 Dynamic SQL permits to create and submit SQL queries dynamically or
run time.

2.12 Glossary
 DDL: Data Definition Language
 DML: Data Manipulation Language
 Domain constraints: The set of all the values that an attribute can
attain.
 Dynamic SQL: Dynamic SQL allows a query to be compiled at run-time.
 Embedded SQL: Embedded SQL allows SQL code [program] to be
used in a host language i.e., general programming languages, such as
C, COBOL, PASCAL, Fortran.

Manipal University of Jaipur B1649 Page No. 47


Advanced Database Management System Unit 2

 ISB: Information Systems Base Language.


 PRTV: Peterlee Relational Test Vehicle
 QBE: Query-By-Example is a relational data manipulation language.
 QUEL: QUEry Language
 INGRES (INteractive Graphics and REtrieval System)
 View: A customised presentation of the data from one or more tables.

2.13 Terminal Questions


1. Explain SQL and its features.
2. Explain with examples different SQL commands used for creating and
deleting relations.
3. What are the three basic components of select statement? Explain with
an example.
4. What are the uses of Insert, Delete and Update commands?
5. What is the function of Create, Alter commands?
6. What do you understand by DDL? Make a list of commands used in
DDL.
7. Write a short note on ACID properties of transaction model.
8. What is primary key and candidate key?
9. Write a short note on Dynamic SQL.

2.14 Answers
Self Assessment Questions
1. Query by example
2. False
3. DML
4. Transaction Execution
5. Referential Integrity
6. Attribute
7. True
8. Create
9. HAVING
10. False
11. View
12. True
13. EXEC SQL
14. True

Manipal University of Jaipur B1649 Page No. 48


Advanced Database Management System Unit 2

15. Two
16. Consistency
17. Dynamic SQL
18. False
Terminal Questions
1. SQL refers to Structured Query language. Refer Section 2.3 for more
details.
2. Relations can be created by use of Create command. Refer Section 2.5
for more details.
3. The three basic components of select statement are SELECT, FROM
and WHERE. Refer Section 2.6 for more details.
4. These commands are the DML commands. Refer Section 2.6 for more
details.
5. Create and Alter commands are used to create and alter database
objects. Refer Section 2.5 for more details.
6. DDL refer to data definition language. Refer Section 2.5 for more details.
7. Every transaction must follow ACID property. Refer Section 2.9 for more
details.
8. Primary key is used to uniquely identify a row. Refer Section 2.4 for
more details.
9. Dynamic SQL allows a query to be constructed (and executed) at run-
time. Refer Section 2.10 for more details.

References:
 Peter Rob, Carlos Coronel, "Database Systems: Design,
Implementation, and Management", (7th Ed.), Thomson Learning
 Silberschatz, Korth, Sudarshan, "Database System Concepts", (4th Ed.),
McGraw-Hill
 Elmasari Navathe, "Fundamentals of Database Systems",(3rd Ed.),
Pearson Education Asia
E-references:
 http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001.
htm
 http://msdn.microsoft.com/en-us/library/windows/desktop/
ms714570%28v=vs.85%29.aspx
 http://beginner-sql-tutorial.com/sql-commands.htm
Manipal University of Jaipur B1649 Page No. 49
Advanced Database Management System Unit 3

Unit 3 Normalisation
Structure:
3.1 Introduction
Objectives
3.2 Functional Dependency
3.3 Anomalies in a Database
Redundancy
Inconsistency
Update anomalies
3.4 The Normalisation Process
First normal form
Second normal form
Third Normal form
Boyce-Codd normal form
Fourth normal form
Fifth normal form
3.5 Normalisation and Database Design
3.6 Denormalisation
3.7 Summary
3.8 Glossary
3.9 Terminal Questions
3.10 Answers
3.11 References

3.1 Introduction
The basic objective of relational database design is to formulate a basic set
of relational schemas which helps the user to store information in the
database without any redundancy and anomaly. We basically strive to
develop a database from which information can be extracted effortlessly.
Normalisation is one way to achieve this aim of designing relational
schemas which are efficient in performance.
Relational schemas which contain the design anomalies are decomposed to
convert them into various normal forms. This procedure of converting the
relational databases into normal forms is referred to as normalisation which

Manipal University of Jaipur B1649 Page No. 50


Advanced Database Management System Unit 3

we will cover in this unit. Normalisation technique plays a vital role in


designing good and efficient relational databases.
In 2nd unit, you studied various aspects of relational database systems and
SQL. In this unit, you will study problems that arise due to bad designs and
ways to achieve good designs using normalisation techniques.
Objectives:
After studying this unit, you should be able to:
 Identify the functional dependencies which may exist among the
relations
 recall and describe various anomalies in a database
 explain the concept of schema refinement through normalisation
 discuss the relationship between normalisation and database design
 describe the need and techniques of denormalisation

3.2 Functional Dependency


Functional dependency (FD) is the most important component of
normalisation.
Two attributes A and B in any relation R are said to possess a functional
dependency (FD) if for each distinct value of A, there is only one value of
attribute B associated with it. FD is symbolically represented as

AB
This denotes that attribute A functionally determines attribute B or attribute
B is functionally dependent on attribute A.
Note that it is not necessary if B is functionally dependent on A, then A is
also functionally dependent on B.
Let us understand this with the help of few examples given below.
Example: Let us take an example of Relation Sample (P, Q, R) as shown in
Table 3.1:

Manipal University of Jaipur B1649 Page No. 51


Advanced Database Management System Unit 3

Table 3.1: Sample (P, Q, R)

P Q R

pl q1 rl

p2 q2 r2

Pl q1 r2

Pl q1 r3

In this sample (P,Q. R), t[P] symbolises the tuple variables of the attribute P.
This relation contains the functional dependency between P and Q where Q
is functionally dependent on P. It can be represented as

PQ
Here t[P]=t[Q] which means that for every value of Q there exists a unique
value of P. But there exists no functionally dependency between P and R as
for each value of R there is not a unique value of P. Similarly there is no FD
between Q and R in the above relation.
Example: Table 3.2 below shows functional dependency for an Employee
Relation
Table 3.2: Functional Dependency

ID Name Dept No Sal Mgr

131 Ram 20 10000 134


132 Kiran 20 7000 136
133 Rajesh 20 5000 136
134 Padma 10 20000
135 Devi 30 3000 137
136 Satish 20 6000
137 V.V.Rao 30 10000

Manipal University of Jaipur B1649 Page No. 52


Advanced Database Management System Unit 3

Non-key attributes depends on the PK (primary key) attribute.


In this relation, Name, Dept. No., Sal, Mgr are each functionally dependent
on the ID (employee Id). Thus, ID  Name ID  Dept No , ID  Sal and
ID  Mgr .

Example: Consider another relation where every vehicle owner has a


license and each license contains a unique license number. A license
number uniquely determines a distinct owner. We can also say that a
particular licence number is capable of uniquely determining the identity of
the owner.
Thus we can say that a functional dependency exists between the two
attributes licence id and license owner. This functional dependency is shown
below symbolically:
license _ id  license _ owner

This means that the License_owner is functionally dependent on license id


or we can say that license_id functionally determines license_owner.
Again you must note that the converse of the above functional dependency
may need not hold true necessarily. In some cases a person can possess
two types of vehicles such as a two wheeler and a four wheeler then the
owner will possess two licenses.
Functional dependencies are useful in refining the schema. By using
Functional dependencies a relation can be replaced with smaller relations.
Decomposition of a relational schema (R) consists of replacement of the
relation schema by two or more relational schemas all of which contains a
subset of the attributes of R. There are two basic objectives of decomposing
a particular relation. These are:
 To reduce data redundancy in the relational schema.
 To retain the ability to recreate the original relation without leaving out
tuples or adding new tuples.
This method of recreating the original table from the decomposed tables is
called a join.
Decomposition is termed as lossless join decomposition, when the parts of
the table can be joined back again without adding more tuples added to the

Manipal University of Jaipur B1649 Page No. 53


Advanced Database Management System Unit 3

database relation. In case the re-joining of the disjoined relations results in


extra tuples the join is termed as a lossy join, because it loses information
with the additional tuples.
Self Assessment Questions
1. Decomposition helps to reduce data redundancy. (True/False)
2. Functional dependencies can be used to refine the _________.

Activity 1
Explain Armstrong’s axioms for functional dependencies. You may take
help of internet.

3.3 Anomalies in a Database


Many-a-times a database is not designed appropriately which results in
different types of anomalies. These database anomalies diminish the
performance of the database. Therefore we need to remove these
anomalies so as to design a good database. Let us first study about some of
the very important anomalies, which are discussed below:
3.3.1 Redundancy
Redundancy is the most common type of database anomaly. If the data
values stored in a relation are repeated, then the database is said to
possess redundancy. Database containing multiple copies of the same data
result in wastage of expensive storage space. It also leads to various other
types of anomalies.
Let us take the example of relational schema Sales persons (SP). This
relation contains the details of the sales person. The relational schema (SP)
for the same , as an example, is designed in the following manner:
SP(SPId, SPName, SPAge, MGRId, MGRName, MGRAge, SPLocation,
SPSalary) where the attributes mean as follows:
SPId: Sales person unique identity number
SPName: Sales person’s name
SPAge: Sales person’s age
MGRId: Manager’s unique identity number
MGRName: Manager’s name
MGRAge: Manager’s Age
SPLocation: Sales person’s location
SPSalary: Sales person’s salary

Manipal University of Jaipur B1649 Page No. 54


Advanced Database Management System Unit 3

One example of this relational schema is shown in Table 3.3.


Table 3.3: An Example of Relational Schema
Sales
Id Name Age ManagerId MName Mage Location Pay
employee
11 Ramkumar 32 21 Shammi 35 Hisar 4000
21 Shammi 35 44 Rajan 48 Bhiwani 6000
13 Sam 34 21 Shammi 35 Hisar 3400
44 Rajan 48 Null Null Null Null Null
50 Sunny 33 44 Rajan 48 Bhiwani 3000
26 Raj 29 13 Sam 34 Hisar 2500
47 Harry 32 44 Rajan 48 Karnal 4000

Note that the above table contains anomaly. Observe that ‘Rajan’ and his
age appear thrice in the Table 3.3. Imagine a large relation containing
thousands of records. If in such type of table even 100 records are
duplicated then this would result in the wastage of lots of storage space as
well as lower the speed of query execution. This repetition of database
values is termed as redundancy.
3.3.2 Inconsistency
A poorly designed database may also contain inconsistency. It is one of the
most troublesome database anomalies. If there is any type of disagreement
between data items in a database, then it is referred to as inconsistency.
Let us understand this with the help of above relation SP. Suppose we want
to modify/update the age of ‘Shanti’ from 35 to 36. In such a situation this
needs to be updated at all the places, in whichever record ‘Shanti’ appears
in SPName column. If we do not do so then it will result in inconsistency in
the database as at some she Shanti’s age will be shown as 35 years old
while at other places she would be 36 years old in the same database.
This data inconsistency problem is quite common in situations where there
is redundancy in the database.
Therefore database designers strive to develop a database which contains
very less redundancy. Their primary objective is to keep the redundancy
under control rather than eliminating it completely.

Manipal University of Jaipur B1649 Page No. 55


Advanced Database Management System Unit 3

3.3.3 Update anomalies


When a person tries to update the redundant database, then it results in
update anomalies. There are mainly three categories of update anomalies.
These update anomalies are discussed below.
1. Insertion Anomalies: We will again illustrate this with the help of
foregoing table SP. Suppose you want to add a new sales person ‘Ajay’
to the database who has recently joined the company but has not been
assigned any manager yet.
In such a scenario the available information with you is SPId, SPName,
SPAge, SPLocation and SPSalary. You will not have any values for
attributes MGRId, MGRName and MGRAge as the sales person has not
been assigned any manager. Therefore you will assign Null to these
attributes.
But doing so gives rise to another problem. This results in problem of
interpretation of the Null. Different people may interpret null differently.
By seeing the null in the attribute a data entry operator may feel that the
value was unavailable but some other person might think that the
attribute is not applicable to this entry.
It is not possible to compare the Null values and therefore searching &
sorting must be done. One of the design goals is therefore, to reduce the
use of Null to a minimum.
Now, let us suppose ‘Prakash’ joins as a manager. But if there is no sale
person yet associated with him then the question arises that where
should this information be stored? If we insert a record with all other
value as Null, then there is also be problem that any one of these
attributes is the primary key for that table and it is not possible to insert
null into any key attribute. Therefore this situation gives rise to an
insertion anomaly in the relation.
2. Deletion Anomalies: Deletion is another type of update anomaly which
occurs when inconsistencies arise due to the removal of record from a
relation. Let us consider that ‘Shammi’ leaves the organisation.
Therefore the entire records relating to ‘Shammi’ needs to be deleted
from the database table. But accidentally the record of sales person
‘Sunny’ is also deleted along with the deletion of ‘Shammi’s” records.

Manipal University of Jaipur B1649 Page No. 56


Advanced Database Management System Unit 3

All the information regarding sales person ‘Sunny’ is lost. Such a


condition gives rise to deletion anomaly in which the deletion of one or
more record unintentionally deletes some other data from the database.
3. Modification Anomalies: As was explained earlier, when an attribute’s
value for a particular record needs to be modified, then this change must
be done in all the occurrences of the record. Otherwise, it results in
modification anomaly.
Let us suppose that you want to update the age of ‘Rajan’ to 49 years.
Therefore you need to essentially update this at all the places where
‘Rajan’ appears in the MName column. If by mistake even one
occurrence is missed, inconsistency will arise.
Remember a good quality database design will be free from any such
anomalies. Clearly, for the above mentioned causes the sales person
SP table cannot be considered as a good database design.
Self Assessment Questions
3. Which of the following is referred when there is a disagreement
between data items in a database?
a) Redundancy
b) Inconsistency
c) Anomaly
d) Normalisation
4. When the data values are stored repeatedly in multiple copies in the
database, it is known as ________________ .

3.4 The Normalisation Process


Normalisation comprises of various set of rules which are used to make sure
that the database relations are fully normalised by listing the functional
dependencies and decomposing them into smaller, efficient tables.
Normalisation primarily helps to:
1. eliminate data maintenance anomalies
2. minimise database redundancy
3. eliminates data inconsistency
Normalisation technique is established on the idea of normal forms. A table
is said to be in a specific normal form if it fulfils a particular set of constraint

Manipal University of Jaipur B1649 Page No. 57


Advanced Database Management System Unit 3

which are defined for that normal form. These constraints are usually
applicable on the attributes (column) and the relationships between them.
There are various levels of normal forms (See Figure 3.1). Each normal
form addresses a specific issue that could result in minimising the database
anomalies.
Database Normalisation uses functional dependencies present in a
relation/table and the candidate key in examining the tables. In the
beginning, there three normal forms were suggested; First normal Form
(1NF), Second normal Form (2NF), and Third normal Form (3NF). Later on
Fourth Normal Form (4NF) and Fifth Normal Form (5NF) were also
introduced.
Afterwards, E.F. Codd and R. Boyce presented a more substantial definition
of third Normal Form known as (Boyce-Codd Normal Form).
All the normal forms except 1NF are derived from the concept of functional
dependencies among the attributes of relation.

Figure 3.1: Normalisation Process

Manipal University of Jaipur B1649 Page No. 58


Advanced Database Management System Unit 3

When you initially enter the records into a database table, it is commonly in
unnormalised form. Therefore you need to refine this table with the help of
various types of normalisation forms which are explained below:
3.4.1 First normal form
First normal form commonly termed as 1NF is the most basic normal form.
In this normal form the condition lies that there must not be any repeating
groups in any column. In other words, all the columns in the table must be
composed of atomic values.
Note: Atomic: A column is said to be atomic if the values are indivisible
units.
The table is said to possess atomic values if there is one and only one data
item for any given row & column intersection. Non-atomic values create
repeating groups. A repeating group is just the repetition of a data item or
cluster of data items in records. For example, consider below given
Table 3.4:
Table 3.4: Employee Table with Attribute Dependents

ID Name DeptNo Sal Mgr Dependents

131 Ram 20 10000 134 Father, Mother, Sister

132 Kiran 20 7000 136 Wife, Son

133 Rajesh 20 5000 136 Wife

134 Padma 10 20000 Son, Daughter

135 Devi 30 3000 137 Father, Mother

136 Satish 20 6000 Father, Mother

137 V.V. Rao 30 10000 Wife, First Son, Second Son

In the Table 3.4 you can see that the dependents column contains
non-atomic values. Therefore to convert this table into INF, we need to
modify the non-atomic values into atomic values as shown in Table 3.5.

Manipal University of Jaipur B1649 Page No. 59


Advanced Database Management System Unit 3

Table 3.5: Change of Non-atomic Values into Atomic Values of Table 3.4

ID Name DeptNo Sal Mgr Dependents

131 Ram 20 10000 134 Father

131 Ram 20 10000 134 Mother

131 Ram 20 10000 134 Sister

132 Kiran 20 7000 136 Wife

132 Kiran 20 7000 136 Son

133 Rajesh 20 5000 136 Wife

134 Padma 10 20000 Son

134 Padma 10 20000 Daughter

135 Devi 30 3000 137 Father

135 Defi 30 3000 137 Mother

136 Satish 20 6000 Father

137 V.V. Rao 30 10000 Wife

137 V.V. Rao 30 10000 First Son

137 V.V. Rao 30 10000 Second Son

Observe in Table 3.5 the dependents column now contains atomic values.
You will note that for each dependent the other employee details such as
ID, Name, Dept No, Sal and Mgr are repeated which results in the creation
of repeating group(data redundancy). According to first NF, the above
relation employee (Table 3.5) is in 1NF. However, it is best practice to
remove the groups which are being repeated in the table.
According to first normalisation rule, the table should not contain any
repeating groups of column values. If there exists any such type of repeating
groups then they should be decomposed and the associated columns will
form their own table. Also the new resulting table must contain a link with
the original table (from where it was decomposed). Thus, to remove
repeating groups from the Employee relation, it can be decomposed into two
relations namely Emp and Emp_Depend as shown in Table 3.6 and 3.7:

Manipal University of Jaipur B1649 Page No. 60


Advanced Database Management System Unit 3

Table 3.6: Emp Relation

ID Name DeptNo Sal Mgr


131 Ram 20 10000 134
132 Kiran 20 7000 136
133 Rajesh 20 5000 136
134 Padma 10 20000
135 Devi 30 3000 137
136 Satish 20 6000
137 V.V. Rao 30 10000

Table 3.7: Emp_Depend Relation

Here, in the above table 3.7, {ID, Dependents} combination will act as the
unique key. And the tuple ‘ID’ is the common tuple in both the tables
(table 3.6 and table 3.7) which act as a link with the original table. Now data
redundancy in the columns ID, Name, Dept. No, Sal and Mgr are also
eliminated and now these tables are in INF. Now let us consider another
example. Suppose we have a customer table as shown in Table 3.8.

Manipal University of Jaipur B1649 Page No. 61


Advanced Database Management System Unit 3

Table 3.8: Customer Table

Cust_id Name Addres Acc_ id Acc_type Min_bal Tran_id Tran_type Tan_mod Amount Balance
s e

001 Ravi Hyd 994 SB 1000 14300 B/F 1000 1000

001 Ravi Hyd 994 SB 1000 14301 Deposit Bycash 1000 2000

001 Ravi Hyd 994 SB 1000 14302 Withdrawal ATM 500 1500

110 Tim Sec'ba 340 CA 500 14303 B/F 3500 3500


d

110 Tim Sec 340 CA 500 14304 Deposit Payroll 3500 7000
'bad

110 Tim Sec'ba 340 CA 500 14305 Withdrawal ATM 1000 6000
d

420 Kavi Vizag 699 SB 1000 14306 B/F 6000 6000

420 Kavi Vizag 699 SB 1000 14307 Credit Bycash 2000 8000

420 Kavi Vizag 699 SB 1000 14308 Withdrawal ATM 6500 1500

You will notice that the Table 3.8 contains repeating group composed of
Cust_id, Name and Address. Therefore to convert this table into first normal
form, we need to remove this repeating group. This can be done by dividing
this table into two tables: Customer and Customer Tran. (See Table 3.9 and
3.10)
(Note: The primary key columns of each table are indicated in highlights in
Figures).

Manipal University of Jaipur B1649 Page No. 62


Advanced Database Management System Unit 3

Table 3.9: Customer Table

Cust_id Name Address

001 Ravi Hyd

110 Tim Sec'bad

420 Kavi Vizag

Table 3.10: Customer_Tran Table

Tran_id Cust_i Acc_id Acc_type Min_bal Tran_type Tan_mod Amount Balance


d e

14300 001 994 SB 1000 B/F 1000 1000

14301 001 994 SB 1000 Deposit Bycash 1000 2000

14302 001 994 SB 1000 Withdrawal ATM 500 1500

14303 110 340 CA 500 B/F 3500 3500

14304 110 340 CA 500 Deposit Payroll 3500 7000

14305 110 340 CA 500 Withdrawal ATM 1000 6000

14306 420 699 SB 1000 B/F 6000 6000

14307 420 699 SB 1000 Credit Bycash 2000 8000

14308 420 699 SB 1000 Withdrawal ATM 6500 1500

3.4.2 Second normal form


1NF table is not fully free from redundancy. It may have partial
dependencies. Therefore, the Second Normal Form resolves partial
dependencies.
The Second Normal Form states that
 The table must be in 1st Normal form
 All the non-key columns must be fully functional dependent on the
Primary key
Any attribute (column) is said to be partially dependent if its value can be
determined by any one or more attributes of the primary key, but not all.

Manipal University of Jaipur B1649 Page No. 63


Advanced Database Management System Unit 3

Every normal form is based upon the previous normal form. Therefore the
first condition for the second normal form is to have all its tables in first
normal form.
The Fully Functional Dependency is for a given composite primary key
(a primary key which is made of more than a single attribute), each column
attribute, which is not an attribute of the Primary key, should be dependent
on each and every one of the attributes.
If attributes are only partially dependent on the primary key attribute then
they must be removed and placed in another table. The primary key of the
new table formed must have apportion of the original key that they were
dependent on.
Again consider the earlier example of Customer Relation. After converting it
into1NF we have two tables: Customer and Customer_Tran. Now we need
to convert it into 2NF (Second Normal Form). For doing so, the Customer
Tran table is further decomposed into three tables: Customer Account,
Accounts and Transaction, as shown in Table 3.11, 3.12 and 3.13.
Table 3.11: Customer_Accounts Table

Cust.id Acc_id Balance


001 994 1500
110 340 6000
420 699 1500

Table 3.12: Accounts Table

Acc_id Accjype Min.bal


994 SB 1000
340 CA 500
699 SB 1000

Manipal University of Jaipur B1649 Page No. 64


Advanced Database Management System Unit 3

Table 3.13: Transaction Table

Tran_id Acc_id Tran_type Tan_mode Amount

14300 994 B/F 1000

14301 994 Deposit Bycash 1000

14302 994 Withdrawal ATM 500

14303 340 B/F 3500

14304 340 Deposit Payroll 3500

14305 340 Withdrawal ATM 1000

14306 699 B/F 6000

14307 699 Credit Bycash 2000

14308 699 Withdrawal ATM 6500

Table 3.14: Customer Table


Cust_id Name Address
001 Ravi Hyd
110 Tim Sec'bad
420 Kavi Vizag

As the Acc_type and Min_bal attributes of Customer_Account table


(Table 3.11) are not fully functionally dependent on the primary key
(dependent on acc_id), therefore a new Accounts table is formed
(Table 3.12).
Similarly, the Balance is dependent on Cust_id and Acc_id, but not fully
functionally dependent on the , resulting in a new Customer_Accounts table
(Table 3.14).
3.4.3 Third Normal form
Second normal forms are not yet completely free from redundancies. It may
show some redundancies due to transitive dependencies. Thus the next
higher normal form i.e. the third normal form objective is to resolve transitive
dependencies. A transitive dependency arises between two attributes when

Manipal University of Jaipur B1649 Page No. 65


Advanced Database Management System Unit 3

any non-key attribute is functionally dependent on some other non-key


column which is in turn functionally dependent on the primary key.
The essential conditions for the Third Normal Form are:
 The table must be in 2nd Normal Form
 The table must not contain any transitive dependencies
Transitive Dependencies: Columns dependent on other columns that in turn
are dependent on the primary key are said to be transitively dependent.
In other words, a relation R is said to be in the third normal form (3NF) if and
only if it is in 2NF and every non-key attribute must be non-transitively
dependent on the Primary key.
Therefore the main objective of 3NF is to make the relation free from all
transitive dependencies. Let us understand how we can do this with the help
of an example.
Example: Again let us go back to our previous example. The Accounts table
(Table 3.12)is in the second normal form but it has transitive dependency as
follows:
In order to remove this transitive dependency, the Accounts table can be
decomposed into two tables: Acc_Detail and Product as shown in Table
3.15 and 3.16:
Table 3.15: Acc_Detail Table

Acc_id Acc_type
994 SB
340 CA
699 SB

Table 3.16: Product Table

Acc_type Min_bal
SB 1000
CA 500

Manipal University of Jaipur B1649 Page No. 66


Advanced Database Management System Unit 3

Tables after Third Normal Form are given below (Table 3.17, 3.18 and 3.19)
Table 3.17: Customer Table

Cust_id Name Address

001 Ravi Hyd

110 Tim Sec'bad

420 Kavi Vizag

Table 3.18: Customer Accounts Table

Cust_id Acc_id Balance

001 994 1500

110 340 6000

420 699 1500

Table 3.19: Transaction Table

Tran_id Acc_id Tran_type Tan_mode Amount


14300 994 B/F 1000

14301 994 Deposit Bycash 1000


14302 994 Withdrawal ATM 500
14303 340 B/F 3500
14304 340 Deposit Payroll 3500
14305 340 Withdrawal ATM 1000
14306 699 B/F 6000
14307 699 Credit Bycash 2000
14308 699 Withdrawal ATM 6500

3.4.4 Boyce-Codd normal form


BCNF is the common name of Boyce-Codd normal Form. This normal form
is stricter than the 3 NF. Remember that every relation which is in BCNF

Manipal University of Jaipur B1649 Page No. 67


Advanced Database Management System Unit 3

form is also in 3NF, but a relation, which is in 3NF may or may not be
necessarily in BCNF.
The essential condition for a relational schema R to be in BCNF is that
“whenever any nontrivial FD X  A holds in R, then X must be a super key
of R”.
Let us understand the concept of BCNF with the help of relation schema
TEACH.
The Teach schema is composed of following attributes

student varchar  5  ,
Course varchar  5  ,
Teacher varchar  5 ).

In this relation there are two dependencies. One in which


(Student+Course)→Teacher, and second in which Teacher→Course.
In this example, it has been assumed that one teacher teaches only one
course. (Student+Course) is the primary key in this relation.

Figure 3.2: Teach Schema

In this example we will determine whether this table Teach (Figure 3.1) is in
BCNF or not. For this, we need to first check the first condition whether the
relation is in 3NF.Here the relation Teach is in 3NF. Hence first condition is
satisfied.
Manipal University of Jaipur B1649 Page No. 68
Advanced Database Management System Unit 3

Now let us check the second condition. According to the BCNF criteria, the
FD Teacher  Course which is of the form X  A must holds on this
relation Teach, and Teacher should be a superkey. But, here Teacher is not
a super key. Therefore this condition is not fulfilled. So we can say that the
relation is not in BCNF.
We have seen the relation Teach is not in BCNF but it is in 3NF. This
condition arises because, for a relation to be in 3NF, it must follow either of
the two conditions which are ‘either X should be a super key of R’ or ‘A
should be a prime attribute’. In this relation as Teacher is not a superkey,
first condition fails. But even then, the second condition is satisfied as
course is a prime attribute of R. Therefore, the relation is in 3NF, but not in
BCNF.
Comparison of BCNF with 3NF: For understanding the differences
between BCNF and 3NF, we must again carefully look back to the definition
of 3NF and BNF.
According to a 3NF definition “The condition for a relational schema R to be
in 3NF is if whenever a nontrivial functional dependency (FD)X→A holds in
R, then either of these two conditions must be fulfilled.”
1. X should be a super key of R.
2. A is an prime attribute
But, according to the definition of BCNF” the condition for a relational
schema R to be in BCNF is if whenever a nontrivial functional dependency
(FD) X→A holds in R, then X should be a super key of R”.
Thus we see that BCNF is more strict than 3NF. One can easily obtain a
3NF relational design without sacrificing the condition of lossless- join
dependency preservation.
But, it is not easy to achieve BCNF design, lossless join and dependency
preservation altogether. In such type of situations, in which, we cannot
achieve all three objectives together; we will have to chose 3NF, lossless-
join and dependency preservation.
Multi-Valued Dependencies (MVDs): MVD arises in situation where one
attribute value is possibly a ‘multi valued fact’ about some other attribute
within the same table. One special case of MVD is FD which you have
studied earlier. Therefore, every FD is an MVD.

Manipal University of Jaipur B1649 Page No. 69


Advanced Database Management System Unit 3

Now let us understand what MVD is with the help of few examples given
below. Let us consider the relational schema CSB with following structure.

(Stud _ name 10  ,


Course char 10  ,
Text _ book char 10 )

An instance of Relation CSB is shown in Table 3.20.


Table 3.20: Instance of Relation CSB

Stud_name Course Text_book

Brown First_Yr_Optics Phy - 1

Brown First_Yr_Mech Phy - 1

Green First_Yr_Optics Phy - 1

Green First_Yr_Mech Phy - 1

Brown Org_Chem Chem - 1

Brown Inorg_Chem Chem - 1

Jones French_litter French - 1

Jones French_grmr French - 1

In this relation the two attributes ‘Stud_name’ and ‘Text_book’ are


independent multi-valued facts about the attribute ‘course’. Therefore, we
can simply say that this relation contains multi-valued dependency. Here
Stud_name and Text_book are independent multi-valued facts about course
because the student has no control over the textbooks which are used for a
particular course.
Let us take one more example of a relation schema Emp_Profile with
following three attributes:

(Emp _ name char 15  ,


Equipment char 15  ,
Languagechar 15 ).

Manipal University of Jaipur B1649 Page No. 70


Advanced Database Management System Unit 3

An instance of the relation Emp_Profile is shown in Table 3.21.


Table 3.21: Instance of Relation Emp_Profile
Equipment Emp_name Language
PC Smith French
PC Smith German
Workstation Smith German
Workstation Smith French
Workstation Jones French
Workstation Jones German
Workstation Jones German

In this relation, Equipment and language are the two independent multi-
valued facts about employee_name. Therefore we can say that the relation
also contains MVD as shown in Figure 3.3 below.

Employee Equipment Employee Language

PC French
Smith Smith

Workstation German

French
Jones German
Jones Workstation
Spanish

Figure 3.3: Equipment and Language are Independent


Multivalued Facts about Employee

Note that this relation Emp_Profile is in BCNF. Here all the attributes are
required for the uniquely identifying the records, hence Emp_name +
Equipment +Language is the primary key. But, this relation still contains
redundancy problem.
Therefore we can further decompose it to higher normal form i.e. 4NF to
resolve the problem of redundancy. In the next section we will see the
definition of 4NF and how MVDs are associated with this.
3.4.5 Fourth normal form
The Fourth Normal form is the next higher normal form after 3NF/BCNF. It is
based on the concept of multi-valued dependency. A Multivalued
dependency arise in a condition where a relation contains at least ways

Manipal University of Jaipur B1649 Page No. 71


Advanced Database Management System Unit 3

three columns, one column has several rows whose values are similar to the
values of a single row of one of the other columns (See Table 3.22)
A more formal definition of MVD states that: “A multi valued dependency
exists if, for each value of an attribute A, there exists a finite set of values of
attribute B that are associated with A and a finite set of values of attribute C
that are also associated with A. Attributes B and C are independent of each
other.”
4NF - Addressing Multi-Valued Dependencies
Let us take the example of a relation Branch_Staff_Client (Table 3.22) which
contains information about the various clients for a bank branch, the various
staff who addresses the client's needs and the various requirements of each
client.
Table 3.22: Branch_Staff_Client

BranchNumber StaffName ClientName ClientRequiremen


t

B-41 Radha Surya A

B-41 Radha Ravi B

B-41 Smitha Surya B

B-41 Smitha Ravi A

The above relation contains MVD. In this relation, the Client name
determines the Staff name that serves the client and the Client name also
determines the client require¬ments. But Staff_name and
Client_requirement are not dependent on each other i.e. they both are
independent facts about Client_Name. Hence, there exists MVD.
Multi-valued dependencies in Branch_Staff_Client relation can be
symbolically represented as:
Clientname  StaffName
Clientname  ClientRequirements

Manipal University of Jaipur B1649 Page No. 72


Advanced Database Management System Unit 3

The necessary conditions for the Fourth Normal form are as follows:
1. The table should be in Boyce-Codd normal form
2. There should be no multi-valued dependencies.
Thus the 4NF basic objective is to eliminate multi-valued dependencies from
the relation. In order to remove multi-valued dependencies from a table, we
need to decompose the table and shift the related columns into separate
tables along with a copy of the determinant. This copy will serve as a foreign
key to the original table.
Table 3.23: Branch_Staff Table before Fourth Normal Form

BranchNumber StaffName ClientName

B-41 Radha Surya

B-41 Radha Ravi

B-41 Smitha Surya

B-41 Smitha Ravi

Table 3.24: Branch_staff Table after Fourth Normal Form

BranchNumber ClientName BranchNumber StaffName

B-41 Surya B-41 Radha

B-41 Ravi B-41 Smitha

3.4.6 Fifth normal form


Fifth normal form is the highest normal form used in relational database
designing. It is mostly used when there is a large relational database.
The Fifth Normal form was developed by an IBM researcher, Ronald Fagin.
According to the Fagin’s theorem, “The original table must be reconstructed
from the tables into which it has been decomposed.” 5NF allows
decomposing a relation into three or more relations.
Fifth normal form is based on the concept of join dependency. Join
dependency means that a relation, after being broken down into 3 or more
smaller relations, should be capable of being combined all over again on
similar keys to result in the creation of the original table. The join
dependency is more general form of multi-valued dependency.

Manipal University of Jaipur B1649 Page No. 73


Advanced Database Management System Unit 3

A relation (R) meets the condition of fifth normal form R1, R2 . Rn  if and
only if R is equal to the join of R1, R 2 . Rn

(Here, Ri are subsets of the set of attributes of R)


Any relation R is said to be in 5NF or PJNF) ( project  join normal form ) if
for all join dependencies in any case one of the following holds.

(a) (R1, R2…. Rn) is trivial join-dependency (that is, one of Rt is R)

(b) Every Ri is an candidate key for relation R.


Definition of Fifth Normal Form: A relation should be in fifth normal form
(5NF) if and only if all join dependency in the table is connoted by candidate
keys of the relation.
Table before Fifth Normal Form
Table 3.25: Dept-Subject

Dept. Subject Student

Comp. Sc. CP1000 John Smith

Mathematics MA1000 John Smith

Comp. Sc. CP2000 Arun Kumar

Comp. Sc. CP3000 Reena Rani

Physics PHI 000 Raymond Chew

Chemistry CH2000 Albert Garcia

Table after Fifth Normal Form


Table 3.26, 3.27 and 3.28 are formed after converting Table 3.25 into Fifth
Normal Form.

Manipal University of Jaipur B1649 Page No. 74


Advanced Database Management System Unit 3

Table 3.26: Dept-Subject

Dept. Subject Student

Comp. Sc. CP1000 John Smith

Mathematics MA1000 John Smith

Comp. Sc. CP2000 Arun Kumar

Comp. Sc. CP3000 Reena Rani

Physics PHI 000 Raymond Chew

Chemistry CH2000 Albert Garcia

Table 3.27: Subject-Student

Subject Student

CP1000 John Smith

MA 1000 John Smith

CP2000 Arun Kumar

CP3000 Reena Rani

PHI 000 Raymond Chew

CH2000 Albert Garcia

Table 3.28: Dept-Student

Dept. Student

Comp. Sc. John Smith

Mathematics John Smith

Comp. Sc. Arun Kumar

Comp. Sc. Reena Rani

Physics Raymond Chew

Chemistry Albert Garcia

Manipal University of Jaipur B1649 Page No. 75


Advanced Database Management System Unit 3

Self Assessment Questions


5. How does Normalisation help?
a) By eliminating various database anomalies
b) By minimising redundancy
c) By eliminating data inconsistency
d) All of the above
6. An attribute (column) is said to be _____ if its value can be determined
by any one or more attributes of the primary key, but not all.
7. A table which is in __________ normal form may contain
redundancies due to transitive dependencies.
8. The Fifth Normal form is usually useful when we have large relational
data models. (True/False)
9. The join dependency is more generalised form of __________
dependency.
10. An FD is a special case of an MVD and every FD is an MVD.
(True/False)
11. The fifth normal form is also called __________.

3.5 Normalisation and Database Design


Normalisation and database design are two closely integrated terms. In this
section, we will study about the relationship between the two. A database
design refers to the process of moving from the real-life business models to
the database model, which meets those requirements. Normalisation is one
such technique.
You have already studied in detail about normalisation. Normalisation as
you have learnt earlier is a technique that is used for designing relations in
which data redundancies are minimised.
By using the normalisation technique we want to design for our relational
database that has following set of properties:
1) It holds all the data required for the purposes that the database is to serve.
2) It must have as less redundancy as possible,
3) It must hold manifold values for types of data that requires them,
4) It must allow efficient updates of the data in the database, and
5) It must avoid the risk of accidental data loss.

Manipal University of Jaipur B1649 Page No. 76


Advanced Database Management System Unit 3

You have studied that there are mainly five normal forms. However, of
these, there are three forms that are most commonly used practically. These
three forms are the: first normal, second normal, and third normal. When
you convert an ER (Entity-Relationship) model in Third Normal Form (3NF)
to a relational model:
 Relations are referred as tables.
 Attributes are referred as columns.
 Relationships are referred as data references (primary and foreign key
references).
Third Normal Form is considered as the standard normal form the viewpoint
of the relational database model. Normalised database tables are easy to
maintain and also easily understood by the developers. However it is not
necessary that a fully normalised database is the best database design. In
most of the cases, it is suggested that database must be optimised up to
third normal form. Therefore we often require to denormalise our database
(you will study in detail about denormalisation in the next section 3.6)
relations so as to meet the optimum performance level. Therefore we can
say that an efficiently normalised database has the following advantages:
1. Simplified and easy data maintenance
2. Enhanced speed of data processing
3. Enhanced design quality
Self Assessment Questions
12. From a __________ point of view, it is standard to have tables that are
in Third Normal Form.
13. According to relational database rules, a completely normalised
database always has the best performance. (True/False).
14. Denormalisation is done to increase the performance of the database.
(True/False).

3.6 Denormalisation
Normalisation is implemented to preserve data integrity. Nevertheless, in a
real world project, you need some level of data redundancy for reasons
relating to performance or maintaining history.
During the normalisation process, you need to decompose database tables
into smaller tables. However if you create more tables, the database needs

Manipal University of Jaipur B1649 Page No. 77


Advanced Database Management System Unit 3

to execute more joins while solving queries. But remember joins has a poor
effect on performance. Hence, denormalisation is done to enhance the
performance.
Denormalisation is the process of converting higher normal forms to lower
normal forms with the objective of getting faster access to database.
Keep in mind that denormalisation is a common and essential element of
database design process, but it must follow appropriate normalisation.
Techniques used for Denormalisation: There are mainly four techniques
used for denormalisation. Given below is a brief summary of the techniques:
1. Duplicate Data: The most easy technique is the method of adding the
duplicate data into the relational table. Doing this will help to minimise
the number of joins which are required to execute a given query. It also
minimises the CPU and I/O resources being utilised as well as boosts up
the performance.
2. Summary data: Summarising the data stored in the relational database
table is another useful technique used for denormalising the database.
In this technique the records are summarised in some summary columns
thereby reducing the number of records stored in a table. This technique
enhances the database performance as now the database server needs
to process lesser records for a given query execution.
3. Horizontal partitioning: Horizontal Fragmentation is another
denormalisation technique in which the database table is split by rows.
This reduces the number of records per table and hence drives the
performance.
4. Vertical fragmentation: Vertical fragmentation breaks tables/relations
by columns. The method makes 2 or more than 2 tables by allocating
the original key to all and allocating a few of the non-key columns to
every newly made identical keyed table.
Self Assessment Questions
15. Denormalisation is a technique to move from higher to lesser normal
forms of database modelling in order to get faster access to
database.(True/ False)
16. __________ splits tables by rows, thus reducing the number of
records per table.

Manipal University of Jaipur B1649 Page No. 78


Advanced Database Management System Unit 3

Activity 2
What problems can you encounter when you decide to introduce some
denormalisation into your model?

3.7 Summary
Let us recapitulate the important concepts discussed in this unit:
 Normalisation is a technique used to design tables in which data
redundancies are minimised.
 Normalisation is based on the concept of normal forms i.e. 1NF, 2NF ,
3NF, BCNF , 4NF and 5NF.
 A partial dependency refers to a condition in which any attribute is
functionally dependent upon only a part of a multi-attribute Primary key .
 A transitive dependency is a condition where attribute is functionally
dependent on another non-key attribute.
 A table is in 1NF when every key attributes is defined and when all the
remaining attributes are dependent upon Primary key .
 A table is in 2NF if it is in 1NF and contains no partial dependencies.
 Boyce-Codd ( BCNF ) is a special case of 3NF in which all the
determinant keys are also candidate keys.
 Denormalisation is done to enhance the performance of the database.

3.8 Glossary
 Boyce-Codd normal form (BCNF): When each determinant in a
relation is a candidate key, a relation is said to be in Boyce-Codd
Normal Form ( BCNF ).
 First normal form: A database is said to be in if each of the values of
all the attributes in a relation are atomic in nature.
 Functional dependency: Two attributes A and B in any relation R are
said to possess a functional dependency (FD) if for each distinct value of
A, there is only one value of attribute B associated with it.
 Normalisation: It is the process of obtaining good database design by
decomposing the relations into normal forms based on functional
dependencies.

Manipal University of Jaipur B1649 Page No. 79


Advanced Database Management System Unit 3

 Second normal form: If every non-key attribute of a relation schema is


fully FD (functionally dependent) on the key then the relation is said to
be in 2NF.
 Third normal form: A relation is said to be 3NF if it is in 2NF and no
and does not contain transitive dependencies.

3.9 Terminal Questions


1. Explain the various types of database anomalies.
2. Define functional dependency. Give examples.
3. What is normalisation? Explain why normalisation is required in
database design.
4. Explain 1NF with a suitable example.
5. Explain second normal form with example.
6. Explain transitive dependencies with examples. Show how these are
significant in designing databases.
7. What is a third normal form? Give example.
8. Explain how BCNF and 3 NF differ.
9. What is fourth normal form and fifth normal form? Explain with an
example.
10. Write a short note on denormalisation.

3.10 Answers
Self Assessment Questions
1. True
2. Schema
3. (b)Inconsistency
4. Redundancy
5. (d) All of the above
6. Partially dependent
7. Second
8. True
9. Multi-valued
10. True
11. Project-Join Normal Form (PJNF).
12. Rational model
13. False

Manipal University of Jaipur B1649 Page No. 80


Advanced Database Management System Unit 3

14. True
15. True
16. Horizontal Fragmentation
Terminal Questions
1. There are mainly three types of anomalies in database: first is
redundancy, second is inconsistency and the third is update. Refer
Section 3.3 for more details.
2. Functional dependency is a type of constraint in which attribute is
dependent upon another attribute. . Refer Section 3.2 for more details.
3. Normalisation is the process of designing a good database by
converting it into various normal forms by eliminating all the database
anomalies. Refer Section 3.4 for more details.
4. In, 1NF, all attribute values of a relation are atomic in nature. Refer
Section 3.4 for more details.
5. When all the non-key attributes of a relational schema are fully
functionally dependent on the primary key then that relation is said to
be in 2NF. Refer Section 3.4 for more details.
6. A transitive dependency is a condition where one attribute is
functionally dependent on another non-key attribute. Refer Section 3.4
for more details.
7. A table is said to be in 3NF if it is in 2NF and also it does not contains
any transitive dependencies. Refer Section 3.4 for more details.
8. Boyce-Codd (BCNF) is a strict case of 3NFwhere all the determinant
keys are also candidate keys. Refer Section 3.4 for more details.
9. A 4NF table necessarily has two conditions i.e. firstly it must be in
Boyce-Codd normal form and secondly it must be free from any multi-
valued dependencies. Refer Section 3.4 for more details.
10. Denormalisation is done to enhance the performance of a normalised
database. Refer Section 3.6 for more details.

3.11 References
 Peter Rob, Carlos Coronel, "Database Systems: Design,
Implementation, and Management", (7thEd.), Thomson Learning
 Silberschatz, Korth, Sudarshan, "Database System Concepts", (4th Ed.),
McGraw-Hill

Manipal University of Jaipur B1649 Page No. 81


Advanced Database Management System Unit 3

 Elmasari Navathe, "Fundamentals of Database Systems",


(3rd Ed.), Pearson Education Asia
E-references
 http://www.techbaba.com/q/2494-denormalization+database.aspx
 http://www.gantthead.com/process/popup.cfm?ID=23451

Manipal University of Jaipur B1649 Page No. 82


Advanced Database Management System Unit 4

Unit 4 Query Optimisation


Structure:
4.1 Introduction
Objectives
4.2 Query Execution Algorithm
External sorting
Implementing the SELECT operation
Methods to implement JOIN operation
Project and Set operations implementation
Aggregate operations implementation
4.3 Heuristics in Query Optimisation
Notation for query trees and query graphs
General transformation rules for relational algebraic operations
Conversion of query trees into the query execution plans
4.4 Semantic Query Optimisation
4.5 Multi-Query Optimisation and Application
4.6 Execution Strategies for SQL Sub Queries
4.7 Query Processing for SQL Updates
4.8 Summary
4.9 Glossary
4.10 Terminal Questions
4.11 Answers

4.1 Introduction
You have already studied DBMS and SQL in the previous units. Hence,
query optimisation which we are going to cover in this unit will not be new to
you as it is related to DBMS.
Query optimisation is a technique that helps the DBMS to reduce the query
execution time. Nowadays, every database software supplies optimising
SQL compilers that firstly analyses the SQL query, if required then rewrites
the query, and finally develops an optimal query to retrieve the data from the
database. This module of SQL compiler is called Query Optimiser. This
optimisation is based on the different optimisation rules devised on the
criteria of cost of each operation on the query.

Manipal University of Jaipur B1649 Page No. 83


Advanced Database Management System Unit 4

We start our discussion with the overview of various algorithms for query
operations in the context of an RDBMS. It will also cover the discussion on
the heuristic in query optimisation. We will further study a brief overview of
the semantic query optimisation, multi-query optimisation and application.
The latter part of the unit deals with execution strategies for SQL sub
queries and query processing for SQL updates.
Objectives:
After studying this unit, you should be able to:
 describe the algorithms for executing query operations
 discuss the heuristics in query optimisation
 explain briefly semantic query optimisation
 identify multi-query optimisation and application
 explain the execution strategies for SQL sub queries
 discuss query processing for SQL updates

4.2 Query Execution Algorithm


RDBMS provides various algorithms for implementing the different types of
relational operations that appear in a query execution strategy.
Different types of algorithms that are used by many relational operations are
like external sorting merge sort and many more to implement operations like
SELECT, JOIN, PROJECT (UNION, INTERSECTION, SET DIFFERENCE)
and aggregate operations MIN, MAX, COUNT, AVERAGE and SUM. Let’s
discuss these in detail.
4.2.1 External sorting
Sorting is one of the primary algorithms used in query processing. It is used
to reorder data into a new desired sequence. It may be avoided if an index
already exists to facilitate ordered access to the records.
Internal sorting uses main memory so it is fast but also expensive while on
the other hand, external sorting is slow and cheaper as it uses secondary
storage devices. External sorting is appropriate for large files stored on the
disk as can’t be completely fitted in the main memory. Internal sorting
algorithm is suitable for sorting data structures that can fit completely in
memory.

Manipal University of Jaipur B1649 Page No. 84


Advanced Database Management System Unit 4

An external sorting algorithm makes use of a sort-merge strategy. Sort-


merge strategy divides the main file into sub-files of smaller size termed as
runs. Now these runs are sorted first and then merged to make larger runs.
These larger runs are again sorted in turn. The external sorting needs buffer
space in main memory to execute the actual sorting and merging of the
runs.
External Sorting algorithm carries out the operation in following two phases:
Sorting Phase and Merging Phase.
Phase 1: Sorting phase: In this phase, the runs are read into the main
memory. Over there the runs are sorted by using internal sorting algorithm
and the result is written back to a disk as temporary sorted runs. The
number of initial runs and the size of a run (nR) are governed by the number
of file blocks (b) and available buffer space (nB).
For instance,
if nB = 5 blocks
b = 1024 blocks
Then, nR = [(b/nB)] = 205 initial runs each of size 5 blocks.
Hence after the sort phase, 205 sorted-runs are stored as temporary sub-
files on disk.
Phase 2: Merging phase: In this phase, merging of the sorted runs is
carried out over one or more phases. The number of runs that can be
merged together in each pass is termed as the degree of merging (dM). In
each pass, a buffer block is required to hold one single block from every
runs being merged and one block is required for containing one block of the
final result.
Therefore, it can be concluded that
 dM is smaller of (nB – 1) and nR, and
 number of passes = [log dM (nR)].
Let’s continue with the above example;
with dM =4 (four way merging) our above calculated 205 initial sorted runs,
in the first phase would be merged into 52 runs which are further merged
into 13 runs, later on 4 and then finally 1 run. So it means that total 4 passes
are required.

Manipal University of Jaipur B1649 Page No. 85


Advanced Database Management System Unit 4

The worst case performance comes with minimum value of dM as 2. And


the number of block accesses will be [(2*b) + (2*(b*(log dM b))]. Here, the 1st
term constitutes the number of block accesses in sort phase since each file
block is accessed at two times, first for reading into memory and second for
writing the record blocks, after sorting, to disk. The 2nd term symbolises the
number of blocks accesses for the merge phase, presuming the worst-case
scenario of dM 2.
4.2.2 Implementing the SELECT operation
SELECT (represented by symbol σ) operation performs the task of retrieving
the desired records from the database.
Now, let’s generate a query for selecting all the records of EMPNO 3276
from table EMP (Table 4.1

(Query1):
EMPNO‘3276’ EMP 
Table 4.1: Instance of EMP Table

ENAME EMP NO DNO SEX SALARY


Mahesh 1234 1 M 30000
Kartik 3276 4 M 4000
Gaurav 4278 5 M 10000
Jaya 2753 3 F 5000
Kiran 3721 6 F 15000

If you want to retrieve all the records with DNO more than 4 from table
DEPT (Table 4.2), then the query will be framed like:

(Query2):
DNO 4 EMP
Table 4.2: Instance of DEPT Table
DNAME DNO MGRENO
Research 4 4001
Production 5 5001
Sales 1 1001

If you want to select all the records from table EMP where DNO field is 2,
then it can be written as:

 DNO  2
EMP
(Query3):

Manipal University of Jaipur B1649 Page No. 86


Advanced Database Management System Unit 4

SELECT operation can retrieve data with multiple criteria’s as well. For
example, to select all the records about the female employees whose
SALARY is more than 10000 from department 3 with reference to EMP
table; will be framed like:

(Query4):  DNO  3 AND SALARY  10000 AND SEX  ‘F’


EMP 
Search Methods for Simple Selection: To select records from the file a
possibilities of numerous search algorithms exists. These are termed as file
scans as they scan entire file and then retrieve the records satisfying the
user defined condition. And if the search algorithm carries an index, then
that search is termed as index scan. The basic search algorithms are
explained below.
 Linear search (brute force): It retrieves all the records from the file and
checks whether the attribute values qualify the criteria or not. This
process is carried out till the end of the file. So we can say above

mentioned query 1
EMPNO‘3276’ EMP  may be executed using the linear
search algorithm.
 Binary search: It does not take care of the ordering done on some field
of the file. It checks for the equality on which ever key attribute file is
ordered. Binary search is faster and more efficient than the linear search
as the search space is reduced to half in each comparison.
So, we can say that if in EMP file the EMPNO is the ordering attribute
then query 1
EMPNO‘3276’ EMP  might have used binary search.

 Using primary index (or Hash Key): If the query checks for equality
between the key attribute and the primary index (or hash key), then hash
search might be used. If in query 1, in EMP file, the EMPNO is hash
indexed, then hash search may be used on this field. On using the
primary index (or hash key) at most a single record is only retrieved.
Depending upon the cost associated with each method, the query optimiser
picks up the most appropriate method for executing a SELECT operation.
Now let us extend our discussion to JOIN operation.

Manipal University of Jaipur B1649 Page No. 87


Advanced Database Management System Unit 4

4.2.3 Methods to implement JOIN operation


JOIN operation is used to join two database tables/relations. It is a time
consuming operation. JOIN operation is in the form of ; where
M and N are domain compatible attributes of X and Y files respectively.
There are various algorithms for implementing JOIN operations. Some of
these are discussed below:
 Nested loop join (brute force): Retrieve each and every record from
(inner loop) Y of each record t in (outer loop) X. After this, check if both
records meet the condition; t [M]= s[N].
 Single loop join (using an access structure to retrieve the matching
records): If an hash key or index is there for any one of the two join
attributes (let us say N of Y) retrieve each and every record t in X, (one
at one time). After that utilise the access structure to directly retrieve all
matching records s from S satisfying condition; s [N] = t [M].
 Sort-merge join: JOIN can be implemented more efficiently if the
records from X and Y files are physically sorted by the join attribute
value M and N respectively

 Hashjoin: All the records from both the files R and S are hashed to a
single hash file by utilising the same function on the join attributes M of
X and N of Y as hash keys.
Firstly, the partitioned phase takes places where the records of the file
with less entries/records (say X) hash its entries to the hash file bucket.
Here, the records of R are sent into the hash buckets.
Secondly, the probing phase takes place, where each record of another
file (say Y) is hashed and added to the bucket. Now the matching
records from X are combined in that bucket.
4.2.4 Project and Set operations implementation
By defining the required attributes, PROJECT operation can select subset of
  attribute list  R 
attributes from a relation. In the project operation
implementation if < attribute list > carries the key of relation R then the
output produces same number of tuples as that of R but with only the
attribute values in the <attribute list>.

Manipal University of Jaipur B1649 Page No. 88


Advanced Database Management System Unit 4

Now, if the < attribute list> is without key of R, then the output will produce
duplicate tuples. These duplicate tuples must be deleted by sorting. After
sorting, wherever duplicate tuples appear consecutively, it is eliminated.
Among the Set operations (UNION, INTERSECTION, SET DIFFERENCE
and CARTESIAN PRODUCT), the CARTESIAN PRODUCT operation R x S
is most costly. It is because of two reasons, firstly its output carries a
combination of records from R and S. Secondly, all the attributes of R and S
are present in the output.
For example, if we have two relations R and S as shown in the table,

Relation Records Attributes


R i x
S j y

CARTESIAN PRODUCT output of the above table will carry (i*j) records and
(x+y) attributes. As it carries many records so it’s wise to avoid CARTESIAN
PRODUCT operation and go for some different equivalent operation.
Set operations like UNION, INTERSECTION and SET DIFFERENCE can
be applicable only to union compatible relations. These relations must have
same number of attributes and must be from same attribute domain.
Relations where Union operation can be performed must have same
attributes and that to from the same domain.
4.2.5 Aggregate operations implementation
Sometimes you need to find out certain statistical values from the table.
Aggregate Functions are functions that provide mathematical operations. If
you need to add, count or perform basic statistics, then these functions are
of great help.
SQL has many built-in functions for performing calculations on data. SQL
Aggregate Functions are the functions that return a single value, calculated
from values in a column which is selected from the table. Some of the useful
aggregate functions are:
AVG() - The AVG() function returns the average value of a numeric column.
COUNT() - Returns the number of rows
MAX() - The MAX() function returns the largest value of the selected column
MIN() - The MIN() function returns the smallest value of the selected column

Manipal University of Jaipur B1649 Page No. 89


Advanced Database Management System Unit 4

SUM() - The SUM() function returns the total sum of a numeric column.
Let us take one example. The SQL syntax for using MAX() is:
SELECT MAX(column_name) FROM table_name

Table STD.

STD_Id STUDENTS SUBJECTS MARKS


1 Seema Maths 95
2 Sangeetha Maths 91
3 Seema Physics 97
4 Monika Physics 55
5 Sangeetha Physics 63
6 Seema Chemistry 70

As an example, let’s consider a SQL query for the table STD:


SELECT MAX (MARKS) AS MAXIMMARKS
FROM STD
Where STD is the table name and MARKS is the column name of the table
STD. The above statement would select the highest (maximum) Marks from
the column MARKS of the table STD and result would look like this:

MAXIMMARKS

97

MIN operation will work in the same way. If a SQL query is made for table
STD as:
SELECT MIN (MARKS) AS MINMARKS
FROM STD
Then the above statement would select the lowest (minimum) Marks from
the column MARKS of the table STD and result would look like this:
MINMARKS
55

Similarly AVG(),COUNT(),SUM() would give the result as stated above.

Manipal University of Jaipur B1649 Page No. 90


Advanced Database Management System Unit 4

Aggregate functions often need an added GROUP BY statement. The


GROUP BY statement is used in conjunction with the aggregate functions to
group the result-set by one or more columns.
The SQL syntax of GROUP BY statement is:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name

If you want to find the sum total of each students from the table STD, then
you have to use the GROUP BY statement to group the STUDENTS.
If you use the following SQL statement:
SELECT STUDENTS, SUM (MARKS) FROM STD
GROUP BY STUDENTS
Then the result will look like this:

STUDENTS SUM (MARKS)


Monika 55
Sangeetha 154
Seema 262

Self Assessment Questions


1. ______________ may be avoided if an appropriate index exists to allow
ordered access to the records.
2. Relations are said to be Union compatible if they have same
_____________ and that to from same domain.
3. JOIN operation is the most time consuming operation. (True/False)
4. _____________ is another name given to search algorithms.

Activity 1
Discuss the reasons for converting SQL queries into relational algebra
queries before optimization is done.

Manipal University of Jaipur B1649 Page No. 91


Advanced Database Management System Unit 4

4.3 Heuristics in Query Optimization


In general, heuristics means a good idea to make best decision. We use
heuristics to solve our daily life problems. Heuristics can be defined as rules
governed from common sense rather than from an exhaustive methodology.
In query optimisation heuristic rule defines internal representation of the
query. This representation may be in the form of query graph or query tree.
It is done with the objective of performance improvement. All high level
programs initially generate an internal representation, which is further
optimised as per the heuristic rules. Later on, depending upon the access
path specified in the query, the query execution plan is generated.
The basic heuristic rule says that before applying JOIN operation or any
other binary operation, SELECT and PROJECT operations must be applied.
This is done because SELECT and PROJECT operations reduce the size of
a file while JOIN operation increases the file size.
Query graph and query tree are the data structures used for queries internal
representation. Query graph represents relational calculus expression. And
query tree represents relational algebra.
4.3.1 Notation for query trees and query graphs
A query tree is employed to represent relational algebra expressions. Here,
the leaf nodes symbolize query’s input relations, and the internal nodes
denotes the relational algebra operations.
Query tree execution consists of an internal node operation. On the
operation execution the operands are replaced from the internal nodes by
the resultant relation. The operation reaches its final stage on the execution
of the root node and generates the resultant relation for the query.
Figure 4.1 illustrates one query tree for query block ‘Q’. For all projects from
the ‘Sikkim’, retrieve project number, controlling dept. Number, and
manager’s surname, address and birth date.
Q: SELECT P.PNUMBER, P.DNUM, E.ENAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P, DEPT AS D, EMP AS E
WHERE P.DNUM=D.DEPTNO
AND D.MGREMPNO=E.EMPNO
AND P.PLOCATION=‘Chennai’;

Manipal University of Jaipur B1649 Page No. 92


Advanced Database Management System Unit 4

(a)  P.PNUMBER, P.DNUM, E.ENAME, E.ADDRESS, E.BDATE

(3)
D.MGREMPNO=E.EMPNO

(2) E
P.DNUM=D.DEPTNO

(1)
P.DNUM=D.DEPTNO D

(b)  P.PNUMBER, P.DNUM, E.ENAME, E.ADDRESS, E.BDATE

 P.DNUM=D.DEPTNO AND D.MGREMPNO=E.EMPNO AND P.PLOCATION = “Sikkim”

x
E

D
P

Figure 4.1: Two Query Trees for the Query Q

In Figure 4.1(a), the leaf nodes P, D and E represents PROJECT, DEPT


and EMP relations respectively. And the operations are represented by the
internal tree nodes. On the execution of this query tree, all the nodes
marked (1), (2) and (3) in Figure 4.1(a) will operate sequentially as the
resulting tuples of preceding node will be the input of the following node.
Query tree arranges the operations in specific order for query execution.

Manipal University of Jaipur B1649 Page No. 93


Advanced Database Management System Unit 4

Figure 4.2: Query Graph for Q2

Figure 4.2 represents the query graph for the relational algebra expression
given below:
◊PNUMBER, DNUM, ENAME, ADDRESS, BDATE
((( PLOCATION= ‘Chennai’ (PROJECT))
DNUM=DEPTNO (DEPARTMENT))
MGREMPNO=EMPNO(EMP))
In the query graph as shown in Figure 4.2 the single circles represents the
relations whereas the double nodes denotes the constant values. The graph
edges represent the selection and join conditions. And the square brackets
specify the attributes to be retrieved from each relation.
Moreover, there are no order preferences for executing an operation in case
of query graph. In correspondence to every query only a single graph can
be created.
4.3.2 General transformation rules for relational algebraic operations
If two operations produce same result they are said to be equivalent.
Interestingly, two relations can also be considered as equivalent if they have
the same set of attributes in a different order but represents the same
information.
Many rules exist to transform relational algebra operations into equivalent
ones. The symbols used over there are defined in the table below.

Manipal University of Jaipur B1649 Page No. 94


Advanced Database Management System Unit 4

Table 4.3: Notations for Various Relational Algebra Operations

Relational Algebra Operation Symbol


SELECT σ
PROJECT П
JOIN

Union U
Interaction ∩
Cartesian Product x

Let’s study some of the transformation rules used in query optimisation:


Explanation
Transformation
rule
Sequence or A conjunctive selection condition can be broken up into a cascade (sequence)
Cascade of σ of individual σ operation
c1ANDc2 ANDANDcn R   c1 (c2 ((cn R )))

here c1….cn are the relational conditions and R is the Relation.


Commutativity of σ The σ operation is commutative.
c1 (c2 R )  c2 ( c1R )

Sequence or In a cascade (sequence) of π operations, all but the last one can be ignored:
Cascade of π
πList1(πList2(…..(πListn(r(N)))…….))= πList1(r(N))
Commuting π with π If the selection condition c involves only those attributes Al,…, An in the
projection list, the two operations can be commuted:
A1, A2…An (σc(R)) = σc (πA1, A2… An (R))
Commutativity of The operation is commutative, as is the x operation:
(and x)
R cS = S c R

R×S=S×R
Note: The order of attributes can vary in the resulting relation as compared to
the original relations.
Associatively of c These four operations are individually associative; that is, if q stands for any
,x, U and ∩: one of these four operations (throughout the expression), we have:
R q (S q T) = (R q S) q T
Distribution σ with The operation commutes with È, Ç and x. If q stands for any one of these
set operations three operations (throughout the expression), we have:
σc (R S) = (σc (R)) q (σc (S))
Converting a (σ, x) If the condition c of a σ that follows an x corresponds to a join condition,
sequence into convert the (σ, x) sequence into as follows:
(σ c (R x S)) = (R c S)
There are other possible transformations. For example, a selection or join
condition c can be converted into an equivalent condition by using the
following rules (DeMorgan’s laws):
NOT (c1 AND c2) = (NOT c1) OR (NOT c2)
NOT (c1 OR c2) = (NOT c1) AND (NOT c2)

Manipal University of Jaipur B1649 Page No. 95


Advanced Database Management System Unit 4

4.3.3 Conversion of query trees into the query execution plans


Query tree is represented by execution plan for a relational algebra
expression. It holds information about the algorithm to be used and the
access methods available. This information is helpful to compute the
relational operators existing in the query tree.
To understand this, let’s consider a query Q2:
SELECT Fname, LName, Address
FROM DEPARTMENT, EMPLOYEE
WHERE D.Dname=’Research’
AND D.DNumber=E.DNO
The transformation of query Q2 into the relational algebra expression will be
like:

π FNAME, LNAME, ADDRESS (DNAME=‘RESEARCH’ (DEPARTMENT)


DNUMBER=DNDEMPLOYEE)

Moving ahead, the query tree for query Q2 is shown in Figure 4.3.
πFname, Lname, Address

∞Dnumber=Dno

Dname= ‘Research’ EMPLOYEE

DEPARTMENT

Figure 4.3: A Query tree for query Q2

For the conversion of this tree into an execution plan, following are the
requirements of the optimiser:
 index search for the SELECT operation
 table scan as access method for EMPLOYEE,
Manipal University of Jaipur B1649 Page No. 96
Advanced Database Management System Unit 4

 nested loop join algorithm for JOIN,


 scan of JOIN result for the PROJECT operator.
Additionally, for query execution a pipelined or materialised evaluation can
also be taken into account.
Materialised (stored) evolution specifies that the result of operations can be
stored as temporary relation (table). For example, the output of JOIN
operation can be stored as a temporary relation (table), which further is read
as input for the PROJECT operation, to produce the resultant query table.
Pipelined evaluation results into the cost savings. This is because the
intermediate results need not to be saved to the disk and not having to read
them back for the next operation.
Self Assessment Questions
5. The size of the file can be reduced by SELECT and ____________
operations.
6. _____________ represents a relational calculus expression.
7. The query graph representation also indicates an order in which
operations perform first. (True/False).

4.4 Semantic Query Optimisation


Semantic query optimisation is a technique to modify one query into another
query by using the relational database constraints. These constraints may
be unique attributes or much more complex constraints. This technique is
used for the efficient execution of the query.
Let’s discuss this approach with the help of an example given below:
Select E.LNAME, M.LNAME
FROM EMPLOYEE AS E, EMPLOYEE AS M
WHERE E.SUPERNO=M.ENO AND E.SALARY>M>SALARY
This query retrieves the names of employees who earn more than their
supervisors. If a constraint is applied on the database schema to check that
none of the employee can earn more than his reporting supervisor; the
semantic query optimiser checks for this constraint and may not execute the
query if it knows that the resultant query will be empty. If the constraint
check is done efficiently then this approach can save a lot of time.

Manipal University of Jaipur B1649 Page No. 97


Advanced Database Management System Unit 4

Self Assessment Questions


8. Semantic query optimisation helps in efficient query __________ by
modifying one query into another.
9. Relational database constraints are used in semantic query
optimisation technique. (True/False)

4.5 Multi-Query Optimisation and Application


Query optimisers have been very useful for the success of relational
database technology. For data stream systems the risk is even higher. As
compared to relational DBMS one-shot queries, the stream system
processes multiple continuous queries simultaneously. The risk involved in
data stream systems is higher. These queries can process massive streams
in real time and are active for longer time period. A query performance can
be affected depending upon the query implementation method. For good
stream processing performance the key is to optimise multiple queries
together, instead of optimising them individually.
In stream query processing, the workload is shared among concurrently
multiple active queries by sharing computation and state. Query evaluation
techniques that do not follow this property are known as MQO (Multi- Query
Optimisation) techniques. MQO saves the evaluation cost and execution
time by executing the common operations once over a set of queries. MQO
offers significant improvement to the system performance.
Consider the following two queries that retrieve information from an order
processing database.
a)
SELECT name, custkey, orderkey, orderdate, totalprice
FROM customer, orders, lineitem
WHERE orders.custkey = customer.custkey
AND lineitem.orderkey = orders.orderkey
AND lineitem.quantity = '24';
b)
SELECT name, custkey, orderkey, orderdate, totalprice
FROM customer, orders, lineitem

Manipal University of Jaipur B1649 Page No. 98


Advanced Database Management System Unit 4

WHERE orders.custkey = customer.custkey


AND lineitem.orderkey = orders.orderkey
AND lineitem.quantity = '24'
AND orders.orderstatus = ‘shipping’;
The first query retrieves customer and order information for the specific
quantity of items ordered. The second query also retrieves the same
information but only for those whose order status is shipping.
Second query’s output is a subset of the first query output, so its
computation is fast. MQP (Multiple Query processing) helps in optimising
the result by first finding out the customers whose lineitem quantity is 24 and
then utilising this information by applying additional constraint to check that
the orderstatus is shipping.
Multi-query optimisation technique can be used for framing proficient
algorithms for problems like view/index selection, query result caching and
maintenance.
For example, multiple query optimisations can be applied to mobile
database system to pull (on-demand) batches requests. Several queries can
be answered at once by the resulting view. This is broadcasted over a view
channel dedicated to common answers of multiple queries instead of
transmitting over individual downlink channels.
Efficient and extensible algorithm for multi-query optimisation
Multi-query optimisation targets common sub-expressions to be removed to
minimize the evaluation cost. This can be achieved with the help of multiple
cost-based heuristic algorithms designed particularly for multi-query
optimisation. Greedy algorithm is the simplest algorithm that falls under this
category.
Greedy algorithm is a cost-based heuristic algorithm. Depending upon the
current situation it makes the decision and never reconsiders this decision
again, whatever situation may arise later. To find an optimal solution,
Greedy algorithm selects a set of nodes to be materialised and then
concludes the decision. It is a repetitive task to be carried over different sets
of nodes to find the best set of nodes to be materialised.

Manipal University of Jaipur B1649 Page No. 99


Advanced Database Management System Unit 4

A greedy strategy works in a top-down manner. It reduces each problem


into a smaller one by making one greedy choice after another. This
approach turns out to be good strategy in some cases and sometimes does
not offer optimal solutions, but only provides a compromise that produces
acceptable approximations.
Self Assessment Questions
10. The key to achieving good stream processing performance is to
optimise ____________ together.
11. MQO (Multi Query Optimisation) saves the evaluation cost and
execution time by executing the common operations once over a set of
queries (True/False)

Activity 2
With the help of internet find out some more practical applications of
multi query optimisation.

4.6 Execution Strategies for SQL Sub Queries


As discussed in unit-2 sub queries are a powerful addition to the SQL
language. In different application they help in query creation for example
decision support or automatic query formulation by ad-hoc reporting tools.
Different physical execution strategies are employed by query optimiser for
the various logical query plan options.
Two types of strategies are there for sub query execution:
(1) navigational strategies
(2) set-oriented strategies
 Navigational strategies: For executing sub- query, navigational
strategies depends on the nested loops joins. Basically there are two
classes of navigational strategies: forward lookup and reverse lookup.
Forward lookup firstly starts executing the outer query and when outer
rows are generated then it invokes the sub-query. Reverse lookup starts
with the sub-query and processes one sub-query row at one time.
 Set-oriented processing finally needs that the query could be
effectively de-correlated. If this is the situation, set operations for
example hash, merge and join can execute the query.

Manipal University of Jaipur B1649 Page No. 100


Advanced Database Management System Unit 4

Self Assessment Questions


12. ___________ rely on nested loop joins for implementation.
13. ____________ works reversely it starts with sub query first and after
that executes the outer query.

4.7 Query Processing for SQL Updates


Data update operations are considered as the most important part of user
applications. The proficient processing of these operations is equally
important as the processing of data retrieval operations. When data
insertion, deletion or modification is performed in a database, the DBMS is
required to confirm the changes made by the stated constraints. Also it
preserves the basic storage structures in order to maintain a consistent as
well as correct representation of data.
It is required to provide validation to update operations against stated
constraints like Check, uniqueness, etc. Also these operations are required
to preserve the basic storage structures.
Modelling updates in query processing
An algebraic model is a base for SQL query processor. It supports new
operator addition as per the requirement. The trees related to those
operators are manipulated by the structure. Now let us discuss some
concepts related to update processing approach:
Delta stream: It is defined as a set of rows which is used to encode the
changes to a specific base table. This is just a relation with a precise
schema. Any relational operator can process it.
Application of delta stream: Stream Update operator is considered as a
side-effecting operator. This operator for every input row, sends data
modification instruction to the storage engine. An update instruction carries
Stream Update multiple instances. This is further used for maintaining
physical structures of different types.
In Figure 4.4, we have shown a general template used for the
implementation plan of update statements.

Manipal University of Jaipur B1649 Page No. 101


Advanced Database Management System Unit 4

Figure 4.4: General Template for the Execution Plan of Update Statement

It incorporates two components:


 The first is read-only and in charge for delivering a delta stream that
specifies the changes to be applied to the target table.
 The second component consumes the delta stream, applying the
changes to the base table, and then performs all the actions that the
DML (Data Manipulation Language) statement implicitly fires.
The action series to be executed is evaluated by checking the entire active
dependencies opposed to the target table, and then filtering depending upon
the current statement requirement.
Self Assessment Questions
14. It is required to validate update operations against stated relational
database constraints (True/False)
15. ____________ is defined as a set of rows that encode the changes
made to a specific base table.

4.8 Summary
Let us recapitulate the important points of this unit:
 Sorting is one of the primary algorithms used in query processing. It is of
two types: internal sorting and external sorting.
 There are multiple categories of query execution algorithms such as
external sorting, binary search, linear search, hash-key search or
primary index etc.

Manipal University of Jaipur B1649 Page No. 102


Advanced Database Management System Unit 4

 SELECT (represented by symbol ) operation performs the task of


retrieving the desired records from the database. There are various
search methods such as linear search, binary search and primary index.
 JOIN operation is used to join two database tables/relations. There are
various algorithms for implementing the JOIN operations such as Nested
loop join, single loop join, sort-merge join and hash join.
 Query graph and query tree are the data structures used for queries
internal representation.
 A Multi Query Optimisation (MQO) is a methodology of optimising a
group of SQL queries altogether with the help of common sub-
expressions to save cost and time.
 There are two types of sub- query optimisation strategies. First, is the
navigational strategy and second is the set-oriented strategy.

4.9 Glossary
 Greedy algorithm: A Greedy algorithms is a cost-based heuristic
algorithm.
 Mobile database system: A mobile database is a database that can be
connected to by a mobile computing device over a mobile network.
 Multi-query optimisation: A Multi Query Optimisation (MQO) is a
methodology of optimising a group of SQL queries altogether with the
help of common sub-expressions to save cost and time.
 Query optimisation: Query optimization refers to the procedure for
selecting the best execution strategy amongst the various options
available
 Query tree: A query tree is a data structure to represent relational
algebra expressions.

4.10 Terminal Questions


1. Explain the various heuristics involved in query optimisation.
2. Explain the various algorithms for executing query operations
3. Write a short note on Semantic query optimisation.
4. Describe multi-query optimisation and its application.
5. What are the various execution strategies for SQL sub-queries?

Manipal University of Jaipur B1649 Page No. 103


Advanced Database Management System Unit 4

4.11 Answers
Self Assessment Questions
1. Sorting
2. Attributes
3. True
4. File scans
5. PROJECT
6. Query graph
7. False
8. Execution
9. True
10. Multiple queries
11. True
12. Navigational strategies
13. Reverse lookup
14. True
15. Delta stream
Terminal Questions
1. Heuristics approach to query optimisation uses heuristic rules and
algebraic techniques to improve the efficiency of query execution. Refer
Section 4.3 for more details.
2. SELECT, JOIN, PROJECT (UNION, INTERSECTION, SET
DIFFERENCE), and aggregate operations (MIN, MAX, COUNT,
AVERAGE, SUM) are various algorithms for query execution. Refer
Section 4.2 for more details.
3. Semantic query optimisation is a different approach to query
optimisation that uses various constraints to modify one query into
another to make it more efficient to execute. Refer Section 4.4 for more
details.
4. To achieve good stream processing performance, multiple queries are
optimised together, rather than individually. This is multi-query
optimisation. Refer Section 4.5 for more details.
5. There are mainly two techniques for SQL sub-queries execution namely
navigational strategies and set-oriented strategies. Refer Section 4.6 for
more details.

Manipal University of Jaipur B1649 Page No. 104


Advanced Database Management System Unit 4

References:
 Elmasri, Navathe, Somayajulu, Gupta, (2006) Fundamentals of
Database Systems, (6th Ed.), India: Pearson Education.
 Peter Rob, Carlos Coronel, (2004). Database Systems: Design,
Implementation, and Management, (7th Ed.), US: Thomson Learning.
 Silberschatz, Korth, Sudarshan, (2011). Database System Concepts,
(6th Ed.), McGraw-Hill.
E-references
 http://www.cs.iusb.edu/technical_reports/TR-20080105-1.pdf
 http://research.microsoft.com/pubs/76059/pods98-tutorial.pdf
 http://infolab.stanford.edu/~hyunjung/cs346/ioannidis.pdf

Manipal University of Jaipur B1649 Page No. 105


Advanced Database Management System Unit 5

Unit 5 Query Execution


Structure:
5.1 Introduction
Objectives
5.2 Introduction to Physical-Query-Plan Operators
Scanning tables
Sorting while scanning tables
5.3 One-Pass Algorithms for Database Operations
5.4 Nested-Loop Joins
Tuple-based nested-loop join
Iterator for a tuple-based nested-loop join
5.5 Two-Pass Algorithms based on Sorting
5.6 Two-Pass Algorithms Based on Hashing
5.7 Index-Based Algorithms
5.8 Buffer Management
5.9 Parallel Algorithms for Relational Operations
5.10 Using Heuristics in Query Optimisation
5.11 Basic Algorithm for Executing Query Operations
5.12 Summary
5.13 Glossary
5.14 Terminal Questions
5.15 Answers

5.1 Introduction
In the previous unit, you studied query optimization and its various
components such as query execution algorithm, heuristics in query
optimization, semantic query optimization, multi-query optimization and
applications. You also learned execution strategies for SQL sub queries and
query processing for SQL updates. Now, you will study about query
execution in this unit.
The query processor is the group of components of a DBMS that turns user
queries and data-modification commands into a sequence of database
operations and executes those operations. Since SQL lets us express
queries at a very high level, the query processor must supply a lot of detail
regarding how the query is to be executed.

Manipal University of Jaipur B1649 Page No. 106


Advanced Database Management System Unit 5

In this unit, we will study query execution, that is, the algorithms that
manipulate the data of the database. We shall cover the principal methods
for execution of the operations of relational algebra. We shall introduce you
to the basic building blocks of physical query plans. You will be also
introduced to the more complex algorithms that implement operators of
relational algebra efficiently; these algorithms also form a necessary part of
physical query plans. You will also study about "iterators". Iterator is an
object that enables a programmer to traverse a container.
Objectives:
After studying this unit, you should be able to:
 explain the physical-query-plan operators
 discuss the one-pass algorithm for database operations
 identify and demonstrate nested-Loop joins
 explain two-pass algorithms based on sorting and hashing
 discuss Index-based algorithms
 discuss buffer management
 demonstrate parallel algorithms for relational operations
 explain heuristics in query optimisation
 identify basic algorithms for executing query operations

5.2 Introduction to Physical-Query-Plan Operators


Physical query strategy is made from operators. Each of these operators
implements one step of the plan. The physical operators are often specific
implementations for one of the operators of the relational algebra, although
we also require physical operators for other tasks that do not involve an
operator of the relational algebra.
For instance, we often need to "scan" a table. In other words, we need to
bring into the main memory, each tuple of some relation that is an operand
of a relational-algebra expression.
5.2.1 Scanning tables
One of the most fundamental things that we can do in a physical query plan
is to read the entire list of contents of a relation R. This step is especially
necessary when we take the union or join of R with another relation. A
variation of this operator includes an easy predicate. Here we read only
those tuples of the relation R that suit the predicate. There are primarily two
Manipal University of Jaipur B1649 Page No. 107
Advanced Database Management System Unit 5

fundamental approaches for locating the tuples of a relation R. They are given
below:
1. In several cases, there is an index on any attribute of R. We may be able
to use this index to get all the tuples of R. For instance, a sparse index
on R can be used to lead us to all the blocks holding R, even if we don't
know which blocks these are. This operation is known as index-scan.
2. In certain cases, the relation R is stored in an area of secondary
memory with its tuples set in blocks. The blocks which contain the tuples
of R are known to the system. It is possible to get the blocks one by one.
This operation is known as table-scan.
We shall resume the study of index-scan in section 5.7, where we will
discuss the implementation of the operator. But an important observation for
now is that we can use the index not only to get all the tuples of the relation
it indexes, but also to get only those tuples that have a specific value (or a
specific range of values) in the attribute or attributes that make the search
key for the index.
5.2.2 Sorting while scanning tables
There can be several reasons behind sorting a relation as we read its
tuples. One reason could be that the query could include an ‘ORDER BY’
clause, (we will be using capital letters in some words to emphasise and
highlight them.) requiring that a relation be sorted. Another reason is that
different algorithms for relational-algebra operations need one or both the
arguments to become sorted relations.
The physical-query-plan operator which is sort-scan gets a relation R, as
well as a measurement of the attributes in which sort is to be made. It
produces R in that sorted order. There are certain ways in which sort-scan
can be implemented. They are given below:
1. In case R is too large to fit in main memory, then the multi-way merging
approach is a good choice. However, rather than storing the final sorted
R back on the disk, we can produce one block of the sorted R at a time,
since its tuples are needed.
2. If we have to produce a relation R sorted by attribute a, and there is a B-
tree index present on a, or R is stored as an indexed-sequential file
ordered by a, then a scan of the index allows us the production of R in
the desired order.

Manipal University of Jaipur B1649 Page No. 108


Advanced Database Management System Unit 5

3. If the relation R that we want to retrieve in sorted order is sufficiently


small to fit in the main memory, then we can retrieve its tuples by using a
table scan or index scan. Then we can use one of the many possible
main-memory sorting algorithms which are also efficient.
Significance of Iterators
When Iterators are composed within query plans, they support proficient
execution. They contrast with a materialization strategy, where the result of
each operator is produced entirely and not in parts. They are stored either
on the disk or the main memory.
When iterators are used, many operations become active. Tuples pass
among operators as needed. This reduces the need for storage.
Self Assessment Questions
1. The system knows the blocks containing the tuples of R, and it is not
possible to get the blocks one by one. (True/ False)
2. We can use the index not only to get all the tuples of the relation it
indexes, but also _________ .
3. It is open function that initiates the process of getting tuples, but it does
not get a tuple. (True/ False)

Activity 1
In a group of four, analyze and explain how the index can be used not
only to get all the tuples of the relation indexed by it as well as those
tuples that possess a specific value in the attribute or attributes that make
up the search key for the index.

5.3 One-Pass Algorithms for Database Operations


We are about to begin our study of one of the most important topics in query
optimization: how to execute the individual steps - for instance, a join or
selection - of a logical query plan?
The selection of an algorithm for every single operator is an essential
element of the process of transformation of a logical query plan into a
physical query plan. The proposed algorithms for operators largely fall into
three classes:
1. Index-based technique which is explained in Section 5.7
2. Sorting-based technique which is covered in Section 5.5.

Manipal University of Jaipur B1649 Page No. 109


Advanced Database Management System Unit 5

3. Hash-based technique which is described in Section 5.6 and Section


5.9, among other places.
Additionally, it is possible to divide algorithms for operators into the following
degrees of cost and difficulty:
(a) There are some methods which work without a set limit on the size of
the data. These methods use three or more passes to do their jobs, and
are natural, recursive generalizations of the two-pass algorithms;
(b) Some methods involve reading the data only once from disk. These are
the one-pass algorithms. They work when at least one of the arguments
of the operation fits in main memory. But with selection and projection
operations there are exceptions.;
There are certain methods that can be used for data that is too large for
the available main memory but not for the largest possible data sets. An
example of such an algorithm is the two-phase, multiway merge sort.
These two-pass algorithms are described by reading data first time from
the disk, processing it somehow, writing all or majority of it to the disk,
and then again reading it for further processing through the second
pass.
In this section, we shall concentrate on the one-pass methods. However,
both in this section and subsequently, we shall classify operators into three
broad groups:
1. Full-relation, binary operations: All other operations are in this class:
set and bag versions of union, intersection, difference, joins, and
products. Except for bag union, each of these operations requires at
least one argument to be limited to size M, if we are to use a one-pass
algorithm.
2. Tuple-at-a-time, unary operations: These operations require neither
an entire relation, nor a large part of it, in memory at once. This enables
us to read one block at a particular time, use the main memory buffer,
and produce the output.
3. Fill-relation, unary operations: These one-argument operations need
to consider all or most of the tuples in memory at once. So one-pass
algorithms are limited to relations that are approximately of size hl (the
number of main-memory buffers available) or less. The operations of this

Manipal University of Jaipur B1649 Page No. 110


Advanced Database Management System Unit 5

class that we consider here are Y (the grouping operator) and S (the
duplicate elimination operator).
One-pass algorithms for tuple-at-a-time operations
The tuple-at-a-time operations (σ) (R) and (R) have observable algorithms,
regardless of whether relation fits in main memory. You can understand the
blocks of R individually into an input buffer; execute the operation on every
tuple. Besides, you can move the selected tuples or the projected tuples to
the output buffer. Refer to Figure 5.1 for the performance of selection or
projection on relation R.
Since the output buffer may be an input buffer of some other operator or
may be sending data to a user or application, we do not count the output
buffer as needed space. Thus, we basically want M – 1 for the input buffer
irrespective of B, where M is available memory buffer/block, B is the block
taken by each argument
The disk I/O necessity for this method depends on just how the argument
relation R is given. If R is primarily on disk, then the cost is whatsoever it
takes to execute a table-scan or index-scan of R.

unary
op

Input Output
buffer buffer

Figure 5.1: Performance of a Selection or Projection on Relation R

Self Assessment Questions


4. The selection of an algorithm for each operator is one of the most
fundamental elements of the process of transformation of a logical
query plan into a physical query plan. (True/ False)
5. Tuple-at-a-time, unary operations require neither _____________ nor
_________.

Manipal University of Jaipur B1649 Page No. 111


Advanced Database Management System Unit 5

Activity 2
If the output of the operation can be stored on full cylinders, we waste
almost no time writing. Analyze what you have understood from this
statement. Explain with an example.

5.4 Nested-Loop Joins


Before proceeding to the more complex algorithms in the next sections, we
shall turn our attention to a family of algorithms for the join operator called
"nested loop" joins.
These algorithms are, in a sense, "one-and-a-half" passes, since in each
difference one of the two arguments includes its tuples read only once.
Contrary to this, the other argument will be read repeatedly.
5.4.1 Tuple-based nested-loop join
The effortless variation of nested-loop join has loops that range over single
tuples of the relations concerned. In this algorithm, which we call, tuple-
based nested-loop join, we calculate the join R(X, Y) S(Y, Z) as below:
For each tuple s in S DO
For each tuple r in R DO
If r and s join to make a tuple t
Then
Output is t
S is called the outer relation and R the inner relation of the join. One buffer
is for outer relation and one buffer for inner relation. Then the I/O cost of this
algorithm is T(R)T(S) disk. It is expensive since it examines every pair of
tuples in the two relations. However, there are many situations where this
algorithm can be modified to have much lower cost. A next development
looks much more cautiously at the way tuples of R and S are split between
blocks, and utilizes as much of the memory as it can to decrease the
number of disk I/O's as we go through the inner loop.

Manipal University of Jaipur B1649 Page No. 112


Advanced Database Management System Unit 5

5.4.2 Iterator for a tuple-based nested-loop join


One benefit of a nested-loop join is that it fits well into an iterator structure. It
prevents us from storing intermediate relations on disk in some situations.
The iterator for R S is easy to build from the iterators for R and S, which
support functions R. Open ( ). The code for the three iterator functions for
nested-loop join is in Figure 5.2. It makes the assumption that neither
relation R nor S is empty.

Open () {
R. Open () ;
S. Open () ;
s : = S. GetNext () ;
}
GetNext () {
REPEAT {
r : = R. GetNext () ;
IF (r = NotFound) { /* R is exhausted for the current s */
R. Close () ;
s : = S. GetNext () ;
IF (s = NotFound) RETURN NotFound ;
/* both R and S are exhausted */
R. Open () ;
r : = R. GetNext () ;
}
}
UNTIL (r and s join) ;
RETURN the join of r and s ;
}
Close () {
R. Close () ;
S. Close () ;
}

Figure 5.2: Iterator Functions for Tuple-based Nested-loop Join of R and S

Manipal University of Jaipur B1649 Page No. 113


Advanced Database Management System Unit 5

The functions used in Figure 5.2 as defined below:


Open():
R.Open() will initialize a main-memory structure to represent a set of
tuples of R.
GetNext():
REPEAT R.GetNext() will continue until tuple r is not returned.
Close():
R.Close() will release the memory.
Self Assessment Questions
6. _________ joins can be used for relations of any size. One relation
does not need to necessarily fit in the main memory.
7. Nested-loop does not allow us to avoid storing intermediate relations
on disk in some situations. (True/ False)

5.5 Two-Pass Algorithms based on Sorting


We shall now begin the study of multi-pass algorithms for performing
relational algebra operations on relations that are larger than what the one-
pass algorithms of Section 5.3 can handle. We focus on two-pass
algorithms, where data from the operand relations is examine into main
memory, processed in somehow written out to disk again and then reread
from disk to complete the operation.
We can naturally extend this idea to any number of passes, where the data
is read many times into main memory. However we concentrate on two-
pass algorithms because:
1. Two passes are usually enough, even for very large relations
2. Generalizing to more than two passes is not hard
In this section, we consider sorting as a tool for implementing relational
operations. The fundamental idea is below. If we have a big relation R,
where B(R) is greater than M, the number of memory buffers available, then
we can continually:
1. Read h1 blocks of R into the main memory.
2. Sort the Y blocks in the main memory, using a main-memory sorting
algorithm which is also efficient. Such an algorithm will take an amount
Manipal University of Jaipur B1649 Page No. 114
Advanced Database Management System Unit 5

of processor time that is just slightly more than linear in the number of
tuples in main memory. So we expect that the time to sort will definitely
not exceed the disk I/O (input/output) time for step (1).
3. Finally, write the sorted list into M blocks of disk. We shall refer to the
contents of these blocks as one of the sorted sublists of R.
All the algorithms we shall discuss then use a second pass to ''merge"
the sorted sublists in some way to execute the desired operator.
Self Assessment Questions
8. In _________ algorithms, data is read into main memory from the
operand relations.
9. In the second pass, all the sorted sublists are _________.

5.6 Two-Pass Algorithms Based on Hashing


There is a family of hash-based algorithms that attack the same problems
as in Section 5.5. The essential idea behind all these algorithms is as
follows. If the data is extremely big to store in main-memory buffers, hash all
the tuples of the argument or arguments using an appropriate hash key. For
all the general operations, there is a way to select the hash key so all the
tuples that require to be measured together when we perform the operation
have the same hash value.
We then perform the operation by working on one bucket at a time (or on a
pair of buckets with the same hash value in the case of a binary operation).
In fact we have decreased the size of the operand(s) by a factor equivalent
to the number of buckets.
If there are M buffers available, then we can pick M as the number of
buckets. This helps in gaining a factor of M in the size of the relations that
we can handle easily. Notice that the sort-based algorithms of Section 5.5
also gain a factor of M by pre-processing, although the sorting and hashing
approaches achieve their similar gains by rather different means.
Partitioning Relations by Hashing: To begin, let us review the way we
would take a relation R by using M buffers and partitioning R into M - 1
buckets of roughly equal size.
We shall assume that h is the hash function, and that h takes complete
tuples of R as its argument (i.e., all attributes of R are part of the hash key).

Manipal University of Jaipur B1649 Page No. 115


Advanced Database Management System Unit 5

We connect one buffer with every bucket. The last buffer holds blocks of R,
individually.
Every tuple t in the block is hashed to bucket h (t) and copied to the suitable
buffer. But if that buffer is full, we write it out to disk, and initialize one more
block for the similar bucket. Finally, we write out the final block of each
bucket if it is not empty.
The algorithm is given in more detail in Figure 5.3. Note that it assumes that
tuples, while they may be variable-length, are never too large to fit in an
empty buffer.

Figure 5.3: Partitioning a Relation R into M - 1 Buckets

Self Assessment Questions


10. If there are M buffers available and we can pick M as the number of
buckets, we can gain a factor of M in the size of the relations that we
can handle. (True/ False)
11. The essential idea behind all hash-based algorithms is _________.

5.7 Index-Based Algorithms


The existence of an index on one or more attributes of a relation makes
available some algorithms that would not be feasible without the index.
Index-based algorithms are especially useful for the selection operator.

Manipal University of Jaipur B1649 Page No. 116


Advanced Database Management System Unit 5

However, algorithms for join and other binary operators also use indexes to
very good advantage.
In this section, we shall introduce these algorithms. We also continue with
the discussion of the index-scan operator for accessing a stored table with
an index that we began in Section 5.2.1. To appreciate many of the issues,
we first need to study "clustering" indexes.
Clustering and non-clustering indexes
A relation is said to be 'clustered’ if its tuples are packed into the least blocks
that can possibly hold those tuples. All the analysis we have done so far
assume that relations are clustered.
We may also speak of clustering index, which are indexes on an attribute or
else attributes such that all the tuples through a fixed value for the search
key of this index appear on approximately as few blocks as can hold them.
It is noteworthy that a relation that is not clustered cannot have a clustering
index, but a clustered relation can also have non-clustering indexes. (See
Figure 5.4)

a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1

All the a1 tuples

Figure 5.4: A Clustering Index having all the Tuples with a Fixed Value Packed
into (close to) the Minimum Possible Number of Blocks

Self Assessment Questions


12. The existence of an index on one or more attributes of a relation makes
available some algorithms that _________.
13. Index-based algorithms are extremely useful for the selection operator.
(True/ False)

5.8 Buffer Management


We have assumed that operators on relations have some number M of
main-memory buffers that they can utilize to store required data. Actually,
these buffers are not often allocated in advance to the operator, as well as
the value of M might differ, depending on system conditions.

Manipal University of Jaipur B1649 Page No. 117


Advanced Database Management System Unit 5

The essential task of creating main-memory buffers accessible to


processes, for example, queries that act on the database is given to the
buffer manager.
It is the responsibility of the buffer manager to let processes get the memory
that they need for reduction of the delayed and unsatisfiable requests. The
role of the buffer manager is illustrated in Figure 5.5.

Requests

Read/Writes

Buffers

Buffer
manager

Figure 5.5: Role of Buffer Manager

Buffer management architecture


Buffer management architectures are broadly divided into two main
categories:
1. In most of the relational database management system, the buffer
manager controls main memory directly
2. The buffer manager allocates buffers in virtual memory. It permits the
operating system to decide which buffers are actually in main memory at
any time and which are in the “swap space" on disk that the operating
system manages. Many main memory DBMSs and “object-oriented"
DBMSs operate this way.
Whichever approach a DBMS uses, the same problem arises: how to fit the
number of buffers into the available main memory. The buffer manager
should try to limit the number of buffers in usage so that they can fit in it.

Manipal University of Jaipur B1649 Page No. 118


Advanced Database Management System Unit 5

When the buffer manager controls main memory directly, and requests
exceed available space, it has to select a buffer to empty, by returning its
contents to disk. If the buffered block has not been changed, then it may
simply be erased from main memory, but if the block has changed it must be
written back to its place on the disk.
When the buffer manager allocates space in virtual memory, it has the
option to allocate more buffers than can fit in main memory. However, if all
these buffers are really in use, then there will be "thrashing," a common
operating-system problem, where many blocks are moved in and out of the
disk's swap space. In this situation, the system spends the majority time in
swapping blocks, whereas very little useful work gets done.
Normally, when the Database Management System is initialized then
several buffers parameter are set. We would expect that this number is set
so that the buffers occupy the available main memory, regardless of
whether the buffers are allocated in main or virtual memory.
Self Assessment Questions
14. The buffers are rarely allocated in advance to the _________ , and the
value of M may vary depending on system conditions.
15. If the buffered block has not been changed, then it may simply be
erased from _________.

5.9 Parallel Algorithms for Relational Operations


Database operations, frequently being time-consuming and involving a lot of
data, can generally profit from parallel processing. In this section, we shall
review the principal architectures for parallel machines. We then
concentrate on the "shared-nothing" architecture, which appears to be the
most cost effective for database operations, although it may not be superior
for other parallel applications.
There are simple modifications of the standard algorithms for most relational
operations that will exploit parallelism almost perfectly. That is, the time to
complete an operation on a p processor machine is about l/p of the time it
takes to complete the operation on a uni-processor.
Models of parallelism
You can say that collection of processors is the heart of parallel machines.
Often the number of processors p is large, in the hundreds or thousands.
Manipal University of Jaipur B1649 Page No. 119
Advanced Database Management System Unit 5

We shall assume that each processor has its own local cache, which we do
not show explicitly in our diagrams.
In most organizations, each processor also has local memory, which we do
show. Of great importance to database processing is the fact that along with
these processors are many disks, perhaps one or more per processor or in
some architecture a large collection of disks accessible to all processors
directly.
Additionally, parallel computers all have some communications facility for
passing information among processors. In our diagrams, we show the
communications as if there were a shared bus for all the elements of the
machine.
However, in practice a bus cannot interconnect as many processors or other
elements as are found in the largest machines. So the interconnection
system is, in much architecture, a powerful switch, perhaps augmented by
busses that connect subsets of the processors in local clusters.
The three most important classes of parallel machines are:
1. Shared Memory: In this architecture, as illustrated in Figure 5.6, each
processor has access to all the memory of all the processors. That is,
there is a single physical address space for the entire machine, rather
than one address space for each processor.

M M M

P P P

Figure 5.6: A Shared-Memory Machine

Manipal University of Jaipur B1649 Page No. 120


Advanced Database Management System Unit 5

The diagram given in Figure 5.6 is, in fact too extreme, signifying that
processors have no private memory at all. Rather, each processor has
some local main memory, which it typically uses whenever it can.
However, it has direct access to the memory of other processors when it
needs to. Large machines of this class are of the NUMA (non uniform
memory access) kind, meaning that it takes rather more time for a
processor to access data in a memory that "belongs" to some other
processor than it does to access its "own" memory, or the memory of
processors in its local cluster.
However, the difference in memory-access times is not great in current
architectures. Rather, all memory accesses, no matter where the data is,
take much more time than a cache access. So the critical issue is
whether or not the data a processor needs is in its own cache.

P P P

M M M

Figure 5.7: A Shared-Disk Machine

2. Shared Disk: In this architecture, as shown in Figure 5.7; every


processor has its own memory, which is not accessible directly from
other processors. However, the disks are accessible from any of the
processors through the communication network.
Disk controllers manage the potentially competing requests from
different processors. The number of disks stored and processors need
not be identical.

Manipal University of Jaipur B1649 Page No. 121


Advanced Database Management System Unit 5

3. Shared Nothing: Here, all processors have their own memory and their
own disk or disks, as in Figure 5.8. All communication is via
communication network, from processor to processor. For example, if
one processor p wants to read tuples from the disk of another Processor
Q, then processor P sends a message to Q asking for the data. Then, Q
obtains the tuples from its disk and ships them over the network in
another message, which is received by P.

P P P

M M M

Figure 5.8: A Shared-Nothing Machine

The shared-nothing architecture is the most commonly used architecture for


high-performance database systems.
Shared nothing machinery is comparatively economical to make, however
when we design algorithms for these machines we should be aware that it is
costly to send data from one processor to another.
Normally, data must be sent between processors in a message, which has
considerable overhead associated with it. Both processors must execute a
program that supports the message transfer, and there may be contention
or delays associated with the communication network as well.
Usually, the value of a message can be broken down into a large fixed
overhead and a small amount of time per byte transmitted. Thus, there is an
important advantage to designing a parallel algorithm so that communication
among processors includes large amounts of data sent at once.
For instance, we might buffer several blocks of data at processor P all
bound for processor Q. If Q does not need the data immediately, it may be
much more efficient to wait until we have a long message at P and then
send it to Q.

Manipal University of Jaipur B1649 Page No. 122


Advanced Database Management System Unit 5

Self Assessment Questions


16. The disks are accessible from any of the processors through the
_________ network.
17. The number of disks stored and processors need not be identical.
(True/ false)

5.10 Using Heuristics in Query Optimisation


One of the chief heuristic rules is that before applying the JOIN or other
binary operations, one must apply SELECT and PROJECT operations.
 A query tree is a tree data structure. Its purpose is to communicate to a
relational algebra expression. It symbolises the relational algebra
operations as internal nodes, and signifies the input relations of the
query as leaf nodes of the tree.
 An implementation of the query tree includes execution of an internal
node operation when its operands are accessible and then swapping
that internal node with the relation which results from the execution of
the operation.
 The execution ends at the execution of the root node. The output is the
result relation for the query.
Heuristic Optimization of Query Trees
 Different relational algebra expressions can be equivalent. In other
words, they can correspond to the same query.
 A standard initial query tree is made by the query parser.
 This first query tree is then transformed by the heuristic query optimiser
into a final query tree that is efficient to execute.

SELECT NAME FROM


EMPLOYEE, WORKS_ON, PROJECT WHERE PNAME='Aquarius' AND
ESSN=SSN AND BDATE > '1-DEC-56'AND PNUMBER=PNO;
For Execution of this query, we do not require creating a huge file containing
the CARTESIAN PRODUCT of the whole EMPLOYEE, PROJECT, and
WORKS_ON files. This query basically requires a single record from the
PROJECT relation and only the employee records for those whose date of
birth is after '1-DEC-56'.

Manipal University of Jaipur B1649 Page No. 123


Advanced Database Management System Unit 5

The basic outline of a Heuristic Algebraic Optimization Algorithm is given


below:
1. Break up any SELECT operations by means of conjunctive operations
into a cascade of SELECT operations.
2. Move each and every SELECT operation as far down the query tree as
is allowed by the attributes which are involved in the select condition.
3. Rearrange the leaf nodes of the tree by the following criteria:
 Position the leaf node relation with the most limiting SELECT
operations so that they are executed first in the query representation.
 Ensure that the ordering of leaf nodes does not cause CARTESIAN
PRODUCT operation.
4. Through a subsequent SELECT operation, combine a CARTESIAN
PRODUCT operation in the tree into a JOIN operation, but the condition
should signify a join condition.
5. Break down and move lists of projection attributes down the tree as
much as possible by creating new PROJECT operations as desired.
6. Recognize those sub-trees that stand for those groups of operations that
can be implemented by only one algorithm.
Self Assessment Questions
18. The _________ terminates at the execution of the root node. This
makes the result relation for the query.
19. The first query tree is transformed by the heuristic query optimizer into
a final query tree that is efficient to execute. (True/ False)

5.11 Basic Algorithm for Executing Query Operations


External Sorting is the essential algorithm for implementation of query
operation. Sorting is one of the main algorithms used in query processing
(an example is ORDER BY- clause which requires a sorting).External
sorting is appropriate for huge files of records stored on disk that do not fit
completely in main memory.
A sort-merge approach is utilized by the usual external sorting algorithm.
The algorithm consists of two phases. They are given below:
1. Sorting Phase
2. Merging Phase

Manipal University of Jaipur B1649 Page No. 124


Advanced Database Management System Unit 5

Implementing the SELECT Operation


Many search algorithms are feasible for selecting records from a file. The
following search techniques are available:
 Linear search (brute force)
 Binary search
 Using a primary index (or hash function)
 Using a primary index to retrieve multiple records
 Using a clustering index to retrieve multiple records
 Implementing the JOIN Operation
 Using a secondary (B+-tree) index on an equality comparison
The JOIN operation is one of the lengthiest operations in query processing.
Most common methods for performing a join are:
1. Nested-loop join (brute force)
2. Sort-merge join
3. Single-loop join (using an access structure to retrieve the matching
records)
4. Hash-join
Implementing PROJECT as well as Set Operations
 Implementation of a PROJECT operation is easy if attribute lists includes
a key of relation R.
 If attribute list does not contain a key of R, duplicate tuples must be
eliminated.
 Set operations (u, n,-,x) are at times expensive to implement. The
Cartesian product operation is especially quite expensive.
 Since union, intersection, set difference apply only to union–compatible
relations, their implementation can be done by using certain variations of
the sort-merge technique.
 Hashing can moreover be utilised to execute UNION, INTERSECTION,
and SET DIFFERENCE.
Implementing Aggregate Operations
 The aggregate operations (MAX, MIN, SUM, AVERAGE, COUNT), when
applied to an entire table, can be computed by a table scan or else by
using an appropriate index. For example:
SELECT MAX (SALARY) FROM EMPLOYEE;

Manipal University of Jaipur B1649 Page No. 125


Advanced Database Management System Unit 5

If an ascending index on salary exists for the EMPLOYEE relation, it can


be utilized (or else we can scan the entire table).
 when a GROUP The index can also be used for computing the SUM,
AVERAGE, and COUNT aggregates. However, the index must be
dense, i.e. there must bean index entry for each and every record in the
main file.
 The aggregate operator must be applied independently to each group of
tuples BY clause is used in a query.
Self Assessment Questions
20. The index can be used for _________.
21. _________ can be used to implement INTERSECTION, UNION and
SET DIFFERENCE.

5.12 Summary
Let us recapitulate the important points discussed in this unit:
 The principal methods for execution of the operations of relational
algebra are; scanning, hashing, sorting, and indexing are the major
approaches.
 One reason is that various algorithms for relational-algebra operations
require either one or both of their arguments to be known as sorted
relations.
 Another reason is that the query could include an ORDER BY clause,
requiring that a relation should be sorted.
 Iterators support proficient execution when they are composed within the
query plans.
 We have assumed that operators on relations have available some
number M of main-memory buffers that can be used to store the needed
data.
 The basic algorithm for execution of query operation is External Sorting.
 External sorting is suitable for huge files of records stored on disk that
do not fit entirely in main memory.
 The number of buffers is a parameter set when the DBMS is initialised.

5.13 Glossary
 Iterators: An object that enables a programmer to traverse a container.

Manipal University of Jaipur B1649 Page No. 126


Advanced Database Management System Unit 5

 NUMA: Non Uniform Memory Access. It is a computer memory design


used in multiprocessing, where the memory access time depends on the
memory location relative to a processor.
 Scanning tables: A variation of this operator involves a simple
predicate, where we read only those tuples of the relation R that satisfy
the predicate.
 Shared disk: The disks are accessible from any of the processors
through the communication network.
 Table-scan: Those blocks which contain the tuples of R are known to
the system.
 Two-pass algorithms: Data from the operand relations is read into
main memory and processed in some way written out to disk again. This
data is then read again from the disk to complete the operation.

5.14 Terminal Questions


1. Explain the physical query plan operators.
2. Discuss the One-Pass algorithm for database.
3. What is buffer management?
4. Describe the most important classes of parallel machines.

5.15 Answers
Self Assessment Questions
1. False
2. Secondary
3. Open function
4. True
5. False
6. Nested-loop
7. False
8. merged
9. Two-pass
10. True
11. Last
12. non-clustering
13. False
14. Operator

Manipal University of Jaipur B1649 Page No. 127


Advanced Database Management System Unit 5

15. Main memory


16. Communication
17. True
18. Execution
19. True
20. JOIN
21. Hashing
Terminal Questions
1. Physical query plans are built from operators. Every single operator
implements one step of the plan. Refer Section 5.2 for more details.
2. One-pass algorithms read the data only once from disk. Refer Section
5.3 for more details.
3. Buffer manager allows processes to get the memory they need, while
the unsatisfiable and delayed requests are minimized. Refer Section 5.8
for more details.
4. Shared memory, shared disk, etc. are the important phases of parallel
machines. Refer Section 5.9 for more details.

References:
 Raghu Ramakrishnan, Johannes Gehrke, Database Management
Systems, (3rdEd.), McGraw-Hill, Higher Education
 Peter Rob, Carlos Coronel, Database Systems: Design, Implementation,
and Management, (7thEd.), Thomson Learning
 Silberschatz, Korth, Sudarshan, Database System Concepts, (4th Ed.),
McGraw-Hill
 Elmasari Navathe, Fundamentals of Database Systems, (3rdEd.),
Pearson Education Asia
E-references:
 www.wisegeek.com
 www.dbms.edu.in
 www.neilconway.org/docs/dbms_notes.pdf
 www.unixspace.com/context/

Manipal University of Jaipur B1649 Page No. 128


Advanced Database Management System Unit 6

Unit 6 Adaptive Query Processing and


Query Evaluation
Structure:
6.1 Introduction
Objectives
6.2 Query Processing Mechanism: Eddy
6.3 Eddy Architecture and how Eddy allows Extreme flexibility
6.4 Properties of Query Processing Algorithms
6.5 Need and Uses of Adaptive Query Processing
6.6 Complexities
6.7 Robust Query Optimisation through Progressive Optimisation
6.8 Query Evaluation Techniques for Large Databases
6.9 Query Evaluation Plans
6.10 Summary
6.11 Glossary
6.12 Terminal Questions
6.13 Answers

6.1 Introduction
In the previous unit, you studied query execution and its various related
aspects such as physical-query-plan operators, nested-loop joins, and
various algorithms. You also studied how to use heuristics in query
optimisation. In this unit, we will introduce you to adaptive query processing
and query evaluation.
With the diversification of data management field into more complicated
settings, where queries are becoming increasingly complex, the traditional
optimise-then-execute paradigm is proving insufficient. This has led to new
techniques, usually placed under the common standard of ‘Adaptive Query
Processing’, which focus on applying runtime feedback to modify query
processing in order to provide better response time or more efficient CPU
usage.
Adaptive query processing refers to a set of procedures used to correct the
inherent flaws of traditional query processing. It is used for creating an
optimal query plans in situations when the traditional plans fail.

Manipal University of Jaipur B1649 Page No. 129


Advanced Database Management System Unit 6

In this unit, you will study many of the common techniques, and approaches
associated with Adaptive Query Processing. Our goal in this unit is to
provide not only an overview of each technique, but also a basic framework
for understanding the field of adaptive query processing in general. We
focus primarily on processing mechanism, eddy architecture, query
processing algorithms, complexities, synchronisation barriers, robust query
processing through progressive optimisation. We conclude with a discussion
of query evaluation techniques for large databases and query evaluation
plans.
Objectives:
After studying this unit, you should be able to:
 explain the Eddy architecture and how it allows for extreme flexibility
 discuss the properties of query processing algorithms
 identify and demonstrate the need and uses of adaptive query
processing
 explain different types of complexity
 discuss synchronisation barriers in query processing
 identify Robust query processing through progressive optimisation
 demonstrate query evaluation techniques for large databases
 discuss query evaluation plans

6.2 Query Processing Mechanism: Eddy


Information Resources can display erratic characteristics in shared-nothing
databases and big federated databases. During the process of query
processing, assumptions that are made at the time when a query is
submitted will rarely hold. Therefore, execution techniques and traditional
static query optimisation are not very effective in such environments.
In this unit, we will introduce about a query processing process known as
eddy. It constantly reorders operators in a query plan. Pipelined joins can be
easily reordered, and the synchronisation barriers that require inputs from
various sources can be flawlessly coordinated by characterising the
moments of symmetry (discussed in section 6.4).
We can merge the execution as well as an optimisation phase of query
processing by combining eddies with suitable join algorithms. This allows
each tuple to have a flexible ordering of the query operators. This flexibility

Manipal University of Jaipur B1649 Page No. 130


Advanced Database Management System Unit 6

is controlled by combining a simple learning algorithm along with algorithms


similar to fluid dynamics as in the river. River defined here is basically a
dataflow query engine. It is analogous in certain ways to Volcano, and
Gamma or commercial parallel database engines.
A more conceptual implementation of an adaptive query processing
operator refers to Eddies. The basic idea is that the purpose of the addition
of an eddy is to control various operators in a query. The fact that data
involved from these operations (running as independent threads) moves
through the eddy, is apparent from the Figure 6.1.

Figure 6.1: Query Processing Mechanism – Eddy

An eddy can adaptively choose the best possible order to route tuples as
well as run each successive operator since it can function as a central unit
between each operator.
The eddy maintains a priority queue of all tuples that require processing.
Their priority level increases as tuples move from one operator to the next.
This finally ensures that tuples at later stages of processing are processed
first. Moreover, a tuple has to adjust its priority based on the production and

Manipal University of Jaipur B1649 Page No. 131


Advanced Database Management System Unit 6

consumption rates of the operators that it needs to be processed by. A low-


cost operator represents a smaller percentage of the total processing time
required and can be given more tuples in a shorter duration of time. The
reason is that a low-cost operator can consume tuples faster as compared
to a high-cost one.
By tracking the rates at which tuples are routed through them, the eddy
learns the operators’ relative performance.
Self Assessment Question
1. An eddy is a query processing mechanism which constantly reorders
operators in a query plan as it runs. (True/ False)
2. The eddy functions by _______________.

6.3 Eddy Architecture and how Eddy allows Extreme flexibility


We discussed the Query Processing Mechanism: Eddy in section 6.2. Now
we will study Eddy Architecture and how it allows for the extreme flexibility.
The discussion in the previous section allows us to consider easily
reordering query plans at moments of symmetry (discussed in section 6.4).
River and eddies
In this section, we will illustrate the eddy mechanism during query
processing for implementation of reordering in a natural manner. The
techniques that we are going to describe can be used with any operator.
However, more frequent re-optimisation algorithms are allowed by frequent
moments of symmetry. Before the discussion on eddies, we will first
introduce our basic query processing environment.
River: Eddies are implemented by us in the context of River. It is a shared-
nothing parallel query processing framework. It adjusts dynamically to any
fluctuations in workload and performance.

Manipal University of Jaipur B1649 Page No. 132


Advanced Database Management System Unit 6

Block Index Hash

Figure 6.2: Tuples Generated by Block, Index and Hash Ripple Join

All tuples are formed by the join in block ripple. This is depicted in
Figure 6.2. However, by the join predicate, some of them may be eliminated.
The arrows for hash ripple join and index are symbols of the logical portion
of the cross-product space checked so far. These joins only expend work on
tuples satisfying the join predicate (black dots). In the hash ripple diagram,
the arrival rate of one relation is three times faster than the other.
A River can be used to robustly produce near-record performance on I/O-
intensive benchmarks such as hash joins and parallel sorting. It can be used
in spite of the dissimilarities and dynamic variability in workloads across
machines in a system as well as hardware components.
In Telegraph, our intention is to leverage the adaptability of River to allow
dynamic shifting of load (including both data delivery and query processing)
in a shared-nothing parallel environment.
A rather simple overview of the River framework will serve our purpose,
since we are not discussing parallelism here. In these database engines
represented as river, “iterator”-style modules (query operators)
communicate via a fixed dataflow graph (a query plan).
With the edges in the graph matching to finite message queues, each
module runs as an autonomous thread. The faster thread blocks on the
queue waiting for the slower thread to catch up, if the producer and the
consumer are running at differing rates.
River is essentially multi-threaded. By reading from various inputs at self-
determining rates, it can take advantage of barrier free algorithms. The
River implementation used by here is derived from the work on Now-Sort. It

Manipal University of Jaipur B1649 Page No. 133


Advanced Database Management System Unit 6

features well-organised I/O mechanisms together with high-performance


user-level networking, pre-fetching scans as well as avoidance of operating
system buffering.
Pre-Optimisation: How to originally pair off relations into joins, with the
constraint that each relation takes part in a single join has to be decided by
a heuristic pre-optimiser, even though we will use eddies to reorder tables
between joins,. This corresponds to choosing a spanning tree of a query
graph, in which edges symbolise binary joins and nodes symbolise relations.
A chain of Cartesian products across tables known to be very small is
formed by one rational heuristic for selection of a spanning tree. The small
tables are chosen so as to handle “star schemas” when base-table
cardinality statistics are available. The chain then selects as many random
non-equijoin edges as necessary to complete a spanning tree. This is done
only after selecting random equijoin edges (on the assumption that they are
relatively low selectivity).
The pre-optimiser needs to decide join algorithms for every edge for a given
spanning tree of the query graph. It can use either an index join if an index
is available, or a hash ripple join, along each equijoin edge. It can use a
block ripple join along each non-equijoin edge.
An Eddy in the river: An eddy is implemented via a module in a river
containing a single output relation, an arbitrary number of input relations,
and a number of participating unary and binary modules.
The participating operators of an eddy are encapsulated by the scheduling.
The tuples entering the eddy can flow through its operators in a wide range
of orders. In other words, based on the intuition that symmetries can be
easily captured in an -ary operator, an eddy merges multiple unary and
binary operators into a single -ary operator within a query plan.
A fixed-sized buffer of tuples that is to be processed by one or more
operators is maintained by an eddy component. Each operator which
participates in the eddy has an output stream that returns tuples to the eddy
and either one or two inputs that are fed tuples by the eddy.
Eddies derive their name from the circular data flowing within a river. A tuple
entering an eddy is related with Done bits and a tuple descriptor containing

Manipal University of Jaipur B1649 Page No. 134


Advanced Database Management System Unit 6

a vector of Ready bits. Those operators that are eligible to process the tuple
and those that have already processed the tuple are specified by them.
A tuple was shipped by the eddy module ships, only to those operators for
which the corresponding Ready was bit turned on. The operator returns the
tuple to the eddy after processing it, and the corresponding Done bit is
turned on. The tuple is sent to the eddy’s output if all the Done bits are on;
or else, it is sent to a different eligible operator so as to achieve continued
processing.
An eddy zeroes the Done bits, and sets the Ready bits suitably when it
receives a tuple from one of its inputs. In the uncomplicated case, the eddy
sets all Ready bits on. The fact that any ordering of the operators is
acceptable is signified by this.
As soon as there are ordering constraints on the operators, the eddy turns
on only the Ready bits matching to operators that can be executed at first.
The eddy turns on the Ready bit of any operator qualified to process the
tuple when it returns a tuple to the eddy.
Binary operators produce output tuples that match to combinations of input
tuples. In such cases, the Ready bits and Done bits of the two input tuples
are read. In this manner the ordering constraints are preserved by an eddy
while at the same time, maximising opportunities for tuples to follow diverse
possible orderings of the operators.
Two properties of eddies deserve comment here. Firstly, note that eddies do
not constrain reordering to moments of symmetry across the eddy as a
whole. A given operator must carefully abstain from fetching tuples from
certain inputs until its next moment of symmetry, for instance, a nested-
loops join would not fetch a new tuple from the current outer relation until it
has finished rescanning the inner.
Secondly, eddies represent the full class of bushy trees corresponding to
the set of join nodes. For instance, it is possible that two pairs of tuples are
united independently by two dissimilar join modules, and then routed to a
third join to execute the 4-way concatenation of the two binary records.
However, there is no obligation that all operators (apart from the one that is
fetching a new tuple) in the eddy are at a moment of symmetry when this
happens. Thus eddies are a little flexible both in the scenarios in which they
Manipal University of Jaipur B1649 Page No. 135
Advanced Database Management System Unit 6

can rationally reorder operators, and in the shapes of trees that they can
produce.
Self Assessment Questions
3. An eddy zeroes the Done bits, and sets the Ready bits suitably when it
receives a tuple from one of its inputs. (True/ False)
4. Two properties of eddies are______________

Activity 1
Discuss how Eddy Architecture allows extreme flexibility in query
processing.

6.4 Properties of Query Processing Algorithms


Different properties of query processing algorithms are:
 Reorderability plans
 Synchronisation barriers in query processing
 Moments of symmetry
 Joins and indexes
 Physical properties, predicates and commutativity
 Join algorithms and re-ordering
Now let us discuss them in detail.
Reorderability plans: An essential challenge of run-time re-optimisation is
to reorder pipelined query processing operators while they are in flight. To
alter a query plan on the fly, a great deal of state in a variety of operators
has to be well thought-out, and subjective changes can necessitate
important processing and code complexity to warranty accurate results. For
instance, the state that is maintained by an operator like hybrid hash join
can grow as large as the size of an input relation.
Besides, it also requires modification or re-computation if the plan is
reordered while the state is being constructed. We can keep this work to a
minimum by restraining the scenarios in which we reorder operators. Since
in a highly variable setting, the best-case scenario rarely exists for a
significant length of time period, it is better to favour adaptivity over best-
case performance. In idealised query processing, therefore it is better to
sacrifice marginal improvements.

Manipal University of Jaipur B1649 Page No. 136


Advanced Database Management System Unit 6

Synchronisation barriers in query processing: A significant state is


usually captured by binary operators similar to joins. In such operators, the
interleaving of requests for tuples from different inputs relates to an exact
form of state used.
As an example, let us consider the case of a merge join on two duplicate-
free, sorted inputs. The next tuple is always consumed, during processing,
from the relation whose last tuple had the lower value. This constrains the
order in which tuples can be consumed significantly.
As an example, let us consider the case of a slowly-delivered external
relation ‘slow-lo’ with a high-bandwidth but large local relation ‘fast-hi’ with
only high values in its join column and numerous low values in its join
column. The processing of fast-hi is postponed for a long time while
consuming many tuples from slow-lo.
We describe this phenomenon as a synchronisation barrier by using
terminology from parallel programming. One table-scan produces a value
larger than any seen before, while the other table-scan waits.
Generally, barriers limit performance when two tasks take different amounts
of time to complete (i.e., to “arrive” at the barrier) by limiting concurrency. It
is noteworthy here that concurrency arises even in single-site query
engines, which can carry out disk I/O, network I/O and computation
simultaneously.
Therefore, in a dynamic (or even heterogeneous but static) performance
environment, it is desirable to minimise the overhead of synchronisation
barriers. There are two issues which affect the overhead of barriers in a
plan. They are: a) the gap between arrival times of the two inputs at the
barrier; and the frequency of barriers. b).
Moments of symmetry: It is noteworthy that the synchronisation barrier in
merge join is declared in an order-independent manner. It does not
discriminate between the inputs based on any property other than the data
that they convey.
Since its two inputs are treated uniformly, a Merge join is often described as
a symmetric operator. Let us take the example of a traditional nested-loops
join. In a nested-loops join, the “inner” relation is synchronised with the
“outer” relation, but not vice versa. A barrier is set until a full scan of the
Manipal University of Jaipur B1649 Page No. 137
Advanced Database Management System Unit 6

inner is completed after each tuple (or block of tuples) is consumed from the
outer relation.
Performance benefits can often be obtained by reordering of the inputs for
asymmetric operators like nested-loops join.
A join algorithm declares the end of a scheduling dependency involving its
two input relations when it reaches a barrier. Without altering any state in
the join, the order of the inputs to the join can often be changed in such
scenarios. When this is true, the barrier is referred to as a moment of
symmetry.
Let us again consider the example of a nested-loops join, with inner relation
S and outer relation R. (See Figure 6.3) Having joined each tuple in a
subset of R with every tuple in S, the join has completed a full inner loop at
a barrier. Without affecting the join algorithm, reordering the inputs at this
point can be done.

Figure 6.3: Tuples Generated by Nested-Loops Join

Joins and indexes: Benefit of indexes on the inner relation can be taken by
Nested-loops joins. These result in a fairly efficient pipelining join algorithm.
Since one input relation has been pre-indexed, an index nested-loops join or
an “index join” is essentially asymmetric..
Changing the alternative of inner and outer relation “on the fly” is difficult
even when indexes exist on both inputs. Therefore, it is simpler to think of

Manipal University of Jaipur B1649 Page No. 138


Advanced Database Management System Unit 6

an index join as a type of unary selection operator on the unindexed input


for the objective of reordering.
The fact that with respect to the unindexed relation, the selectivity of the join
node may be greater than one is the only difference between an index join
and a selection. We can reorder an index join and its indexed relation as a
unit among other operators in the plan tree, even though one cannot swap
the inputs to a single index join.
It must be noted that the logic for indexes can be useful to external tables
that need bindings to be passed. Such tables may be gateways to web
pages.
Physical properties, predicates and commutativity: Undoubtedly, the
possible join orderings are constrained by a pre-optimiser’s choice of an
index join algorithm. In the join view, the unindexed join input is ordered
before (though not necessarily directly before) the indexed input since an
ordering constraint must be obligatory.
This constraint develops because of a physical property of an input relation.
Indexes cannot be scanned but can be probed. Hence, they cannot appear
before the corresponding probing tables. More complex but similar
constraints can arise in preserving the ordered inputs to a merge join (i.e.
preserving “interesting orders”).
Additional constraints are raised by the applicability of certain join
algorithms. Many join algorithms will not work on joins other than equijoins.
Since they always require all relations mentioned in their equijoin predicates
to be handled before them, such algorithms constrain reordering on the plan
tree as well.
Join algorithms and re-ordering: Join algorithms with frequent moments
of symmetry, adaptive or nonexistent barriers, and minimal ordering
constraints have to be most favourable, for an eddy to be most successful.
These algorithms present the best opportunities for re-optimisation. The
need to avoid blocking, rules out the use of hybrid hash join, minimises
ordering constraints and barriers excluded merge joins. Nested loops joins
are undesirable because they have imbalanced barriers and occasional
moments of symmetry.

Manipal University of Jaipur B1649 Page No. 139


Advanced Database Management System Unit 6

Ripple joins have moments of symmetry at every “corner” of a rectangular


ripple, i.e., whenever all tuples in a prefix of input stream join a prefix of the
input stream and vice versa.
This scenario occurs between each consecutive tuple consumed from a
scanned input for hash ripple joins and index joins. Recurrent moments of
symmetry are thus offered by the ripple joins.
Ripple joins are also attractive with respect to barriers. In order to allow
changing rates for each input, ripple joins were designed. The reason was to
proactively expend more processing on the input relation with more
statistical influence on intermediate results.
The same mechanism also allows reactive adaptivity in the wide-area
scenario. A barrier is reached at every single corner, whereas the next
corner can adaptively be a sign of the relative rates of the two inputs.
In case of block ripple join, the next corner is selected upon reaching the
previous corner. This can be done adaptively to reflect the relative rates of
the two inputs over time.
At a modest overhead in performance and memory footprint, the ripple join
family offers attractive adaptivity features. Therefore, they fit well with the
principle of sacrificing marginal speed for adaptability. As a result, there is
focus on these algorithms in Telegraph.
Self Assessment Questions
5. Ripple joins were designed to allow changing rates
for ______________
6. At a modest overhead in performance and memory footprint, the ripple
join family offers attractive adaptivity features. (True/ False)
7. The possible join orderings are constrained by a pre-optimiser’s choice
of an index join algorithm. (True/ False)

Activity 2
The next tuple, during processing, is always consumed from the relation
whose last tuple had the lower value. Explain this statement with the help
of suitable examples and figures if necessary.

Manipal University of Jaipur B1649 Page No. 140


Advanced Database Management System Unit 6

6.5 Need and Uses of Adaptive Query Processing


Now let us study what is adaptive query processing and where it is most
appropriately used.
Declarative queries were key value proposition of relational model, in which
the user chooses the data to be queried and the database management
system will work out the correct algorithm for retrieving the data from the
data store. The normal method of doing this is cost based query
optimisation. Figure 6.4 demonstrates the common Traditional Query
Processing.

Figure 6.4: Traditional Query Processing

Following three tasks are involved in Traditional Query Processing.


 Optimisation: Optimiser selects a plan to accomplish a query using
available statistics.
 Execution: Executor executes the plan to get query results.
 Statistics Tracking: A statistics tracker keeps the statistics utilised by
the optimiser.
Adaptive systems are also known by "self-tuning” or "dynamic" systems; i.e.
the systems that modify their behaviour by the use of "introspection",
"learning", etc. The query processing system is said to be adaptive if it has
three features:
 It obtains information from its environment

Manipal University of Jaipur B1649 Page No. 141


Advanced Database Management System Unit 6

 It applies this information to decide its behaviour,


 This process repeats again and again, resulting in a feedback loop
among environment and behaviour.
Static optimisation has the first two of these features. The feedback required
in an adaptive system is solution to its effectiveness. Figure 6.5
demonstrates the general structure of the adaptive query processing.

Figure 6.5: Adaptive Query processing in Adaptive Environment

Now let us study the main applications of Adaptive Query Processing.


These are:
Adaptive query processing in data grids: The data grid integrates wide-
area autonomous data sources and provides users with a unified data query
and processing infrastructure. Adaptive data query and processing is
essentially used by data grids to offer better quality of services (QoS) to
users and applications in spite of dynamically changing resources and
environments.
Adaptive query processing in internet applications: Data management
in the Internet has gained a lot of popularity. The recent focus is on
efficiently dealing with transfer rates and unpredictable, dynamic data
volumes by making use of adaptive query processing techniques. For
various query processing domains, an equally vital consideration is the high
degree of variability in performance needs.

Manipal University of Jaipur B1649 Page No. 142


Advanced Database Management System Unit 6

Adaptive query processing in web based data integration: Mediators for


web-based data integration require the capability to handle multiple, often
conflicting objectives such as cost, coverage and execution flexibility. This
requires the development of query planning algorithms as well as
techniques for automatically gathering the obligatory cost/coverage statistics
from the independent data sources.
To detect and correct optimiser: Adaptive query processing has been
utilised to detect and correct optimiser errors due to wrong statistics or
simplified cost metrics.
Adaptive query processing in wide-area database systems: In broad-
area database systems, that run on changeable and volatile environments
(such as computational grids), it is difficult to produce efficient database
query plans depending upon information available only at compile time. A
potent solution to this difficulty is by adjusting the query plan to varying
conditions during execution as well as by making use of information that
becomes available at query run-time.
Self Assessment Questions
8. Adaptive query processing has been utilised to detect and correct
optimiser errors due to wrong statistics. (True/ False)
9. Adapt data query and processing is required by ____________ to
provide better quality of services (QoS) to users.

6.6 Complexities
In the database systems, since very large query engines function in
changeable and unpredictable environments, using traditional techniques for
adaptive query processing takes the engine to the breaking point. This
volatility is common in large-scale systems, on account of increased
complexity. These complexities are:
Hardware and workload complexity: Variabilities are frequent in the busty
performance of servers and networks in wide-area environments. These
systems often serve as big communities of users whose aggregate
behaviour can be really hard to guess, with the hardware mix being quite
heterogeneous.

Manipal University of Jaipur B1649 Page No. 143


Advanced Database Management System Unit 6

Analogous performance variations can be displayed by large clusters of


“shared-nothing” computers, owing to a mix up of heterogeneous hardware
evolution and user requests.
Hardware performance can be volatile even in entirely homogeneous
environments. For example, the inner tracks of a disk might demonstrate
just half the bandwidth of outer tracks.
Data complexity: Selectivity estimation for static alphanumeric data sets is
rather understood very well. Besides, through complex methods and types,
there has been preliminary work on estimating statistical properties of static
sets of data. However, federated data hardly comes with any statistical
summaries. Multifaceted non-alphanumeric data types are now broadly in
use both on the web and in object-relational databases. Selectivity
estimates are usually imprecise in these scenarios as well as in traditional
static relational databases.
User Interface Complexity: Many queries can run for a very long time in
large-scale systems. Consequently, there is interest in Online Aggregation
and other methods that permit users to “Control” properties of queries at the
time they execute depending upon refining approximate results.
Self Assessment Questions
10. Many queries can run for a very long time in large-scale systems.
(True/ False)
11. In wide-area environments, variabilities are frequent in _____________

6.7 Robust Query Optimisation through Progressive


Optimisation
In some cases the query optimiser has a choice between a “conservative”
plan that is likely to perform reasonably well in many situations, or a more
aggressive plan that works better if the cost estimate is accurate, but much
worse if the estimate is slightly off.
The requisite probability distributions over the parameters can be calculated
by use of histograms or query workload information. It is evidently a more
robust optimisation objective, with an assumption that only one plan can be
selected and the required probability distributions can be obtained.

Manipal University of Jaipur B1649 Page No. 144


Advanced Database Management System Unit 6

Error-aware optimisation (EAO) utilises intervals over query cost estimates,


rather than specifying the estimates for single. EAO focuses mainly on
memory usage ambiguity. A later work, it provides several features including
the use of intervals. It generates linear query plans (a slight variation of the
left-linear or left-deep plan, in that one of the two inputs to every join – not
necessarily the right one – must be a base relation) and uses bounding
boxes over the estimated cardinalities in order to find and prefer robust
plans.
Another way of making plans more robust is to utilise more sophisticated
operators, for example, n-way pipelined hash joins.
Self Assessment Questions
12. The needed probability distributions over the parameters can be
calculated by use of _________ or ______________ information.
13. _____________________ utilises intervals over query cost estimates,
rather than specifying the estimates for single.

6.8 Query Evaluation Techniques for Large Databases


In this section we will focus on various query evaluation techniques for large
databases.
Structural design of query engines: Query processing algorithms iterate
on components of input sets; algorithms are algebra operators. The physical
algebra is the collection of operators and data representations apart from
associated cost functions that the database execution engine supports, and
the logical algebra which is more associated to expressible queries of the
data model (e.g. SQL) and the data model.
Transfer between operators and synchronisation between them is the key.
Primitive methods consist of using one process per operator and using IPC
or creation of temporary files/buffers. Implementation of all operators as a
set of procedures (open, next and close), and having operators schedule
each other within a single process via simple function calls is an important
practical technique. An operator calls its data input operator's next function
to produce another piece of data ("granule"), every time an operator needs
one, Iterators are operators structured in such a way.
Query plans iterators can be symbolised as trees and are algebra
expressions. The three common structures are: Bushy (arbitrary), Left-deep

Manipal University of Jaipur B1649 Page No. 145


Advanced Database Management System Unit 6

(every right subtree is a leaf), and right-deep (every left-subtree is a leaf)


structures. In a left-deep tree, every operator draws input from only one
input, whereas an inner loop iterates over the other input.
Sorting: Merging techniques are used by all sorting in "real" database
systems. The structure of iterators must be followed by sorting modules'
interfaces.
Exploit the duality of merge sort as well as quicksort. Sort proceeds
in divide stage and the combine stage. One of the two phases is based on
logical keys (indexes), the physically arranged data items (this phase is
logical and is particular to an algorithm).
There are two types of sub algorithms: one for sorting a run within main
memory and another one is meant for managing runs on disk or tape.
Degree of fan-in (which basically refers to the number of runs merged in a
given step) is a key factor.
For creation of the set of initial (level-0) runs, quicksort and replacement-
selection are the two algorithms of choice. Replacement assortment fills
memory in a priority heap in which the smallest key is written to a run and
replaced from the next input.
This replacement may be bigger than the just written item, so we can then
iterate; unless put mark replacement for next run file. RS (replacement-
selection) has smooth alternates between read and write operations. In
contrast, Quicksort has bursty I/O pattern.
Level-0 runs and level-1 runs are mixed together. Buffer space must be
dedicated to each input run as well as the merge output. A cluster is the unit
of I/O.
Hashing: In general, hashing must be considered for equality matches.
Hashing-based query processing algorithms utilise in-memory hash table of
database objects. Hash table overflow occurs if the data in hash table is
larger than main memory (common case).
Assigning hash buckets to partitions desires to be optimised so that disk
accesses result in clustered buckets both logically as well as physically.

Manipal University of Jaipur B1649 Page No. 146


Advanced Database Management System Unit 6

Disk Access: File scans can be made quickly with read-ahead (track-at-a-
crack). This will require contiguous file allocation, so may need to bypass
OS/file system.
Advantages of disk access:
a) It is possible to scan an index without ever retrieving records, i.e. if just
salary values are needed and the index is on salary.
b) Multiple indices can be joined to satisfy query requirements, even if
none of the indices is enough by itself.
c) Take union/intersection of two index scans if two or more indices apply
to individual clauses of a query.
d) Joining of two tables can be achieved by joining indices on the two join
attributes and then doing record retrievals in the underlying data sets.
Buffer management: Cache data in I/O buffer. LRU is not right for a lot of
database operations. Iterator implementations can take benefit of buffer
management mechanisms which typically provide fix/unfix semantics on a
buffer page, when passing buffer pages amongst themselves.
Self Assessment Questions
14. In-memory hash table of database objects are utilised by hashing-
based query processing algos. (True/ False)
15. Iterator implementations can take benefit of buffer management
mechanisms which ___________________
16. It is _____________ to scan an index without ever retrieving records.

6.9 Query Evaluation Plans


A query evaluation plan (or simply plan) consists of an extended relational
algebra tree, with extra annotations at every node indicating the
implementation method to use for each relational operator and the access
methods to utilise for each table.
SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid = S.sid
AND R.bid = 100 AND S.rating > 5

Manipal University of Jaipur B1649 Page No. 147


Advanced Database Management System Unit 6

This query can be expressed in relational algebra as follows:


Sailors))

In Figure 6.6, this expression is illustrated in the form of a tree. The algebra
expression partly specifies how to evaluate the query, we first calculate the
natural join of Reserves and Sailors, then performs the selections, and
finally projects the sname field.

 sname

 bid=100 rating> 5

sid=sid

Reserves Sailors

Figure 6.6: Query Expressed as a Relational Algebra Tree

To obtain a fully specific evaluation plan, we should decide on an


implementation for each of the algebra operations involved. For example,
we can use a page-oriented simple nested loops join by means of Reserves
as the outer table and apply selections and projections to every tuple in the
result of the join as it is formed. The result of the join previous to the
selections and projections is never stored entirely. This query evaluation
plan is illustrated in Figure 6.7.

Manipal University of Jaipur B1649 Page No. 148


Advanced Database Management System Unit 6

 sname (Orl-/he-}7y)

O’ bid=100 rating> 5 (Oll-lhe-f7y)

(Simple nested loops)


sid=sid

(File scan) Reserves Sailors (File scan)

Figure 6.7: Query Evaluation Plan for Sample Query

In drawing the query evaluation plan, we have utilised the convention that
the outer table is the left child of the join operator. We adopt this convention
henceforth.
Self Assessment Questions
17. The algebra expression partly does not evaluation of the query-owe.
(True/ False)
18. To get a fully specific evaluation plan, we should decide on a
___________ for each of the algebra operations involved.

6.10 Summary
Let us recapitulate the important points discussed in this unit:
 Eddy is a query processing mechanism. It constantly reorders operators
in a query plan as it runs.
 The techniques that we described in this unit can be used with any
operator, but algorithms with recurrent moments of symmetry allow for
more recurrent re-optimisation.
 Reorder of pipelined query processing operators during flight is the basic
challenge of run-time re-optimisation.
 The data grid integrates wide-area autonomous data sources and
provides users with a unified data query and processing infrastructure.
 The required probability distributions over the parameters can be
computed using histograms or query workload information.

Manipal University of Jaipur B1649 Page No. 149


Advanced Database Management System Unit 6

 Naive methods like creation of temporary files/buffers, using one


process per operator and using IPC are used.
 A query evaluation plan (or simply plan) contains an extended relational
algebra tree.

6.11 Glossary
 Binary Operator: Binary operators similar to joins often capture
significant state.
 EAO: Error-aware optimisation. EAO considers intervals of estimates
and proposes heuristics to identify robust plans. However, the
techniques in EAO assume a single uncertain statistic (memory size)
and a single join.
 Index nested-loops: One input relation has been pre-indexed because
an index nested-loop join (henceforth an “index join”) is inherently
asymmetric.
 LRU: Least Recently Used. This rule may be used in a cache to select
which cache entry to flush. It is based on temporal locality - the
observation that, in general, the cache entry which has not been
accessed for longest is least likely to be accessed in the near future.
 QOS: Quality of services. It refers to several related aspects of
telephony and computer networks that allow the transport of traffic with
special requirements.
 Ripple Joins: The purpose of ripple joins is to allow changing rates for
each input.
 XML: Extensible Markup Language. It is a markup language that defines
a set of rules for encoding documents in a format that is both human-
readable and machine-readable.

6.12 Terminal Questions


1. Explain the eddy architecture and how it allows for extreme flexibility.
2. What are the basic properties of query processing algorithms?
3. Discuss where the adaptive query processing is most widely used?
4. Explain the query evaluation techniques for large databases.

Manipal University of Jaipur B1649 Page No. 150


Advanced Database Management System Unit 6

6.13 Answers
Self Assessment Questions
1. True
2. Eddy
3. False
4. Ordering
5. Arrival times
6. False
7. True
8. True
9. Data grids
10. False
11. Federated
12. Histograms, Query Workload
13. Error-aware optimisation (EAO)
14. True
15. Hash function
16. False
17. False
18. Implementation
Terminal questions
1. In eddy architecture, we implemented eddies in the context of River, a
shared-nothing parallel query processing framework. Refer Section 6.2
for more details.
2. Reorder ability plans and moments of symmetry are the properties of
query processing. Refer Section 6.4 for more details.
3. Adaptive query processing is used in internet application and data grids.
Refer Section 6.5 for more details.
4. Query evaluation techniques for large databases are hashing and
sorting. Refer Section 6.8 for more details.

References:
 Raghu Ramakrishnan, Johannes Gehrke, Database Management
Systems, (3rd Ed.), McGraw-Hill
 Peter Rob, Carlos Coronel, Database Systems: Design, Implementation,
and Management, (7th Ed.), Thomson Learning

Manipal University of Jaipur B1649 Page No. 151


Advanced Database Management System Unit 6

 Silberschatz, Korth, Sudarshan, Database System Concepts, (4th Ed.),


McGraw-Hill
 Elmasari Navathe, Fundamentals of Database Systems, (3rd Ed.),
Pearson Education Asia
E-reference:
 http://db.cs.berkeley.edu/papers/sigmod00-eddy.pdf
 http://www.it.iitb.ac.in/

Manipal University of Jaipur B1649 Page No. 152


Advanced Database Management System Unit 7

Unit 7 Transaction Processing


Structure:
7.1 Introduction
Objectives
7.2 Transaction Processing: An Introduction
7.3 Advantages and Disadvantages of Transaction Processing System
Advantages of transaction processing system
Disadvantages of transaction processing system
7.4 Online Transaction Processing System
7.5 Serialisability and Recoverability
Cascading rollback
Recoverable schedules
Managing rollbacks using locking
7.6 View Serialisability
7.7 Resolving Deadlocks
Deadlock detection by timeout
The waits-for graph
7.8 Distributed Locking
Centralised lock systems
Primary-copy locking
7.9 Transaction Management in Multi-Database System
7.10 Long-Duration Transactions
7.11 High Performance Transaction Systems
7.12 Summary
7.13 Glossary
7.14 Terminal Questions
7.15 Answers

7.1 Introduction
In the previous unit, you studied the concept of adaptive query processing
and query evaluation. You studied query processing mechanism, eddy
architecture, properties of query processing algorithms, synchronisation
barriers in query processing, etc.
A series of operations comprising database operations that is atomic with
regard to concurrency and recovery is called transaction. As a transaction

Manipal University of Jaipur B1649 Page No. 153


Advanced Database Management System Unit 7

can include various steps, every step in the transaction must be performed
well to make the transaction successful. If any step of the transaction does
not succeed, then the entire transaction is bound to fail.
Several common techniques and approaches associated with transaction
processing are explained here. Our aim is to provide a basic framework for
understanding transaction processing. We focus primarily on advantages
and disadvantages of transaction processing system, online transaction
processing system, serialisability and recoverability, view serialisability,
resolving deadlock, distributed locking and transaction management in
multi-database system. The unit is concluded with the analysis of long
duration transaction and high-performance transaction system.
Objectives:
After studying this unit, you should be able to:
 discuss the concept of transaction processing
 list various advantages and disadvantages of transaction processing
system
 discuss online transaction processing system
 explain the concept of serialisability and recoverability
 recognise and explain different methods used for resolving deadlock
 explain distributed locking
 summarise the concept of transaction management in multi-database
system
 explain long duration transaction
 discuss high performance transaction system

7.2 Transaction Processing: An Introduction


As you know, generally, one or more database operations are grouped into
three transactions, which is a unit of work that must be performed atomically
and in separation from other transactions. To remind you again, a
transaction is an execution of a program that satisfies the four basic
properties: Atomicity, Consistency, Isolation, Durability (known as ACID
properties). Transactions are meant for: Concurrency control and Recovery
of data.

Manipal University of Jaipur B1649 Page No. 154


Advanced Database Management System Unit 7

Additionally, a DBMS provides the assurance of stability that the work of a


finished transaction will never be vanished. Thus the transaction manager
obtains transaction commands from an application, which inform the
transaction manager when transactions start and end, in addition to
information regarding the expectations of the application. Figure 7.1
demonstrate various features of transaction processing as an example.

Figure 7.1: An Example of Transaction Processing Architecture

For instance, some may not need atomicity. The following tasks are carried
out by the transaction manager.
1. Logging: To guarantee stability, each change in the database is logged
individually on disk. The log manager considers numerous policies
designed to guarantee that regardless of when a system failure or
”crash" takes place, a recovery manager can check the log of changes
and database is restored to some reliable state.
2. Concurrency control: Transactions must emerge to perform in
isolation. However, in most of the systems, the binary transactions are
actually executing at once. Therefore, the scheduler must guarantee that
the individual actions of numerous transactions are executed in such a
way that the total outcome is the same as if the transactions had actually
executed in their entirety, one by one.
3. Deadlock resolution: As transactions struggle for getting resources via
the locks that the scheduler provides, they can be in a condition where

Manipal University of Jaipur B1649 Page No. 155


Advanced Database Management System Unit 7

none can continue since each requires something another transaction


comprises. The transaction manager has the liability to interfere and
terminate (“rollback" or "abort") one or more transactions to allow the
others to continue.
Self Assessment Questions
1. To provide which type of assurance, each change in the database is
logged individually on disk?
a. Continuity
b. Stability
c. Concurrency
d. Speed
2. To whom the transaction commands inform about the transactions start
and end?
a. Transaction commander
b. Transaction coordinator
c. Transaction manager
d. Transaction master

7.3 Advantages and Disadvantages of Transaction Processing


System
Transaction processing system consists of various advantages and
disadvantages which you are going to study now.
7.3.1 Advantages of transaction processing system
The advantages of transaction processing are:
1. Each business firm needs a system for the collection, accumulating and
recovering of data and statistics, so that it can function competently. A
transaction processing system will fulfil this requirement. Every single
transaction is managed and monitored by means of a transaction
processing system, in order that the system will identify if entered data is
valid. Once the gathered information clears the test, it will then be
accumulated and generated in the processing system.
2. The process of monitoring all transactions can be made simpler by
means of an organised transaction processing system. It’ll enormously
save a firm’s time, energy, money and attempt. Furthermore, all the
entered data will be kept protected and all transactions performed will be
monitored and recorded in a correct manner. All records will be
Manipal University of Jaipur B1649 Page No. 156
Advanced Database Management System Unit 7

maintained in an organised and protected manner. Only approved


people have access to its functions.
3. By means of a consistent transaction processing system, any firm will
successfully please its consumers and get more satisfied consumers.
Consumers rely on the consistency of a company. In future, they will
stand by a firm with whom that they can trust their earnings, money and
personal information.
7.3.2 Disadvantages of transaction processing system
The main disadvantages of transaction processing systems are:
1. There is the need to manage hundreds and thousands of simultaneous
consumers.
2. There is the need to permit several consumers to work on the same set
of data, with instant updating.
3. There is the need to manage faults in a protected and consistent way:
Transaction Systems generally manage faults in a secure and reliable
manner, but there are some faults that you cannot evade, for example,
network faults or deadlocks of databases. Thus, there should be a
method which can be used to manage them when they take place. It is
not possible to just terminate an existing process.
Self Assessment Questions
3. _________ transaction processing system can make the process of
monitoring each and every transaction simpler.
4. In transaction processing system, it is not required to manage errors in
a secured and consistent manner. (True/ False)

7.4 Online Transaction Processing System


Online transaction processing is considered as interactive in which every
transaction is processed as it takes place. In Online Transaction Processing
(OLTP) system, you can endlessly interact with the system by a computer or
a terminal. The Air-line reservation, Railway reservation system, the
Banking ATM machine, the Library application, etc. are some of the
examples of online transaction processing system.
In these types of systems, you are required to enter pre-defined inputs such
as flight number, train number, date of journey, amount to withdraw, etc.

Manipal University of Jaipur B1649 Page No. 157


Advanced Database Management System Unit 7

According to these pre-defined inputs, the system generates pre-defined


outputs such as the confirmed tickets, or non availability of ticket, etc.
The problem with online transaction processing (OLTP) system is the high
costs related with the required security & fault acceptance traits. A data is
entered by a person for a system transaction, where it is processed and the
output is obtained before entering the next input.
When you use online entry with postponed processing, then data is input as
the transaction takes place and is stored online, but files are not updated.
Files are updated afterwards in batch. For example, orders received over
the phone may be accepted by the system, but the orders are not processed
until a slow time, like at night.
Figure 7.2 shows an example of online transaction processing. In it a
customer uses an ATM, which demonstrates an easy interface for different
functions such as cash withdrawal, account balance query, bill payment,
transfer, or cash advance. In the same network, an employee in the branch
office executes operations such as fund applications and consulting.
In the bank head office, a business analyst tunes up business deal for better
functioning. Other employees execute day-to-day functions such as
customer relationship management, budget planning, and stock control. In
the Figure, you can see that all requests are addressed to the mainframe for
processing.

Manipal University of Jaipur B1649 Page No. 158


Advanced Database Management System Unit 7

Figure 7.2: Online Transaction Processing Framework

Self Assessment Questions


5. In online transaction processing systems, which type of input user must
provide?
6. In Online transaction processing (OLTP) system, the consumer cannot
continuously interact with the system by a computer or a terminal.
(True/False)

7.5 Serialisability and Recoverability


In transaction processing, serialisability is considered as the property of a
serialisable schedule. A (possibly concurrent) schedule S is serializable if it
is equivalent to a serial schedule S'. That is, S has the same result database
state as S'.
Serial Schedule is a schedule where transactions are done in order.
Transactions are scheduled one by one. Upon completion of one transaction
another transaction is scheduled whereas schedules in which transactions
are not executed concurrently is called Non-Serial schedule.
A schedule S of n transactions is serializable if it is equivalent to some serial
schedule of the same n transactions. There are n! possible serial schedules
of n transactions and many more possible non-serial schedules.
Serialisability is the main criterion for the accuracy of simultaneous
transactions executions and a main objective for concurrency control. In the
context of data storage and communication, serialisability is the process of
converting a data structure into a format that can be stored.
If every transaction is accurate by itself, then any sequential execution of
these transactions is accurate.
Notable examples are banking transactions that debit and credit accounts
with money. If the associated schedules are not serialisable, then the total
sum of money may not be maintained. Money could vanish, or be produced
from nowhere. This does not take place if serialisability is sustained.
In systems where transactions can terminate, serialisability by itself is not
adequate for accuracy. Schedules are also required to acquire the property
of Recoverability. Recoverability signifies that data is not read by the
committed transactions where the data is written by terminated transactions.
Manipal University of Jaipur B1649 Page No. 159
Advanced Database Management System Unit 7

While serialisability can be negotiated in various applications, negotiating


recoverability always infringes the database’s integrity.
If a system crash takes place, the actions of the committed transactions can
be rebuilt on the disk copy of the database. However, it is to be noted here
that only the committed transactions can be re built. No activity is carried out
by the logging system to assist serialisability. A database state is simply
rebuilt even if it is the consequence of non-serialisable schedule of actions.
Commercial database systems do not always endure on serialisability and
serialisability is enforced in some systems, when the user makes a demand.
7.5.1 Cascading rollback
On the failing of the transaction, the system is required to return to the
situation that it was in before the transaction was initiated. We call this
as rollback. That is, on the failing of the transaction, that changes that had
been made are returned to their previous values is known as “rolled back”.
When a data is written by a non-committed transaction it is stated to be
‘dirty’. The dirty data can be viewed either in the buffers, or on disk, or both.
They both can lead to problems.
A cascading rollback can be carried out if dirty data can be procured to
transactions. . On termination of transaction T, any transaction that has read
the data written by it should be found out and terminated. Further other
transactions that have read data written by a terminated transaction should
also be recursively terminated.
When a transaction T aborts, you must determine which transactions have
read data written by T, abort them, and recursively abort any transactions
that have read data written by an aborted transaction. In other words, you
must find each transaction U that read dirty data written by T, abort U, find
any transaction V that read dirty data from U, abort V, and so on.
A log can be used to end the effect of a terminated transaction if it is one of
the kinds that offer previous values for e.g. undo or undo/redo. The data can
also be fixed from the copy in the disk if the effect of the dirty data has not
shifted to disk.

Manipal University of Jaipur B1649 Page No. 160


Advanced Database Management System Unit 7

7.5.2 Recoverable schedules


If you want any of the logging methods to permit recovery, the set of
transactions that are considered as committed after recovery must be
reliable.
Specifically, if a transaction TI is, after recovery considered as committed,
and Tl utilised a value written by T2, then T2 must also remain committed,
after recovery. Therefore, we define a recoverable schedule as follows:
“If a transaction commits only after every transaction from which it has read
commits, a schedule is said to be recoverable.”
7.5.3 Managing rollbacks using locking
The section now analyses the management of cascading rollbacks in a lock-
based scheduler. Strict locking is a simple method used to provide
assurance against cascading rollbacks.
Till the time the transaction is either committed or aborted, any write locks or
other locks such as increment locks which permit values to be changed
should not be discharged by the transaction.
A transaction cannot read dirty data in a recoverable schedule as it remains
locked until the transaction commits. The problem of fixing the data in
buffers when a transaction terminates however still persists since the
changes need to have the effect terminated.
Self Assessment Questions
7. When a data is written by a non-committed transaction, what is it
called?
a. Dirty
b. Strict
c. Rolled back
d. Serialisable
8. Write locks should not be released by a transaction in case of strict
locking, until the transaction is either committed or aborted.
(True/ False)

Activity 1
Surf the Internet and find out how serialisability and recoverability is
maintained in a banking transaction. Write a short report.

Manipal University of Jaipur B1649 Page No. 161


Advanced Database Management System Unit 7

7.6 View Serialisability


"View-serialisability" is one of the weaker conditions that assure
serialisability. View-serialisability regards all the associations between
transactions T and U in a way that a database element is written by T, the
value of which is read by U. The main dissimilarity among serialisability and
view c become apparent when a value A is written by transaction T and it is
not read by any transaction. A value for A is written thereafter by some other
transaction.
In that situation, you can put the WT (A) action in some different locations of
the schedule that would not be allowed in case of conflict-serialisability. In
this section, we will discuss the concept of view-serialisability.
View equivalence: Let us presume two schedules named as S1 and S2
which are related to the similar group of transactions. Visualise an imaginary
transaction TO. Preliminary values for every database element read by any
transaction in the schedules are written by TO.
Again visualise some other imaginary transaction Tf. Every element written
by one or more transactions after the end of each schedule is read by Tf.
Thereafter we can discover the write action Wj (A) that most closely headed
the read in question for every read action rt (A) in one of the schedules.
T3 is regarded to be the source of the read action Tt (A). T3 could be the
imaginary initial transaction T0, and also Tt could be Tf. If for all read actions
in one of the schedules, the source is similar in the other schedule, then
S1and S2 are said to be view-equivalent
Certainly, view equivalent schedules are factually equivalent. When
executed on any one database state, they both perform the same. We can
say that S is view-seralisable if a schedule S is view-equivalent to a
sequential schedule.
Self Assessment Questions
9. In case of View-serialisability, all the associations among transactions
T and U are regarded such that T writes a database element, the value
of which is read by U (True/ False)
10. A schedule S is said to be _________, if S is view-equivalent to a
sequential schedule.

Manipal University of Jaipur B1649 Page No. 162


Advanced Database Management System Unit 7

7.7 Resolving Deadlocks


You may have noticed that simultaneous execution of transactions can lead
to fight for resources. A deadlock situation is reached. Deadlock is a
situation when two or more transactions are put into wait state
simultaneously but in this state each would be waiting for the other
transaction to get released so that they can proceed. In this state, no
development is made by anyone as all numerous transactions waits for a
resource held by one of the others. The deadlock can be effectively dealt
with the help of two extensive techniques. The deadlocks can be identified
and fixed. The matter could also be dealt with in such a manner that it never
occurs. The methods are discussed as below.
7.7.1 Deadlock detection by timeout
On the occurrence of a deadlock, it is usually not possible to fix it in order
that the transactions included could continue. Therefore, in such a situation
a transaction may need to be rolled back (aborted and restarted).
Timeout is the easiest manner of identifying and resolving deadlocks. A
limitation may be maintained on the active period of a transaction, and roll it
back if a transaction goes beyond this time.
Example: In a transaction system, where usual transactions are carried out
in milliseconds, a timeout of one minute would involve only transactions
trapped in a deadlock.
In case of more difficult transactions, you may want the timeout to happen
after a much longer interval.
Observe that when any transaction times out which is included in the
deadlock, its locks or other resources are discharged. In such a situation,
the other transactions included in the deadlock will compete prior to arriving
at timeout limits.
7.7.2 The waits-for graph
Waits-for graph can effectively deal with deadlocks that take place due to
transactions waiting for locks held by some other. The waits-for graph
predicates the transactions waiting for locks held by some other transaction.
This graph can be used to identify deadlocks after its formation or to avoid
their occurrence at all. It requires us to preserve the waits-for graph always,
denying to permit an action which forms a cycle in the graph.

Manipal University of Jaipur B1649 Page No. 163


Advanced Database Management System Unit 7

In addition to transactions that at present hold locks on X, a lock table


preserves for every database element X, the transactions list waiting for
locks on X. The waits-for graph comprises a node for every transaction.
Currently, it holds a lock or is awaiting one lock. An arc from node T to node
U is present if there is a database element Alike say:
1. a lock on A is held by U,
2. T is awaiting a lock on A, and
3. T can’t get a lock on A in the preferred mode unless U discharges its
lock on A first.
If the waits-for graph does not include any cycle, then all transactions can
finally complete. There will always be a transaction awaiting for no other
transaction, which can definitely complete. At the same time, at least some
other transaction will be present which is not waiting and can compete.
On the other hand, if a cycle is there, then any transaction in the cycle
cannot make progress, owing to which, there is a deadlock. Thus, a
deadlock can be avoided by rolling back a transaction making a request;
which would lead to a cycle in the waits-for graph.
Example: Let us consider the four transactions. All the transactions read
one element and write another:

In the above notations l = lock, r = read, w = write, u = unlock


In T1 transaction, A is firstly locked (l1), and then read (r1). B is written (w1)
after that A and B is unlocked (u1).
In T2 transaction, C is locked (l2) and read (r2), after that A is locked (l2) and
written (w2). And then both are unlocked (u2).
In T3 transaction, B is locked (l3) and read (r3), after that C is locked (l3) and
written (w3). And then both are unlocked (u3)

Manipal University of Jaipur B1649 Page No. 164


Advanced Database Management System Unit 7

In T4 transaction, D is locked (l4) and read (r4), after that A is locked (l4) and
written (w4). And then both are unlocked (u4)
Here, we make use of a simple locking system. It has only one lock mode. A
similar effect would be observed if we use a shared/exclusive system and
took locks in the suitable mode (shared for a read and exclusive for a write).
T1 T2 T3 T4

1) l1 (A); r1 (A);
2) l2 (C); r2 (C);
3) l3 (B); r3 (B);
4) l4 (D); r4 (D);
5) l2 (A); Denied
6) l3 (C); Denied
7) l4 (A); Denied
8) l1 (B); Denied

Figure 7.3: Starting of a Schedule with a Deadlock

The starting of a schedule of these four transactions is shown in Figure 7.3.


In the initial first four steps, every transaction gets a lock on the element
sought to be read by it. At step (5), T2 makes an effort to lock A, but the
request is rejected since T1 already comprises a lock on A. Therefore, T2
waits for 1, and an arc is drawn from the node for T2 to the node for T1.

Figure 7.4: Waits-for Graph after Step (7) of Figure 7.3

Likewise, at step (6) T3 rejected a lock on C due toT2. At step (7), due to T1,
a lock on A is rejected to T4. Now the waits-for graph is displayed in
Figure 7.4. At step (8), no cycle is included in this graph. TI must wait for the
lock on B, The lock on B is held by T3. If T1 is allowed to wait, then this

Manipal University of Jaipur B1649 Page No. 165


Advanced Database Management System Unit 7

results in a cycle in the waits-for graph including TI, T2, and T3. This is
shown in Figure 7.5.

Figure 7.5: Waits-for Graph with a Cycle induced by Step (8) of Figure 7.3

As each of them is waiting for another to finish, there is no growth. This


results in a deadlock, which includes these three transactions. Moreover, T4
cannot come to an end either, even though it is absent in the cycle, as
progress of T2 relies on the progress made by T1.

Figure 7.6: Waits-for graph on roll back of T1

If we roll back any transaction which can induce a cycle, then T1 must be
rolled back, deferring the waits-for graph of Figure 7.6. T1 abandons its lock
on A. This lock may be provided toT2.Assuming that the lock is given to
T2, T2 can now compete. It abandons its locks on A and C. At some time,
T1 is restarted, but it fails to obtain locks on A and B tillT2, T3,and T4 finish.
Self Assessment Questions
11. In case of _________, numerous transactions waits for a resource held
by some others, and no one can grow.

Manipal University of Jaipur B1649 Page No. 166


Advanced Database Management System Unit 7

12. The timeout method signifies transactions that are awaiting locks held
by some other transaction. (True/ False)

7.8 Distributed Locking


Now you will analyse extending a locking scheduler to an environment
wherein transactions are distributed. The transactions comprise components
at numerous sites. It is presumed that individual sites handle lock tables. It
is further assumed that component of transaction at a site can at that site
just ask for a lock on the data elements. On replication of the data, the
copies of a single element X must be organised in such a manner that they
are changed in the similar manner by every transaction.
This necessity brings in a difference among locking the logical database
element X and also locking one or more copies of X. A clear (and at times
sufficient) solution to the issue of maintaining locks in a distributed database
is considered as centralised locking. This is discussed below.
7.8.1 Centralised lock systems
The most simple strategy is to assign one site, i.e. the lock site, for
preserving a lock table for logical elements, irrespective of whether they
comprise copies at that site or not.
A request is sent to the lock site when a transaction requires a lock on
logical element X. The lock is either granted or denied. As getting a global
lock on X is similar to getting a local lock on X at the lock site, you can be
certain that the global locks act properly provided the locks are supervised
conventionally by the lock site.
Unless the transaction happens to be running at the lock site the normal
cost is three messages per lock, viz. request, grant, and release.
Single lock site would suffice in some circumstances. However, if there are
more sites and more concurrent transactions, the lock site can turn out to be
a restricted access.
No transaction can get any locks at a site if the lock site crashes. Due to
these issues with centralised locking, another method is used for
maintaining distributed locks, which is discussed as below.

Manipal University of Jaipur B1649 Page No. 167


Advanced Database Management System Unit 7

7.8.2 Primary-copy locking


Primary-copy locking is considered as an enhancement over the centralised
locking method. Primary-copy locking method includes distributing the
function of the lock site. It however still preserves the rule that each logical
element comprises of a single site accountable for the global lock.
This change prevents the chance that the central lock site will turn out to be
a restricted access preserving the ease of the centralised method at the
same time.
Each logical element in the primary-copy lock method comprises at least
one copy which is known as the "primary-copy." The site of the primary copy
sustains an entry for X in its lock table. It grants or rejects the request sent
by the transaction, as suited.
Global (logical) locks will be managed properly provided the sites manage
the locks for the primary copies appropriately.
Similar to centralised lock site, three messages are produced by most of the
lock requests, apart from those requests where the transaction and the
primary copy are available at the same site. On the other hand, primary
copies are selected sensibly, then it can be expected that these sites will
mostly be the same.
Example: Consider the chain-of-stores. It is necessary that each store's
sales data includes its primary copy. The copies used at the central office,
data warehouse or by sales analysts cannot be considered to be primary
copies. Only usual transaction is carried out at the store. The updates
include sales data for that store. When this type of transaction takes its
locks no messages are necessitated. Associated messages will be sent only
if the transaction observed or modified data at another store would lock
Self Assessment Questions
13. Centralised Lock Systems include assigning one site, viz. the lock site,
to preserve a lock table for logical elements, irrespective of whether or
not they comprise copies at the site. (True/ False)
14. _________ Locking is as an enhancement over the centralised locking
method.

Manipal University of Jaipur B1649 Page No. 168


Advanced Database Management System Unit 7

7.9 Transaction Management in Multi-Database System


A multi-database system is the one that permits the transactions of the
customer to invoke retrieval and update commands in opposition to data
situated in various hardware and software environments.
There are two types of transaction defined as below:
 Local transactions: Local transactions control local database system.
 Global transactions: Global transactions control multiple database
system.
Transaction management is considered as complex in multi-database
systems due to the supposition of autonomy.
In Global 2 Phase Locking, each local site utilises a strict 2 Phase Locking
(locks are discharged at the end). Locks set in consequence of a global
transaction are released only when that transaction arrives at the end.
Global 2 Phase Locking assures global serialisability.
Because of autonomy necessities, sites cannot collaborate and perform a
common concurrency control method. For example, there is no way to make
sure that all databases pursue strict 2 Phase Locking.
The solutions to these include:
 provide very low level of simultaneous execution, or
 utilise weaker levels of consistency
Each local DBMS executes local transactions. These are executed outside
of the multiple-database system control. We perform global transactions
under multi-database control.
Local database managements systems cannot correspond directly to
coordinate global transaction execution. The multi-database cannot manage
local transaction execution.
To make sure that DBMS’s schedule is serialisable, there is a need of local
concurrency control technique. In locking, DBMS must be able to protect
against local deadlocks.
Additional methods are required to guarantee global serialisability. DBMS
guarantees local serialisability between its local transactions ,together with
those that are part of a global transaction.

Manipal University of Jaipur B1649 Page No. 169


Advanced Database Management System Unit 7

The multi database guarantees serialisability between global transactions.


This is done by overlooking the orderings induced by local transactions.
Two-level serialisability (2LSR) does not guarantee global serialisability;
however, it can fulfil the needs for strong accuracy.
Now let us discuss the Global-read protocol and Local-read protocol.
Global-read protocol: Global transactions are capable to read, however,
they cannot perform updation of local data items. Global data cannot be
accessed by local transactions. No reliability constraints are there among
local and global data items.
Local-read protocol: Local transactions comprise read access to global
data. It prohibits all access to local data by global transactions.
A transaction comprises value dependency if the value that it writes to a
data item at one site relies on a value that it read for a data on another site.
Self Assessment Questions
15. Multiple database system is controlled by _________ transactions.
16. 2LSR does not guarantee global serialisability; but can fulfil the needs
for _________.

7.10 Long-Duration Transactions


Sometimes you will find a set of applications in which a database system is
suitable for satisfying data. Yet the model of diverse small transactions for
which database concurrency-control techniques are proclaimed, is not
suitable. Now you will study the problems that arise during long
transactions.
Problems of long transactions: A long transaction is the one that is
permitted too much time to hold locks, and that particular lock is needed by
another transaction. According to the environment, long time could signify
hours, minutes or seconds.
We can presume that at least a number of minutes, and most likely hours,
are included in "long" transactions. There are three extensive categories of
applications that include long transactions. These are:
1. Conventional DBMS applications: Although general database
applications perform usually short transactions, several applications
require sporadic long transactions. For instance, a transaction may

Manipal University of Jaipur B1649 Page No. 170


Advanced Database Management System Unit 7

inspect all the bank's accounts to verify that total balance is correct. A
different application might need one index to be reconstructed
infrequently to maintain working at its height.
2. Design systems: An object designed may be mechanical such as a car,
electronics like a microwave, or software system; the universal aspect of
the design systems is that the design is separated into a set of different
parts or components (for example, codes of some software scheme),
and many designers work on these different components
simultaneously. We would not like 2 designers working on a copy of one
file, doing some edition in designs, and then composing the new file
versions. This is for the reason that one group of changes would over-
write the other.
Therefore, the system of check-out and check-in permits one designer to
"check-out" a file and ‘check-in’ after the final changes, possibly hours or
days afterwards. In this case, even if, the first designer is doing some
changes a second designer may want to check out the file to discover
something regarding its subjects. If the check-out procedure was
tantamount to an exclusive lock, then some logical and reasonable
actions would be postponed, probably for days.
3. Workflow systems: These systems include accumulations of
processes, some performed by software itself, some including human
contact, and probably some by human action alone. For example, let us
consider an office paperwork including a bill payment. Such applications
may take some time to perform, and throughout this period, some
database components might be subject to some changes.
Self Assessment Questions
17. A long transaction does not take much time to hold locks. (True/ False)
18. Which of the following applications is based on the concept that the
design is separated into a set of different parts or components?
a. Conventional DBMS applications
b. Design systems
c. Workflow systems.
d. None of the above

Manipal University of Jaipur B1649 Page No. 171


Advanced Database Management System Unit 7

7.11 High Performance Transaction Systems


High-performance transaction systems assist in enhancing the rate of
transaction processing, but are inadequate to acquire high performance:
 Disk I/O is a restricted access. I/O time does not decrease at a rate
analogous to the increase in processor speeds.
 Parallel transactions may try to read or write the similar data item,
resulting in data conflicts that decrease effectual parallelism.
We can diminish the extent to which a database system is disk bound by
augmenting the size of the database buffer.
Main memory database: Commercial 64-bit systems can sustain main
memories of tens of gigabytes. Memory resident data permits quicker
processing of transactions.
There are some disk-related limitations which are defined as below:
 Logging is a restricted access when transaction rate is high.
 Use group-commit to decrease various output operations.
 If the update rate for customised buffer blocks is high, the disk data-
transfer rate could turn out to be a restricted access.
 If the system crashes, all of main memory is vanished.
Self Assessment Questions
19. The extent to which a database system is disk bound can be
diminished by augmenting the _________ of the database buffer.
20. Commercial 64-bit systems can sustain main memories of tens of
gigabytes. (True/False)

Activity 2
Make distinction between centralised locking and primary-copy locking.

7.12 Summary
Let us recapitulate the important points discussed in this unit:
 A series of operations comprising database operations that is atomic
with regard to concurrency and recovery is called transaction.
 The transaction manager obtains transaction commands from an
application, which inform the transaction manager when transactions
start and end.

Manipal University of Jaipur B1649 Page No. 172


Advanced Database Management System Unit 7

 Online transaction processing is considered as interactive in which every


transaction is processed as it takes place.
 Serialisability is the main criterion for the accuracy of simultaneous
transactions executions and a main objective for concurrency control.
 Recoverability signifies that data is not read by the committed
transactions where the data is written by terminated transactions.
 "View-serialisability" is one of the weaker conditions that assure
serialisability.
 We can identify stalemates and repair them, or we can handle
transactions in such a way that these problems may never occur.
 Transaction management is considered as complex in multi-database
systems due to the supposition of autonomy.
 High-performance transaction systems assist in enhancing the rate of
transaction processing, but are inadequate to acquire high performance.

7.13 Glossary
 Online transaction processing system: Online transaction processing
is considered as interactive in which every transaction is processed as it
takes place.
 Recoverability: Recoverability signifies that data is not read by the
committed transactions where the data is written by terminated
transactions.
 Serialisability: Serialisability is the main criterion for the accuracy of
simultaneous transactions executions and a main objective for
concurrency control.
 Transaction: A series of operations comprising database operations
that is atomic with regard to concurrency and recovery is called
transaction.

7.14 Terminal Questions


1. What are the advantages and disadvantages of transaction processing
system? Discuss.
2. Explain the concept of serialisability and Recoverability. Illustrate how to
manage rollbacks by locking.
3. What do you mean by View-serialisability? Illustrate the concept.

Manipal University of Jaipur B1649 Page No. 173


Advanced Database Management System Unit 7

4. Explain the different methods used for resolving deadlocks.


5. What are the problems that occur during long transactions? Illustrate.

7.15 Answers
Self Assessment Questions
1. Stability
2. Transaction manager
3. Organised
4. False
5. Pre-defined
6. False
7. Dirty
8. True
9. True
10. View-seralisable
11. Deadlock
12. False
13. True
14. Primary-Copy
15. Global
16. Strong accuracy
17. False
18. b) Design systems
19. Size
20. True
Terminal Questions
1. One advantage is that every business firm needs a system for the
collection, accumulating and recovering of data and statistics, so that it
can function competently. One of the disadvantages is that there is need
to manage hundreds and thousands of simultaneous consumers. Refer
Section 7.3 for more details.
2. Serialisability is the main criterion for the accuracy of simultaneous
transactions executions and a main objective for concurrency control.
Recoverability signifies that data is not read by the committed
transactions where the data is written by terminated transactions. The

Manipal University of Jaipur B1649 Page No. 174


Advanced Database Management System Unit 7

difficulty of continuous rollbacks can be handled in a lock-based


scheduler. An easy method, known as strict locking, which is used to
provide assurance that there are no cascading rollbacks. Refer
Section 7.5 for more details.
3. View-serialisability" is one of the weaker conditions that assure
serialisability. View-serialisability considers all the associations among
transactions T &U such that T writes a database element whose value U
reads. Refer Section 7.6 for more details.
4. The methods used for resolving deadlock includes Deadlock Detection
by Timeout and Waits-For graph. In case f timeout method, keep a limit
on the active period of a transaction, and if a transaction goes beyond
this time, roll it back. The Waits-For Graph signifies which transactions
are waiting for locks held by another transaction. Refer Section 7.7 for
more details.
5. Approximately, a long transaction is one that is allowed too much time to
hold locks, and that particular lock is needed by another transaction.
According to the environment, long time could signify seconds, minutes,
or hours. Refer Section 7.10 for more details.
References:
 Lewis, P.M. et al. (2002). Databases and transaction processing: an
application-oriented approach, Addison-Wesley.
 Silberschatz, A. et al. (2006). Database system concepts, McGraw-Hill
Higher Education.
E-references
 http://ambarwati.dosen.narotama.ac.id/files/2011/05/FIS-2011-w2.pdf,
retrieved on 09-04-12
 http://www.scribd.com/doc/44537611/23/Serializability, retrieved on
09-04-12
 http://ambarwati.dosen.narotama.ac.id/files/2011/05/FIS-2011-w2.pdf,
retrieved on 09-04-12
 http://publib.boulder.ibm.com/infocenter/zos/basics/index.jsp?topic=/com
.ibm.zos.zmainframe/zconc_onlinetrans.htm, retrieved on 09-04-12.

Manipal University of Jaipur B1649 Page No. 175


Advanced Database Management System Unit 8

Unit 8 Concurrency Control


Structure:
8.1 Introduction
Objectives
8.2 Enforcing Serialisability by Locks
Locks
Locking scheduler
Two phase locking
8.3 Locking Systems with Several Lock Modes
8.4 Architecture for a Locking Scheduler
Two-part scheduler
The lock table
8.5 Managing Hierarchies of Database Elements
8.6 Concurrency Control by Timestamps
Timestamp resolution
Timestamp locking
8.7 Concurrency Control by Validation
8.8 Database Recovery Management
8.9 Summary
8.10 Glossary
8.11 Terminal Questions
8.12 Answers

8.1 Introduction
In the previous unit, you have learned about transaction processing. We
discussed the process in detail and also studied about its advantages and
disadvantages, serialisability and recoverability, distributed locking,
transaction management in multi-database system, long duration
transaction and high-performance transaction system.
In this unit, we will reflect on the concept of concurrency control
serialisability which includes methods used for enforcing serialisability by
locks, several lock modes used in locking system and architecture of locking
scheduler. You will also recognise the process of managing hierarchies of
database elements. We will also discuss concurrency control methods like
concurrency control by timestamp and validation. Lastly, you will recognise
the concept of database recovery management.
Manipal University of Jaipur B1649 Page No. 176
Advanced Database Management System Unit 8

Objectives:
After studying this unit, you should be able to:
 recognise the methods used for enforcing serialisability by locks
 identify several lock modes used in locking system
 discuss the architecture of locking scheduler
 discuss how to manage hierarchies of database elements
 explain concurrency control by timestamp and validation
 discuss database recovery management

8.2 Enforcing Serialisability by Locks


Serialisability describes the concurrent execution of several transactions.
The objective of serialisability is to find the non-serial schedules that allow
transactions to execute concurrently without interfering with one another and
thereby producing a database state that could be produced by a serial
execution. Serialisability must be guaranteed to prevent inconsistency from
transactions interfering with one another. The order of Read and Write
operations are important in serialisability.
We use the following methods for enforcing serialisability by locks:
 Locks
 Locking scheduler
 Two phase locking
These are discussed as below.
8.2.1 Locks
Performing locks on a resource is one of the methods that are used to
serialise transactions. Any data that is going to be utilised in support
of the transaction will be locked by the process.
A lock is used by the transaction to reject data access to other transactions
and thus avoid inaccurate updates. Locks can be of different types such as
Read (shared) or Write (exclusive) locks.
A data item’s write locks stops other transaction from reading that data item.
On the other hand, Read Locks just prevent other transactions from editing
(writing to) the data item. You will study these types later in the unit. So, we
can say that there can be three states of a lock:
 Exclusive (write) lock

Manipal University of Jaipur B1649 Page No. 177


Advanced Database Management System Unit 8

 Shared (read) lock


 Unlocked
You can use locks in the following manner:
1. Firstly, the transaction which is required to access any data item must
lock the item. It makes a request to a shared lock for read only access or
it makes a request to an exclusive lock for read and write access.
2. If another transaction does not lock the item, then the lock will be
provided.
3. If the item is currently in the locked state, then the database
management system (DBMS) identifies whether the request is well-
matched with the current lock. If an item already having shared lock gets
a request of shared lock, then the request will be granted. Or else,
transaction must wait until the current lock is released.
4. Transaction maintains to hold a lock until it is clearly released either
during execution or when it finishes (that is aborts or commits). The
effects of the write operation will become noticeable to another
transaction only when the exclusive lock has been released.
8.2.2 Locking scheduler
The scheduler is used to create the order which implements the operations
inside simultaneous transactions. To guarantee serialisability, the
implementation of database operations is interleaved by the scheduler. For
identifying the proper order, the scheduler establishes its proceedings on
algorithms of concurrency control, like time stamping or locking methods,
which are discussed later in the unit. It also ensures that the computer’s
central processing unit (CPU) is used in an efficient manner.
Lock requests are allowed by the locking scheduler provided the condition is
that of being in a legal schedule. Lock table accumulates the information
regarding existing locks on the elements.
8.2.3 Two phase locking
Two phase locking is another method which is used to ensure serialisability.
The “two-phase” lock is followed by a transaction if all lock requests occur
prior to the first unlock operation inside the transaction. Two phase locking
defines the process of acquiring and giving up the locks. It does not prevent
deadlocks. (Deadlocks take place when two or more transactions are in a

Manipal University of Jaipur B1649 Page No. 178


Advanced Database Management System Unit 8

waiting position and they are waiting for the locks held by each other that
are to be released.)
Two main phases are there within the transaction:
 Growing phase: In case of growing phase, all necessary locks are
attained by a transaction without releasing data.
 Shrinking phase: In case of shrinking phase, all locks are released by a
transaction. Also you cannot attain any new lock.
The rules followed in case of two phases locking protocol are given below:
 Two transactions cannot comprise inconsistent locks.
 No unlock operation can be performed before a lock operation in the
similar transaction.
 It will not affect any data until the attainment of all the locks.
If, throughout the initial stage, a process is not able to obtain all the locks,
then it is necessary to release all of them. They wait, and begin once more.
Make note that if two phase locking is utilised by all transactions, then each
schedule created by interleaving them are said to be serialisable.
To make sure that data is not accessed by a transaction until another
transaction operating on the data has either committed or aborted, locks
may be held until the transaction is committed or terminated. We call this as
strict two phase locking. This is comparable to two phase locking excluding
that here, the shrinking phase occurs throughout the commit or abort. We
have one consequence of strict two phase locking. By placing the second
phase at the end of a transaction, all lock acquirements and releases can be
managed by the system without the knowledge of transaction.
Self Assessment Questions
1. Which phase of locking defines the process of acquiring and giving up
the locks?
(a) Growing Phase
(b) Shrinking Phase
(c) Two Phase
(d) None of the above
2. Locking scheduler permits lock requests only if it is in a legal schedule.
(True/ False)

Manipal University of Jaipur B1649 Page No. 179


Advanced Database Management System Unit 8

8.3 Locking Systems with Several Lock Modes


A lock comprises a mode that identifies its power. It identifies, for example,
whether it stops other users from reading, or just from changing, the data.
Based on the type of lock mode, when one user comprises a lock on a
record, the lock does not allow other users to change that record. It stops
other users even from reading that record.
Let us now discuss the several lock modes as bellow:
1. Shared Locks: In case of Row-level shared locks, various users are
permitted by shared locks to read data, but no user is permitted to
change that data. In case of Table-level, various users are permitted by
shared locks to carry out read and write operations on the table, but any
user is not permitted to carry out Data Definition Language (DDL)
operations. Numerous clients can hold shared locks simultaneously.
2. Exclusive Locks: In this case, only one user is permitted to update a
specific part of data (that is insert, update, and delete). When one client
holds an exclusive lock on a row or table, it is not allowed to place other
lock of any type on it.
3. Update Locks: Update locks, at all times, are considered as row-level
locks. When the client uses the row by means of the SELECT FOR
UPDATE statement, the row is locked by using an update mode lock.
This signifies that the row cannot be read or updated by any other user
and makes sure that the existing user can update the row afterwards.
Update locks are said to be comparable to exclusive locks. The major
distinction among the two is that you can obtain an update lock when
another user already comprises a shared lock on the same record. This
allows the update lock’s holder to read data and it does not prohibit
other clients. On the other hand, once the data is changed by the update
lock holder, the update lock is transformed into an exclusive lock.
In addition, update locks are said to be asymmetric with regard to
shared locks. You can obtain an update lock on a record that already
comprises of a shared lock, but you cannot obtain a shared lock on a
record that already comprises of an update lock. Since an update lock
avoids successive read locks, it is simpler to transform the update lock
to an exclusive lock.

Manipal University of Jaipur B1649 Page No. 180


Advanced Database Management System Unit 8

4. Upgrading Locks: Let us assume that a transaction is required to read


as well as write. In this case, it obtains a shared lock on the element. It
carries out the calculations on the element. Also, when it is ready to
write, an exclusive lock is granted to it.
Transactions with unexpected read write locks can make use of
Upgrading Locks. Use of upgrading locks in a random manner
generates a deadlock. For example, both the transactions want to
upgrade on the same.
5. Increment Locks: Increment Locks are utilised for incrementing &
decrementing stored values. For example, in case of ticket selling
transactions, number of seats is decremented after every transaction.
Read or write locks are not allowed by an increment lock. Various
transactions can hold increment lock on element. If an increment lock is
granted on element, then shared and exclusive locks cannot be granted.
You cannot merge shared and exclusive locks. If User 1 is having an
exclusive lock on a record, then a shared lock or an exclusive lock cannot
be obtained on that same record, by User 2.
Within a specific group, all locks are treated as equal.
 All users in spite of the user privileges are equal. Locks that are placed
by a Date Base Administrator are considered equivalent to the locks that
are placed by some client.
 All methods of executing statements that place locks are equal.
It is not bothered whether you execute lock as portion of a typed statement,
which is called from a remote application (compiled), or called from inside
the local application, or if you place the lock as a consequence of a
statement inside an accumulated procedure or trigger.
You cannot escalate some locks. For example, if you are utilising a scroll
cursor and you obtain a shared lock on a record, and then afterward within
that same transaction, that record is updated. Obtaining an exclusive lock is
only possible if no other locks are there on the table. If you and another
client both comprise of shared locks on the similar record, then the server
cannot upgrade your shared lock to an exclusive lock until the other client
drops her shared lock.

Manipal University of Jaipur B1649 Page No. 181


Advanced Database Management System Unit 8

Self Assessment Questions


3. Which types of locks are used for incrementing & decrementing stored
values?
a) Exclusive locks
b) Upgrade locks
c) Decrement locks
d) Increment lock
4. In shared locks, only one user is permitted to update a specific part of
data. (True/False)

Activity 1
How can you compare Shared locks and Exclusive locks? Illustrate.

8.4 Architecture for a Locking Scheduler


To understand the architecture of a locking scheduler, consider a simple
scheduler that:
 Inserts locks for transaction
 Releases locks when told
Now we will discuss about two-part scheduler as given below.
8.4.1 Two-part scheduler
Here, the scheduler is divided into two parts as shown in Figure 8.1.

Figure 8.1: Two-part Scheduler

Manipal University of Jaipur B1649 Page No. 182


Advanced Database Management System Unit 8

Let us now discuss the concept of two-part scheduler. As you can see in the
Figure 8.1 above, it contains two parts, that is, part I and part II.
Part I obtain the number of requests from the transactions and inserts lock
actions prior to all database-access operations. It must choose a suitable
lock mode. Thus, Part I select & insert suitable lock modes to database (DB)
operations such as read, writes, or update.
Actions (such as a lock or db operation) performed by part I are taken by
part II. Part II executes the number of actions received by Part I. It finds out
if the transaction T delay is required since a lock has not been provided. If
this is the case, then add the action to the list of actions that must finally be
performed for transaction T.
So, now we identify the transaction (T) to which that action belongs and find
out status of T (delayed or not).
 If T is delayed, then action is delayed and it is added to wait list.
 If T is not delayed, then, two cases are possible.
 If the action is a db operation, then it is transmitted to the database
and executed.
 If the action is a lock, check the lock table to observe if the lock can
be granted.
 If the lock is granted, then modify the lock table to comprise the
lock just granted.
 If not, make an entry in the lock table to specify that the lock has
been requested. More actions for transaction T are postponed,
until the lock is provided.
 When T is done (commits or aborts), transaction manager (belongs to T)
informs Part I to release all locks, if waiting lock is still there, Part I then
informs Part II.
 When Part II is informed that a lock is available on some database
element.
 It identifies the next transaction or transactions that can now be
provided a lock on element.
 Those transactions are permitted to execute their postponed actions
until they either complete or arrive at another lock request that
cannot be granted.
Let us now recognise the concept of lock table.

Manipal University of Jaipur B1649 Page No. 183


Advanced Database Management System Unit 8

8.4.2 The lock table


The lock table signifies a relation that relates database elements with
information regarding locking. Size, in lock table, is comparative only to the
number of lock elements, and not to the size of the whole database.
We have shown this in Figure 8.2 of lock table as below

Figure 8.2: The Lock Table

As you can see in the Figure, there are various group modes which are
defined as below:
 S: It symbolises only shared locks.
 U: It symbolises one update lock and zero or shared locks.
 X: It symbolises one exclusive lock and no other locks.
Waiting bit notifies that at least one transaction is waiting to have a lock on
A and B. Let us illustrate all the transactions which are either holding or
waiting for a lock on A and B. Each entry has:
 The name of the transaction
 The mode of this lock
 Whether transaction is holding or waiting for a lock
 Pointer associating entries together

Manipal University of Jaipur B1649 Page No. 184


Advanced Database Management System Unit 8

 Pointer associating all entries for a specific transaction (Tnext). It is


utilised when a transaction commits or aborts to discover all the locks
that must be released.
Handling Lock Requests: Let us assume that a transaction T requests a
lock on A
 If there is no lock table entry for A, then there are no locks on A, so
create the entry and grant the lock request.
 If the lock table entry for A exists, use the group mode to guide the
decision about the lock request.
 If group mode is U (update) or X (exclusive), then
 Reject the lock request by T
 Place an entry on the list notifying that T requests a lock
 And Wait? = ‘yes’
 If group mode is S (shared), then
 Request I granted for an S or U lock
 Entry is created for T on the list with Wait? = ‘no’
 In the case where the new lock is considered as an update lock, we
change the group mode to U.
Handling Unlock Requests: Now let us assume that a transaction T
unlocks A. Then:
 Delete entry of T on the list for A
 If T’s lock is not matching to that of the group mode, it is not required to
change group mode
 Or else if T’s lock is:
 X, then there are no other locks
 U or S, then find out if there are remaining S locks or not.
 If the value of Waiting is ‘yes’, it is required to grant one or more lock
requests
There are various different approaches defined as below:
 First-Come-First-Served: This approach grants longest waiting
request. It does not provide starvation.
 Shared Locks Priority: This approach grants all S locks waiting, then
one U lock. It grants X lock if others are not waiting.
 Upgrading Priority: If there is a U lock waiting for upgrading an X lock,
grant that first.
Manipal University of Jaipur B1649 Page No. 185
Advanced Database Management System Unit 8

Self Assessment Questions


5. The _______ represents a relation which is related with the database
elements with information regarding locking.
6. Part II selects & inserts suitable lock modes to database (DB)
operations such as read, write, or update. (True/False)

8.5 Managing Hierarchies of Database Elements


To understand this concept, let us first recognise the term “Database
Element”.
The term “Database Element” points to various elements in the database.
Different systems lock different sizes of database elements (such as
Relations, Blocks, Tuples). It is based on the Application and the structure of
the data for which the Database is utilised.
Managing hierarchies concentrate on two problems that occur when there is
a tree structure to our data.
1. The first tree structure that takes place is the hierarchy of lockable
elements. It illustrates the process of allowing locks on both large
elements, such as Relations and smaller elements included in it such as
blocks and tuples of relation, or individual.
2. Another type of hierarchy is data that is itself organised in a tree. One of
the major examples would be B-tree index.
Locks with multiple granularity
Locking functions in any situation, however it is important to identify whether
to select large objects or small objects. Also it is necessary to recognise the
levels of granularity at which we shall lock. It follows the following process of
transaction: the lower the level of granularity, the more concurrency, but the
more locks and the higher the locking overhead. Best transaction process
depends on application, for example, locking blocks or tuples in bank
database, and entire documents in document database.
There may be a requirement for locks at numerous levels of granularity,
even inside the same application. The database elements such as
Relations, Blocks, and Tuples are structured in a hierarchy in the following
manner as shown in the Figure 8.3 as below.

Manipal University of Jaipur B1649 Page No. 186


Advanced Database Management System Unit 8

Figure 8.3: Hierarchy of Database Elements

Example: Bank Application (as shown in Figure 8.4)


Here, the relations, block and tuples are considered as follows:
 Relation: Account Balances
 Block: Accounts
 Tuples: Add/Deposit, Remove/Withdraw, Sum

Figure 8.4: Example of Bank Application

 If Locks takes place on the Relation Level, then only one lock would be
there for the whole Account Balances Relation.
 The overall Account Balances are changed by most of the transactions.
Thus, an exclusive lock is needed by the transactions on the Account
Balances Relation.
 Only one deposit or withdrawal could happen at any time
 Thus as a result, the system would permit very little concurrency.

Manipal University of Jaipur B1649 Page No. 187


Advanced Database Management System Unit 8

 Just lock separate Account Blocks or Account Tuples


 Offering a Lock for each tuple is too fine-grained and is perhaps does
not value the effort.
Warning Protocol
The warning protocol is used to handle locks on a hierarchy of database
elements.
Here two new types of locks are introduced. They are:
 IS: IS signifies intention to request an S lock.
 IX: IX signifies intention to request an X lock.
An IS (or IX) lock specifies the intention to request S (or X) lock for a sub
element down in the hierarchy. For requesting an S (or X) lock on any
element (A) of database, a path is travelled from the root to the element A. If
we have arrived at A, then the S (or X) lock is requested.
Or else, an IS (or IX) lock is requested. Once we have acquired the
requested lock, we continue to the corresponding child (if required).
Now let us see the Compatibility matrix between locks as shown in
Figure 8.5 below:

Figure 8.5: Compatibility Matrix between Locks

If two transactions are proposed to read or write a sub element, an I lock is


provided to both of them. Thus the potential conflict will be resolved at a
lower level.
An I lock used for a super element provides a restriction to the locks that the
similar transaction can acquire at a sub element.
 If the parent element P as shown in Figure 8.6, in IS is locked by Ti, then
Ti can lock child element C in IS, S.

Manipal University of Jaipur B1649 Page No. 188


Advanced Database Management System Unit 8

 If the parent element P in IX is locked by Ti, then Ti can lock child


element C in IS, S, IX, X

Figure 8.6: Parent Element and Child Element

Self Assessment Questions


7. _______ lock specifies the intention to request an S lock.
8. The process of managing locks on a hierarchy of database elements is
performed by warning protocol. (True/ False)

8.6 Concurrency Control by Timestamps


Concurrency Control is defined as the coordination of the concurrent
execution in a multi-processing database system. The purpose of
concurrency control is to guarantee the transactions serialisability in the
environment of a multi-user database.
Whenever a transaction begins, a timestamp is provided to it. Through this,
we can tell the order in which the transactions are considered to be applied
in. Thus, if two transactions are provided that affect the same object, the
transaction having the previous timestamp is intended to be applied prior to
the other one. But, if we present the wrong transaction first, it is terminated
and must be started again.
All the objects in the database comprise of a read timestamp and a write
timestamp. Read timestamp is updated when the data of the object is read.
A write timestamp is updated when the data of the object is changed. If a
transaction is required to read an object, but the transaction initiated before
the write timestamp of object, it signifies that an object’s data is changed
after the initiation of a transaction. In this situation, the transaction is
cancelled and must be started again.
If there is a need of a transaction to write to an object, and the transaction
initiated prior to the object’s read timestamp, it signifies that the object is
Manipal University of Jaipur B1649 Page No. 189
Advanced Database Management System Unit 8

viewed, and it is presumed that it had taken a copy of the object's data.
Thus you cannot write to the object because that would make any copied
data invalid. Thus, the transaction is terminated and must be started again.
8.6.1 Timestamp resolution
Timestamp Resolution is defined as the smallest amount of time passed
among two neighbouring timestamps. If the timestamp’s resolution is
excessively large, the chance of two or more timestamps being equal is
augmented. This therefore enables some transactions to give out of correct
order.
For example, let us suppose that we are having a system that can generate
number of exclusive timestamps per second, and two events are provided
that take place 2 milliseconds apart, then they will possibly be provided the
same timestamp although they in fact occurred at unusual times.
8.6.2 Timestamp locking
Although this method is a method of non-locking, that is, the object is not
locked from simultaneous access for the period of a transaction. The act of
recording every timestamp in opposition to the Object needs a very short
duration lock on the object or its substitute.
Concurrency control by means of timestamps is considered dissimilar as
compared to locking in one significant manner. When a transaction comes
across a later timestamp, it terminates. With locking, it would either wait or
continue instantly.
Self Assessment Questions
9. If we present the wrong transaction first, it is not necessary to abort it.
(True/False)
10. What is the minimum time passed among two neighbouring timestamps
known?
a) Timestamp Collection
b) Timestamp Locking
c) Timestamp Resolution
d) Timestamp Revision

Manipal University of Jaipur B1649 Page No. 190


Advanced Database Management System Unit 8

8.7 Concurrency Control by Validation


Transactions can continue without locking. All database modifications are
performed on a local copy. We verify if the schedule for transaction is
serialisable or not. If it is serialisable, the local copy variations are affected
to the global database. Or else, the local modifications are not needed, and
the transaction is started again.
For every transaction T, the scheduler preserves two sets of significant
database elements which are defined as below:
 RS(T), the read set of T: It is the set of all database elements read by T.
 WS(T), the write set of T: It is the set of all database elements written
by T.
This information is critical to find out whether some schedule that has
previously been executed was really serialisable.
We execute Transaction T in three phases:
1. Read: In this phase, transaction reads every element from database. It
performs all its actions in its local address space.
2. Validate: In this phase, the serialisability of the schedule is verified by
contrasting RS(T) and WS(T) to the read / write sets of the simultaneous
transactions. If validation is ineffective, skip phase 3.
3. Write: In this phase, we write the new values of the elements in WS(T)
back to the database.
The scheduler preserves three sets of transactions and some significant
information at any time.
1. START: It is a set of transactions that have initiated, but have not still
accomplished their validation stage. For every element T of START,
keep START(T).
2. VAL: It is a set of transactions that have completed validation, but not
still accomplished their write stage. For elements T of VAL, record
VAL(T).
3. FIN: It is a set of transactions that have accomplished all the phases.
For T in FIN, keep FIN(T).
Let the validation order be T1, T2, T3, …. Thus the resultant schedule will
be considered as conflict equivalent to sequential schedule S = T1, T2, T3.

Manipal University of Jaipur B1649 Page No. 191


Advanced Database Management System Unit 8

You can consider every transaction that productively validates as executing


completely at the moment that it validates.
Self Assessment Questions
11. What do you call a set of transactions that have initiated, but have not
still finished their validation phase?
a) START
b) VAL
c) FIN
d) STOP
12. FIN is a set of transactions that have finished validation, but still not
finished their write phase. (True/False)

Activity 2
Illustrate how concurrency control by validation is performed. Explain with
example.

8.8 Database Recovery Management


Database Recovery is concerned with the restoring of the database to an
accurate state in the occurrence of a failure. There are numerous storage
devices that hold data. They are main memory, magnetic disk, magnetic
tape and optical disks. The DBMS needs recovery on the different types of
failures defined as below:
1. System crashes (Software & Hardware)
2. Media failures
3. Application software error
4. Natural physical disaster
5. Unintentional distraction
6. Sabotage
7. Programming Exemption
To prevent data from failure different backup methods are used. There are
three different levels of backup. These are discussed as below:
 Full database backup: It considers a full backup of the database, or
dump of the database.
 Differential backup: In case of differential backup, only the last
variations that are performed on the database are copied.

Manipal University of Jaipur B1649 Page No. 192


Advanced Database Management System Unit 8

 Transaction log backup: A transaction log backup method is used to


back up the transaction log operations that are not shown in a preceding
backup copy of the database.
The steps generally followed in the database recovery process are given as
below:
 First, identify the type and the scope of the required recovery.
 If the whole database is required to be recovered to a reliable state, then
the recovery makes use of the latest backup copy of the database in a
recognised reliable state.
 The backup copy is then rolled forward to restore all subsequent
transactions by means of the transaction log info.
 If the database is required to be recovered but the committed segment
of the database is still functional, the recovery procedure makes use of
the transaction log to ‘undo’ every transaction that is not committed.
Transaction recovery includes the following:
 Write-ahead protocol: This protocol makes sure that transaction logs
are always written before any database data that are really updated.
 Redundant transaction logs: Most DBMS maintain numerous copies
of the transaction log to make sure that a disk physical failure will not
harm the DBMS ability for recovering data.
 Database buffers: Buffer is considered as a temporary storage part
available in primary memory. It is used to accelerate disk operations.
 Database checkpoint: It is an operation where all the updated buffers
are written to the disk by the DBMS.
There are two fundamental characteristics for transaction recovery:
1. Deferred-write and Deferred-update: When deferred write or deferred
update is used by the recovery process, the transaction operations do
not update the physical database instantly.
2. Write-through: When the recovery process utilises the write through or
immediate update, the database is instantly updated by transaction
operations throughout the transaction’s execution, even before the
transaction arrives at its commit point.

Manipal University of Jaipur B1649 Page No. 193


Advanced Database Management System Unit 8

Self Assessment Questions


13. Which type of backup of the database is used to copy only the last
modifications that are performed on the database?
a) Full database backup
b) Differential backup
c) Transaction log backup
d) None of the above
14. A temporary storage area in primary memory is known as buffer.
(True/ False)

8.9 Summary
Let us recapitulate the important points discussed in this unit.
 The purpose of concurrency control is to guarantee the transactions’
serialisability in a multi-user database environment.
 A lock is used by the transaction to reject data access to other
transactions and thus avoid inaccurate updates.
 The scheduler is used to create the order which implements the
operations inside simultaneous transactions. To guarantee
Serialisability, the implementation of database operations is interleaved
by the scheduler
 The “two-phase” lock is followed by a transaction if all lock requests
occur before the first unlock operation inside the transaction.
 Based on the type of lock mode, when one user is having a lock on a
record, the lock stops other users from changing that record.
 The lock table signifies a relation that relates database elements with
information regarding locking.
 The term “Database Element” points to various elements in the
database. The database elements such as Relations, Blocks, and
Tuples are organised in a hierarchy.
 Whenever a transaction begins, a timestamp is provided to it. Through
this, we can tell the order in which the transactions are considered to be
applied in.
 Database Recovery is concerned with the restoring of the database to
an accurate state in the occurrence of a failure.

Manipal University of Jaipur B1649 Page No. 194


Advanced Database Management System Unit 8

8.10 Glossary
 B-Tree Index: It is a tree data structure that keeps data sorted and
allows searches, sequential access, insertions, and deletions in
logarithmic time.
 Concurrency Control: The coordination of the concurrent execution in
a multiprocessing database system is known as Concurrency Control.
 Database Element: The term “Database Element” points to various
elements in the database.
 Database Recovery: Database Recovery is concerned with the
restoring of the database to an accurate state in the occurrence of a
failure.
 Lock Table: The lock table signifies a relation that relates database
elements with information regarding locking.
 Lock: A lock is used by the transaction to reject data access to other
transactions and thus avoid inaccurate updates.
 Scheduler: The scheduler is used to create the order which implements
the operations inside simultaneous transactions.
 Two-phase locking: The “two-phase” lock is followed by a transaction if
all lock requests occur before the first unlock operation inside the
transaction.

8.11 Terminal Questions


1. How is two phase locking method used to ensure serialisability?
Illustrate.
2. What are the different lock modes used in the locking system? Discuss.
3. Explain the architecture for a Locking Scheduler.
4. Illustrate the concept of locks with multiple granularity with example.
5. Explain the concept of database recovery management. Discuss the
different levels of backup used for recovering data.

8.12 Answers
Self Assessment Questions
1. c) Two phase
2. True
3. Increment
4. False

Manipal University of Jaipur B1649 Page No. 195


Advanced Database Management System Unit 8

5. lock table
6. False
7. IS
8. True
9. False
10. Timestamp Resolution
11. START
12. False
13. b) Differential backup
14. True
Terminal Questions
1. The “two-phase” lock is followed by a transaction if all lock requests
occur before the first unlock operation inside the transaction. Two phase
locking defines the process of acquiring and giving up the locks. It does
not prevent deadlocks. If two phase locking is used by all transactions,
then all schedules created by interleaving them are serialisable. Refer
Section 7.2 for more details.
2. Locking system includes various lock modes such as Shared Locks,
Exclusive Locks, Update Locks, Upgrading Locks, and Increment Locks.
Refer Section 7.3 for more details.
3. The scheduler is divided into two parts. Part I selects & inserts suitable
lock modes to database (DB) operations such as read, write, or update.
Actions (such as a lock or db operation) performed by part I are taken by
part II. Part II executes the number of actions received by Part I for more
details.
The lock table signifies a relation that relates database elements with
information regarding locking. In lock table, size is proportional to the
number of lock elements only, not to the size of the whole database.
Refer Section 7.4 for more details.
4. Locking functions in any situation, however it is important to identify
whether to select large objects or small objects. Also it is necessary to
recognise the level of granularity at which we shall lock. It follows the
following process of transaction: the lower the level of granularity, the
more concurrency, but the more locks and the higher the locking
overhead. Best transaction process depends on application, for

Manipal University of Jaipur B1649 Page No. 196


Advanced Database Management System Unit 8

example, locking blocks or tuples in bank database, and entire


documents in document database. Refer Section 7.5.1 for more details.
5. Database Recovery is concerned with the restoring of the database to
an accurate state in the occurrence of a failure. There are numerous
storage devices that holds data. To prevent data from failure different
backup methods are used. Refer Section 7.8 for more details.

References:
 Majumdar, A.K. & Bhattacharya, P. (2006) Database Management
Systems, 18th edition, Tata McGraw-Hill Education.
 Singh, S.K. (2009) Database Systems: Concepts, Design and
Applications, 3rd edition, Pearson Education.
E-references
 http://www.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html,
05-04-12
 http://www.slideshare.net/koolkampus/ch16, 05-04-12

Manipal University of Jaipur B1649 Page No. 197


Advanced Database Management System Unit 9

Unit 9 Parallel Database Architectures


for Parallel Databases
Structure:
9.1 Introduction
Objectives
9.2 Parallel Database
Advantages of parallel database
Disadvantages of parallel database
Parallelism in Database Management System
9.3 Parallel Query Evaluation
Parallel query processing
When to implement parallelism
How parallel-execution works
Parallelised SQL statements
9.4 Parallelising Individual Operations
9.5 I/O Parallelism
Partitioning techniques (number of disks = n)
Comparison of partitioning techniques
9.6 Inter-Query Parallelism
9.7 Intra Query Parallelism
Intra partition parallelism
Inter partition parallelism
9.8 Inter Operation and Intra Operation Parallelism
9.9 Design of Parallel Systems
9.10 Summary
9.11 Glossary
9.12 Terminal Questions
9.13 Answers

9.1 Introduction
In the previous unit, you studied Concurrency Control and its various related
aspects such as enforcing, serialisability by locks, locking systems,
architecture for a locking scheduler, managing hierarchies of database
elements, concurrency control and database recovery management. In this
unit, we will introduce you to transaction processing.

Manipal University of Jaipur B1649 Page No. 198


Advanced Database Management System Unit 9

Databases are becoming increasingly large as they contain large volumes of


transaction data and various types of multimedia objects such as images.
Hence, large-scale parallel database systems are gradually used more and
more for storing large volumes of data; process the time-consuming
decision-support queries and providing high throughput for transaction
processing.
Thus, in this unit we will look at the issue of parallelism and data distribution
in a DBMS. You will learn about the basic concepts of parallel database and
alternatives for parallel database architecture. Thereafter, you will be
introduced to the concept of data partitioning and study its influence on
parallel query evaluation. This unit explores a variety of parallelisation
techniques, including I/O parallelism, inter-query and intra-query parallelism,
and inter operation and intra operation parallelism. The unit will wind up with
the various parallel-system design issues.
Objectives:
After studying this unit, you should be able to:
 describe parallel databases and its architecture
 identify parallel query evaluation
 demonstrate parallelisation of individual operations in parallel databases
 identify the concept of I/O parallelism
 differentiate between inter-parallelism and intra-parallelism
 compare intra operation with inter operation parallelism
 recognise the various design issues of parallel system

9.2 Parallel Database


A parallel database can be defined as a database that run multiple
instances to "share" a single physical database. A variety of hardware
architectures allow multiple computers to share access to data, software, or
peripheral devices. A parallel database is designed to take advantage of
such architectures.
Parallel database systems are based on the concept of “parallelism in data
management”. In order to deliver high-performance and high-availability,
database servers are placed at a much lower price than equivalent
mainframe computers.

Manipal University of Jaipur B1649 Page No. 199


Advanced Database Management System Unit 9

Parallel database are widely used now-a-days because of its unique


benefits. The technology of Parallel database helps to benefit specific types
of applications by enabling features such as higher performance, greater
flexibility, high availability and capacity to serve many users simultaneously.
You might wonder how a parallel database is able to do all this efficiently.
In a parallel database technology, more than one CPU is accessible to an
application, hence higher speed up and scale up can be achieved. Also the
nodes are separated from each other; therefore any malfunction at one
node does not affect the entire system. In case of any failure, one of the
properly working nodes covers for the failed node and the system carries
onto furnish data access to users. And, this provides high database
availability.
In parallel database, allocation and de-allocation of instances can also be
done as per necessity. For example, more instances can be allocated if
there is increase in database demand and also some instances can be
de-allocated if the demand is less. Parallel database also makes it possible
to get over the memory constraint by empowering a single system to serve
multitude of users.
9.2.1 Advantages of parallel database
There are numerous advantages of parallel database technology. Some of
these have been listed below:
1. Increased throughput
2. Decreased response time
3. Ability to process an extremely large number of transactions
4. Substantial performance improvement
5. Increased availability of system
6. Greater flexibility
7. Possible to serve large number of users
9.2.2 Disadvantages of parallel database
Along with advantages it also carries some disadvantages as listed below:
1. More start-up costs as when several processes start in parallel they tend
to easily dominate the real computation time.
2. Interference problem is created due to the slow-down which each new
process imposes on the other process.

Manipal University of Jaipur B1649 Page No. 200


Advanced Database Management System Unit 9

3. The service time of the slowest step of task is actually the service time
for the system.
9.2.3 Parallelism in Database Management System
Parallelism in Database Management System is available in various forms.
It also has numerous goals and objectives to fulfil. All this and a detailed
explanation of the parallel DBMS architecture will be discussed in the
following sections:
Goals/Objectives of parallelism in database
There are mainly two objectives of parallelism in database:
1. Speed Up: First objective is to speed up the processing of a given
task/query i.e. to reduce the response time. This is done with the help of
additional hardware that helps to process the same task in lesser time
as compared to a single hardware. It is as simple as two men performing
a task will take lesser time as compared to one man doing it alone.
Speed up is measured by the formula:
Speed up  Time _ Original / Time _ Parallel

If an original system takes 2 minutes to process a task and a parallel


system takes 1 min to process the same task, then the speed up will be
2 (i.e. 2min/1min).
2. Scale Up: The next goal of parallel database is scale up. Scale up
means to process a larger number of jobs in the same time period. It
may also be said to increase the throughput of the system.
Scale up can be calculated by applying the below given formula:
Scale up  Volume _Parallel/ Volume _ Original
For example: If an original system processes 1000 transactions in a
minute and a parallel system processes 2000 transactions in a minute
then the scale up will be 2 (i.e. 2000/1000).
Most of the traditional DBMS used relational database approach, which was
dominating at onetime - it is still a dominating database approach. The fact
that relational queries are ideally suitable for parallel-execution, gave great
interest to many researchers in adopting parallelism in DBMS, and resulted
in giving birth to parallel DBMSs. Relational queries is composed of uniform
operations implemented on consistent streams of data. Every operator

Manipal University of Jaipur B1649 Page No. 201


Advanced Database Management System Unit 9

creates one new relation; therefore the operators can be compiled into
parallel data-flow graphs.
Forms of parallelism
There are two forms of parallelism that can be applied to DBMSs.
1. Pipelined parallelism: In a pipelined parallelism approach, one
operation’s output is streamed into the input of some other operator and
these two operators can function simultaneously in series. Figure 9.1
depicts an example of pipelined parallelism in which the output of scan
is immediately fed into the input of sort where sort is executing in parallel
to the data scan.
2. Partitioned parallelism: In a partitioned parallelism approach, the input
is partitioned between different processors & memories, and also the
operator can often be divided into several autonomous operators each
working on a part of the data. Figure 9.1 illustrates an example of
partitioned parallelism in which four processors scan and sort the input
simultaneously and the results of all four are merged together to
generate the final output.

Sort Sort Sort Sort Sort

Sort Scan Scan Scan Scan

Source Source Source Source Source


Data Data Data Data Data

Figure 9.1: Pipelined and Partitioned Parallelism

Parallel DBMSs architecture


Now we will briefly discuss the parallel DBMS architecture. There are mainly
three machine architectures available on which you can run parallel DBMSs
so as to minimise the response time and maximise the throughput.

Manipal University of Jaipur B1649 Page No. 202


Advanced Database Management System Unit 9

These three architectures are discussed below:


(a) Shared memory multiprocessor: In a shared-memory multiprocessor
architecture, several processors share storage disks and common
memory with assistance of interconnected network. Figure 9.2 shows
such one shared memory architecture.

CPU 1 CPU 2 CPU 3

Common
Bus
Shared Memory

Shared
Disc

Figure 9.2: Shared Memory Architecture

This shared-memory architecture is suited for lower degree of


parallelism i.e. where there are not many processors. But for this type of
system, one major problem is network interference. Moreover,
partitioning creates many of the skew and load balancing problems
faced by shared nothing machine.
(b) Shared nothing multiprocessor: In a shared-nothing architecture,
each processor has its own personal memory besides one or more disk
storage. As Shared-nothing multiprocessor runs only question &answer
through the network, low traffic is likely in the inter-connection network.
This gives low network intervention among processors, and hence
permits high scalability. Figure 9.3 shows a shared nothing illustration.

Manipal University of Jaipur B1649 Page No. 203


Advanced Database Management System Unit 9

CPU CPU CPU CPU

Memory Memory Memory Memory

Disc Disc Disc Disc

Figure 9.3: Shared-Nothing Architecture

Shared-nothing parallel machine architecture is possibly the most


efficient architecture upon which parallel database system should be
carried out.
(c) Shared-disk multiprocessor: In shared-disk architecture, every
processor has its own personal memory and only shares storage disks
via an inter-connection network. Shared-disk multiprocessor aids in
improving network interference difficulty of shared - memory
multiprocessor. Figure 9.4 shows a shared -disk illustration.
Node 1 Node 2 Node 3 Node 4 Node 5

CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU

Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory

Common High Speed Bus

Shared
Discs

Figure 9.4: Shared-Disk Architecture

Manipal University of Jaipur B1649 Page No. 204


Advanced Database Management System Unit 9

This architecture is effective for following situations:


 big read-only database
 database in which there is no simultaneous sharing
It is not really suitable for database applications that reads & writes shared
database. When a processor wants a shared disk page for write
concurrency control, it becomes an issue. To implement this in shared-disk
architecture inter-processor communication is necessary for reservation and
release. There are various optimisations of such protocol, but they all
exchange large data pages and reservation messages and. It produces
processor interference besides producing delays in heavy traffic on the
shared up inter-connection network.
Self Assessment Questions
1. Nearly all traditional DBMS uses relational database approach.
(True/False)
2. In which architecture, each processor has private memory and one or
more private disk storage?
(a) Shared memory
(b) Shared Disk
(c) Shared nothing
(d) Pipelined

Activity 1
Briefly compare the three alternative design approaches for parallel
databases based on their potential advantages and disadvantages.

9.3 Parallel Query Evaluation


Optimisation of Query evaluation is done by using parallel optimisation
methodologies for making repartitioning more competent. Efficiency is
enhanced by identifying the potential partitioning necessities for obtaining,
parallelism for query operation, and by identifying when the data’s
partitioning property fulfils the partitioning needs of a query operation.
A database management system in agreement with the invention uses
parallel query processing approaches to optimise repartitioning of data, or to

Manipal University of Jaipur B1649 Page No. 205


Advanced Database Management System Unit 9

void it altogether. Now let us understand the basics of parallel query


processing.
9.3.1 Parallel query processing
In parallel query processing, multiple processes together at the same time to
handle one individual SQL statement. The Database server can handle the
statement more swiftly in comparison to a single server by segregating the
work necessary to process a statement among various multiple servers.
The feature helps to dramatically enhance functioning for data intensive
operations related with decision-support applications and even extremely
big database environments. Parallel query feature is most useful for SMP
(Symmetric multiprocessing), clustered or MPP (massively parallel systems)
because of the fact that in such types of systems the query processing could
be efficiently split up among various central processing units on one single
system.
In parallel query processing, the query is parallelised dynamically at the
execution time itself. It automatically adapts to optimise itself for
parallelisation, if the location or distribution of the data changes.
9.3.2 When to implement parallelism
Parallel-execution is helpful for various types of operations that access
considerable amounts of data. Parallelism enhances performance for:
 Queries
 formation of large indexes
 volume inserts, updates, and deletes
 Aggregations and copying
Parallel-execution is advantageous to systems with the following
characteristics:
 Symmetric multiprocessors (SMP), clusters, or massively parallel
systems (for example, multiple CPUs)
 Sufficient I/O bandwidth
 Under-utilised CPUs or occasionally used Central Processing Units (for
instance, systems where CPU use is characteristically less than
30percent)

Manipal University of Jaipur B1649 Page No. 206


Advanced Database Management System Unit 9

 Adequate memory to sustain extra memory intensive processes like


hashing, sorts and I/O buffers
9.3.3 How parallel-execution works
The basic unit of work in parallelism is called a granule. When a database
instance starts up, the database makes a pool of parallel-execution servers
that are present for any parallel operation. When performing a parallel
operation, the coordinator acquires parallel-execution servers from the
group and allocates them to the operation. The One process, by the name
of parallel-execution coordinator, assigns execution of one granule to many
parallel-execution servers and co-ordinates results from every server
processes and return results to the user.
Figure 9.5 below displays many such servers executing a scan of the table
employees.

Parallel Execution Parallel


Coordinator Execution Server

EMP
Select * Table

FROM
EMP

Figure 9.5: Parallel Full Table Scan


As you can see in the Figure 9.5, the table EMP is segmented dynamically
into various granules. This task of granule generation is done by
coordinator. This granule is nothing else but an assortment of blocks of table
employees. Now each granule is read by a distinct parallel-execution server.
Note that the mapping is not static, but is decided at the time of execution.
When one execution server completes studying rows in the table employees
analogous to a granule, it retrieves additional granule from coordinator if
some granules are left behind.

Manipal University of Jaipur B1649 Page No. 207


Advanced Database Management System Unit 9

This process continues until all the granules are being used up, in other
words the complete table employee has been successfully read. After this,
the servers dispatch results back to the coordinator which completes the
task of re-assembling pieces into the needed full scan.
The total servers allocated to one single operation are the DOP (degree of
parallelism) for the operation. Various operations among the identical SQL
statement have the same scale of parallelism.
9.3.4 Parallelised SQL statements
Every SQL statement goes through an optimisation and parallelisation
process where the parsing is done. Optimisation is done by the Optimiser. A
database is capable of adapting itself to the new situation if it finds a more
optimum execution plan for a given SQL statement.
After the optimiser decides execution plan for a given SQL statement,
thereafter the coordinator decides the appropriate parallelisation method for
every operation in execution plan.
Therefore it is the responsibility of the coordinator to determine if an
operation must be executed in parallel and if so then how many servers
must be required. The number of servers employed for an operation is
‘degree of parallelism’.
Self Assessment Questions
3. Oracle is capable of increasing or decreasing the number of parallel-
execution servers based on the requirement. (True/False)
4. Basic unit of work in parallelism is _______________.

9.4 Parallelising Individual Operations


In this section, you will learn how individual operations are parallelised in a
database. You have learnt in previous section that before enlisting query
server processes, the query coordinator process scrutinises the operations
in the query execution plan to determine whether the individual operations
can be parallelised.
The Server can parallelise the following operations:
 sorts
 joins
 table scans
Manipal University of Jaipur B1649 Page No. 208
Advanced Database Management System Unit 9

 table population
 index creation
Partitioning rows to each query server
An essential step in parallelisation is partitioning the rows to each query
server. This task is done by the query server. The process (query-
coordinator process) determines the partitioning needs of each and every
operation. Partitioning requirement of an operation is the method in which
rows acted-on by the operation must be partitioned, or divided between the
query server processes. The partitioning maybe any of the given below:
 random
 round robin
 hash
 range
Note: You will learn in detail about the partitioning techniques in next
section.
The next step involves determining the ordering requirement for every
operation in execution plan. This is again done by the query coordinator.
The coordinator establishes the flow of data of the statement i.e. which
operation will be succeeded and preceded by which operation. Operations
that need the output of other operations are known as parent operations.
Parallelism between operations
After deciding upon the partitioning and ordering, the database can proceed
on to parallelism between operations. For doing so it is essential that the
child operations have been executed before the parent operations, as
parent operation requires the output of child operations to consume.
For example consider a query and its data flow along with data flow diagram
as given in Figure 9.6 and 9.7 respectively:

Figure 9.6: The Data Flow for the Above Query

Manipal University of Jaipur B1649 Page No. 209


Advanced Database Management System Unit 9

Query Coordinator

1
Group
by Sort

2
Merge
Join

3 4
FULL SCAN FULL SCAN
emp dept

Figure 9.7: Data Flow Diagram

Figure 9.8 below illustrates the parallel-execution of our sample query.


Parallel Execution Servers for Parallel Execution Servers for
ORDER BY Operation FULL TABLE Scan

A-G

EMP Table
H-M
Parallel
User
execution
Process
Coordinator
N-S
SELECT *
FROM emp
ORDER BY ename T-Z

Intra-operation Inter Intra-operation


Parallelism operation Parallelism
Parallelism

Figure 9.8: Inter operation Parallelism and Dynamic Partitioning

As you see in Figure 9.8 that while the query servers are generating rows in
the FULL SCAN DEPT operation, another set of query servers can start
performing the MERGE JOIN operation to consume the rows. When the
FULL SCAN DEPT operation is complete, the FULL SCAN EMP operation
can begin to produce rows.
It is noteworthy that in reality there are 8 parallel-execution servers working
in the above given query, though the degree of parallelism is 4. The reason

Manipal University of Jaipur B1649 Page No. 210


Advanced Database Management System Unit 9

being, a parent & child operator can be executed simultaneously on account


of the process of interoperation parallelism.
You should also notice that all of the servers occupied in the scan operation
transmit rows to the suitable server executing the sorting operation. If a row
read by a parallel-execution server includes one value for the ‘ename’
column between A & G, that row is sent to the 1st order by parallel-
execution server. After the completion of scan operation, the sorting
processes could send back the sorted results to coordinator, which returns
the query results to the users.
Degree of parallelism
The parallel-execution coordinator can enrol 2 or more parallel-execution
servers of the instance to treat a given statement. The total quantity of
parallel-execution servers that are linked with a single operation is known as
the ‘degree of parallelism’. It refers directly to intra operation parallelism
only. If inter operation parallelism is achievable, then the total number of
parallel-execution servers for one statement can be double the stipulated
degree of parallelism. Therefore, only up to two sets of parallel-execution
servers can run at one time. Every group of parallel-execution servers could
process numerous operations. To ensure optimal inter operation parallelism,
only 2 sets of parallel-execution servers should be active.
Self Assessment Questions
5. Which of the following operations require the output of other
operations?
(a) Child operations
(b) Dependent operations
(c) parent operations
(d) Inter-related operations
6. The number of ____________ linked with a single operation is known
as the degree of parallelism.

9.5 I/O Parallelism


I/O parallelism (Input/output parallelism) is the simplest type of parallelism.
This parallelism attempts to minimise the time required to retrieve relations
from disk by partitioning the relations on multiple disks. In this the input
data is partitioned and thereafter each partition is processed in parallel. The

Manipal University of Jaipur B1649 Page No. 211


Advanced Database Management System Unit 9

results obtained are then combined after the processing of all partitioned
data. This technique greatly reduces the retrieval time. I/O parallelism is
also referred to as data partitioning.
In horizontal partitioning, the tuples of a relation are distributed between
several disks in a manner such that each tuple is on one disk.
Partitioning features can help to significantly enhance data access and
improve overall application performance. Employing the partitioning
techniques explained here, you could tune SQL statements to avert
avoidable index & table scans (by employing partition pruning).
You can also enhance the execution of very big join operations when huge
amounts of data (for instance, millions of rows) are connected jointly by
employing partition wise joins. Ultimately, partitioning data significantly
increases managing capability of big databases and significantly cut down
the time period needed for administrative tasks like restore and backup.
9.5.1 Partitioning techniques (number of disks = n)
There are mainly three types of partitioning techniques i.e., Round robin,
Hash partitioning and Range partitioning. Each partitioning method has
different advantages and design considerations. Thus, each method is more
appropriate for a particular situation. Now we will discuss these techniques
in detail.
(a) Round robin: The most simple partitioning approach divides tuples
between the fragments in the manner of round robin tournament. In this
partitioning technique, the mth tuple of a Relation Z is inserted to disk dm
mod n. In simple words, the disk takes turns while receiving new tuples.
Therefore in this method tuple S would be placed on D1, tuple T would
be placed on D2, tuple U on D3 & so on.
Round robin approach is the partitioned variant of the standard entry-
sequence file.
Advantage: Round robin partitioning is best when all the applications
want to access relation by scanning sequentially all of it on each and
every query.
Disadvantage: Round robin technique is not suitable for sophisticated
access of the relation. For example in case of Join, this technique would
take too much of time.

Manipal University of Jaipur B1649 Page No. 212


Advanced Database Management System Unit 9

(b) Hash partitioning: In this technique we select attributes (one or more)


as partitioning attributes. We select hash function l with range 0......n .
Each tuple or row of the primary relation is hashed on the specific
attribute. The output for the hash function leads to a data that is
transferred to the disk m. Let m refer to the result of hash function l
applied to the partitioning attribute value of a tuple. Send tuple to disk m.
Advantages: Hash Partitioning helps to avoid data skew i.e. data is
evenly distributed across the disk. Hash partitioning is practically
suitable for the applications which want only associative and sequential
access to data. Tuples are positioned by applying a hashing function to
any one attribute of the each tuple.
This function indicates the allotment of position of tuple on one specific
disk. Associative access to the tuples with one particular attribute value
could be directed to one single disk, bypassing the extra work of starting
queries on numerous disks.
Disadvantages: It is not suitable for point queries (queries that involve
exact matches) on non-partitioning attributes. For example:
SELECT *
FROM STUDENT
Where Stud_Age>8 AND Stud_Age<17;
This type of range queries would require more time if implemented with
Hash partitioning.
(c) Range partitioning: In this technique, the administrator chooses an
attribute as the partitioning attribute and specifies attribute-values within
a certain range to be placed on certain disk. Let n be the partitioning
attribute value of a tuple. Tuples such that ni ≥ ni+1 go to disk a + 1.
Tuples with n<n0 go to disk 1 and tuples with n ≥ nn-2 go to disk a1.
Thus, range partitioning helps to distribute a contiguous attribute value
ranges to each disk.
Advantages: Range partitioning bundles tuples with same types of
attributes collectively in the similar partition. It is effective for grouping
data and also sequential & associative access.

Manipal University of Jaipur B1649 Page No. 213


Advanced Database Management System Unit 9

Range partitioning is best suited for performing range based queries and
is also good for point queries (finding exact matches) involving a
partitioning attribute.
Disadvantages: The basic disadvantage with this partitioning is that, it
may result in data skew and execution skew. In data skew, all the data is
located in one partition. In execution skew, all executions take place in
one partition. On the contrary, round-robin and hashing are less prone to
such type of skew problems.
9.5.2 Comparison of partitioning techniques
Table 9.1 below shows the comparison between the three partitioning
techniques on the basis of sequential scan, point query and range query
execution.
Table 9.1: Comparison between Round robin, Hashing and Range techniques

Self Assessment Questions


7. Partitioning data very much improves managing capacity of very big
databases. (True/False)
8. _____________ partitioning is very suitable for point queries and also
range queries.

9.6 Inter-Query Parallelism


Inter-query parallelism means executing different independent queries
simultaneously on separate CPUs. Figure 9.9 below shows how three
independent queries can be executed simultaneously by three different
processors. Each request (task) runs on a single thread and executes on a
single processor.

Manipal University of Jaipur B1649 Page No. 214


Advanced Database Management System Unit 9

Query 1 Processor 1 Result 1

Query 2 Processor 2 Result 2

Query 3 Processor 3 Result 3

Figure 9.9: Inter-Query Parallelism

In inter-query parallelism the transactions or queries parallelly execute with


each other. Inter-query parallelism is used effectively in OLTP (On-line
transaction processing) applications. Hence it is used mainly to update a
transaction processing system in order to maintain a greater number of
transactions frequencies.
Advantages: It is perhaps the easiest type of parallelism to maintain in a
database system mainly in a shared memory system. Helps to increase
transaction throughput. Helps OLTP applications to support more number of
users simultaneously.
Disadvantages: Response times of individual transaction/queries are not
much faster as compared to the transaction or queries that run in isolation. It
is quite complicated in a shared-nothing or shared-disk architecture.
Self Assessment Questions
9. The full form of OLTP is ________________________.
10. Inter query parallelism aids in increasing the transaction throughput.
(True/False)

9.7 Intra Query Parallelism


In Intra query parallelism a single large query is broken into a number of
pieces (subtasks) and those subtasks are executed in parallel on multiple
Manipal University of Jaipur B1649 Page No. 215
Advanced Database Management System Unit 9

processors. It is sometimes also referred as parallel query processing.


Figure 9.10 below shows inter-query parallelism in which a large query is
broken down into two subtasks 1 and 2 which are simultaneously executed
on two different processors. The two results R1 and R2 combine together to
form query results.

Large Subtask 1 Subtask 1 Results


Query
Query Results
Subtask 2 Subtask 2 Results

Figure 9.10: Intra-Query Parallelism

Now let us discuss in detail about intra partition parallelism and inter
partition parallelism.
9.7.1 Intra partition parallelism
Intra partition parallelism relates to the capacity to break up a query into
several parts within a single database partition and execute these parts at
the same time. Intra partition parallelism splits up a particular database
operation, such as database load, index creation, and SQL queries into
numerous parts, all of which or many of them can be parallelly executed in
one database partition. Intra partition parallelism can be applied to get
advantage of multiple processors of a SMP (symmetric multiprocessor)
server.
Figure 9.11 shows a query that is subdivided into 4 pieces that could be
parallelly executed, each of them working with one single subset of the data.
When this happens, the results can be sent back more quickly in
comparison to query when it was running serially.

Manipal University of Jaipur B1649 Page No. 216


Advanced Database Management System Unit 9

Figure 9.11: Intra Partition Parallelism

9.7.2 Inter partition parallelism


Inter partition parallelism relates to the capacity to break up a query into
various parts across various partitions of a partitioned database which may
be on one single machine or several machines. The query is run in parallel.
Figure 9.12 shows a query that is subdivided into four parts that can be run
in parallel. In this scenario, the results are returned more swiftly as
compared to the query, if it were run in serial fashion on one single partition.

Figure 9.12: Inter-Partition Parallelism

One can also use intra partition parallelism and inter partition parallelism
simultaneously. This unique grouping provides two attributes of parallelism,
resulting in an even more impressive increase in the speed of processing of
Manipal University of Jaipur B1649 Page No. 217
Advanced Database Management System Unit 9

queries. Figure 9.13 shows a query in which both intra partition parallelism
and inter partition parallelism is implemented at the same time.

Figure 9.13: Simultaneous Implementation of Intra-Partition and


Inter-Partition Parallelism

Self Assessment Questions


11. Intra query parallelism is sometimes also referred as ______ .
12. One cannot employ intra partition parallelism and inter partition
parallelism at the same time. (True/False)

9.8 Inter Operation and Intra Operation Parallelism


Two complementary forms of intra query parallelism are interoperation
parallelism and intra operation parallelism. In this section, we will discuss
briefly about these two. We have already discussed about these two
concepts in section 9.4.

Manipal University of Jaipur B1649 Page No. 218


Advanced Database Management System Unit 9

Intra operation Parallelism: In intra operation parallelism, the


implementation of each single operation in query is parallelised. These
operations may be of sort, join, projection etc.
As you might have observed that the number of operation are small as
compared to the number of tuples/rows/records that needs to be treated by
each operation, hence this performs better with increasing parallelism.
Inter operation Parallelism: In case of inter operation parallelism, the
various operations in a query expression are implemented in parallel.
Self Assessment Questions
13. Intra operation parallelism performs better with increasing parallelism.
(True/False)
14. In ____________ parallelism, the various operations in a query
expression are parallelly executed.

Activity 2
Explore some of the practical applications of inter and intra operation
parallelism.

9.9 Design of Parallel Systems


Designing parallel systems is not an easy task. There exist many issues
associated to it. Some major issues in parallel systems designs are
discussed below:
 Parallel processing of data from outside sources is required in order to
manage large quantities of arriving data.
 Proper technique should be there to manage problem of data skew.
 Adaptability in case of failure of some disks or processors:
 The chances of any processor or disk failure are more in the parallel
system. The breakdown of any of the processor or single disk will
lead to problems in entire system.
 Operations even in case of degraded performance should be able to
execute in spite of breakdown or failure.
 The system must support schema changes and online data
reorganisation, like in the case of construction of index for terabyte
databases could take hours, days or even weeks on a parallel system.
Hence there is requirement for allowing other processes

Manipal University of Jaipur B1649 Page No. 219


Advanced Database Management System Unit 9

(insertions/updates/deletions) to be performed on relation even as index


is being constructed.
 There should be support for schema changes and online repartitioning
(concurrent processing).
Self Assessment Questions
15. The chances of any processor or disk failing is more in a parallel
system. (True/False)
16. There must be a suitable technique to manage ____________ skew
problem in parallel system design.

9.10 Summary
Let us recapitulate the important points of this unit:
 A parallel database can be defined as the database having many
memory areas that share one disk drive.
 Parallel database systems are based on the concept of “parallelism in
data management”.
 In a parallel database technology, more than one CPU is accessible to
an application, hence higher speed up and scale up can be achieved.
 With Parallel database, we can make it possible to get over the memory
constraint by empowering a single system to many users.
 Parallelism in Database Management System is available in various
forms.
 There are two forms of parallelism that can be applied to DBMSs:
Pipelined parallelism and Partitioned parallelism
 There are mainly three machine architectures available on which you
can run parallel DBMSs; Shared memory multiprocessor, shared nothing
multiprocessor and Shared-disk multiprocessor.
 Optimisation of Query evaluation is done by using parallel optimisation
methodologies for making repartitioning more competent.
 In parallel query processing, multiple processes together at the same
time to handle one individual SQL statement.
 In parallel query processing, the query is parallelised dynamically at the
execution time itself.

Manipal University of Jaipur B1649 Page No. 220


Advanced Database Management System Unit 9

 Every SQL statement goes through an optimisation and parallelisation


process where the parsing is done.
 An essential step in parallelisation is partitioning the rows to each query
server.
 The total quantity of parallel-execution servers that are linked with a
single operation is known as the ‘degree of parallelism’.
 I/O parallelism (Input/output parallelism) is the simplest type of
parallelism.
 I/O parallelism is also known as data partitioning.
 There are mainly three types of partitioning techniques; Round robin,
Hash partitioning and Range partitioning.
 Inter-query parallelism means executing different independent queries
simultaneously on separate CPUs.
 In Intra query parallelism a single large query is broken into a number of
pieces (subtasks) and those subtasks are executed in parallel on
multiple processors.
 Intra partition parallelism relates to the capacity to break up a query into
several parts within a single database partition and execute these parts
at the same time.
 Inter partition parallelism relates to the capacity to break up a query into
various parts across various partitions of a partitioned database which
may be on one single machine or several machines.

9.11 Glossary
 Degree of parallelism: The total number of parallel-execution servers
for a single operation is known as ‘degree of parallelism’.
 Intra-query parallelism: The ability to subdivide a single query into a
number of parts and replicate them concurrently using either intra
partition parallelism or inter partition parallelism or both.
 Parallel database systems: The database having various memory
regions and sharing one single disk drive.
 Parallel I/O: The process of reading from or writing to two or more I/O
devices concurrently.

Manipal University of Jaipur B1649 Page No. 221


Advanced Database Management System Unit 9

 Shared disk architecture: In this model, every processor has its own
memory and share storage disks only, through interconnection network.
 Shared memory multiprocessor architecture: In this model,
processors share both memory and storage disks.
 Shared nothing architecture: In this model, each processor has its
own individual memory and private disk storage which may be one or
more.

9.12 Terminal Questions


1. What do you mean by parallel database? What are the advantages and
disadvantages of parallel database?
2. What are the three machine architectures upon which parallel DBMS
run?
3. Discuss in detail the concept of I/O Parallelism.
4. What are the different types of partitioning techniques? Describe in
detail.
5. What is Intra-query and inter-query parallelism? Explain with help of a
diagram.
6. Differentiate between inter operation and intra operation parallelism.
7. What are the major design issues of parallel system?

9.13 Answers
Self Assessment Questions
1. True
2. (c) shared-nothing
3. True
4. granule
5. (c) Parent operations
6. Parallel-execution servers
7. True
8. Range
9. On-line transaction processing
10. True
11. Parallel query processing
12. False
13. True
Manipal University of Jaipur B1649 Page No. 222
Advanced Database Management System Unit 9

14. Inter operation


15. True
16. Data
Terminal Questions
1. A database that has numerous memory areas and sharing only one disk
drive. Refer Section 9.2 for more details.
2. Three machine architectures namely shared-memory, shared-disk and
shared-nothing are available on which to run parallel DBMS. Refer
Section 9.3for more details.
3. I/O Parallelism tries to minimise the time needed to recollect relations
from disk by relations partitioning on multiple disks. Refer Section 9.5for
more details.
4. There are three major partitioning methods: Round robin, Hash and
range partitioning. Refer Section 9.5for more details
5. Intra query parallelism denotes the process of one query in parallel on
multiple disc/processors. Refer Sections 9.6 and 9.7for more details
6. In intra operation parallelism, the working of each individual operation in
the query is parallelised. Refer Section 9.8for more details
7. The various design issues associated with parallel database are data
skew, resilience to failure, parallel loading of data, etc. Refer Section 9.8
for more details

References:
 Rob, P. & Coronel, C. (2004) Database Systems: Design,
Implementation, and Management, (Sixth edition), Thomson Learning.
 Silberschatz, Korth & Sudarshan (2006) Database System Concepts,
(Fourth edition), McGraw-Hill.
 Navathe, E. (2000) Fundamentals of Database Systems, (Third edition),
Pearson Education Asia.
E-references
 http://pages.cs.wisc.edu/~anhai/courses/764-sp07-anhai/paralleldb.pdf
 http://dcx.sybase.com/1200/en/dbusage/parallelism.html
 http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/
com.ibm.db2.udb.doc/admin/c0004557.htm

Manipal University of Jaipur B1649 Page No. 223


Advanced Database Management System Unit 10

Unit 10 Object Oriented DBMS


Structure:
10.1 Introduction
Objectives
10.2 Object Oriented Paradigm
10.3 OODBMS Architectural Approaches
Distributed client - server approach
Data access mechanism
Object clustering
Heterogeneous operation
10.4 Object Identity
10.5 Procedures and Encapsulation
10.6 Object Oriented Data Model
10.7 Relationships
10.8 Identifiers
10.9 Basic OODMS Terminology
10.10 Basic Interface and Class Structure
10.11 Type Hierarchies and Inheritance
10.12 Type Extents and Persistent Programming Languages
10.13 Summary
10.14 Glossary
10.15 Terminal Questions
10.16 Answers

10.1 Introduction
In the previous unit, you studied the concept of parallel database
architectures. You also studied the concept of parallel query evaluation,
parallelising individual operations, I/O Parallelism etc.
In today's world, Client-Server applications that rely on a database on the
server as a data store while servicing requests from multiple clients are
quite commonplace. Majority of these applications use Relational Database
Management System (RDBMS) as their data store simultaneously with an
object oriented programming language for development. This causes certain
inefficiencies as objects must be mapped to tuples in the database and vice

Manipal University of Jaipur B1649 Page No. 224


Advanced Database Management System Unit 10

versa instead of the data being stored in a way that is consistent with the
programming model. To overcome this problem Object Oriented Database
Management Systems (OODBMS) have been developed.
In this unit, you will study the concept of object oriented DBMS. You will
learn about object oriented paradigm and architectural approaches of
OODBMS. Also you will recognise the concept of object oriented data
model, OODBMS terminology, type hierarchies and inheritance. We will also
discuss the concept of type extents and persistent programming languages.
Objectives:
After studying this unit, you should be able to:
 discuss object oriented paradigm
 recognise OODBMS architectural approaches
 describe object identity, its procedures and encapsulation
 explain object oriented data model
 describe relationships and identifiers
 discuss basic OODBMS Terminology
 recognise basic interface and class structure
 explain the concept of type hierarchies and inheritance
 discuss the concept of type extents and persistent programming
languages

10.2 Object Oriented Paradigm


The Object Oriented model or paradigm relies on the encapsulation of the
data and code into one single unit. All the interactions among one object
and its system are performed through messages. Therefore, a set of
allowed messages defines the interface among an object and its system.
Generally, an object has association with:
 A group (set) of methods, where every method is a set of code to
execute every message. A method submits a value as reply of the
message.
 A group (set) of variables that include the object data. The measure of
every variable is by itself one object.
 A group (set) of messages and the object reacts to these messages.

Manipal University of Jaipur B1649 Page No. 225


Advanced Database Management System Unit 10

Now let us discuss the incentive of using messages & methods: As an


example consider employees as objects and annual-wage as message.
Every employee object reacts to the yearly-wage message but in dissimilar
calculations for managers, back-end employees, etc.
As the sole external interface portrayed by one object is pack of messages,
to which it reacts, it is feasible to:
 modify the variables and methods’ definition and not having any affect
on the remaining system
 substitute a variable with a method that calculates a value
The main benefit of the object oriented paradigm is the capability to modify
an object definition without having an effect on the remaining system. You
can classify the methods of any object as either ‘readonly’ or ‘update’. Also,
you can classify message as ‘readonly’ or ‘update’. You can express the
entity’s derived attributes in the E.R model as readonly messages.
Another major benefit of the object oriented paradigm is its ability to
understand easily. It facilitates natural illustration of real-world objects, their
mutual relationships and behaviour and is thus close to customers. An
object oriented application comprises of a set of objects with their own
private state, having an interaction among themselves. Object oriented
systems can be maintained easily since they are modular and objects are
independent of each other.
Other objects in the system should not be affected by change in one object.
Object oriented paradigm removes the requirement for shared data areas,
hence diminishing system coupling. The paradigm assists reusability.
Objects are self-reliant and may be utilised in other suitably similar
applications.
Self Assessment Questions
1. __________ are said to be self-reliant and may be utilised in other
suitably similar applications.
2. The methods of an object cannot be classified as either read-only or
update. (True/ False)

Manipal University of Jaipur B1649 Page No. 226


Advanced Database Management System Unit 10

10.3 OODBMS Architectural Approaches


Now you will learn about various architectural approaches relevant to an
OODBMS. These approaches are as follows:
 Distributed Client - Server Approach
 Data Access Mechanism
 Object Clustering
 Heterogeneous Operation
Let’s discuss them in detail.
10.3.1 Distributed client – server approach
Enhancements in technologies of local area network and workstation have
given rise to group design type applications fulfilling the need for OODBMS
For example, Electronic Offices, CASE, CAD, etc. OODBMS are usually
implemented in a multiple process distributed environment. Various services
of database are offered by server processes. These services may be
managing secondary storage, controlling transaction, etc.
Client processes manage application specific activities like utilisation and
updation of separate objects. These processes may be situated on the
same workstation or on dissimilar workstations. Usually, a single server will
communicate with numerous clients providing simultaneous requests for
data which is managed by that server. A client may interact with numerous
servers to use data distributed all through the network.
There are three different workstation-server architectures that have been
proposed for use with OODBMS. There are discussed as below:
Object server approach
An object is considered as the unit of transfer from server to client. Both
machines store objects and are competent of performing methods on
objects. Object-level locking is carried out easily.
The main disadvantage of this approach is the overhead related with the
server interaction needed to access each object.
Another disadvantage is the added complexity of the server software which
must offer whole OODBMS functionality.

Manipal University of Jaipur B1649 Page No. 227


Advanced Database Management System Unit 10

Page server approach


In this approach, we consider page as the unit of transfer from server to
client. The overhead of object access is decreased by the transfers of page
level since it does not need server interaction at all times. You can simplify
the architecture and implementation of the server as it needs only executing
the services of backend databases.
A probable disadvantage of this approach is that methods can be assessed
only on the client. Therefore all objects that an application uses must be
transported to the client. Here, it is difficult to implement object level locking.
File server approach
In this approach, the client processes of OODBMS have an interaction with
a network file service for reading and writing database pages. This approach
makes the process of the server implementation simpler because there is no
need to manage secondary storage. The main disadvantage of this
approach is that it requires two network interactions for accessing data.
From the three different approaches discussed above, the page server
approach provides buffer pools and efficient clustering algorithms. If large
amount of data is scanned by applications, the object server approach
performs badly, but is better as compared to the page server approach for
applications executing numerous updates and executing on workstations
with small buffer pools.
10.3.2 Data access mechanism
Assessment of Object oriented DBMS products should take into account the
procedure required to shift data from secondary store unit into a consumer
application. Usually this necessitates interaction with the server process,
probably across one network.
Objects that are stored into a consumer’s memory may need more
processing. The cost and procedure of releasing locks, and updated objects
that are returned to the server should be considered.
10.3.3 Object clustering
The process of transferring units larger as compared to an object is done
under the supposition that an access of an application to a specified object
signifies a high possibility that it may also access other related objects.
Manipal University of Jaipur B1649 Page No. 228
Advanced Database Management System Unit 10

When transferring number of objects, further server interaction may not be


required to assure these further object accesses.
Object clustering can be defined as the capability for an application to offer
information to the object oriented DBMS. This is done so that objects which
are usually accessed mutually can be accumulated close to each other and
therefore benefits from bulk transfers of data.
10.3.4 Heterogeneous operation
In this approach, an object oriented DBMS offers a method in which
applications can work together. This is done by sharing access to a common
group of objects. Numerous concurrent applications are supported by a
usual OODBMS, these applications are executed on numerous processors
which are connected through a local area network.
Frequently, the processors will be from dissimilar computer companies
where each company comprises its own data representation formats. To
make applications work together in this kind of an environment, data must
be converted to the representation format appropriate for the processor.
Then the data is accumulated enduringly by a server and momentarily by a
client who desires to access the data. To make object oriented DBMS an
efficient integration method, it must support data access in heterogeneous
processing surroundings.
Self Assessment Questions
3. In which of the following approach, the unit of transfer to client from
server is regarded as a page?
a) Page Server
b) File Server
c) Object Server
d) Blade server
4. You can define object clustering as the potential of an application to
offer information to object oriented DBMS. (True/ False)

Manipal University of Jaipur B1649 Page No. 229


Advanced Database Management System Unit 10

10.4 Object Identity


An identity of an object is maintained even when all or some values of
variables or even definitions of methods vary with time. The object identity
concept is essential in applications, however, it does not relate to relational
database tuples.
Object identity is considered as a powerful concept of identity as compared
to the ones that are usually seen in the programming languages or in the
data models which are not based on the object orientation.
Various forms of identity which exist are defined as below:
 Name: This signifies a user supplied name which is utilised for identity;
e.g., name of a file in the file system.
 Value: This signifies a value of data which is utilised for identity; e.g.,
primary key of one row in single relational database.
 Built-in: This signifies that an idea of identity is built (created) into
programming languages or the data model or and it does not need any
user supplied identifier.
You can implement object identity through an exclusive, system generated
object identifier (OID). The external user cannot see value of OID. However,
it can be utilised by a system internally to recognise every object in a unique
manner and to generate and handle the references of inter-object.
In numerous circumstances, the automatic generation of identifiers by the
system is considered as an advantage. This is because it does not require
people to do that task. But, people should make use of this ability with
caution.
Identifiers produced by the system are generally particular to the system. If
data are moved to another database system, then there is a need to
translate identifiers. If entities that are being modelled previously contain
distinctive identifiers which are from outside the system, then identifiers
produced by system may not be necessary.
Self Assessment Questions
5. The __________ of object identity can’t be seen by the user.
6. Value of OID can’t be viewed by the external user. (True/ False)

Manipal University of Jaipur B1649 Page No. 230


Advanced Database Management System Unit 10

10.5 Procedures and Encapsulation


Procedures are used to illustrate an object’s behaviour and they are also
known as functions or methods. The high level of abstraction can be viewed
by the user in an OODBMS. In OODBMS, data is encapsulated inside the
object.
This data can only be accessed by OODBMS procedures which are related
with that particular object. To be an appropriate OODBMS, it is necessary
for the database to comprise procedures beside just data.
Encapsulation basically signifies hiding the data inside the object from the
outside classes. Classes perform the encapsulation of the attributes and
behaviours of their objects.
By means of behaviour encapsulation, the users of the class are not allowed
to view the internal implementation of behaviour. This process offers some
amount of data independence in order that users are not required to be
modified when behaviour implementations are modified. Attributes of a class
may or may not be encapsulated.
Changing the definition of the attributes of a class that are not encapsulated
needs variation of all users that use them. The attributes that users of a
class cannot use are encapsulated. Attributes that are encapsulated
generally comprise of behaviours that offer some kind of access to the
attribute by the users. Variations to these attributes usually do not need
variation to users of the class.
Self Assessment Questions
7. Which of the following process hides the internal data of the object from
the outside classes?
a) Implementation
b) Encapsulation
c) Attribute hiding
d) Inheritance
8. By means of behaviour encapsulation, the users of the class are
allowed to view the internal implementation of behaviour. (True/False)

Manipal University of Jaipur B1649 Page No. 231


Advanced Database Management System Unit 10

10.6 Object Oriented Data Model


A data model is defined as an organisation of the real world objects
(entities), restrictions on them, and the relationships between the objects. A
database language is considered as a tangible syntax for a data model. A
data model is implemented by the database system.
Object oriented data model comprises various Object Oriented concepts:
1. Object and object identifier: Entity of any real world is known as an
object (related with a unique id: utilised to refer to an object for retrieval).
In object oriented databases, OID identifies the objects uniquely. OID
format is particular for every system.
2. Attributes and methods: Each object comprises of a state and
behaviour, where a state is the group of values for the object’s attributes
and behaviour is the group of methods, that is, code which functions on
the state of the object. Only clear message passing can access or
invoke the state and behaviour which are encapsulated in an object.
[An instance variable whose domain can be any class, that is, user-
defined or primitive is known as an attribute.]
3. Class: A class includes a set of all objects, which partake (share) the
same group of methods and attributes. Object must relate to just one
class. An object is considered as example for that class.
4. Class hierarchy and inheritance: This includes deriving a new class
(which is known as subclass) from a current class (which is known as
superclass). All the methods and attributes of the current class are
inherited by a subclass. Also subclass may comprise additional
attributes &methods. The concept of single inheritance (class hierarchy)
and multiple inheritances will be discussed further in this unit.
Self Assessment Questions
9. You cannot use database system to implement a data model.
(True/ False)
10. Which of the following can be defined as the group of values for the
object’s attributes?
a) State
b) Class
c) Behaviour
d) Method
Manipal University of Jaipur B1649 Page No. 232
Advanced Database Management System Unit 10

10.7 Relationships
Relationships are one of the significant constituents of the Object Oriented
paradigm. Relationships permit objects to consider each other and effect in
networks of inter-related objects. Relationships are considered as the paths
utilised to carry out navigation-based data access.
The capability to directly and proficiently display relationships is one of the
main enhancements of the Object Oriented data model over the relational
data model. This decreases data independence by depending on the
occurrence of particular relationships and indexes.
Theoretically, you can consider relationships as abstract entities that permit
objects to refer to each other. An OODBMS may select to symbolise
relationships as attributes of the class (from which the relationships
originate), as independent objects (where case relationships may be
extensible and permit attributes to be added to a relationship), or as hidden
data structures connected to the owning object in some way.
We frequently call relationships as references, associations, or links. At
times, we use the term relation to signify the schema definition of the
potential for inter-connections among objects. We use the term relationship
to signify actual incidences of an inter-connection among objects. Here, the
term relationship is used interchangeably for both the schema definition as
well as the object level existence of connections among objects.
Though we can discover much regarding an object by observing its
attributes, at times a significant fact regarding an object is the manner in
which it connects to other objects in the same or another class.
Example: Let us consider a class known as movie. We have given below the
declarations of four attributes that are comprised by all movie objects.
1. class Movie {
2. attribute string Name ;
3. attribute integer Year ;
4. attribute integer length ;
5. attribute enum Film {colour, black and white } film Type;
Now, assume that you want to add a property (that is a set of stars) to the
declaration of the Movie class. More specifically, we would like to connect
Manipal University of Jaipur B1649 Page No. 233
Advanced Database Management System Unit 10

each Movie object to the set of Star objects. The best manner to symbolise
this connection among the classes, Movie and Star, is with a relationship.
This relationship can be represented in Movie by the following line:
relationship Set<Star> stars;
The above line is represented in the declaration of class Movie. This line
may emerge after any of the lines numbered (1) to (5). It signifies that in
every object of class Movie, there is a group of references to Star objects.
The set of references is known as stars. Here, the keyword relationship
indicates that stars enclose references to other objects, whereas the
keyword Set previous to<Star> indicates that stars refers to a set of Star
objects, instead of a single object.
Self Assessment Questions
11. Relationships can be considered as __________entities that permit
objects to refer to each other.
12. Which of the following component of Object Oriented paradigm is used
to carry out navigation-based data access?
a) Identifiers
b) Relationships
c) Inheritance
d) Attributes

Activity 1
Illustrate the concept of relationships in OODBMS with example.

10.8 Identifiers
Object identifiers can uniquely identify objects.
 You can store object identifiers as a field of an object, and they are
referred to another object. For example, the field of a person object
named as spouse can be considered as an identifier of another person
object.
 Object identifiers can be considered as system generated (that is
produced by database) or external (such as social-security number).

Manipal University of Jaipur B1649 Page No. 234


Advanced Database Management System Unit 10

To identify the object in some of the systems, only 4 byte with object
position or object index in file is sufficient. However, in some other systems,
object identifiers are considered to be more complicated and maintain
exclusiveness even outside the local computer’s range.
Self Assessment Questions
13. Object identifiers cannot be accumulated as a field of an object. (True/
False)
14. For identifying the object in some systems, only __________ bytes with
object index or object position in the file is sufficient.

10.9 Basic OODMS Terminology


OODBMS Terminology includes the following:
 Object Identity: We have already discussed this concept as above.
 Classes: A class defines the data values accumulated by an object of
that class. Every object is related to just one class. An object is
frequently considered as an instance of a class. Specification of a class
offers the outside view of class’s instances. In case of OODBMS, the
class construct is usually utilised to define the schema of a database.
Make note that some object oriented databases make use of the term
type in place of a class. The objects that are to be accumulated inside
the database are defined by OODBMS.
 Encapsulation: We have already discussed this concept in the section
above.
 Inheritance: Inheritance is defined as the process where the behaviour
and properties of parent object are inherited by the child object. If class
‘A’ is derived from class ‘B’, all methods of class ‘B’ are inherited by
class ‘A’. Also it can be applied all over where class ‘B’ can be
employed.
Self Assessment Questions
15. Object is considered as an __________ of a class.
16. What do you call the process where the behaviour and properties of
parent object are inherited by the child object?
a) Encapsulation
b) Inheritance
c) Polymorphism

Manipal University of Jaipur B1649 Page No. 235


Advanced Database Management System Unit 10

d) Message passing

10.10 Basic Interface and Class Structure


An interface is used to illustrate the behaviour or ability of a class without
performing to a specific implementation. Interface symbolises an agreement
between a supplier and its users, which defines what's needed from every
implementer. This is done in terms of the services they must offer,
regardless of how they handle to do it.
Declaration of interfaces and classes is comparable to C++ and java syntax,
but not quite the same. But the restrictions of a class or interface declaration
are taken directly from C++. We have shown the declaration as below.
Class class_name
{
// class methods
};
Interface interface_name
{
// interface methods
};
We begin every declaration with either the keyword class or interface to
recognise the element which is being declared. After writing the keyword,
we write the name of the interface or class. This is to note that the class or
interface names start with uppercase letters.
If one or more interfaces are implemented by a class, a separation from a
class name is provided between those interfaces by a colon:
Class class_name : interface interface_name
{
//class methods
};

Manipal University of Jaipur B1649 Page No. 236


Advanced Database Management System Unit 10

Now, if a class inherits from a superclass, it extends the class, as shown


below:
Class class_name extends superclass_name:interface_name
{
//class methods
};
Self Assessment Questions
17. If you want to implement one or more interfaces by a class, then a
separation between those interfaces from a class name is provided by
a __________.
18. An interface illustrates the behaviour or ability of a class without
performing to a specific implementation. (True/ False)

10.11 Type Hierarchies and Inheritance


You can specify the object types by means of a type hierarchy. Type
hierarchy permits the inheritance of both attributes and methods of types
defined earlier.
In the simplest manner, a type could be defined by providing it a type name
and after that giving the names of its public (visible) functions.
An object oriented database usually needs numerous classes. Frequently,
however, various classes are analogous. For instance, bank employee is
analogous to consumer.
To permit the direct demonstration of similarities between classes, it is
required to put classes in a specialisation hierarchy. For example we have
shown a specialisation hierarchy for E-R model in Figure 10.1.

Manipal University of Jaipur B1649 Page No. 237


Advanced Database Management System Unit 10

Person

ISA

Employee Customer

ISA

Officer Teller Secretary

Figure 10.1: Specialisation Hierarchy for Banking Example

Class hierarchy concept is said to be analogous to the specialisation


hierarchy. We have shown the consequent class hierarchy in Figure 10.2.

Person

Employee Customer

Officer Teller Secretary

Figure 10.2: Class Hierarchy matching the Banking Example

We can define the class hierarchy in pseudo-code. Now we will show the
definition of class hierarchy in pseudo code. Also we have shown the
variables related with each class. This is shown as below:

Manipal University of Jaipur B1649 Page No. 238


Advanced Database Management System Unit 10

Class person
{
String name; string address:
};
Class customer isa person
{
Int credit-rating:
};
Class employee isa person
{
Date start-date; int salary:
};
class officer isa employee
{
Int office-number; int expense-account-number:
};
class teller isa employee
{
Int hours-per-week; int station-number:
};
class secretary isa employee { int hours-per-week; int manager:
};

Now let us define subtypes and supertypes.

Manipal University of Jaipur B1649 Page No. 239


Advanced Database Management System Unit 10

When a new type is generated by the user that is analogous but not the
same to a previously defined type, it is known as a subtype. All the functions
of the subtype are inherited by a supertype.
The supertype which is situated at the top of the type hierarchy includes a
set of fields that is inherited by all related subtypes. Before creating a
subtype, an existence of supertype is must.
We use the keyword ‘isa’ to signify that one class is considered as a
specialisation of another class. We call the specialisation of a class as sub-
classes. For example, a sub-class of person (super-class) can be employee;
a subclass of employee (super-class) can be cashier (or teller). On the other
hand, employee is considered as a super-class of cashier (or teller).
The concept of substitutability is considered as a significant benefit of
inheritance in object oriented systems. Consider a class ‘A’. You can call
any method of a class, ‘A’, by means of an object relating to any subclass
‘B’ of ‘A’. Thus you can reuse the code. For example, there is no need to
rewrite the methods &functions in class ‘A’ (like getname in class person) for
objects of class ‘B’.
There are two probable methods of relating objects with non-leaf classes:
 Associate all employee objects comprising the instances of officer, teller,
and secretary. with the employee class
 Associate merely those employee objects that are instance neither teller
nor officer, nor secretary, with the employee class.
Usually the second option is made in object oriented systems. In this case, it
is possible to identify the set of all employee objects by combing those
objects related with all classes in the subtree rooted at employee.
Many object oriented systems permit specialisation to be partial, that is, they
permit objects that relate to a class like employee that do not relate to any of
subclasses of that class.
Multiple inheritance
The capability of class to inherit the variables and methods from numerous
superclasses is known as Multiple inheritance.

Manipal University of Jaipur B1649 Page No. 240


Advanced Database Management System Unit 10

Mostly, tree-structured organisation of classes is sufficient to illustrate


applications. There are circumstances that can’t be displayed substantially
in one tree structured class hierarchy.
An example: In Figure 10.3, we have created subclasses like part-time-
secretary, full-time-secretary, etc. But there are some problems:
(a) Redundancy results in possible irregularities on updates
(b) The full time /part time employees which are not considered as
secretaries and tellers both cannot be demonstrated in the hierarchy.

Person

Employee Customer

Officer Teller Secretary

Full-time Teller Part-time Teller

Full-time Secretary Part-time Secretary

Figure 10.3: Class Hierarchy for Part Time and Full Time Employees

In Figure 10.4, we have shown the relationship of class and subclass. This
is shown by a rooted DAG (directed acyclic graph) where a class might
comprise of extra superclass than one.

Person

Employee
Customer

Full-time Part-time Teller Secretary

Full-time Part-time Full-time Part-time


Officer Teller Teller Secretary Secretary

Figure 10.4: Class Directed Acyclic Graph for Banking Example

Manipal University of Jaipur B1649 Page No. 241


Advanced Database Management System Unit 10

Dealing with name conflicts


On the utilisation of multiple inheritance, there is possible uncertainty that
whether you can inherit the same method or variable from more than 1
superclass.
Example: Consider the example of banking, where a variable pay is defined
for each full-time, part-time, teller & secretary as given below:
 full-time: salary is one integer from 0-100000 comprising of yearly
wages.
 part-time: salary is one integer from 0-30 comprising of an per-hour rate
of wages.
 teller: salary is one integer from 0-15000 comprising of the yearly
wages.
 secretary: salary is one integer from 0-20000 comprising of the yearly
wages.
You can inherit the description of salary from either from part-time or from
secretary. In case of part-time secretary, we have the following alternatives:
 Comprise both variables, renaming them to part-time-pay and secretary-
pay.
 Select one or the other depending on the order of creation.
 Tell the consumer to make an option at the time of class definition.
 Consider the situation as an error.
Till now, no solution has been considered as best, and different systems
make different selections.
All the cases in multiple inheritances do not lead to uncertainty. If, rather
than defining salary, we maintain the variable salary definition in class
employee, & do not define it anywhere, then salary is inherited by all the
subclasses from employee (no uncertainty).
Multiple inheritances can be used to illustrate the concept of roles. For
example, consider the subclasses student, teacher and football Player. For
these subclasses an object can relate to numerous categories at once and
we call each of these categories as a role.
Multiple inheritances can be used to create subclasses, like student-teacher,
student-football-player, etc. to illustrate the chances of an object
concurrently comprising multiple roles.

Manipal University of Jaipur B1649 Page No. 242


Advanced Database Management System Unit 10

Self Assessment Questions


19. A class can be considered as a specialisation of another class by using
the keyword __________.
20. The object types can be specified by means of a type hierarchy. (True/
False)

10.12 Type Extents and Persistent Programming Languages


In many object oriented databases, the collection of objects in an extent
comprises of the same type or class. But, since types are supported by most
of the object oriented databases, it is assumed that extents are collections
of objects of the same type.
Persistent data is defined as the data that continue to occur even after
program that generated it, has finished.
A programming language which is expanded by means of constructs to
manage persistent data is known as a persistent programming language. It
is differentiated from embedded SQL by the following ways:
 In the language of a persistent program, language is completely
incorporated with host language and same type of system is used by
both. Any changes in format that are needed in the databases are
performed clearly.
 By means of Embedded SQL, the programmer is accountable for writing
unambiguous code to retrieve data from memory or send data back to
database.
 In the language of persistent program, one programmer could use
persistent data without writing such code in an unambiguous manner.
Disadvantages of persistent programming languages
 Persistent programming language is influential; however, it is easy to
make mistakes that can harm the database.
 Performing routine high level optimisation is tough.
 Declarative querying is not supported properly.
Self Assessment Questions
21. A programming language which is expanded by means of constructs to
manage persistent data is known as a __________.

Manipal University of Jaipur B1649 Page No. 243


Advanced Database Management System Unit 10

22. Performing routine high level optimisation in persistent programming


language is tough(True/ False)

Activity 2
Explain the concept of handling name conflicts in multiple inheritance.
Give suitable examples.

10.13 Summary
Let us recapitulate the important points discussed in this unit:
 The Object Oriented model or paradigm relies on the encapsulation of
the data and code into one single unit.
 A set of allowed messages defines the interface among an object and its
system.
 Enhancements in technologies of local area network and workstation
have given rise to group design type applications fulfilling the need for
OODBMS.
 An object is considered as the unit of transfer from server to client.
 In page server approach, we consider page as the unit of transfer from
server to client.
 In file server approach, the client processes of OODBMS have an
interaction with a network file service for reading and writing database
pages.
 The ability of class to inherit the methods and variables and from several
superclasses is multiple inheritance.
 A programming language which is expanded by means of constructs to
manage persistent data is known as a persistent programming
language.

10.14 Glossary
 Data model: It is an organisation of the real world entities, restrictions
on them, & the relationships between the objects.
 Directed acyclic graph (DAG): It is a directed graph with no directed
cycles. It is formed by a collection of vertices and directed edges, each
edge linking one vertex to another.

Manipal University of Jaipur B1649 Page No. 244


Advanced Database Management System Unit 10

 Inheritance: Inheritance is defined as the process where the behaviour


and properties of parent object are inherited by the child object.
 Multiple inheritance: The capability of class to inherit the variables and
methods from numerous superclasses is known as Multiple inheritance.
 Non-leaf classes: Non-leaf classes are abstract and can have further
subclasses (child classes).
 Persistent programming language: A programming language which is
expanded by means of constructs to manage persistent data is known
as a persistent programming language.
 Relationships: Relationships permit objects to consider each other and
effect in networks of inter-related objects.

10.15 Terminal Questions


1. Explain various architectural approaches of OODBMS.
2. Illustrate the concept of object oriented data model.
3. Discuss the concept of Type Hierarchies and Inheritance with example.
Also illustrate multiple inheritance.
4. What is persistent programming language? How can it be differentiated
with embedded SQL? Illustrate.
5. Discuss the declaration of interfaces and classes. Also illustrate how to
implement one or more interfaces by a class.

10.16 Answers
Self Assessment Questions
1. Objects
2. False
3. Page server
4. True
5. Value
6. True
7. Encapsulation
8. False
9. False
10. State
Manipal University of Jaipur B1649 Page No. 245
Advanced Database Management System Unit 10

11. Abstract
12. Relationships
13. False
14. Four
15. Instance
16. Inheritance
17. Colon
18. True
19. isa
20. True
21. Persistent programming language
22. True
Terminal Questions
1. The various architectural approaches of OODBMS include Distributed
Client - Server Approach, Data Access Mechanism, Object Clustering,
and Heterogeneous Operation. Refer Section 10.3 for more details.
2. Object oriented data model include various object oriented concepts
such as attributes & methods, objects & objects identifiers, class
hierarchy & inheritance. Refer Section 10.6 for more details.
3. Type hierarchy allows the inheritance of both attributes & methods. The
capability of class to inherit the variables and methods from numerous
superclasses is known as Multiple inheritance. Refer Section 10.11 for
more details.
4. A programming language which is extended with constructs to manage
persistent data is known as a persistent programming language. Refer
Section 10.12 for more details.
5. An interface is used to illustrate the behaviour or ability of a class
without performing to a specific implementation. Refer Section 10.10 for
more details.

References:
 Prabhu, C.S.R. (2005), Object Oriented Database Systems, (2nd Ed.),
PHI Learning Pvt. Ltd.

Manipal University of Jaipur B1649 Page No. 246


Advanced Database Management System Unit 10

 Khoshafian S. (1993), Object Oriented databases,(1stEd.), John Wiley.


E-references
 http://fria.fri.uniza.sk/~kmat/dbs/oodbs/OODBS1b.htm.
 www.cs.cityu.edu.hk/~jfong/cs3462/Lectures/Lecture9.ppt.

Manipal University of Jaipur B1649 Page No. 247


Advanced Database Management System Unit 11

Unit 11 Distributed Databases


Structure:
11.1 Introduction
11.2 Introduction of Distributed Databases
DDBMS architectures
Functions of distributed database management system
Components of distributed database management system
11.3 Homogeneous and Heterogeneous Database
11.4 Distributed Data Storage
Data fragmentation
Data replication
11.5 Advantages and Disadvantages of Data Distribution
Advantages of data distribution
Disadvantages of data distribution
11.6 Distributed Transaction
11.7 Commit Protocols
Components of atomic commit
Two phase commit
11.8 Concurrency Control
11.9 Recovery of Distributed Database
11.10 Directory Systems
11.11 DDBMS Transparency Features
11.12 Distribution Transparency
11.13 Summary
11.14 Glossary
11.15 Terminal Questions
11.16 Answers

11.1 Introduction
In the previous unit, you studied about Object Oriented DBMS. You read
about the various OODBMS architectural approaches. You also read about
object identity, procedures, encapsulation, relationship, identifiers and
inheritance. You became familiar with basic interface and class structure,
type hierarchies, type extent, persistent programming languages and
OODBMS storage issues. In this unit we will study about distributed
databases.

Manipal University of Jaipur B1649 Page No. 247


Advanced Database Management System Unit 11

Distributed database technology is expected to have a significant impact on


data processing in the upcoming years. With the introduction of commercial
products, expectations are that distributed database management systems
will by and large replace centralised ones within the next decade.
This unit explains the role, technologies, and unique database design
features of distributed databases. The goal and trade-offs for distributed
databases, data replication uses, advantages and disadvantages of
distribution databases, and distributed transaction and data storage are
covered. This unit contains a thorough coverage of database concurrency
and recovery in data distribution.
Objectives:
After studying this unit, you should be able to:
 explain the DDBMS Architecture
 discuss homogeneous and heterogeneous database
 explain distributed data storage
 explain data fragmentation and data replication
 identify and demonstrate advantages and disadvantages of data
distribution
 explain distribution transaction
 discuss commit protocols
 demonstrate concurrency control and recovery in distributed databases
 identify directory system
 explain distributed database transparency features
 discuss distribution transparency

11.2 Introduction of Distributed Databases


A database that physically resides entirely on one machine under a single
DBMS is known as local database management system. Database
management system that resides entirely on a machine different from that of
the user connected through a network is known as remote database.
In either case, the entire database is controlled by a single site and hence is
known as Centralised Database System. In contrast to this a database may
be fragmented and each of its fragments is stored on different machines
connected through network(s) or is controlled by different DBMSs or
operates under different operating systems. Such a multiple-source and
multiple-location database is called distributed database.
Manipal University of Jaipur B1649 Page No. 248
Advanced Database Management System Unit 11

Usually, distributed database is a set of several logically consistent


databases which are spread over a computer network. Distributed Database
Management System (DDBMS) is software system that handles the
distributed database system for making the distribution clear to user.
The user is not aware about the database which is fragmented. A
Distributed Database Management System makes sure that the users
access the distributed database.

11.2.1 DDBMS architectures


Generally, the distributed database contains a pool of sites, every one of
which has a local database system (Figure 11.1). Each one of the sites is
capable of processing local transactions; i.e. the transactions which access
data in that particular site only. Additionally, a site may participate in the
processing of overall transactions, those transactions that access data is
various sites. The processing of universal transactions needs
communication between the sites usually through a network.

Figure 11.1: Distributed Database Architecture

The sites within system can be linked physically in numerous ways. Different
topologies are shown as graphs whose nodes match up to sites. Direct
connection between the two sites is given by a link from node ‘A’ to node ‘B’.
Precisely how a database is distributed is known from its configuration and
how they differ from each other is shown in following aspects:
 Installation Cost: It is the cost of connecting the sites physically in
system.

Manipal University of Jaipur B1649 Page No. 249


Advanced Database Management System Unit 11

 Communication Cost: Cost of time required and total expenditure to


transmit a message from site ‘A’ to site ‘B’.
 Reliability: Reliability means the frequency of failures in link or sites.
 Availability: The level at which data could be accessed in spite of the
malfunction of some sites or links.
These differences perform a vital part in selecting the suitable mechanism
for managing the allocation of data.
The contributing or collaborating sites of one distributed database might be
spread physically on a huge geographical region through networks. For
example,
 the all-Indian state capitals or a small physical region
 one solo building or many adjoining buildings
The first type of network is known as wide area network, while the latter is
known as a local area network.
The links of a network between its nodes may be of different patterns known
as its topology. Some of the network topologies are depicted in Figure 11.2.

A B
B B Mesh Network

F
Star network A C

E
E D D

A A
B
Tree Network
E
F B
C
D
F
E
C
D
Ring Network

Figure 11.2: Network Topologies

Manipal University of Jaipur B1649 Page No. 250


Advanced Database Management System Unit 11

11.2.2 Functions of distributed database management system


Distribution results in enhanced complexity in system design and
implementation. To obtain the maximum advantages of DDBMS; the
DDBMS software should be able to offer the following functions besides
those of a centralised DBMS:
 Tracking of data: The ability to track data distribution, replication and
fragmentation by enlarging the DDBMS catalogue.
 Distributed query processing: The capability to transmit queries and
access remote sites between different sites via communication network.
 Distributed transaction management: Capability to plan
implementation strategies for transaction processes and queries from
multiple sites and to coordinate access to data and sustain reliability of
overall database.
 Replicated data management: The capability to choose which copy of
one copied data to access and to sustain the reliability of copy of
replicated data.
 Distributed database recovery: The capability to recover data from
case-by-case crashes and from different types of malfunctions, for
example; breakdown of communication link.
 Security: Distributed transactions should be carried out with adequate
security management of data and the access/authorisation rights of
users.

11.2.3 Components of distributed database management system


A DDBMS has a lot of components linked together. Some of the
components that a DDBMS should have are:
 Sites or nodes (workstations): The end users machines (mostly PCs)
that form the network. The distributed database system is independent
of the hardware of the workstations.
 Network hardware and software: Each workstation should have
essential hardware and software that allow them to establish a network
with other components on the distributed database system. The DDB
system should be independent of the network type of each workstation.

Manipal University of Jaipur B1649 Page No. 251


Advanced Database Management System Unit 11

 Transaction processor (TP): Every data-requesting workstation should


have this software component that receives & processes the request for
data (local or remote). Data access is transparent to the user.
Transaction Processor is also called Transaction Manager (TM) or
Application Processor (AP).
 Data processor (DP): It is a software component on every computer in
the distributed database system. This component stores & retrieves data
located on that single site. It is also known as Data Manager (DM).
Self Assessment Questions
1. Distributed database system consists of a collection of _________ ,
every one of which has a local database system.
2. The consumer is aware about the database which is fragmented.
(True/ False)
3. The most widespread channels are ________ base band coaxial, fibre
optics and broadband coaxial.
4. Which of the following is the basic component of Distributed Database
Management System?
(a) Data Processor
(b) Data Definition
(c) Data warehousing
(d) Data Mining

Activity 1
Analyse how DDB system is independent of the network type of each
workstation.

11.3 Homogeneous and Heterogeneous Database


When the database technology is the same at each of the locations and the
data at the several locations are also compatible, that data is known as
homogenous database. The following conditions should exist for
homogeneous database:
 The operating system used at each of the locations is the same or at
least they should be extremely compatible.
 The data models used at every location should be the same.

Manipal University of Jaipur B1649 Page No. 252


Advanced Database Management System Unit 11

 The database management systems used at every location should be


the same or at least they should be extremely compatible.
 The data at the different locations should have common definitions and
formats.
Homogenous database make the sharing of data between the different
users simpler. This signifies the design goal for the distributed database.
Achieving this objective needs a very high level planning during the planning
phase.
Heterogeneous database: In heterogeneous DDBMS, every one of the site
might manage different types of DBMS wares, which does not need to be
established on the similar original data model and so the system should be
made of RDBMS, OODBMS & ORDBMS products.
 In heterogeneous database, contact among various DBMS is needed for
translations.
 So as to give DBMS transparency, users should be able to make
requests in DBMS language at the local site.
 Data from other sites might have a variety of hardware, diverse DBMS
products and mixture of a variety of hardware & DBMS products.
 The job of finding these data & executing any essential translation are
the capabilities of the heterogeneous DDBMS.

Self Assessment Questions


5. Homogenous database allows sharing of data among different
________ simpler.
6. In which type of database, communication between various DBMS is
required for translations?
(a) Relation Database
(b) Heterogeneous Database
(c) Flat-file database
(d) Operational Database

11.4 Distributed Data Storage


A distributed data store means either the Distributed Database where users
store their information on a number of nodes, or a network in which a user

Manipal University of Jaipur B1649 Page No. 253


Advanced Database Management System Unit 11

stores their information on a number of peer network nodes. It provides the


following functions:
 Replication
 System keeps several copies of data, stored at various sites, for
quick retrieval fault tolerance.
 Fragmentation
 Relation is divided into various fragments stored at separate sites

 Replication & fragmentation can be united


 Relation is divided into various fragments: system keeps and
maintains several similar replicas of each of these fragments.
Data Fragmentation and Data Replication are explained in detail below:

11.4.1 Data fragmentation


It is clear that in the distributed database system the database is broken into
smaller pieces. Here we will discuss the techniques that are used to break
up the database into logical units, known as fragments, which may be
assigned for storage at the various sites.
If a relation N is fragmented, N is split into numerous fragment relations N1,
N2…... Nn. These fragments consists enough information to rebuild the
primary relation N. Such reconstruction could happen by the application of
union operation or also by one special kind of join operation on different
fragments depending on how they were obtained from the original relation.
Of many methods of fragmentation, two of them shall be discussed here:
horizontal fragmentation & vertical fragmentation.
For illustration purposes, let us consider the customer relation CUSTOMER
of some company:
CUSTOMER (CUS_ID, CUS_NAME, CUS_STATE, CUS_ DEPOSIT,
CUS_BALANCE, CUS_RATING, CUS_DUE)

Manipal University of Jaipur B1649 Page No. 254


Advanced Database Management System Unit 11

A sample instance of the CUSTOMER relation is shown in Table 11.1


below:
Table 11.1: Sample Customer Relation

Horizontal fragmentation: Under this fragmentation scheme, a able (or


relation) N is partitioned and subsets, N1, N2, N3 ….....are created. Each
subset Ni (i=1, 2, 3…) are composed of a number of tuples (of relation N).
Every tuple of relation N should be of the fragments, in order that the
primary relation can be rebuild when needed.
Fragment might be described as one selection on global relation N. That is,
the union of all the fragments must be able to generate the original relation.
In our sample relation CUSTOMER, suppose that each state headquarters
requires data belonging to that state only. Therefore, the relation can be
horizontally fragmented as shown in table 11.2 below:
Table 11.2: Horizontally fragmented Relation

Manipal University of Jaipur B1649 Page No. 255


Advanced Database Management System Unit 11

The three resulting fragment relations are shown in Table 11.3 below:
Table 11.3: Three Resulting Fragment Relations

Vertical fragmentation: Vertical fragmentation is similar as decomposition.


Vertical fragmentation of a relation or a table can be acquired by dividing the
table into a number of sub-tables having disjoint columns.
Relation N can be reconstructed from the fragments by taking the natural
join operation. Suppose, now that the company is divided into two
departments – customer department and collection department. The two
departments are concerned with their respective data only. Therefore, the
relation CUSTOMER can be vertically fragmented into two fragments as
given in Table 11.4 below:

Manipal University of Jaipur B1649 Page No. 256


Advanced Database Management System Unit 11

Table 11.4: Example of Vertical Fragmentation

Fragment name Location Node name Attributes


CUS_DEPT Customer Office CUS CUS_ID,
CUS_NAME,
CUS_STATE
COL_DEPT Collection Office COL CUS_ID,
CUS_DEPOSIT,
CUS_BALANCE,
CUS_RATING,
CUS_DUE

The resulting two fragment relations are given in Table 11.5:


Table 11.5: Resulting Fragment Relations

Fragment name: CUS_DEPT


Location: Customer Office
Node:CUS
CUS_ID CUS_NAME CUS_STATE
10 Puranchand Haryana
11 Rohit Punjab
21 Ramlal Haryana
23 Pankaj Bihar
33 Rahul Punjab
43 Satbir Haryana

Fragment name: COL_DEPT Location: Collection Office Node: COL


CUS_ID CUS_DEPOSIT CUS_BALANCE CUS_RATING CUS_DUE
10 3000 2000 3 1000
11 4000 3000 2 1500
21 2000 190 3 280
23 2300 230 3 320
33 3300 450 2 400
43 4500 1000 1 900

Usually, vertical fragmentation can be achieved by addition of a particular


attribute called ‘tuple-id’ to the scheme N (CUS_ID in our case). A ‘tuple-id’
is a logical or physical address for one tuple. As each tuple in N must have
an exclusive address, the ‘tuple-id’ feature is key for augmented scheme.

Manipal University of Jaipur B1649 Page No. 257


Advanced Database Management System Unit 11

To rebuild the initial deposit relation from fragments, we calculate


CUSTOMER = (CUS_DEPT_ COL_DEPT)

Note that the phrase (CUS_DEPT  COL_DEPT) is unique form of the


natural join. The join trait is CUS_ID. Since the tupled-value constitutes an
address, it’s possible to join a tuple of CUS_DEPT with matching tuple of
COL_DEPT by means of address provided by the CUS_ID value. This
address permits straight retrieval of the tuple with no need for the index.
Therefore, this natural join could be worked out much more effectively than
the typical natural joins.
Even though, the tuple-id characteristic is essential in the execution of
vertical portioning, it is vital that this trait is not seen by the users. If the
users are provided access to tuple-ids, then it is impossible for the system to
modify tuple addresses. In addition, the availability of the internal addresses
goes against the concept of data independence, which is one of the major
qualities of relational model.
Mixed fragmentation: A relation can also be fragmented both vertically and
horizontally. In such cases both fragmentation norms are specified.
Suppose in our example of the CUSTOMER relation, we need each
department data individually in two different offices in the state
headquarters. The required fragments will be as shown in table 11.6 below:
Table 11.6: Example of Mixed Fragmentation
Fragment name Location Horizontal criterion Node name Attributes
CUS_BHR_CUS Patna CUS_STATE="Bihar" BHRCUS CUS_ID,
CUS_NAME,
CUS_STATE
CUS_BHR_COL Gaya CUS_STATE="Bihar" BHRCOL CUS_ID,
CUS_DEPOSIT,
CUS_BALANCE,
CUS_DUE
CUS_HAR_CUS Hisar CUS_STATE="Haryana" HARCUS CUS_ID,
CUS_NAME,
CUS_STATE
CUS_HAR_COL Karnal CUS_STATE="Haryana" HARCOL CUS_ID,
CUS_DEPOSIT,
CUS_BALANCE,
CUS_DUE
CUS_PUN_CUS Amritsar CUS_STATE="Punjab" PUNCUS CUS_ID,
CUS_NAME,
CUS_STATE
CUS_PUN_COL Bhatinda CUS_STATE="Punjab" PUNCOL CUS_ID,
CUS_DEPOSIT,
CUS_BALANCE,
CUS_DUE

Manipal University of Jaipur B1649 Page No. 258


Advanced Database Management System Unit 11

The resulting fragments are shown in Table 11.7.


Table 11.7: Resulting Fragments

Fragment name: CUS_BHR_CUS


Location: Patna
Node:BHRCUS
CUS_ID CUS_NAME CUS_STATE
23 Pankaj Bihar

Fragment name: Location: Gaya Node:


CUS_BHR_COL BHRCOL
CUS_ID CUS_DEPOSIT CUS_BALANCE CUS_RATING CUS_DUE
23 2300 230 3 320

Fragment name: CUS_HAR_CUS


Location: Hisar
Node:HARCUS
CUS_ID CUS_NAME CUS_STATE
10 Puranchand Haryana
21 Ramlal Haryana
43 Satbir Haryana

Fragment name: Location: Karnal Node: HARCOL


CUS_HAR_COL
CUS_ID CUS_DEPOSIT CUS_BALANCE CUS_RATING CUS_DUE
10 3000 2000 3 1000
21 2000 190 3 280
43 4500 1000 1 900

Fragment name: CUS_PUN_CUS


Location: Amitsar
Node:PUNCUS
CUS_ID CUS_NAME CUS_STATE
11 Rohit Punjab
33 Rahul Punjab

Fragment name: Location: Bhatinda Node:


CUS_PUN_COL PUNCOL
CUS_ID CUS_DEPOSIT CUS_BALANCE CUS_RATING CUS_DUE
11 4000 3000 2 1500
33 3300 450 2 400

Manipal University of Jaipur B1649 Page No. 259


Advanced Database Management System Unit 11

11.4.2 Data replication


In easy words, Replication is making a copy of the relation. When a relation
N is modified or replicated, a copy of relation N is stored in other sites. The
copies may be kept at only a few selected sites or each site may keep a
copy. In case each site of the system has a copy of the relation, it is known
as full replication.
Replication comes in helpful, when you want to improve the accessibility of
data. Most severe case would be to replicate whole database at every site
of distributed system, which will result in creating a completely replicated
distributed database. This improves accessibility to a great degree because
the system will keep on operating even if there is only one site working.
It moreover improves execution of retrieving global queries, as the effect of
such a query can be obtained from any one of the local sites; therefore, a
retrieval query could be worked at the local site at which it is submitted,
condition being the site should include one server module.
One drawback of full replication is; it could decrease update operations
radically, as one single logical update should be executed on every copy of
the database to keep them reliable. This especially applies if there are many
copies of database.
Full Replication makes the recovery techniques and concurrency control
much expensive then if there was no replication.

Self Assessment Questions


7. Horizontal fragmentation splits the relation by allotting each node of N
to one fragment. (True/ False)
8. Vertical fragmentation of a relation or a table can be acquired by
splitting the table into a number of sub-tables having disjoint columns.
(True/ False)
9. _______ makes concurrency control and recovery techniques more
expensive.

Activity 2
Examine how the information concerning data fragmentation and
replication is stored in a global directory.

Manipal University of Jaipur B1649 Page No. 260


Advanced Database Management System Unit 11

11.5 Advantages and Disadvantages of Data Distribution


In the following sections we will discuss about the benefits and drawbacks of
data distribution.
11.5.1 Advantages of data distribution
Distributed database systems have a number of advantages over their
centralised counterparts. The primary goal of distributed database systems
is to achieve the capacity to access and share data stored in databases
spread across different machines, operating systems and DBMSs, in
reliable, fast and efficient method. The benefits of distributed database are
explained below:
 Space independence: If various sites are linked to one another, then
the user at a site might access data that is present at some other site.
The user does not have to be present physically at the database site.
Therefore, the database becomes space independent.
 Availability of data where it is required: The data in the distributed
database system are so dispersed as to match the data needs of the
users.
 Faster data access: The end-users use only a subset of the whole
database. If this section of the database is locally stored and accessed,
it will be many times quicker than when remotely located.
 Distributed control: The main advantage of achieving data sharing by
the method of the data distribution is that each of the sites is capable of
keeping some degree of control on the stored data.
 User-friendly interface: The end users are free to have interfaces of
their own choice at their sites.
 Increased Reliability: In case of a centralised database system, a
failure renders the entire system useless. Such is not the case with the
distributed database systems. Even in case of a failure the end users
can still access their own database stored locally.
 Query speedup: If there is a query involving data at different sites, it
might be possible to break query into sub-queries that could be parallelly
executed by different sites. This parallel computation enables quicker
processing of a user’s query.

Manipal University of Jaipur B1649 Page No. 261


Advanced Database Management System Unit 11

11.5.2 Disadvantages of data distribution


Distributed database systems are not entirely free from limitations. The main
drawback of distributed database systems is its extra complexity needed to
make sure proper coordination between the sites. Such increased
complexity is in the shape of:
 Complexity of management and control: All the related management
activities and control of the same becomes very complex with degree of
distribution.
 Cost of software development: It is more costly and also tough to
implement a distributed database system as compared to centralised
local database.
 Higher possibility of bugs: As the sites included in the distributed
system work in parallel, it is more difficult to assure the accuracy of the
algorithms. This mode of operation makes them very susceptible to
bugs. The field of creating perfect distributed algorithms is still a big area
or research.
 Processing overheads: The expenditures needed in the exchange of
messages, data and additional computing to accomplish coordination in
distributed database systems is not there in centralised systems.

Self Assessment Questions


10. The ________ in them distributed database system are so scattered as
to match the requirements of the users.
11. What is the primary drawback of distributed database systems?
(a) complexity
(b) standards
(c) overheads
(d) control

11.6 Distributed Transaction


The updation of data on more than two networked computer systems is
termed as a distributed transaction. Distributed transactions expand the
advantages of transactions to those applications that should update the
distributed data.

Manipal University of Jaipur B1649 Page No. 262


Advanced Database Management System Unit 11

Applying effective applications is not easy since these applications


encounter various failures, such as client failure, server failure, and the
network failure between server and client & server. Application program by
itself has to identify and recover in absence of the distributed transactions.
In distributed transactions, every computer system is allocated one
transaction manager. When one transaction accomplishes work at various
computers, transaction managers interact with one another through a
subordinate or superior relationship. These associations are valid only for
one specific transaction.
Each manager does all the organising, enlistment, and aborting of calls for
its listed resource managers (typically those that exist in that specific
computer). Resource managers handle constant data and perform in
collaboration with Distributed Transaction Coordinator (DTC) to guarantee
isolation & atomicity to the application.
In each distributed transaction, every one of the contributing component
should agree to execute a change action (for example database update)
before execution of the transaction. The DTC achieves the transaction co-
ordination role for the involved components.
When executing a distributed transaction between many computers, the
transaction manager transmits prepare/organises, commits and aborts
messages to all of its assistant transaction managers.

Self Assessment Questions


12. For distributed transactions, each computer does not require to have
local transaction manager.(True/ False)
13. __________ manage constant or durable data and work in cooperation
with the Distributed Transaction Coordinator.

11.7 Commit Protocols


Commit protocols are utilised to make sure atomicity across sites. A
transaction is said to be distributed across ‘n’ number of processes. Every
process can itself choose to abort or commit the transaction, butthe
transaction must either commit or abort on all sites.
The two phase commit (2PC) protocol is extensively used.

Manipal University of Jaipur B1649 Page No. 263


Advanced Database Management System Unit 11

The three phase commit (3PC) protocol is much more complex and costly,
but has several advantages over 2PC. This protocol is generally not used in
practical life.

11.7.1 Components of atomic commit


Termination protocol
 When a site is unsuccessful, the correct sites should still be capable to
decide on the result of pending transactions.
 To make a decision on all pending transactions they execute a
termination protocol.
Recovery
 When a site is unsuccessful or fails and then you restart, it has to
perform recovery for all transactions that it has not yet committed
suitably.
 Single site recovery, must terminate all transactions that were active at
the time of the failure.
 Distributed system, must ask around; possibly an active
transaction was committed in the remaining system.
 Independent recovery, a site does not have to communicate with other
sites at restart.

11.7.2 Two phase commit


 Two phase commit is one transaction protocol intended for those
complexities that happen with distributed resource managers.
 Two phase commit protocol is majorly utilised in stock market
transactions, credit card systems, hotel and airline reservations, and
banking applications.
 Distributed transaction manager with two phase commit protocol uses a
coordinator to handle individual resource managers.

Self Assessment Questions


14. _______ are utilised to make sure atomicity across sites.
15. _______ is a transaction protocol which is used for the complexities
that are associated with distributed resource managers.

Manipal University of Jaipur B1649 Page No. 264


Advanced Database Management System Unit 11

11.8 Concurrency Control


Concurrency control technique is being used in distributed database
environment. We presume that every site contributes in the working of a
commit protocol to make sure global transaction atomicity. We presume all
copies of any of the item are up to date.
 Locking protocols: We present some possible schemes that are
applicable to an environment where data can be replicated in numerous
sites.
 Single lock manager approach: System has a single lock manager
existing in a single site, say Si. Whenever a transaction wants to lock an
item, it transmits one lock request to Si and lock manager decides if the
lock could be granted right away. If the answer is yes, the manager
transmits a message to that site which has started the request. If the
answer is no, request is held over till it can be permitted, and at that time
a message is transmitted to the starting site.
 Distributed lock manager: In this technique, locking functionality is
executed by lock managers in the entire site. Lock managers have the
responsibility to manage access to local data items. But particular
protocols may be used for replicas.

Self Assessment Questions


16. A transaction is distributed across __________.
17. System maintains a single __________ that exists in a single chosen
site.

11.9 Recovery of Distributed Database


When one distributed database node fails, its main memory contents get
vanished, and should somehow be recovered. Usually, the most common
technique is to keep, on stable storage, a log of all updates performed on
the node’s data. When a node improves, it reads all the log entries in the
database.
While the log can grow large at random, logging is almost always going with
the taking of the periodic checking points. This check pointing decreases the
time needed for recovery, since only the log records postdating the most
recent checkpoint have to be read through the recovery process.

Manipal University of Jaipur B1649 Page No. 265


Advanced Database Management System Unit 11

In distributed databases, the consistency of the recovered data should also


be considered. If instant consistency is compulsory, once the node is
reintegrated in the system, its data should be consistent with the data on the
other nodes.
If only ultimate consistency is essential, the reintegrated node should still be
in a state that is finally uniform, i.e., the state would develop into consistent
subsequent to a decided time period in a system.
Diskless distributive recovery: For various embedded systems, for
example those working in environments with electromagnetic radiation or
intense vibrations, it might not be practical to utilise disks as permanent
storage for checkpoints and logs. Moreover, many disk drivers are
unpredictable, which makes them inappropriate for real-time systems.
Additionally, disks add to the cost of building the system. We thus see a
requirement for recovery mechanisms which are not dependent on the
disks.
In the replicated database approach, the system’s inherent redundancy can
be utilised for recovery. Rather than reading a checkpoint from disk, the
contents of another node in the system (the recovery source) can be copied
to the recovering node (the recovery target). If updates are executed at
recovery source simultaneously with recovery process, then those updates
should also be moved to the recovery target subsequent to the copying of
database image.
If an entire recovery approach is executed on one distributed database
system, then no other action is needed on other databases. If a partial
recovery is executed on one distributed database system, a co-ordinated
time based & change based recovery should be done on all the databases
having dependencies.

Self Assessment Questions


18. In a _________, the inherent redundancy in the system can be utilised
for recovery.
19. In __________ , the consistency of the recovered data should also be
considered.

Manipal University of Jaipur B1649 Page No. 266


Advanced Database Management System Unit 11

11.10 Directory Systems


Employee information, for example, name, id, email, phone, office address
and even private information can be accessed from many places such as
Web-browser bookmarks. In directory systems there are two types of
entries:
 In White pages, entries are planned by identifier or name and intended
for forward lookup to locate more about entry
 In Yellow pages, entries are designed/planned by properties to locate
entries matching particular requirements

Self Assessment Questions


20. In _________, entries are planned by identifier or name and intended
for forward lookup to locate more about entry.
21. In ____________ , entries are designed/planned by properties to locate
entries matching particular requirements.

11.11 DDBMS Transparency Features


This feature of DDBMS makes sure that each of its users thinks that she is
the only user of the system. That the database is located at various sites or
that there are many users accessing the data at the same time is not known
to the users. All the implementation difficulties are hidden from the user. The
different DDBMS transparency features are:
1. Distribution transparency: This feature permits one distributed
database to be treated as a single (albeit logical) database. The user
need not know – that the database is partitioned; that the database is
replicated at many sites; the database location. She can work as if the
database were locally stored.
2. Transaction transparency: This feature permits a transaction to be
executed either completely or not at all just as in case of a local
database system. This feature ensures the integrity of the distributed
database system.
3. Failure transparency: This feature permits the distributed database to
continue to be operational even if there is a failure at some nodes. The
functions performed by the failed node are carried out by some other
operational nodes.

Manipal University of Jaipur B1649 Page No. 267


Advanced Database Management System Unit 11

4. Heterogeneity transparency: The fact that the constituent databases


are of different types; that the hardware systems are of different types
etc. are not known to the users due to this feature of the DDBMS.

Self Assessment Questions


22. Distribution transparency feature permits one distributed database to be
treated as a single (albeit logical) database (True/False).
23. ___________ feature permits distributed database to continue to be
operational even if there is a failure at some nodes.

11.12 Distribution Transparency


Distribution transparency permits a physically dispersed database to be
managed as though it were a centralised database. The degree of
transparency maintained by the DDBMS differs from one system to another.
3 degrees of distribution transparency is seen:
 Fragmentation transparency, which is the uppermost degree of
transparency. The user has no need to know that the database is
partitioned. As a result, neither the fragment names nor the fragment
locations are specified before the access to data.
 Location transparency resides when the user should recognise the
database fragment names but have no need to declare at which
locations these fragments are situated.
 Local mapping transparency subsists when the end user or programmer
should specify both the fragment names and their locations.
Transparency features are summarised in Table 11.8.
Table 11.8: Transparency Features
IF THE SQL STATEMENT REQUIRES:
FRAGEMENT LOCATION NAME? THEN THE DBMS LEVEL OF
NAME? SUPPORTS DISTRIBUTION
TRANSPARENCY
Yes Yes Local mapping Low
Yes No Location Medium
transparency
No No Fragmentation High
transparency

Manipal University of Jaipur B1649 Page No. 268


Advanced Database Management System Unit 11

As you examine Table 11.8, you might ask why there is not a reference to
the state where the fragment is “No” & location is “Yes.” The explanation is
simple: you cannot incorporate the location name that fails to reference an
accessible fragment. (If you don’t need to specify a fragment name, its
location is clearly unrelated.)

Self Assessment Questions


24. Which transparency exists when the programmer should indicate both
fragment names as well as their locations?
(a) Fragmentation transparency
(b) Local mapping transparency
(c) Location transparency
(d) Failure transparency
25. The end user or programmer needs to recognise that a database is
partitioned or not. (True/ False)

11.13 Summary
Let us recapitulate the important points discussed in this unit:
 A database that physically resides entirely on one machine under a
single DBMS is known as local database management system.
 When the database technology is the same at each of the locations and
the data at the several locations are also compatible that data is known
as homogenous database.
 The transaction manager transmits prepare/organises, commits and
aborts messages to all of its assistant transaction managers.
 When one distributed database node fails, its main memory contents get
vanished, and should somehow be recovered.
 Horizontal fragmentation divides the relation by assigning each tuple of
N to one or more fragments.
 In Horizontal fragmentation scheme, a relation N is partitioned and
subsets are created.
 The degree of transparency maintained by the DDBMS is different from
one system to another.

Manipal University of Jaipur B1649 Page No. 269


Advanced Database Management System Unit 11

11.14 Glossary
 Data fragmentation: A system bears data fragmentation, if only data/file
could be split into fragments for the purpose of physical storage.
 Distributed database: Database which is multiple-source and multiple-
location is called distributed database.
 Distributed transactions: Distributed transactions enhance the
advantages of transactions for those applications that need to update
data.
 DTC: Distributed Transaction Coordinator; the DTC coordinates
transactions that update two or more transaction-protected resources
like databases and files systems.
 Homogenous database: Homogenous database simplifies the data
sharing between different users.

11.15 Terminal Questions


1. Explain the functions and components of DDBMS.
2. What is the difference between homogeneous and heterogeneous
databases?
3. Discuss data fragmentation and its types.
4. What are commit protocols?

11.16 Answers
Self Assessment Questions
1. Sites
2. False
3. Twisted pair
4. Data Processor
5. (a) User
6. (b) Heterogeneous Database
7. (c ) Data
8. (a) Complexity
9. False
10. Resource managers
11. Processes

Manipal University of Jaipur B1649 Page No. 270


Advanced Database Management System Unit 11

12. Lock manager


13. True
14. Two-phase commit
15. Commit protocols
16. True
17. False
18. Replicated database approach
19. distributed databases
20. White pages
21. Yellow pages
22. True
23. Failure transparency
24. Local mapping transparency
25. False

Terminal questions
1. Maintaining track of data, processing of distributed query, etc. is the
function of DDBMS. Refer Section 11.2 for more details.
2. A Homogeneous database will have only one DBMS, while
heterogeneous databases have numerous DBMS's. Refer Section 11.3
for more details.
3. Commit protocols are utilised to make sure atomicity across sites. Refer
Section 11.8 for more details.
4. To break up the database into logical units is known as data
fragmentation. Refer Section 11.12 for more details.

References:
 Ramakrishnan, R. & Gehrke, J. (2003), Database Management
Systems, Third Edition, New Delhi, India: McGraw-Hill.
 Rob, P. & Coronel, C. (2006), Database Systems: Design,
Implementation and Management, Seventh Edition, Thomson Learning.
 Silberschatz, Korth & Sudarshan (1997), Database System Concepts,
Fourth Edition, McGraw-Hill
 Navathe, E. (2000), Fundamentals of Database Systems, Third Edition,
Pearson Education Asia.

Manipal University of Jaipur B1649 Page No. 271


Advanced Database Management System Unit 11

E-references
 http://pages.cs.wisc.edu/~dbbook/openAccess/thirdEdition/slides/slides3
ed-english/Ch22b_DistributedDBs-95.pdf
 http://pcbunn.cacr.caltech.edu

Manipal University of Jaipur B1649 Page No. 272


Advanced Database Management System Unit 12

Unit 12 Object Relational and Extended


Relational Databases
Structure:
12.1 Introduction
Objectives
12.2 Object Relational Database
Reasons behind the development of ORDBMS
Advantages of ORDBMS
Disadvantages of ORDBMS
Characteristics of object relational databases
12.3 Extension Techniques in RDBMS
12.4 Standards for OODBMS Products and Applications
ODMG-93 standards
ODMG Smalltalk binding
SQL3
12.5 Nested Relations and Collections
12.6 Storage and Access Methods
12.7 Implementation Issues for Extended Type
12.8 Comparing RDBMS, OODBMS and ORDBMS
12.9 Summary
12.10 Glossary
12.11 Terminal Questions
12.12 Answers

12.1 Introduction
In previous units, you have read about databases in general with an
emphasis on relational databases. Relational databases, though quite
popular, have some demerits, especially when it comes to the question of
representing real-world entities.
There are some other database models worth mentioning, like object
oriented databases and object relational databases which have advantages
over relational databases because of their object based approach. In these
models, information is represented in the form of objects as used in object-
oriented programming. Object databases main utilisation is in object
oriented areas.

Manipal University of Jaipur B1649 Page No. 273


Advanced Database Management System Unit 12

In this unit, you will study the features of these two objects relational and
extended relational databases. You will learn about the general practices
and approaches associated with these two models. Through this, we will
provide a basic structure for understanding object-oriented database
management system. We will focus primarily on design techniques used in
RDBMS, extension techniques in RDBMS, standards for OODBMS products
and applications. We will also discuss nested relations and collections,
storage and access methods and Implementation issues for extended type.
We will conclude with a comparison of RDBMS, OODBMS and ORDBMS.
Objectives:
After studying this unit, you should be able to:
 describe object-relational DBMSs
 state the design techniques used in RDBMS
 discuss the extension techniques in RDBMS
 identify the OODBMS standards
 recognise the storage and access methods in ORDBMS
 analyse the implementation issues for extended types
 differentiate between RDBMS, OODBMS and RDBMS

12.2 Object Relational Database


Let us start out discussion with the basic concepts of Object Relational
database. This section 12.2 will lay the foundation for better understanding
of the subsequent topics.
Database systems that are based on the object relational model are known
as Object relational databases. In simple words, object relational database
are the ‘Relational Database + Object Oriented Features’. For the relational
database users, it is easy to migrate to object relational databases.
Object relational data models extend the relational data model by providing
a richer type system including object orientation and add constructs to
relational query languages, such as SQL to deal with added data types.
Some examples of databases based on Object Relational technology are:
 Informix Dynamic Server (formerly Postgres, Illustra and Informix
Universal Server)
 IBM’s DB2 Universal Database
 Oracle-8
Manipal University of Jaipur B1649 Page No. 274
Advanced Database Management System Unit 12

 /X, OSMOS
 Ingres II
 Sybase Adaptive Server
Most of the above databases are the enhancement of their relational
predecessors. They provide full support for multimedia and web, various
types of ADT (Abstract data types), spatial data and geographic data, time
series data, collection valued attributes, and others. Thus, they are well
equipped with facilities for efficient application development.
12.2.1 Reasons behind the development of ORDBMS
In the commercial world, there are many DBMS available. Then you may
wonder what the reasons behind the development of ORDBMS are. We will
briefly discuss the main force behind development of ORDBMSs.
ORDBMS basically, evolved to meet the challenges of new applications.
New applications are complex and sophisticated and have diverse data
needs. They require various types of data such as text, images, audio,
Streamed data and BLOBs (binary large objects) etc to record them into the
system.
In addition, there is rising trend to amalgamate the best features of object
databases into the relational model so that the developers can meet the
growing challenges of developing the new applications.
All these factors led to the development of the object relational database,
which we see in the market.
12.2.2 Advantages of ORDBMS
ORDBMS are widely used now-a-days because of the various advantage
linked to it such as;
 The main advantage of object relational data model arises from the
concept of “reuse and sharing”. By reuse, we mean that the programmer
can easily extend the DBMS server to achieve the standard functionality
centrally, rather than coding it in each and every application.
 Sharing means that if the developer wants the applications to use
particular database functionality, then it can be embedded in the
functionality of the server.
 ORDBMS allows enterprises to easily migrate from their existing
systems to an ORDBMS without making major changes.

Manipal University of Jaipur B1649 Page No. 275


Advanced Database Management System Unit 12

 In addition, a user may easily make use of object-oriented systems in


parallel of the RDBMS features.
12.2.3 Disadvantages of ORDBMS
In spite of various advantages and benefits, ORDBMS still has some
disadvantages. Some of these are listed below:
1. The ORDBMSs approach is complex in nature and is associated with
increased costs.
2. The Relational database proponents have a view point that the
extensions to relational model in the form of ORDBMS have diluted the
simplicity of relational model which was the major factor behind its
success.
3. Some database experts also believe that the ORDMSs will be of use for
only a limited set of applications which may not be practical for relational
technology.
4. Also the physical architecture of object-relational model is not suitable
for handling high-speed web applications.
12.2.4 Characteristics of object relational databases
Some of the important characteristics of object relational databases are as
follows:
 Nested relations
 Complex types and object-orientation
 Querying with complex types
 Creation of complex values and objects
We will discuss about these features in detail in the coming sections.
Self Assessment Questions
1. For the relational database users, it is easy to migrate to object
relational databases. (True/False)
2. The main advantage of object relational data model arises from the
concept of “_____________ ”

Activity 1
With the help of internet, find a few examples of object relational
database in real-life. For example, Oracle has recently introduced
Oracle8, an ORDBMS.

Manipal University of Jaipur B1649 Page No. 276


Advanced Database Management System Unit 12

12.3 Extension Techniques in RDBMS


Commercial RDBMS can be extended to incorporate the features of object
oriented languages. This concept of extensibility is the basic idea behind the
innovation of ORDBMS technology.
As you studied in section 12.2, it is very difficult to model complex data
structures such as Streamed data, BLOB, CLOB etc. in a RDBMS. Hence,
the developers proposed some ideas to solve this problem. One idea to
handle this problem is that the DBMS vendors must provide more data types
and functions built into their products itself.
But this approach is not practical, as there exists various types of new data
types. Hence a better solution was proposed that suggested to build the
DBMS engine that is capable of accepting the addition of new, application
specific functionality.
Therefore the database developers can now specialise or extend many of
the features of an ORDBMS such as the data types, OR-SQL expressions,
the aggregates, and so on. ORDBMS acts as a framework into which
specialised software modules can be embedded.
Extensions to a RDBMS mainly fall into following categories:
 Use of Type Constructors to denote complex objects
 Object Identifiers using references
 Support for additional or extensible data types
 Support for user-defined routines (procedures or functions)
 Unstructured complex objects
Now, let us discuss it in brief.
Type Constructors: The type constructors are used to specify complex
types. As the user defines them they are also well-known as user-defined
types (UDTs).
A row type is declared using the following syntax:
CREATE TYPE row_type_name AS [ROW] «component declarations»;
An array type is used to define a collection. For example:
CREATE TYPE Cust-type AS ();
Custname VARCHAR (20),
Contactnumber INTEGER Array [5]
Manipal University of Jaipur B1649 Page No. 277
Advanced Database Management System Unit 12

Object Identifiers Using References: A user-defined type can be used in


two ways: either as type for an attribute or to define the row types of tables.
For example, two tables can be created based on the row type declarations
as follows:
CREATE TABLE Employee OF Emp_.type REF IS emp_id SYSTEM
GENERATED;
CREATE TABLE Company OF Comp_type
REF IS comp_id SYSTEM GENERATED,
PRIMARY KEY (compname));
We can openly declare the object identifiers in the type description rather
than in the table declaration.
Support for additional or extensible data types: SQL-92 specifies the
syntax and behaviour for about ten data types. SQL-3, consist of about a
hundred more of data types which include geographic, temporal, text types,
etc.
Many ORDBMS provide easy support for these common extensions by
means of commercial packages of Abstract data types (ADTs) products.
For example Data Cartridge is an application package for Oracle, Data
Blades is for Informix Universal Server and Extenders in DB2. Yet to rely on
the DBMS trader to deliver all of the innovation data types extension does
not fully deal with the problem.
Therefore with an object-relational DBMS, developers can implement data
types of their own depending upon the application. Developers also define
the actions associated with those data types.
Support for user-defined routines (procedures or functions): RDBMS
provides integral functions for user defined types(UDT). For example, you
had a UDT called Type_T. Now a constructor function Type_T (), will return
a new object of that type. This new object’s, each quality will be initialised to
its default value.
Unstructured complex objects: Some more data types for binary large
objects (BLOBs), and large object locators are extended from RDBMS.
Binary large objects i.e. (BLOBs) and character large objects (CLOBs) are
the two variations exist.

Manipal University of Jaipur B1649 Page No. 278


Advanced Database Management System Unit 12

Self Assessment Questions


3. ORDBMS acts as a framework into which specialised software
modules can be embedded. (True/False)
4. Which of the following is the application package for Informix Universal
Server?
(a) Extenders
(b) Data blades
(c) Data cartridge
(d) Blade smith

12.4 Standards for OODBMS Products and Applications


Object Data Management Group (ODMG) represents an association of
ODBMS vendors. It was formulated to provide a standard for various
ODBMS products.
The objectives of ODMG association are as follows:
 To persist to develop the ODMG standard,
 To train the developer community regarding the standard,
 To advance its use, and
 To offer official recognition of conformity to the standard.
12.4.1 ODMG-93 standards
A working group composed of five representatives of the OODBMS vendors
established ODMG-93 standard. It provides an inclusive framework for
designing an object database, writing portable applications in C++ or
Smalltalk, and querying the database with a simple and very powerful query
language.
Major components of ODMG-93 standard: The ODMG-93 standard
consists of the following parts:
 Object model: It is the model on which the fundamental concepts of
object-oriented data structures such as classes, objects, encapsulation,
inheritance, interfaces, operations, etc. depend upon.
 Object definition language (ODL): Object definition language defines
how to formulate a database schema (the structure of a database). It is
neutral to programming languages.
 Object interchange format: This determines how the various objects
are represented for exchanging them between different OODBMS.

Manipal University of Jaipur B1649 Page No. 279


Advanced Database Management System Unit 12

 Object query language (OQL): OQL is the query language for ODMG
object model. Thus, it is projected to retrieve data from an object base. It
is not concerned with keeping it up to date and neither has it explained
the SQL-like abstractions such views, constraints and stored
procedures. It is intended to work directly with various programming
language for which a requisite is defined such as C++, Java or
SMALLTALK.
 Bindings to programming languages C++, Smalltalk and Java: It is
the ODMG bindings to programming languages like C++, Java and
Smalltalk.
12.4.2 ODMG Smalltalk binding
The ability to store, retrieve and modify persistent objects in Smalltalk is
provided by the ODMG Smalltalk binding. A mechanism is provided by the
binding so as to invoke OQL and measures for transactions and operations
on databases. Reachability makes Smalltalk objects persistent. Thus, an
object turns into persistent when another persistent object in the database
references it, and when it is no longer reachable, it is garbage-collected.
All of the classes described in the ODMG object model are directly mapped
to Smalltalk classes by the Smalltalk binding. Wherever possible, the Object
Model collection classes and operations are mapped to standard collection
classes and methods. Relationships, transactions and database operations
are also mapped to Smalltalk constructs.
12.4.3 SQL3
ANSI and ISO have developed a new SQL standard as SQL3. It is assumed
to be a programming language with full computational and pragmatic power.
The main features of SQL-3 are listed below:
 Various types of user-defined abstract data types (ADTs) are supported
by SQL-3. It also supports various methods, object identifiers, subtypes,
inheritance and polymorphism.
 The statements defining tables are extended by SQL, especially, the
types of rows, row identifiers, and (specific) inheritance between rows of
families of tables.
 Each SQL3 table has a predefined column called as IDENTITY that
contains row identifiers which can be used as values in other rows.

Manipal University of Jaipur B1649 Page No. 280


Advanced Database Management System Unit 12

Thus, through SQL3, it is likely to create pointer-based, network data


structures in the style of DBTG CODASYL.
 The ADT (abstract data types) theory in SQL3 helps the database
developer to correlate values stored in the tables with methods. In such
a case, these values are not directly within reach, but entirely by
methods.
 Methods can be indicated in SQL3 or other languages such as C++ and
Java.
 SQL3 initiated a lot of extensions to earlier SQL versions, including e.g.
enriched support for BLOBs, CLOBs, collections, triggers, transactions,
cursor processing, etc.
 It also comprises some new features such as support for user-defined
aggregate operations, transitive closures, and even deductive rules.
Self Assessment Questions
5. ____________ language defines how to formulate a database schema
(the structure of a database).
6. The _____________ concept in SQL3 helps the database developer to
associate values stored in the tables with methods.
7. Select the name of the predefined column in SQL 3 that contains row
identifiers which can be used as values in other rows.
(a) ADT
(b) Key
(c) Identity
(d) Index

12.5 Nested Relations and Collections


All the restrictions and constraints of first normal form are removed from the
nested relational model from the basic relational model. It is also called Non-
lNF or Non-First Normal Form (NFNF) or NF2 relational model. It is required
that the characteristic of the basic relational model (also called the flat
relational model) to be single-valued and to have atomic domains.
Composite and multivalued features, which results into complex tuples that
have a hierarchical arrangement, are permitted by the nested relations.
Objects that are naturally hierarchically ordered are characterised by nested
relations.

Manipal University of Jaipur B1649 Page No. 281


Advanced Database Management System Unit 12

In Figure 12.1, part (a) shows a nested relation schema DEPT based on
part of the COMPANY database, part (b) gives an example of a Non-INF
tuple in DEPT and (c) tree structure of relation schema.

Figure 12.1: (a) A Nested Relation Schema DEPT (b) An Example of a Non-INf
tuple in DEPT (c) Tree Structure of Relation Schema

In order to classify the DEPT schema in the form of a nested structure, the
following can be written:
dept = (dno, dname, manager, employees, projects, locations) employees =
(ename, dependents) projects = (pname, ploc) locations = (dloc)
dependents = (dname, age)
 Define all attributes of the DEPT relation.
 Then the nested attributes of DEPT viz. EMPLOYEES, PROJECTS, and
LOCATIONS are themselves defined.

Manipal University of Jaipur B1649 Page No. 282


Advanced Database Management System Unit 12

The second-level nested attributes, viz. DEPENDENTS of EMPLOYEES,


are defined, in the next step and so on. In the nested relation definition, the
names of the various attributes must be dissimilar for each of them.
A nested attribute is usually a multivalued composite attribute and within
each tuple, it leads to a "nested relation". For example, a relation is defined
with two attributes (PNAME, PLOC) for the value of the PROJECTS
attribute within each DEPT tuple.
The PROJECTS attribute in the DEPT tuple of Figure 12.1 (b), contains
three tuples as its value. Attributes, such as LOCATIONS of DEPT, may be
multivalued simple attributes.
Generally, nested relational models treat an attribute to be multivalued,
whereas it is possible to have a nested attribute that is single-valued and
composite.
A nested relational database schema consists of various external relation
schemas that define the top level of the individual nested relations. Nested
attributes define relational structures that are nested inside another relation,
thus, they are also called internal relation schemas. Here in our case, DEPT
is the only external relation.
On the contrary, EMPLOYEES, PROJECTS, LOCATIONS, and
DEPENDENTs are internal relations. At last, the attributes that are not
nested are simple attributes and they appear at the leaf level.
Each relation schema by means of a tree structure can be represented as
shown in Figure 12.1 (c). In this tree structure, the root represents an
external relation schema, the leaves symbolise simple attributes, and the
internal nodes are internal relation schemas.
The three first-level nested relations in DEPT characterise independent
information. Therefore, EMPLOYEES correspond to the employees working
for the department, PROJECTS are the projects controlled by the
department, and LOCATIONS symbolises the diverse department locations.
The schema does not represent the relationship between EMPLOYEES and
PROJECTS; this being an M: N relationship is difficult to represent in a
hierarchical arrangement.

Manipal University of Jaipur B1649 Page No. 283


Advanced Database Management System Unit 12

Extensions to the relational algebra, the relational calculus and SQL, have
been projected for nested relations. Conclusively, nested collections
propose immense modelling power but also bring up difficult design
decisions.
Self Assessment Questions
8. Nested relational model is also known as ____________.
9. Nested attributes are also called internal relation schemas.
(True/False)

12.6 Storage and Access Methods


ORDBMS data storage is quite similar to an RDBMS. In this the data is
organised into pages. A collection of pages holds all the data for a
respective table. There is also the facility of caching wherein the blocks of
memory are retained by the ORDBMS particularly for this purpose to store
repeatedly visited pages.
Caching avoids the expenditure of referring to disk every time the page is
accessed. As the user generates a query, the system retrieves the pages
from disk and expel out from the memory cache. If the cache is full, then the
altered pages, or occasionally visited pages, may perhaps be stored back to
disk. The ORDBMS passes pointers to the memory occupant data as
opinions.
The ORDBMS is least bothered regarding the function it performs, or the
way it performs it. It is only concerned in relation to the logic’s return value.
The ORDBMS has the right to read more data from pages in memory, every
time a single execution of the embedded logic is completed. It is also
liberated to raise the logic again with other point of views.
The ORDBMS do not store the logic with the object data; rather it combines
them jointly at run-time.
Large object data storage: Large object data has precise set of disputes.
As extensible database applications are expected to involve a lot of large
data objects, dealing with them proficiently is principally important.
The table records use relatively small data pages, usually only a few
kilobytes in size. It would be an unacceptable constraint to require the new

Manipal University of Jaipur B1649 Page No. 284


Advanced Database Management System Unit 12

types to fit within these pages, so identifications (IDs) offers exceptional


means for supporting data objects of any size.
A considerable increase in the size of the table results from storing large
object data like the polygon consequent to a state boundary, with table data,
such as the city’s name and present population. If it is believed that less
queries access the large object data, then a tactic like this would
significantly increase the time consumed to complete a major percentage of
queries.
The storage of large object data from the table data is separated by the
ORDBMS. The handle for the large object data is stored with the row if the
value of a data object is stored as a large object only. This handle is used by
the user-defined functions that implement behaviour for large objects to
open the large object and access its contents.
Indexing new types: One important reason for users to place their data in a
database is to allow for efficient access via indexes. Unfortunately, the
standard RDBMS index structures support only equality conditions (B+ trees
and hash indexes) and range conditions (B+ trees). An important issue for
ORDBMSs is to provide efficient indexes for ADT methods and operators on
structured objects.
One way to make the set of index structures extensible is to publish an
access method interface that lets users implement an index structure
outside of the DBMS. The index and data can be stored in a _le system, and
the DBMS simply issues the open, next, and close iterator requests to the
user's external index code.
Such functionality makes it possible for a user to connect a DBMS to a Web
search engine, for example. A main drawback of this approach is that data
in an external index is not protected by the DBMS's support for concurrency
and recovery.
An alternative is for the ORDBMS to provide a generic `template' index
structure that is sufficiently general to encompass most index structures that
users might create. Because such a structure is implemented within the
DBMS, it can support high concurrency and recovery.
The Generalised Search Tree (GiST) is such a structure. It is a template
index structure based on B+ trees, which allows most of the tree index

Manipal University of Jaipur B1649 Page No. 285


Advanced Database Management System Unit 12

structures invented so far to be implemented with only a few lines of user-


defined ADT code.
Self Assessment Questions
10. ORDBMS does not care about the logic’s return value. (True/False)
11. The standard RDBMS index structures support only equality conditions
(B+ trees and hash indexes) and range conditions (B+ trees).
(True/False)

12.7 Implementation Issues for Extended Type


The extended type systems with related functions (operations) have a
variety of execution issues. These are briefly described below:
 The ORDBMS must link a user-defined function in its address space in a
dynamic manner only when it is required. Dynamic linking is available in
various types of ORDBMS. The DBMS address space is increased by a
stagnant connections of the functions by an order of scale.
 Client-Server concerns relate with the activation and placement of the
functions. There is a large overhead involved in doing it remotely hence
it must be done in the DBMS address space.
 It must be possible to execute the queries inside functions in DBMS. A
function must be capable of operating in a similar manner irrespective of
the fact that whether it is being executed from an API or by DBMS as a
part of SQL statement.
 The DBMS must provide efficient storage and access of data as
ORDBMS supports a variety of data types and operators associated with
it. Especially given new types, is very important for an ORDBMS.
 There exist several other issues relating to Object-relational database
design which are important to be addressed. Some of these are given
below:
 Object-relational design is more complicated
 Query processing and optimisation
 Interaction of rules with transactions

Manipal University of Jaipur B1649 Page No. 286


Advanced Database Management System Unit 12

Self Assessment Questions


12. Which linking is available in various types of ORDBMS?
(a) Dynamic linking
(b) Refresh linking
(c) Static linking
(d) Pure linking
13. It must be possible to execute the queries inside functions in DBMS.
(True/False)

12.8 Comparing RDBMS, OODBMS and ORDBMS


After reading so much about the various types of database systems, you
can now understand the difference between them and compare them.
Hence, in this section we will compare the three important types of database
systems i.e. RDBMS, OODBMS and ORDBMS. Table 12.1 gives the
comparison between three database systems.
Table 12.1: Comparison between RDBMS, OODBMS and ORDBMS.
Criteria RDBMS ODBMS ORDBMS
Defining standard SQL2 ODMG-2.0 SQL3 (in
process)
Support for object- No support Extensive Support Restricted
oriented features provided; provided support;
Program mainly to new
object is hard data type
to be mapped
to the
database
Usage Uncomplicated Reasonable for Easy to use
usage programmers; except for
some SQL access some
for end users extension
Support for complex Abstract An extensive Abstract
relationships datatypes are range of datatypes datatypes
not supported and data with and
complex inter- complicated
relationships is relationships
supported
Performance High Moderate Expectations
to perform
very well

Manipal University of Jaipur B1649 Page No. 287


Advanced Database Management System Unit 12

Product maturity Comparatively Concept is not Still


developed and many years old, developing,
so very mature and so relatively so immature
mature feature
SQL Usage Extensively OQL is similar to SQL3 is
supports SQL SQL, but with being
additional features developed
like Complex with OO
objects and object- features
oriented features integrated in
it
Advantages Its Complex It can query
dependence applications can compound
on SQL, be handled, applications
relatively reusability of code, and is
simple query less coding capable of
optimisation handling
hence good large and
performance complex
applications
Disadvantage It is unable to Low performance Cannot
deal with due to difficult perform well
complex query optimisation, in web
applications it is unable to application
support large-
scale systems
Support from vendors It is treated as It lacks supplier Very good
highly support due to the future among
successful so enormous size of RDBMS
it has a huge RDBMS market suppliers
market size
but many
suppliers are
shifting to
ORDBMS

Self Assessment Questions


14. Out of RDBMS, OODBMS and ORDBMS, which has got the best
performance?
15. Which one of the following is the easiest to use?
(a) ORDBMS
(b) OODBMS
(c) RDBMS
(d) DBMS
Manipal University of Jaipur B1649 Page No. 288
Advanced Database Management System Unit 12

Activity 2
Prepare a comparative study with an example of each between RDBMS,
OODBMS and ORDBS.

12.9 Summary
Let us recapitulate the important concepts discussed in this unit:
 An ORDBMS is a data repository that can be extended to manage any
kind of data and to organise any kind of data processing. It represents
the next logical step in the evolution of DBMS technology. It both
preserves those aspects of relational DBMS technology that proved so
successful (abstraction, parallelism, and transaction management) and
augments the features with ideas, such as extensibility and advanced
data modelling, derived from object-oriented approaches to software
engineering,
 Users can also implement their own extensions as needed by using the
ADT facilities of these systems. We briefly discussed some
implementation issues for ADTs. Finally, we gave an overview of the
nested relational model, which extends the flat relational model with
hierarchically structured complex objects.
 ODMG 3.0 was developed by the Object Data Management Group
(ODMG). The ODMG is a consortium of vendors and interested parties
that work on specifications for object database and object-relational
mapping products.
 The major components of ODMG 3.0 specification are Object Model,
Object Definition Language (ODL), Object Query Language, • Object
Interchange Format, C++, Java and Smalltalk Language Binding.
 ODMG Smalltalk binding is the binding of ODMG implementations to
Smalltalk.
 There are various implementation issues regarding the support of
extended type systems with associated functions (operations).

12.10 Glossary
 Abstract data type: A combination of an atomic data type and its
associated methods is called an abstract data type, or ADT.
 Access Method: The method used to store, find and retrieve the data
from a database.
Manipal University of Jaipur B1649 Page No. 289
Advanced Database Management System Unit 12

 BLOB: Binary Large Object. Generally used to store multimedia data


such as video, images and audio. It stores data in binary format only
 CLOB: Character Large Object. It is used for documents or large strings
that use the database character. It stores this string data in the
database character set format. NCLOB: National Character large object.
Fixed-width multibyte CLOB. It is used for documents or large strings
that use the National Character Set. It stores this string data in the
National Character Set format.
 ODMG Smalltalk binding: This is the binding of ODMG
implementations to Smalltalk.

12.11 Terminal Questions


1. What do you mean by object relational database? Also explain its
advantages and disadvantages.
2. Explain the characteristics of object-relational databases.
3. Write short notes on ODMG standards.
4. Briefly describe SQL3 and its important features.
5. What are the main reasons behind the development of ORDBMS
6. Describe the various implementation issues for extended types.
7. Differentiate between RDBMS, OODBMS and ORDBMS
8. What are the various components of ODMG-93.
9. What are the storage methods in ORDBMS? Briefly describe.

12.12 Answers
Self Assessment Questions
1. True
2. reuse and sharing
3. True
4. b (Data Blades)
5. Object definition
6. ADT(abstract data types)
7. C (Identity)
8. Non-First Normal Form (NFNF)
9. True

Manipal University of Jaipur B1649 Page No. 290


Advanced Database Management System Unit 12

10. False
11. True
12. a (Dynamic linking)
13. True
14. RDBMS
15. C (RDBMS)

Terminal Questions
1. Database systems that are based on the object relational model are
known as Object relational databases. Refer Section 12.2 for more
details.
2. Some of the important characteristics of object relational databases are
Nested relations, Complex types and object-orientation, Querying with
complex types etc. Refer Section 12.2 for more details.
3. ODMG-93 provides specifications for object database and object-
relational mapping products. Refer Section 12.4 for more details
4. SQL3 is a new SQL standard developed by ANSI and ISO. Refer
Section 12.4 for more details.
5. ORDBMS evolved to meet the challenges of new applications. Refer
Section 12.1 for more details.
6. ORDBMS implementation issues for extended types include various
Client-Server issues, storage and access issues etc. Refer Section 12.6
for more details.
7. There are various differences between a RDBMS, ODBMS and
ORDBMS. These have been discussed in table 12.1 in section 12.7 for
more details.
8. The major components of ODMG 3.0 specification are Object Model,
Object Definition Language (ODL), Object Query Language, Object
Interchange Format, C++, Java and Smalltalk Language Binding. Refer
Section 12.4 for more details.
9. In ORDBMS data is organised into pages. As similar to RDBMS storage.
Refer Section 12.5 for more details.

References:
 Peter Rob, Carlos Coronel, (2007) "Database Systems: Design,
Implementation, and Management", (7th Ed.), India: Thomson Learning.
Manipal University of Jaipur B1649 Page No. 291
Advanced Database Management System Unit 12

 Silberschatz, Korth, Sudarshan, (2010) "Database System Concepts",


(6th Ed.), India: McGraw-Hill.
 Elmasari Navathe, (1989) "Fundamentals of Database Systems", (3rd
Ed.), Pearson Education Asia.
E-references
 http://infolab.usc.edu/
 http://www.odbms.org/
 infolab.stanford.edu/
 db.cs.berkeley.edu/

Manipal University of Jaipur B1649 Page No. 292


Advanced Database Management System Unit 13

Unit 13 XML Query Processing


Structure:
13.1 Introduction
Objectives
13.2 XML Query Languages
XML-QL
Lorel
Quilt
XQL
XQuery
13.3 Approaches for XML Query Languages
Query processing for relational structure
Query processing on storage schema
13.4 XML Database Management Systems
13.5 Summary
13.6 Glossary
13.7 Terminal Questions
13.8 Answers

13.1 Introduction
In the previous unit, you studied the concept of object relational and
extended relational databases which included design techniques used in
RDBMS, extension techniques in RDBMS, standards for OODBMS products
and applications, etc. In this unit, we will reflect on the concept of XML
Query Processing.
XML is used to divide presentation and data. Therefore it provides
independency and flexibility for association of the content. Because of this
temperament of flexibility, data interchanged among two dissimilar systems
can utilise XML as the data format. XML represents a tree-like structure
which is instinctive, understandable and simple to recognise. By means of
XML schema or DTD (document data type), the type and attributes of every
tag which is functional for some XML document can be properly defined.
The function of an XML query engine or processor is to translate the
syntaxes and carry out the operations coded by the query. After the process
and processing time is estimated to be minimum, output is returned. This
therefore provides proficient processing.

Manipal University of Jaipur B1649 Page No. 293


Advanced Database Management System Unit 13

In this unit, you will study various XML query languages like XML-QL, Lorel,
Quilt, XQL, and XQuery. You will recognise the approaches used for XML
query processing like query processing on relational structure and storage
schema. Also we will discuss the concept of XML database management
system.
Objectives:
After studying this unit, you should be able to:
 recognise XML query languages like XML-QL, Lorel, Quilt, XQL, etc.
 discuss the approaches used for XML query processing
 explain XML database management system

13.2 XML Query Languages


Nowadays XML is turning out to be the most significant new standard used
for the representation of data and exchange on the World Wide Web. There
are many new languages that have been projected for extracting and
reforming the content of XML.
Some of the languages existing are conventional languages (such as SQL,
OQL), and other languages are more recent and very much motivated by
XML. Till now, no standard for XML query language has been decided.
However research is going on within the World Wide Web association and
among various academic foundations and in the main companies
associated with internet.
Now we will discuss various XML query languages as below.
13.2.1 XML-QL
XML-QL is an XML query language which is used for the process of
querying, constructing, converting and incorporating the data of XML. XML-
QL language basically displays XML as semi structured data comprising of
irregular or speedily developing structure. For matching data in an XML
document, patterns are used by XML-QL. An extension of XML-QL known
as Elixir was projected to assist ranked queries depending on textual
resemblance.
In case of XML-QL, the client is permitted to query an XML document similar
to a database, and illustrate an output construct.

Manipal University of Jaipur B1649 Page No. 294


Advanced Database Management System Unit 13

XML-QL syntax comprises two sections, one of which is a WHERE clause


and a second one is a CONSTRUCT clause. We use a WHERE clause to
illustrate the data to search for, and a CONSTRUCT clause is used to
illustrate the process of returning the data that is found. Now we will discuss
these two clauses as below.
1. WHERE
The syntax for WHERE clause is shown as below:
WHERE XML-searchstring [ORDER-BY variable [DESCENDING]
[variable [DESCENDING]]] IN 'filename'
You can divide the WHERE clause into various parts. The first part is
known as the search string, and the other part is known as ORDER-BY
clause which is used optionally. ORDER-BY clause is similar to ORDER
BY in SQL. The last part includes the necessary XML document file
name. Let us now discuss all these parts.
XML-searchstring: The search string must be a valid XML section. In
this part, the module varies from the specification. It is implemented in
this manner in order that you can parse the search string by the module
known as XML::Parser.
Now in constructing a query, the tags are listed first that are to be
searched in the document. For example, let us consider the search
string as below:
<BOOK>
<AUTHOR></AUTHOR>
</BOOK>
From the above example, you can see that this search string is
searching for the AUTHOR tag which is nested inside a BOOK tag.
Make note that till now no information has been chosen for retrieval.
Now we will actually take some information, as shown in the example
below:
<BOOK>
<AUTHOR>$author</AUTHOR>
</BOOK>

Manipal University of Jaipur B1649 Page No. 295


Advanced Database Management System Unit 13

Here, $author is the variable name which will take the information that it
looks for inside this tag. This will make this information obtainable to be
used in the CONSTRUCT section of the query. As you have observed,
the variable names begin with a dollar sign ($). This is because dollar
sign is called for by the specification. In case of Perl, this signifies that if
the query is contained in double quotes, you can avoid the dollar sign.
Now suppose we want to search for books that are non-fiction. This is
shown as below:
<BOOK TYPE='non-fiction'>
<AUTHOR>$author</AUTHOR>
</BOOK>
This can also be expressed as a regular expression, as shown below:
<BOOK TYPE='non-.*'>
<AUTHOR>$author</AUTHOR>
</BOOK>
Here this module varies from the specification. As defined in the
specification, the regular expression ability only permits for a subset of
the ability obtainable in a Perl regular expression. By means of this
module, the complete range of the syntax of regular expression has
been made obtainable. This signifies that there is a need to escape
things like periods(.), parenthesis (), and brackets ([]). All non tag
matched are considered as case insensitive.
Now suppose that apart from matching the TYPE, we also want to
retrieve the value. This is shown in the following example:
<BOOK TYPE='non-.* AS_ELEMENT $type'>
<AUTHOR>$author</AUTHOR>
</BOOK>
Here, the keyword named as AS_ELEMENT permits you to save the
corresponding value to make use of it afterwards in the CONSTRUCT
section of the query.

Manipal University of Jaipur B1649 Page No. 296


Advanced Database Management System Unit 13

ORDER-BY: The ORDER-BY clause provides permission to arrange the


data retrieved in the variables. Numerous variables can be specified,
and also DESCENDING can be specified for a reverse sort. This clause
is not necessary. For example:
ORDER-BY $type, $author DESCENDING
IN: The IN clause is a necessary clause that provides the file name of
the XML file. It can be a single file name included in quotes.
2. CONSTRUCT
The CONSTRUCT section provides permission to state a pattern for
output. The pattern will compare each and every character from the first
space after the word CONSTRUCT to the end of the XML-QL query. For
example:
$ql = '(where clause...) CONSTRUCT Type: $type Author: $author';
The repetition of this construct is done for each match discovered and
returned as a single string.
13.2.2 Lorel
Lorel is considered as an early query language used for semi structured
data. Lorel language makes use of the OEM (Object Exchange Model) as
the data model for semi structured data. Lorel is utilised to expand OQL
(Object Query Language) for the procedure of querying elements. This is
done by depending on coercion at various levels to hold back the powerful
typing of OQL. Lorel is also used to extend OQL with path expressions. This
is done in order that user can state the patterns that are corresponding to
actual paths in referred data.
One of the advantages of Lorel language is its easy syntax which makes
users to understand it more clearly.
The drawback of Lorel language is that it is dependent on OQL parser. Also
it comprises of limited functionalities.
13.2.3 Quilt
Quilt is considered as a functional language where you can represent a
query as expression. Quilt expressions include seven major forms. They
are:
 path expressions

Manipal University of Jaipur B1649 Page No. 297


Advanced Database Management System Unit 13

 element constructors
 FLWR expressions
 expressions with operator and functions
 conditional expressions
 quantifiers
 variable bindings
Apart from join operations, quilt also includes nested expressions. Thus, it
mainly includes a subquery inside a single query. For the progress of
XQuery (discussed later), important traits of Quilts were utilised.
Now we have shown the syntax as below which is considered as the top-
level structure of a Quilt query. In the existing version, the grammar is
incomplete and is still being developed as the language progresses. In the
syntax shown in Figure 13.1 below, we have shown the terminal symbols in
angular brackets and we have not further specified their lexical structure.
Query ::= Function_defn* Expression
Function_defn ::= ‘FUNCTION’
<Function_name>
‘(‘ <Variable>* ’)’ ‘{‘ Expression ‘}’
Expression ::= <Variable> | <Constant>
<Function_name> ‘(‘
ExpressionList ‘)’ Expression
Operator Expression
<XPathExpression> |
ElementConstructor |
FLWR_Expression |
‘(‘ Expression ‘)’
ExpressionList ::= Expression | Expression ‘.’
Expression
ElementConstructor ::= StartTag ExpressionList
Enbibitemag
StartTag ::= ‘<’ <String> Attribute* ‘>’
Enbibitemag ::= ‘<’ <String> ‘>’
Attribute ::= <String> ‘=’ Expression | Expression
FLWR_Expression ::= (FOR_clause | LET_clause)+
WHERE_clause?
RETURN_clause
FOR_clause ::= ‘FOR’ FOR_binding (, FOR_binding)*
FOR_binding ::= <Variable> ‘|N|’
‘DISTINCT’? Expression

Manipal University of Jaipur B1649 Page No. 298


Advanced Database Management System Unit 13

LET_clause ::= ‘LET’ LET_binding (, LET_binding)*


LET_binding ::= <Variable> ‘:=’ ‘DISTINCT’?
Expression
WHERE_clause ::= ‘WHERE’ Expression
RETURN_clause ::= ‘RETURN’ Expression
SORTBY_clause?
SORTBY_clause ::= ‘SORTBY’ ‘(‘ ExpressionList
‘)’ (‘ASCENDING’ |
‘DESCENDING’ )?
Operator ::= ‘<’ | ‘<=’ ‘>’ | ‘>=’ | ‘=’ | ‘|=| |’+’|…

Figure 13.1: Quilt Query Syntax

You can begin a Quilt query by defining one or more functions that we use
in the body of the query. The body of the query is just an expression. A
significant type of query relies on a "FLWR_expression". This expression is
constructed from FOR, LET, WHERE, and RETURN clauses.
However a query may also rely on other kinds of expressions. These other
kinds of expressions may be an XPath expression or an element
constructor. We use element constructors for producing new elements of
output that enclose data calculated by the query.
Make note that XPath Expression is considered as a terminal symbol in this
grammar. This symbol symbolises an expression depending on the
shortened syntax of XPath. This is improved by using the following
operators which are taken from XQL:
1. BEFORE and AFTER operators: Let us consider "a BEFORE b". It
assesses to the group of nodes in 'a' that takes place previous to some
node in 'b'. Now consider "a AFTER b". It assesses the group of nodes
in 'a' that takes place after some node in 'b'.
2. INTERSECT operator: Let us consider "a INTERSECT b". It assesses
to the group of nodes in 'a' that are also located in 'b'. This is considered
as an improvement to XPath, which includes an operator for set union
but not for intersection.
3. EXCEPT operator: Let us consider "a EXCEPT b". It assesses to the
set of nodes in 'a' that are also not located in 'b'.
Make note that till now no definite standard function library is developed for
Quilt. The advantage of using Quilt is that it provides strong functionalities.

Manipal University of Jaipur B1649 Page No. 299


Advanced Database Management System Unit 13

The drawback is that it does not provide any support for textual
resemblance.
13.2.4 XQL
XQL is another XML query language which makes use of path expressions.
Thus the main constructs of XQL matches directly to the main structures of
XML. Because of this nature, XQL is considered to be very much associated
to XPath. In case of XQL, document nodes have an essential role. Nodes
comprise identity. Also, the identity, query results series, and associations of
containment are maintained by the nodes. The nodes emerge from various
different sources. But, the process of bringing nodes to the query is not
specified by XQL. Joins and some functions are also supported by XQL.
XQL is generally used to find and sort the text and elements in the
document of an Extensible Markup Language (XML). We use XML files to
send collections of data among computers on the Web.
XQL offers a device which is used for searching and/or choosing out
particular items in the collection of data. It relies on the pattern syntax which
is utilised in the Extensible Stylesheet Language (XSL) and is projected as
an extension to it.
A declarative manner in which particular elements are specified for
processing is performed by the XSL pattern language. Simple directory
notation is used by it. For example, book/author signifies: Choose every
author element in every book element in a specific context.
The capability to utilise Boolean logic, to sort out elements, to index into a
an elements collection are added to this directory outline details by XQL. By
means of XQL, you can write a program to look for repositories of XML files.
This is done to offer hypertext links to particular elements, and for further
applications.
XQL query includes shorter expressions which are easy to use. However,
semantics may not be very instinctive.
13.2.5 XQuery
XQuery is an XML query language which is generated for XML data
sources. W3C working group have produced XQuery and now it is in
approval status. XQuery is considered as an industry standard query
language.
Manipal University of Jaipur B1649 Page No. 300
Advanced Database Management System Unit 13

Declarative access to XML data is provided by XQuery similar to as SQL


performs for relational data. We have shown XQuery and other associated
XML standards in the Figure 13.2. You can see from the Figure 13.2 that
both XPath and XQuery make use of XPath as the basis and utilise XPath
expressions.

Figure 13.2: XQuery and Related Standards

XQuery comprises of two parts which are discussed below:


 The prolog, which is non-compulsory, comprises of declarations that
describe the query’s execution environment.
 The query body comprises of an expression that offers the query’s
result. The XQuery’s input and output are XDM instances at all times.
The XQuery’s body comprises of any or all of the following expression:
 Literals and variables
 Path expressions
 Predicates
 If ..then..else
 Constructors
 Comparisons
 FLWOR expressions

Manipal University of Jaipur B1649 Page No. 301


Advanced Database Management System Unit 13

Now we will discuss FLWOR expressions as below.


FLWOR expressions
FLWOR expressions are considered as a main section of the XQuery
language. The full form of FLWOR is FOR LET WHERE ORDER BY and
RETURN.
Frequently the FLWOR expression is matched with the SELECT FROM
WHERE ORDER BY statement in SQL.
We have shown an example of a simple FLWOR expression in an XQuery
as below:
XQuery
for $x in db2-fn:xmlcolumn('ITEM.DESCRIPTION')
let $p := $x/Item/description/name
where $x/Item/description/price < 3
return <cheap_items> {$p} </cheap_items>
The “for” clause iterate via a series and connects a variable to items in the
series, separately. $x is the binding to the series of XML documents
accumulated in a column named DESCRIPTION. Throughout every iteration
of the “for” clause, $x keeps one item from the group of XML documents
XQuery variable $p. For example, if the XML document associated with $x
comprises of four product names, all the product names will turn out to be
section of the sequence and will be associated to $p.
The “where” clause is considered as analogous to the where clause in SQL
and sorts the outcome group depending on the specified predicate. In the
specified example, the where clause chooses the items whose price is less
than “3”.
The XDM instance that is to be returned is specified by the “return” clause.
You can return the XDM instance when it comes out from the query
assessment or by grouping with further elements created in the return
clause. An example of this can be seen above.
We have given below some more examples of XQuery FLWOR:
1. XQuery for $i in (1 to 3) return $i
OR
Manipal University of Jaipur B1649 Page No. 302
Advanced Database Management System Unit 13

2. XQuery for $x in db2-fn:xmlcolumn(‘ITEM.DESCRIPTION’)//text()


Execution Result
1
-----
1
2
3
All the text nodes are returned from the above query from all the XML
documents accumulated in DESCRIPTION column of the ITEM table.
Self Assessment Questions
1. The syntax of XML-QL comprises of two parts, that is a WHERE clause
and a ______________ clause.
2. Which XML query language is used to extend OQL with path
expressions?
a) Quilt
b) Lorel
c) Xquery
d) XML-QL
3. Quilt query language does not include a subquery inside a single
query. (True/ False)
4. Which query language is used to find and sort the elements and text in
an Extensible Markup Language (XML) document?
5. ________ provides declarative access to XML data similar to as SQL
does for relational data.

Activity 1
What are the major forms included in Quilt? Also illustrate how to
construct Quilt query.

13.3 Approaches for XML Query Languages


The function of a query processor is to take out the query’s high level
abstraction and its procedural assessment into a group of low-level
operations. Make note that SQL query is transformed at logical access
model, similar to SQL processor.

Manipal University of Jaipur B1649 Page No. 303


Advanced Database Management System Unit 13

Then it is transformed at the logical access before accessing the physical


storage model. We have shown the different levels of abstraction in XML
query processing as compared to SQL abstraction levels in the Table 13.1
shown as below.
Table 13.1: Levels of Abstraction in XML Query Processing as Compared to
SQL Abstraction Levels

From the above table, XDBS indicates XML database management system
and RDBS indicates Relational Database Management System.
The language model is intended to fulfil the demands which are represented
in the language ability to carry out search functionality and document-order
knowledge and so document-centric features and afterwards the data-
centric features which are related to influential choice and conversion.
Then the semantic processing is supposed to be able to examine the query
and convert it into a worldwide demonstration to be used during subsequent
steps for optimisation.
Algebraic and non-algebraic procedure is implemented by logical access
model to optimise the internal demonstration of the query. Non-algebraic
optimisation is used to diminish intermediary results by reforming the query
and performing most choosy operations rapidly.
Algebraic optimisation will convert into a more optimised expression in a
semantics-preserving way.
Physical access model is associated to system specific concern. Here,
every logical algebra operator will be divided into matching physical
operators.

Manipal University of Jaipur B1649 Page No. 304


Advanced Database Management System Unit 13

Lastly, for optimised query processing, suitable storage model should be


organised to diminish I/O prices, CPU prices, storage costs for intermediary
outcomes, and communication prices.
Storage models that are used at present include LOBs (Large Objects),
some XML-to-relational mappings, or native storage formats such as
Niagara and Timber. The relational XML data model and native storage
model draw more attentions specified by different proposals for relevant
overlying query processors.
A variety of XML query processors have been projected for more efficient
query processing. Based on the levels of abstraction, the query processors
are divided into the following approaches according to their storage models.
These approaches are Query Processing for Relational Structure and Query
Processing on Storage Schema which are discussed as below.
13.3.1 Query processing for relational structure
When performing query processing for relational structure, XML document
or information associated to XML document is accumulated in relational
database.
This is because relational database executes improved indexing as
compared to simple index formation. RDBMS engine will carry out the query
processing by transforming XQuery into SQL, executing the SQL query and
making the result of XML serialisable.
You can classify the relational storage methods for XML documents into
three groups. They are:
 no XML schema method,
 XML schema method, and
 user defined method.
When no method is there, then relational representation should be derived
from the data. After the appearance of schema, relational schema will be
produced which comprises relationship between root element and all the
sub-elements.
Relational scheme is divided into
 scheme-oblivious and
 scheme conscious method.

Manipal University of Jaipur B1649 Page No. 305


Advanced Database Management System Unit 13

A predetermined plan is preserved by Scheme-oblivious method. On the


contrary, scheme-conscious approach generates a relational schema
depending on DTD/schema of the XML first. It has been observed that
schema-oblivious method can execute better than schema conscious
method.
A query processor BEA/XQRL executes relational scheme by means of
XQuery. Query compiler parses and optimises the query. For extracting the
query, XDBC interface acts as an interface among front-end application and
query processor.
Then compiler will produce query plan so as to optimise the query. XML
data symbolises a stream and XML parser parses it as input. The function
and operator libraries included in runtime operators will process the flow and
offer output depending on the query plan.
Now we will show the overview of BEA in Figure 13.3.

Figure 13.3: Overview of BEA

Manipal University of Jaipur B1649 Page No. 306


Advanced Database Management System Unit 13

13.3.2 Query processing on storage schema


In this approach, elements of XML are allocated label. The reason behind
labelling is to generate exclusive identifiers that will be functional for query
processing.
Many labelling methods are there which consider trade-off among space
possession, contents of information, and appropriateness to updates.
Region-based labelling method is most often utilised. The purpose of this
method is to label elements to show nesting.
The labelling method for simple nesting is shown in Figure 13.4. The final
label indicates status for the node, that is start, end, and level.

Figure 13.4: Region-based Labelling Scheme

ORDPATHS is also a labelling method which is executed in SQL server.


This method labels every node by a series of integer numbers. As the name
suggests, Order, depth, parent, and ancestor-descendant relationships are
there in this method.
Then the XML document will be accumulated as persistent trees. If disk is
utilised as storage means, XML nodes will be divided between disk and
page. Node demonstration is optimised depending on fixed page size.
You can attain the proficient query processing in storage by means of stack-
based algorithms such as StackTreeDesc and holistic twig joins.
StackTreeDesc algorithm makes use of stack structure to store label of
parent elements.
While reaching the path to destination child node, information from stack is
united with child label and returned as consequences in descendant order.
Then, for the next operation, stack is emptied. Alternatively, holistic twig
joins points in an effort to evade constructing mediator consequences when
matching twig patterns.

Manipal University of Jaipur B1649 Page No. 307


Advanced Database Management System Unit 13

Self Assessment Questions


6. When performing query processing for storage schema, XML
document or information associated to XML document is accumulated
in relational database. (True/ False)
7. Which of the following methods is used to preserve a fixed schema by
capturing the tree structure of XML documents?
a) scheme-oblivious method
b) scheme conscious method
c) XML schema method
d) user defined method

13.4 XML Database Management Systems


Now we know that XML will turn out to be a major portion of DBMS
progress, but still even today we cannot define XML database. At the
present time, there are two main approaches which are used for
accumulating and recovering XML-based content. They are:
1. A relational or object-oriented database to accumulate XML data and
middleware (incorporated or intermediary) to carry out data transfers
among the database and XML document.
2. An application XML server that generates XML depending on an initial
query of some type (for example, an e-commerce platform for structuring
distributed system applications that make use of XML for data transfer).
Usually, we refer these as content management systems.
The common point that is to be noted with the above approaches is that
XML offers a bridge between structured and unstructured data in databases
classes.
The relational database group’s concentration on XML as a different data
format for relational & object relational data processing devices directs to
the growth of query languages like XMLQL, Lorel, and XML Query
Language (XQL).
These languages are influenced in the direction of the data-centric outlook
of XML, which needs data to be completely structured but unordered.
Fundamentally, XML is used to publish structured data in a platform and
application in an independent way.

Manipal University of Jaipur B1649 Page No. 308


Advanced Database Management System Unit 13

Therefore XML just offers “syntactic sugaring” to the basic relational data.
While this data-centric utilisation of XML is suitable, it does not binds the full
control of XML.
The document-centric utilisation of XML depends not only on the data
depiction expressed via markups, however, it also depends on the
component ordering of data. Various systems relating to the document
centric outlook make use of XPath.
Xpath is a language designed to recognise parts of XML documents, to
query the data of XML. In recent times, XQuery is published by W3C, which
unites XML’s data and document-centric features, as a candidate standard
query language for XML.
Therefore, existing XML management systems can be divided into XML-
oriented databases and native XML databases.
XML-oriented databases are typically relational and enclose model- or
pattern driven extensions for transporting data to and from XML documents
and are usually intended for data-centric documents.
We have discussed below the major differences among XML-oriented
databases and native XML databases:
 A native XML database is used to maintain physical structure. An XML-
oriented database can do so also, but practice shows a diverse story.
 Native XML databases can accumulate data without schema. We could
make use of techniques to recognise structure in unprocessed
documents to be accumulated in XML-oriented systems. These types of
techniques are relatively limited.
 XPath, DOM, or comparable XML-associated APIs are required to
access data in native XML systems. Alternatively, XML-oriented systems
provide direct access to the data via open-standard APIs, like open-
database connectivity (ODBC).
Example: XTRON is considered as one of the XML data management
system by means of an RDBMS. XTRON data management system uses
the relational engine without modification. XTRON utilises the information of
schema if it is obtainable and accumulates XML data over identical
relational tables whether DTD exists or not.

Manipal University of Jaipur B1649 Page No. 309


Advanced Database Management System Unit 13

Self Assessment Questions


8. Which language is designed to recognise parts of XML documents, to
query the data of XML?
a) Xpath
b) Xquery
c) XQL
d) Quilt
9. Many systems that subscribe to the document centric outlook make
use of XPath. (True/ False)

Activity 2
What is the function of a query processor BEA? Illustrate with diagram.

13.5 Summary
Let us recapitulate the important points discussed in this unit;
 XML represents a tree-like structure which is instinctive, human
understandable and simple to recognise.
 XML-QL is an XML query language which is used for the process of
querying, constructing, converting and incorporating the data of XML.
 Lorel is considered as an early query language used for semi structured
data. Lorel language makes use of the OEM (Object Exchange Model)
as the data model for semi structured data.
 Quilt is considered as a functional language where you can represent a
query as expression. Quilt expressions include seven major forms: path
expressions, element constructors, FLWR expressions, expressions with
operator and functions, conditional expressions, quantifiers, and variable
bindings.
 XQL is another XML query language which makes use of path
expressions. XQL is generally used to find and sort the elements (data
fields) and text in an Extensible Markup Language (XML) document.
 XQuery is an XML query language which is generated for XML data
sources. W3C working group have produced XQuery and now it is in
approval status.

Manipal University of Jaipur B1649 Page No. 310


Advanced Database Management System Unit 13

 FLWOR expressions are considered as a main section of the XQuery


language. The full form of FLWOR is FOR LET WHERE ORDER BY and
RETURN.
 The function of a query processor is to take out the query’s high level
abstraction and its procedural assessment into a group of low-level
operations.
 When performing query processing fir relational structure, XML
document or information associated to XML document is accumulated in
relational database.
 In the approach of Query Processing on Storage Schema, elements of
XML are allocated label. The reason behind labelling is to generate
exclusive identifiers that will be functional for query processing.
 Existing XML management systems are divided into XML-enabled
databases and native XML databases.

13.6 Glossary
 BEA/XQRL: A query processor BEA/XQRL executes relational scheme
by means of XQuery. Query compiler parses and optimises the query.
 FLWOR: FLWOR expressions are considered as a main section of the
XQuery language which signifies FOR LET WHERE ORDER BY and
RETURN.
 Lorel: It is considered as an early query language used for for semi
structured data.
 ORDPATHS: It is a labelling method which is executed in SQL server.
This method labels every node by a series of integer numbers. As the
name suggests, Order, depth, parent, and ancestor-descendant
relationships are there in this method.
 Query processor: The function of a query processor is to take out the
query’s high level abstraction and its procedural assessment into a
group of low-level operations.
 Quilt: It is considered as a functional language where you can represent
a query as expression.
 XML-QL: It is an XML query language which is used for the process of
querying, constructing, converting and incorporating the data of XML.

Manipal University of Jaipur B1649 Page No. 311


Advanced Database Management System Unit 13

 XQL: It is generally used to find and sort the elements (data fields) and
text in an Extensible Markup Language (XML) document.
 XQuery: It is an XML query language which is generated for XML data
sources.

13.7 Terminal Questions


1. What is XML-QL? Discuss its syntax with examples.
2. Illustrate the concept of XQuery. Also discuss FLWOR expressions with
example.
3. What are the different approaches for XML Query Languages? Explain.
4. Differentiate between XML oriented databases and native XML
databases.
5. What are the advantages and disadvantages of Lorel query language?
Discuss.

13.8 Answers
Self Assessment Questions
1. CONSTRUCT
2. b) Lorel
3. False
4. XQL
5. XQuery
6. False
7. a) Scheme-oblivious
8. a) Xpath
9. True
Terminal Questions
1. XML-QL is an XML query language which is used for the process of
querying, constructing, converting and incorporating the data of XML.
Refer Section 13.2 for more details.
2. XQuery is an XML query language which is generated for XML data
sources. W3C working group have produced XQuery and now it is in
approval status. Refer Section 13.2 for more details..
3. Based on the levels of abstraction, the query processors are divided into
the following approaches according to their storage models. They are
Manipal University of Jaipur B1649 Page No. 312
Advanced Database Management System Unit 13

relational processing and native storage processing. Refer Section 13.3


for more details..
4. XML-oriented databases are typically relational and enclose model- or
pattern driven extensions for transporting data to and from XML
documents and are usually intended for data-centric documents. Refer
Section 13.4 for more details..
5. One of the advantages of Lorel language is its easy syntax which makes
users to understand it more clearly. The drawback of Lorel language is
that it is dependent on OQL parser. Refer Section 13.2 for more details..

References:
 Jiang, Z. (2008) On XML Query Processing, Southern Illinois University
at Carbondale.
 Hunter, D. (2007) Beginning XML, 4th Edition, John Wiley & Sons.
E-references
 http://www3.uji.es/~aramburu/curdoc/StoreXML.pdf, 13-04-12
 http://ils.unc.edu/MSpapers/2634.pdf, 13-04-12

Manipal University of Jaipur B1649 Page No. 313


Advanced Database Management System Unit 14

Unit 14 Database Application


Structure:
14.1 Introduction
Objectives
14.2 Active Database
Design principles for active rules
Starburst
Oracle
DB2
Application of active database (Active DB)
14.3 Temporal Database
14.4 Multimedia Database
14.5 Video Database Management
Storage management for video
Video pre-processing for content representation and indexing
Image and semantic-based query processing
Real-time buffer management
14.6 Summary
14.7 Glossary
14.8 Terminal Questions
14.9 Answers

14.1 Introduction
In the previous unit, we studied XML Query processing and several
associated aspects such as XML query languages, approaches for XML
query processing, query processing on relational structure and storage
schema, and XML DBMS. In this unit, we will cover applications related to
database.
An active database system is a technique that allows you to take action
automatically to the events that are happening within or outside the
database system itself. It includes event driven architecture. The main uses
of active database system include statistics gathering, authorisation,
security monitoring and alerting.
In this unit, you will study in detail about active databases, temporal
database and multimedia database. You will also learn about video

Manipal University of Jaipur B1649 Page No. 314


Advanced Database Management System Unit 14

database management system (VDBMS) which offers an integrated support


for queries on storage management for video, semantic and image query
processing.
Objectives:
After studying this unit, you should be able to:
 explain the concept of active database
 discuss temporal database
 identify and explain the text and media database
 explain video database management system

14.2 Active Database


Active database do not form two visible classes, but they rather define two
ends of database rule languages. In active databases, production style rules
are utilised to give automatic implementation of database operations in
response to certain events and / or conditions.
Active databases vary from conservative databases in such a way that the
active databases identify predefined situations in database and trigger
predefined actions when such situations occur. Actions are generally
database updates.
14.2.1 Design principles for active rules
Events that take place triggers the active rules involuntarily. Occurring
events can be updating database, initiation of some actions, etc.
Functionality provided by means of active databases is included in various
commercial packages. The functionality is shown as triggers. There are
some actions that can be recognised by certain events automatically. The
rules that recognise these actions are believed to be a main enhancement
to database systems.
Triggers can be defined as a method which specifies various active rules.
The occurrence of triggers took place in the previous versions of SQL
specification related to relational databases. At present, triggers are
considered as the component of SQL 99 and later standards.
DB2, Oracle, and MS SQL Server are considered as the commercial
relational databases. There are various trigger versions available in these
databases. However, a lot of research has been performed regarding the
appearance of common models used for active databases.

Manipal University of Jaipur B1649 Page No. 315


Advanced Database Management System Unit 14

Below are active rules explained for starburst, oracle etc. in detail.
14.2.2 Starburst
Now, we will provide some cases to demonstrate the process of defining
rules in STARBURST. Through this, we are permitted to illustrate the
process of writing statement-level rules, as only these kinds of rules are
permitted in STARBURST.
The three active rules R1S, R2S, and R3S are as follows:
R1S: CREATE RULE Total_sal1 ON EMPLOYEE
WHEN INSERTED
IF EXISTS (SELECT * FROM INSERTED WHERE Dno IS NOT
NULL)
THEN UPDATE DEPARTMENT AS D
SET D.Total_sal = D.Total_sal +
( SELECT SUM I.Salary) FROM INSERTED AS I WHERE D.Dno - I.Dno)
WHERE D.Dno IN ( SELECT Dno FROM INSERTED );
R2S: CREATE RULE Total_sal2 ON EMPLOYEE
WHEN UPDATED ( Salary )
IF EXISTS ( SELECT * FROM NEW-UPDATED WHERE Dno IS NOT
NULL )
OR EXISTS ( SELECT * FROM OLD-UPDATED WHERE Dno IS NOT
NULL )
THEN UPDATE DEPARTMENT AS D
SET D.Total_sal = D.Total_sal +
( SELECT SUM ) (N.Salary) FROM NEW-UPDATED AS N
WHERE D.Dno N.Dno) -
( SELECT SUM (O.Salary) FROM OLD-UPDATED AS O
WHERE D.Dno = O.Dno
WHERE D.Dno IN ( SELECT Dno FROM NEW-UPDATED) OR
D.Dno IN ( SELECT Dno FROM OLD-UPDATED );
R3S: CREATE RULE Total_sal3 ON EMPLOYEE
WHEN UPDATED (Dno)

Manipal University of Jaipur B1649 Page No. 316


Advanced Database Management System Unit 14

THEN UPDATE DEPARTMENT AS D


SET D.Total_sal = D.Total_sal +
( SELECT SUM (N.Salary) FROM NEW-UPDATED AS N
WHERE D.Dno = N.Dno )
WHERE D.Dno IN ( SELECT Dno FROM NEW-UPDATED
);
UPDATE DEPARTMENT AS D
SET D.Total_sal = Total_sal -
( SELECT SUM (O.Salary) FROM OLD-UPDATED AS O
WHERE D.Dno = O.Dno )
WHERE D.Dno IN ( SELECT Dno FROM OLD-UPDATED );
Now you can study the rule structure by utilising rule R1S. You can state the
name of a rule by using the following statement:
CREATE RULE
That is, this statement states Total_sal1 in case of active rule R1S. A
Clause ON defines the relation upon which a rule is stated, that is,
EMPLOYEE for active rule R1S.
We use WHEN clause when the events (which are used to trigger a rule)
are required to be specified. The IF clause, which is not compulsory, is
utilised to state the conditions which are required to be verified. Lastly, we
use THEN clause to state the actions that are to be performed. These
actions are usually SQL statements (one or more).
The particular events in STARBURST which triggers the rules are
considered as update commands for SQL. These commands include:
DELETE, INSERT, and UPDATE. STARBURST document makes use of
these particular keywords.
Changed tuples can be referred by a method which is incorporated by a rule
designer.
STARBURST document makes use of the following keywords:
 INSERTED
 DELETED
 NEW-UPDATED
 OLD-UPDATED

Manipal University of Jaipur B1649 Page No. 317


Advanced Database Management System Unit 14

By means of these keywords, you can refer to different transition tables


related to these four keywords respectively.
Clearly, the availability of different transition tables depends on triggering
events. These tables can be referred by the writer of a rule when it is
required to write the condition of the rule as well as its action parts.
In case of statement-level semantics, designer of the rule can just refer to
transition tables as well as the rule is triggered merely one time. Thus the
rules should be written in a different way as compared to the rules for row-
level semantics. As a particular insert statement may comprise several
employee related tuples, it should be checked if any recently inserted
employee related tuples is connected to department. In case of R1S, the
following condition is required to be checked:
EXISTS (SELECT * FROM INSERTED WHERE Dno IS NOT NULL)

Also if the condition appears to be true, it signifies the execution of the


action. The action performs the updation in single statement DEPARTMENT
tuple(s) linked to recently inserted employee(s). This is done by totalling
their income to Total_sal characteristic of every linked department.
Since various recently inserted employees might link to similar department,
it is required to utilise the SUM aggregate function for ensuring that every
salary of the department is added.
The active Rule R2S appears to be same as the active rule R1S. However,
UPDATE function is used to trigger this rule. Update function is used for
updating the salaries of employees.
You can trigger R3S by updating Dno attribute which is included in
EMPLOYEE. This signifies that one or more employees are shifted from one
department to another.
Since R3S does not include any specification, action is performed every
time on the occurrence of triggering event. The action which is performed
equally updates both the old as well as new departments related to the
employees who are reassigned. This is done by totalling the salaries of
employees to Total_sal related to every new department. Also the salaries
of all old departments are deducted from Total_sal.

Manipal University of Jaipur B1649 Page No. 318


Advanced Database Management System Unit 14

Delayed consideration is being utilised in the implementation model for


active rules in STARBURST. Specifically, every rule which is triggered
inside a transaction is kept in a set. We call this set as a conflict set. This set
is not judged for assessment of conditions as well as execution till the
completion of a transaction. This is done providing its COMMIT WORK
command.
In STARBURST, client is allowed to begin the rule consideration while
performing a transaction. This is done by means of PROCESS RULES
command. As there is a requirement to evaluate multiple rules, it is essential
to provide an order between the rules.
In STARBURST, rule declaration syntax permits ordering between rules.
This is done to educate the system regarding the order which considers a
set of rules.
Furthermore, transition tables, that is, INSERTED, DELETED, etc. include
the total result of every operation inside the transaction. The transaction
influenced every table, since the transaction applies multiple operations to
all tables.
14.2.3 Oracle
The model which is utilised to state active database rules is known as
Event-Condition-Action (ECA) model. ECA model contain rules which has 3
elements.
(a) Events generate rules: Events in this rule are basically operations for
DB updation. In this rule, events are openly implemented to the DB.
(b) The second element decides whether the generated rule should be
executed or not? As the rule action is generated in the first step, there is
an option in the condition. If any condition is not defined, the rule action
has to be executed at least once when the event happens. If a condition
is defined, firstly it should be evaluated and if the result is true then only
the rule action is executed.
(c) Third element explains the rule action: The rule action is basically a set
of SQL statements. But sometimes it acts as a DB transaction or an
external program which is executed automatically.
For example: Figure 14.1 shows a company database which can be utilised
for Active rules. There is an EMPLOYEE having Name, Security Number

Manipal University of Jaipur B1649 Page No. 319


Advanced Database Management System Unit 14

(Ssn), Salary, Department Number (FK to DEPARTMENT) and Supervisor


(A recursive FK to EMPLOYEE)

Figure 14.1: Example of Active Rule

In this example we can say that Null value is permitted for Department
Number (Dno) which shows that some employee is not assigned to any
department. In DEPARTMENT, there is a Department Name (DNAME) ,
Department Number (Dno), Total Salary (Total_sal), Manager
(Manager_ssn FK to EMPLOYEE)
An active rule is applied on the Total_sal because it has a crucial job of
maintaining salaries of each employee. It should maintain a correct value.
We will first see the events that are responsible for change in Total_sal:
1. New Employee
2. Increments in salary
3. Moving one employee to another department
4. Removing employee
In the first event, we have to re-calculate the Total_sal if a new employee is
added and assigned to department. Therefore Dno (Department Number)
does not have Null value. Same thing goes for 2nd and 4th events. In second
and fourth event we have to determine the employee whose salary is
incremented. For third event, we will always perform an action to maintain
the value of Total Salary accurately, hence no condition is needed (the
action is forever executed).
In event first, second and fourth the action updates the total salary for the
department of an employee to show the added, incremented and erased
salaries. In third event, two actions are needed, first for updation of total
salary of past department and second for updating total salary in present
department.

Manipal University of Jaipur B1649 Page No. 320


Advanced Database Management System Unit 14

Now you can see the Active rules for R1, R2, R3 and R4 events:
(a) R1: CREATE TRIGGER Total_sal1
AFTER INSERT ON EMPLOYEE
FOR EACH ROW
WHEN ( NEW.Dno IS NOT NULL )
UPDATE DEPARTMENT
SET Total_sal = Total_sal + NEW.Salary
WHERE Dno = NEW.Dno;
R2: CREATE TRIGGER Total_sal2
AFTER UPDATE OF Salary ON EMPLOYEE
FOR EACH ROW
WHEN ( NEW.Dno IS NOT NULL )
UPDATE DEPARTMENT
SET Total_sal = Total_sal + NEW.Salary – OLD.Salary
WHERE Dno = NEW.Dno;
R3: CREATE TRIGGER Total_sal3
AFTER UPDATE OF Dno ON EMPLOYEE
FOR EACH ROW
BEGIN
UPDATE DEPARTMENT
SET Total_sal = Total_sal + NEW.Salary
WHERE Dno = NEW.Dno;
UPDATE DEPARTMENT
SET Total_sal = Total_sal – OLD.Salary
WHERE Dno = OLD.Dno;
END;

Manipal University of Jaipur B1649 Page No. 321


Advanced Database Management System Unit 14

R4: CREATE TRIGGER Total_sal4


AFTER DELETE ON EMPLOYEE
FOR EACH ROW
WHEN ( OLD.Dno IS NOT NULL )
UPDATE DEPARTMENT
SET Total_sal = Total_sal – OLD.Salary
WHERE Dno = OLD.Dno;
(b) R5: CREATE TRIGGER Inform_supervisor1
BEFORE INSERT OR UPDATE OF Salary, Supervisor_ssn
ON EMPLOYEE
FOR EACH ROW
WHEN ( NEW.Salary > ( SELECT Salary FROM EMPLOYEE
WHERE Ssn = NEW.Supervisor_ssn ) )
inform supervisor(NEW.Supervisor_ssn, NEW.Ssn );
The CREATE TRIGGER statement states an active rule (trigger name)
Total_sal1 for R1. AFTER clause says triggering rules will happen after the
triggering rule events happens.
Events triggered- Inserting new employee are defined after AFTER clause.
The ON clause explains relation in which rule is defined—FOR EACH ROW
explains the rule that will be triggered for every row for at least one time.
WHEN keyword is optional- It is used for any condition that need to be
certified after every rule is triggered and before execution of action. Finally,
action to be taken is defined as PL/SQL block with more than one SQL
statements or calls to run procedures which are external.
The above mentioned active rules show numerous characteristics of active
rules. Firstly, the events defined for rules to be triggered are SQL
commands: INSERT, DELETE, UPDATE.

Manipal University of Jaipur B1649 Page No. 322


Advanced Database Management System Unit 14

14.2.4 DB2
DB2 is a relational model database server developed by IBM. There are
three DB2 products that are very similar, but not identical: DB2 for LUW
(Linux, Unix, and Windows), DB2 for z/OS (mainframe), and DB2 for iSeries
(formerly OS/400). The DB2 LUW product runs on multiple Linux and UNIX
distributions, such as Red Hat Linux, SUSE Linux, AIX, HP/UX, and Solaris,
and most Windows systems.
The Syntax is:
DB2-trigger: CREATE TRIGGER <trigger-name>
{BEFORE I AFTER} < trigger-event>
ON <table-name>
[REFERENCING <references>]
FOR EACH {ROW I STATEMENT}
WHEN (<SQL-condition>)
<SQL-procedure-statements>
<trigger-event> : INSERT I DELETE I UPDATE
[ON <column-names>]
<reference> : OLD AS <old-value-tuple-name> I
NEW AS <new-value-tuple-name> I
OLD_TABLE AS <old-value-table-name> I
NEW_TABLE AS <new-value-table-name>
14.2.5 Application of active database (Active DB)
You can now see the possible active database applications. Noticeably, first
application permits announcement of condition to happen. Such as, an
active DB might be utilised to observe the physical property of an
organisational heating system.
Another application is the physical property taking records direct from
physical property sensors. Active rules are triggered when physical property
records are added from the condition that assures if the physical property
increases from risk level. It will effect an action to happen

Manipal University of Jaipur B1649 Page No. 323


Advanced Database Management System Unit 14

Another application of active rules is that it entails integrity constraints by


explaining the types of events that may affect the constraint to be despoiled
as well as evaluates the suitable conditions that assures constraints are
despoiled or not.
Hence the composite application constraints which are also known as
business rules might be enforced that way.
For example in DB of a college, 1 rule might notice Average of Grade (AG)
of pupils when a new grade is inserted and may aware the consultant if AG
of a pupil falls below a certain value. Another rule might assure that subject
requirements are fulfilled before permitting a pupil to apply for a subject.
Another application is the basic maintenance of derived data. Maintaining
consistency of occurred views and operations updating
Self Assessment Questions
1. Events ____________ by the rule action that automatically identifies.
2. The main events that are defined for rules triggered are the common
SQL statements in STARBURST. (True/ False)
3. What does the CREATE TRIGGER statement states?
a) trigger name
b) trigger class
c) trigger function
d) trigger variable
4. The model which is utilised to state active database rules is known as
____________ model.

14.3 Temporal Database


Temporal databases are used to record time-referenced data. Basically
majority of the database technologies are temporal. For example:
 Record keeping function (inventory administration, medical-record and
personnel, )
 Financial function (banking, accounting and portfolio organisation)
 Scientific function (weather monitoring)
 Scheduling function (project organisation, hotel, airline and train
reservations).
All these functions trust on temporal databases.

Manipal University of Jaipur B1649 Page No. 324


Advanced Database Management System Unit 14

Temporal databases are best suited for the applications where information
is to organize on time constraints. Therefore, temporal database set a good
example to demonstrate the requirement for development of a combined set
of concepts for the use of application developers. The framing (objective,
design, coding, interface and implementation) of temporal database is
designed by application developers and designers.
There are numerous applications where time is an important factor in storing
the information. For example:
 Insurance, to keep record of accidents and claims.
 Healthcare, to maintain patient histories.
 Reservation systems, to check the reservation and availability of seats in
train, airline, hotel, car rental, and many more places.
 Scientific databases, where experiments outcome need to be stored
along with the time that when it was carried out.
In case of temporal applications, even the two instances utilised might be
simply expanded. For example, in COMPANY database, it may be desirable
to keep PROJECT, JOB and SALARY histories of all the employees.
It can be applied to UNIVERSITY database as well, to store the grade
history of STUDENT. The details about the YEAR, SEMESTER, COURSE
and each SECTION are also included in the database.
Actually, it can be easily concluded that some temporal information is stored
by many of the database applications. But it is also observed that many
users try to ignore temporal feature as it adds complexity to the applications.

Different forms of Temporal databases


Temporal database can be distinguished into various types depending upon
the different notions of time, i.e., valid time and transaction time. Valid time
is the time for which a fact is true in the real world. Transaction time is the
time at which a transaction was made.
A historical database stores data with respect to valid time.
A rollback database stores data with respect to transaction time.
A bitemporal database stores data with respect to both valid time and
transaction time.
As an example, a snapshot database store only a single state of the real
world, usually the most recent state in the context of both valid time and
Manipal University of Jaipur B1649 Page No. 325
Advanced Database Management System Unit 14

transaction time. A TimeDB database stores the history of data with respect
to both valid time and transaction time.
Self Assessment Questions
5. Temporal databases are the technique which record ____________
data.
6. Financial function of temporal database include:
a) Banking
b) accounting
c) portfolio organisation
d) all of the above

Activity 1:
Explain how temporal databases include all database applications to
organise their information.

14.4 Multimedia Database


Multimedia databases facilitate the users to store as well as generate query
for retrieving multimedia information. This information can demand for:
 Documents (like articles/ books/journals)
 Images (like drawings/pictures)
 Video clips (like newsreels/ movies/home videos)
 Audio clips (like speeches/ phone messages/songs),
The primary type of database query generally tries to locate multimedia
sources comprising of particular objects of interest. Such as, one user wants
to locate all the video clips regarding a specific person, say Michael Jackson
in a video database.
Another scenario may be as of someone willing to retrieve video clips
grounded on specific activities like, video clips where a soccer goal is
scored by a certain player or team. These types of queries are mentioned as
content based retrieval, as they retrieve information based on a certain
activity/object from the multimedia sources.
To make this retrieval fast the multimedia database must make use of some
model to index and manage multimedia sources grounded on the contents.
But identifying the contents of multimedia sources is lengthy and difficult

Manipal University of Jaipur B1649 Page No. 326


Advanced Database Management System Unit 14

task. To accomplish this task two approaches can be followed as defined


below:
 Based on automatic analysis of the multimedia sources. It is done to
recognise the contents mathematical characteristics.
 Based on manual identification of the objects and activities of interest in
each multimedia source. And later on depending on this information;
index the sources.
Self Assessment Questions
7. The main type of database query those are required to include locating
multimedia sources consists of certain objects of interest. (True/ False)
8. Identifying the contents of multimedia sources is an easy task. (True/
False)

14.5 Video Database Management


The video database management system comprises of open source system.
The video database management system research group came up with the
extensions and adaptations that were required to support full functionality of
the database.
The extensions of the key database consists of store management for video,
video pre-processing for representation of content and indexing, image and
semantic-based query processing, real time buffer management.
14.5.1 Storage management for video
Massive amount of data having real-time restrictions are carried by the
buffer managers as well as video database storage. In Video database
management, the database buffer zone and the streaming zone are split by
the buffer pool.
Numerous page requests by segment allocation for huge streaming
requests done by means of stream manager are taken care of by the
extended buffer management. Interchanging information, which is used to
direct buffer caching, is performed by means of interface. This interface
takes place among stream manager as well as buffer manager.
Storage manager provides long duration for the execution of essential video
operation in addition to accomplish the requests for real-time as well as for
non real-time. Video database management technique is utilised to take

Manipal University of Jaipur B1649 Page No. 327


Advanced Database Management System Unit 14

care of expanded hierarchies for storage that assist real-time, translucent


access to disk, buffer, as well as tertiary storage.
Caching level series on disk storage as well as buffer makes the process of
accessing better for data which is referenced most often. Also the server
used for tertiary storage helps in managing access to tertiary data. This is
done to make the tertiary data accessible straightforwardly to VDBMS.
In tertiary storage, the important items are recognised by the use of
committed disk partition. Cache disk manager performs the maintenance of
these items. Also he creates the report of these items.
14.5.2 Video pre-processing for content representation and indexing
Metadata, in addition to index video content take a disk space to a greater
extent as compared to the video itself. This is because the data having high-
dimensional characteristic are gathered for all video frames. Also the data
are combined for all video shots.
In executing and optimising the characteristic-based queries, there occur
problems in severe indexing as well as searching. These problems are
presented by the extent of metadata in addition to their storage in database
as multi-dimensional vectors.
The research set included in VDBMS increased the potentiality of Shore’s
indexing. This is done by including the implementation of GiST v2.0. Also
the predator’s layer of query processing is changed for utilising the index of
the Shore or GiST.
VDBMS inserted vector ADT which is to be utilised by every field. The
following statement is implemented:
CREATE GSR INDEX <table> <?eldname> <table>
for generating an example of GiST SR-tree for utilising all fields as access
path in the queries which are characteristic-matching.
High-dimensional characteristic vectors which are generated by means of
visual characteristic extraction as well as utilised in the image resemblance
searches are managed by means of multi-dimensional structure of indexing.
At present an interface is being built for supporting add-in constituent for the
methods of indexing. This is done to implement, check, and compare
different methods of indexing within VDBMS.

Manipal University of Jaipur B1649 Page No. 328


Advanced Database Management System Unit 14

Query processor was changed widely for managing new indexing plan. Also
it was changed for managing latest operators concerned with video query in
addition to their incorporation into the plan of query execution.
Query processing in case of Video DBMS is required to consider video
techniques in addition to operators in producing, and carrying out query
plans. The process of providing nearest queries to the access path (which is
high dimensional) performs image resemblance search.
In case of image resemblance queries having multiple features, users
usually display sample image. In addition, database is queried for images.
Determination of outcomes should take place as per a combined order of
similarity.
14.5.3 Image and semantic-based query processing
The Video Database Management System toolkit employs image and
semantic-based query processing for separating raw videos into shots then
combining the shots obtained from visuals and semantic signifiers and filing
the content of video for searching.
Image and semantic-based pre-processing algorithms identifies the
limitations of video scene which will cut the video into significant shots. This
can be done by utilising a procedure that calculates colour changes in bar
graph.
Video shots are processed to take out MPEG7-supporting low-level visual
characteristic signifiers, spatial and temporary segmentation, camera
movement categorisation, illustration key frames, and additionally semantic
notations of domain proficient. The video features and content are confined
in Video Database Management System along with physical metadata.
The new method of presenting video indices in an XML is same as MPEG7
multimedia indices signifiers.
14.5.4 Real-time buffer management
Content based search is supported by continuous media servers and main
memory get recovery. After that it stores the demanded media prior to
transfer it to the user.
Several studies notified the investigations of buffering policies for media
streaming. Chang and Garcia-Molina initiates a prefetching plan for memory

Manipal University of Jaipur B1649 Page No. 329


Advanced Database Management System Unit 14

efficiency which depends on determining the time displacement among the


requesting prefetching.
In media streaming, dynamic buffer allocation reduces the memory
requirement. The essential practicalities of buffer management for delay-
sensitive multimedia data, defines alterations required by DBMS to sustain
multimedia data.
Buffer allocation is very goal-oriented utilised for various DB (database)
workloads in which targeted objective/goal is allocated for every workload.
Caching parts of media streams enhances streaming performance in two
ways which may be referred in the near future:
1. It decreases numerous addresses to the disk storage
2. It minimises waiting time for initialising streaming.
You know accurate caching conclusions are very hard to make. Before initial
reference, replacement of data will be barred and not referenced for
maximum time in optimum pre fetch and replacement methods.
In the normal streams there is a trouble in policy faith on information which
is usually not available. However, there is an inherent connection between
query processing and streaming.
Streaming options are generally grounded on query outcomes, and buffer
manager employ this type of link to prefetch as well as cache pages awaited
for reference. An effective buffer management policy should be used which
takes feedback from the search engine and helped in making more accurate
replacement and prefetching decisions.
To predict about future video streaming requests top graded query
outcomes of query processor should be utilised in which a weight function
further determines candidates for caching.
VDBMS is an option that can achieve better caching of media streams by
incorporating knowledge of the query as well as streaming elements and
further minimises the initial latency and reduces disk I/O.
Self Assessment Questions
9. Video database management method is used to take care of
____________ storage hierarchies.

Manipal University of Jaipur B1649 Page No. 330


Advanced Database Management System Unit 14

10. In Video database management, the database ____________ and the


streaming zone are split by the buffer pool.
11. Video shots are processed to take out MPEG7-supporting low-level
visual characteristic signifiers, (True/ False)
12. Which processing is used by VDBMS video pre-processing toolkit to
partition raw video?
a) Semantic based query processing
b) Distributed query processing
c) Both
d) None of the above

Activity 2:
The Video Database Management System toolkit employs image and
semantic-based query processing to separate raw video into shots.
Analyse this statement and make a note of it.

14.6 Summary
Let us recapitulate the important points discussed in this unit:
 By the occurrence of events, active rules must be automatically
triggered. The events may be like database accessed or updated.
 ECA (Event-Condition-Action) model is used to state active database
rules.
 The application can periodically insert in the database the temperature
reading records directly from temperature sensors, and active rules can
be written that are triggered.
 The main type of database query those are required to include locating
multimedia sources consisting of objects of interest.
 Massive volumes of data with real-time limitations are carried by the
video database storage and buffer managers.
 Video pre-processing toolkit in VDBMS utilises semantic processing and
image for creating various shots by partitioning the raw video streams.

14.7 Glossary
 Active database: An active database system is a technique that allows
you to take action automatically to events

Manipal University of Jaipur B1649 Page No. 331


Advanced Database Management System Unit 14

 Active rules: Events that take place, automatically triggers the active
rules.
 Dynamic buffer allocation: Minimises the memory requirement for
concurrent media streams.
 ECA: Event-Condition-Action.
 Multimedia databases: It gives features that let users to store as well
as query various kinds of multimedia information
 VDBMS: Video Database Management System

14.8 Terminal Questions


1. What is active database? Explain what are the active rules for oracle?
2. What is the difference between temporal and multimedia database?
3. Explain the storage management for video.
4. Discuss real time buffer management.
5. Write short notes on:
(a) Starburst
(b) Oracle
(c) DB2

14.9 Answers
Self Assessment Questions
1. Triggered
2. True
3. (a) Trigger name
4. Integrity
5. Time-referenced
6. d. all of the above
7. True
8. False
9. Extended
10. Buffer zone
11. True
12. a. Semantic based query processing

Manipal University of Jaipur B1649 Page No. 332


Advanced Database Management System Unit 14

Terminal Questions
1. Active database do not form two visible classes, but somewhat they
define two ends of database rule languages. Refer Section 14.2 for
more details.
2. Temporal database works on time-referenced data and multimedia
database let users to store and query various types of multimedia
information. Refer Section 14.3 and 14.4 for more details.
3. Massive volumes of data with real-time limitations are carried by the
video database storage and buffer managers. Refer Section 14.5 for
more details.
4. Content based search is supported by continuous media servers and
main memory get recovery which further buffer to store the requested
media streams before transferring them on to the user. Refer Section
14.5 for more details.
5. DB2, Oracle, and STARBURST are considered as the commercial
relational databases. There are various trigger versions available in
these databases. Refer Section 14.2 for more details.

Reference:
 Ramakrishnan, R. & Gehrke, J. (2003) Database Management Systems,
Third Edition, McGraw-Hill, Higher Education.
 Rob, P. & Coronel, C. (2006) Database Systems: Design,
Implementation and Management, Seventh Edition, Thomson Learning.
 Silberschatz, Korth & Sudarshan (1997) Database System Concepts,
Fourth Edition, McGraw-Hill
 Navathe, E. (2000) Fundamentals of Database Systems, Third Edition,
Pearson Education Asia
E-reference
 portal.aauj.edu/.../database/delphi_database_application_developers_...
retrieved on May 14, 2012.
 deep.yweb.sk/dbs/p63-paton.pdf retrieved on May 14, 2012.
 www.cs.arizona.edu/~rts/pubs/LNCS639.pdf retireved on May 14, 2012.

_____________________

Manipal University of Jaipur B1649 Page No. 333

You might also like