0% found this document useful (0 votes)
10 views85 pages

Advanced Database Module

The document outlines the Advanced Database Systems module at Infolink University College, covering key concepts such as object-oriented databases, query processing algorithms, transaction processing, concurrency control, and database recovery techniques. It includes detailed sections on object identity, structure, and various database operations like selection, projection, and set operations. Additionally, it discusses the applications of object-oriented databases in various fields and provides insights into database security and distributed systems.

Uploaded by

adanalemu8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views85 pages

Advanced Database Module

The document outlines the Advanced Database Systems module at Infolink University College, covering key concepts such as object-oriented databases, query processing algorithms, transaction processing, concurrency control, and database recovery techniques. It includes detailed sections on object identity, structure, and various database operations like selection, projection, and set operations. Additionally, it discusses the applications of object-oriented databases in various fields and provides insights into database security and distributed systems.

Uploaded by

adanalemu8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

INFOLINK UNIVERSITY COLLEGE

DILLA CAMPUS

DEPARTMENT OF INFORMATION TECHNOLOGY

Advanced Database Systems Module

Meskele Yirgalem

(MSc in Computer Science and Networking)

Dilla, Ethiopia

Nov, 2022
Advanced Database Systems

Contents
1. Concepts for object-oriented Databases.......................................................................... 4
1.1. Why object-orientation in databases? ...................................................................... 4
1.2. Object Oriented Databases (ODBMS) ...................................................................... 4
1.3. Object Identity ........................................................................................................... 6
1.4. Object Structure......................................................................................................... 6
1.5. Type Constructors: .................................................................................................... 7
1.6. Type (class) Hierarchy .............................................................................................. 9
2. Algorithms for query processing and optimization ...................................................... 10
2.1. Selection (Or Restriction)........................................................................................ 10
Selection Examples.................................................................................................... 10
2.2. Projection ................................................................................................................ 11
2.3. Combining Selection and Projection ....................................................................... 12
2.4. Union ....................................................................................................................... 13
2.5. Set Difference ..........................................................................................................
14
2.6. Intersection .............................................................................................................. 14
2.7. Cartesian product..................................................................................................... 14
2.8. Join Operations ........................................................................................................
15
2.9. OVERVIEW OF QUERY PROCESSING ............................................................. 17
3. Transaction processing.................................................................................................. 24
3.1. Introduction to Transaction processing .................................................................. 24
3.2. Transaction processing concepts and theory ........................................................... 25
3.3. DB architectures ...................................................................................................... 26
3.4. Concurrency control ................................................................................................ 27
4. Concurrency Control Techniques ................................................................................. 32
4.1. Locking and time stamping ..................................................................................... 32
5. Database Recovery Techniques .................................................................................... 38
5.1. Introduction of Database Recovery ........................................................................ 38
5.2. Explain Log Based Recovery Method .................................................................... 39

2|Page IT dep’t Meskele Y .


Advanced Database Systems

5.3. Review Questions ................................................................................................... 46


6.1 Introduction to Database Security Issues ................................................................ 49
6.2 Database Security and the DBA .............................................................................. 50
6.3 Access Protection, User Accounts, and Database Audits ....................................... 51
6.3.1 Types of Discretionary Privileges ........................................................................ 52
6.3.5 Specifying Limits on Propagation of Privileges .................................................. 56
7. Distributed Database Systems....................................................................................... 62
7.1 Basic Concepts: ....................................................................................................... 62
7.2 Database Systems: Levels of Data and Process Distribution ................................. 63
7.3 Distributed Database systems................................................................................. 66
7.4 DDBMS Advantages and Disadvantages ................................................................ 68
7.5 DDBMS Functions: ................................................................................................. 68
7.6 Distributed Database Transparency Features ......................................................... 69
7.7 Distributed Concurrency Control ........................................................................... 74
7.8 Distributed Database Design .................................................................................. 76

3|Page IT dep’t Meskele Y .


Advanced Database Systems

Chapter One

1. Concepts for object-oriented Databases


1.1. 1.1 Overview of Object Database Concepts?
The term object-oriented abbreviated OO or O-O has its origins in OO programming
languages, or OOPLs. Today OO concepts are applied in the areas of databases, software
engineering, knowledge bases, artificial intelligence, and computer systems in general.
OOPLs have their roots in the SIMULA language, which was proposed in the late 1960s.
The programming language Smalltalk, developed at Xerox PARC8 in the 1970s, was one
of the first languages to explicitly incorporate additional OO concepts, such as message
passing and inheritance. It is known as a pure OO programming language, meaning that it
was explicitly designed to be object-oriented. This contrasts with hybrid OO programming
What is object?
An object typically has two components: state (value) and behavior (operations). It can
have a complex data structure as well as specific operations defined by the programmer.
Objects in an OOPL exist only during program execution; therefore, they are called
Transient objects.
An OO database can extend the existence of objects so that they are stored permanently in
a database, and hence the objects become persistent objects that exist beyond program
termination and can be retrieved later and shared by other programs. In other words, OO
databases store persistent objects permanently in secondary storage, and allow the sharing
of these objects among multiple programs and applications. This requires the incorporation
of other well-known features of database management systems, such as indexing
mechanisms to efficiently locate the objects, concurrency control to allow objects haring
among concurrent programs, and recovery from

1.2 Object Oriented Databases (ODBMS)


Definition. An object database management system (ODBMS, also referred to as
objectoriented database management system or OODBMS), is a database management
system (DBMS) that supports the modelling and creation of data as objects.

4|Page IT dep’t Meskele Y .


Advanced Database Systems

An object database (also object-oriented database management system, OODBMS) is a


database management system in which information is represented in the form of objects as
used in object-oriented programming. It stores data together with the appropriate methods
for accessing it i.e. encapsulation.
This enables:
• Complex data types to be stored (e.g. CAD applications)
• A wide range of data types in the same database (e.g. multimedia
applications)
• Easier to follow objects through time (e.g. "evolutionary applications")
Applications
The first areas where OODBMS were widely used were:
CASE
CAD(computer aid design)
CAM
Increasingly now used in:
Telecommunications
Healthcare
Finance
Multimedia
Text/document/quality management

 Terminologies
Data is formally represented as instances of one or more relations.
Attribute is a named column of a relation table, representing a “Property.”
Relation is a table:
Tuple is a row of the table, representing a “record.”
Informal terms Formal terms
Table Relation
Column Attribute/Domain
Row Tuple

5|Page IT dep’t Meskele Y .


Advanced Database Systems

1.3 Object Identity


An object retains its identity even if some or all of the values of variables or definitions of
methods change over time.
Object identity is a stronger notion of identity than in programming languages or data
models not based on object orientation.
Value – data value; e.g. primary key value used in relational systems.
Name – supplied by user; used for variables in procedures.
Built-in – identity built into data model or programming language.
 No user-supplied identifier is required.
 Is the form of identity used in object-oriented systems

1.4 Object Structure


The state (current value) of a complex object may be constructed from other objects (or
other values) by using certain type constructors.
Can be represented by (i,c,v)
i is an unique id c is
a type constructor v is
the object state
Complex objects are built by applying constructors to simpler objects including: sets, lists
and tuples. An example is illustrated below:

6|Page IT dep’t Meskele Y .


Advanced Database Systems

An object has associated with it:


A set of variables that contain the data for the object. The value of each variable is itself an
object.
A set of messages to which the object responds; each message may have zero, one, or more
parameters.
A set of methods, each of which is a body of code to implement a message; a method
returns a value as the response to the message
The physical representation of data is visible only to the implementer of the object
Messages and responses provide the only external interface to an object.
The term message does not necessarily imply physical message passing. Messages can
be implemented as procedure invocations.

1.5 Type Constructors:


In OO databases, the state (current value) of a complex object may be constructed from
other objects (or other values) by using certain type constructors.
The three most basic constructors are atom, tuple, and set.
Other commonly used constructors include list, bag, and array.
The atom constructor is used to represent all basic atomic values, such as integers,
real numbers, character strings, Booleans, and any other basic data types that the
system supports directly. Encapsulation
Data and functions are combined in one object
Implementation of operations and object structure hidden
Methods
Methods are programs written in general-purpose language with the following features
Only variables in the object itself may be referenced directly
Data in other objects are referenced only by sending messages.
Methods can be read-only or update methods
Read-only methods do not change the value of the object
Strictly speaking, every attribute of an entity must be represented by a variable and two
methods, one to read and the other to update the attribute

7|Page IT dep’t Meskele Y .


Advanced Database Systems

e.g., the attribute address is represented by a variable address and two messages get-
address and set-address.
For convenience, many object-oriented data models permit direct access to variables of
other objects.

Inheritance
Data and functions are organized in a hierarchy
Objects inherit characteristics and functions of their ancestor objects
Sharing of data within hierarchy scope supports code reusability
A mechanism of reusability, the most powerful concept of OO programming
Ex1.

8|Page IT dep’t Meskele Y .


Advanced Database Systems

Eg2.
Animals:A head and a body, feed
Mammals:A head and a body, feed + Four legs, sit
Fish:A head and a body, feed + swim
Inheritance allows one class to define as a special case of more general class. These
special classes are known as subclass, and the more general classes are known as super
classes. By default, a subclass inherits all the properties (both state and behavior) of its
super class, and it also defines its own unique properties. In addition, a subclass can
redefine inherited methods. If class B is a subclass of class A, then if class B inherits the
characteristics of class A, every instance of B is automatically an instance of A. however
the reverse is not true.

1.6 Type (class) Hierarchy


This classifies entities into subclasses. Two
types
 Specialization-Process of specifying subclasses of an entity set (superclass)
sharing some characteristics. Superclass is defined first followed by subclasses. 
Generalization – Opposite of specialisation. Define first subclasses

9|Page IT dep’t Meskele Y .


Advanced Database Systems

Chapter 2

2. Algorithms for query processing and optimization


2.1 Selection (Or Restriction)
Selection is unary operators.
The selection operator is sigma:

The selection operation acts like a filter on a relation by returning only a certain number
of tuples.

The resulting relation will have the same degree as the


original relation.
The resulting relation may have fewer tuples than the
original relation.
The tuples to be returned are dependent on a condition that is part of the selection
operator.
C (R) Returns only those tuples in R that satisfy condition C
A condition C can be made up of any combination of comparison or logical
operators that operate on the attributes
of R.
Comparison operators:
Logical operators:

Selection Examples

Assume the following relation EMP has the following tuples:


Name Office Dept Rank

Smith 400 CS Assistant

Jones 220 Econ Adjunct

Green 160 Econ Assistant

Brown 420 CS Associate

Smith 500 Fin Associate


Select only those Employees in the CS department:

10 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Dept = 'CS' (EMP)

Result:
Name Office Dept Rank

Smith 400 CS Assistant

Brown 420 CS Associate

Select only those Employees with last name Smith who are assistant professors:
Name = 'Smith' Rank = 'Assistant' (EMP) Result:

Name Office Dept Rank

Smith 400 CS Assistant


Select only those Employees who are either Assistant Professors or in the Economics
department:
Rank = 'Assistant' Dept = 'Econ' (EMP) Result:
Name Office Dept Rank

Smith 400 CS Assistant

Jones 220 Econ Adjunct

Green 160 Econ Assistant


Select only those Employees who are not in the CS department or Adjuncts:
(Rank = 'Adjunct' Dept = 'CS') (EMP)
Result:
Name Office Dept Rank

Green 160 Econ Assistant

Smith 500 Fin Associate

2.2 Projection
• Projection is also a Unary operator.
• The Projection operator is pi:
• Projection limits the attributes that will be returned from the original relation.

11 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

• The general syntax is: attributes R


Where attributes is the list of attributes to be displayed and R is the relation.
• The resulting relation will have the same number of tuples as the original relation
(unless there are duplicate tuples produced).
• The degree of the resulting relation may be equal to or less than that of the original
relation.

 Projection Examples
Assume the same EMP relation above is used.
Project only the names and departments of the employees:
name, dept (EMP) Results:
Name Dept

Smith CS

Jones Econ

Green Econ

Brown CS

Smith Fin
2.3 Combining Selection and Projection
• The selection and projection operators can be combined to perform both operations.
• Show the names of all employees working in the CS department:

name ( Dept = 'CS' (EMP) )


Results:

Name
Smith

Brown

12 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Show the name and rank of those Employees who are not in the CS department or
Adjuncts:

name, rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) ) Result:


Name Rank

Green Assistant

Smith Associate
2.4 Union
R U S: The union of two relations R and S defines a relation that contains all the tuples of
R or S or both R and S, duplicate tuples being eliminated. R and S must be unioncompatible

 Same number of fields


 Corresponding’ fields have the same type.

Consider the following relations R and S


R
First Last Age

Bill Smith 22

Sally Green 28

Mary Keen 23

Tony Jones 32
S
First Last Age

Forrest Gump 36

Sally Green 28

DonJuan DeMarco 27

R S

First Last Age

Bill Smith 22

13 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Sally Green 28

Mary Keen 23

Tony Jones 32

Forrest Gump 36

DonJuan DeMarco 27

2.5 Set Difference


R-S: The set difference operation defines a relation consisting of the tuples that are in
relation R,but not in S.

R and S musts are Union compatible.

R-S
First Last Age

Bill Smith 22

Mary Keen 23

Tony Jones 32

2.6 Intersection
R n S: The intersection operation defines a relation consisting of the set of all tuples that
are in both R and S.R and S must be union compatible
R S
First Last Age

Sally Green 28

2.7 Cartesian product


The Cartesian product also referred to as a cross-join, returns all the rows in all the tables
listed in the query. Each row in the first table is paired with all the rows in the second
table. This happens when there is no relationship defined between the two tables.

14 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

R X S: The Cartesian product operation defines a relation that is the concatenation of


every tuple of relation R with every tuple of relation S
R S

First Last Age Dinner Dessert

Bill Smith 22 Steak Ice Cream

Mary Keen 23 Lobster Cheesecake

Tony Jones 32
RXS
First Last Age Dinner Dessert

Bill Smith 22 Steak Ice Cream

Bill Smith 22 Lobster Cheesecake

Mary Keen 23 Steak Ice Cream

Mary Keen 23 Lobster Cheesecake

Tony Jones 32 Steak Ice Cream

Tony Jones 32 Lobster Cheesecake

2.8 Join Operations


Typically we want only combinations of the Cartesian product that satisfy certain
conditions and so we would normally use a Join Operation instead of the Cartesian product
operation.

Join is a derivative Cartesian product, equivalent to performing a selection operation, using


the join predicate as the selection formula, over the Cartesian product of the operand
relations.

@ Join operation is one of the most difficult operations to implement efficiently in an


RDBMS and is one of the reasons why relational systems have intrinsic performance
problems.

There are various forms of Join operations, each with subtle differences

 Theta join (Ө-join)

15 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

 Equijoin(a particular type of Theta join)

 Natural join
 Outer join

 Semijoin

Theta join (Ө-join)

R F S: The theta join operation defines a realtion that contains tuples satsfying the
predicate F from the Cartesian product of R and S

The predicate F is of the form R.ai Ө S.bi where Ө may be one of the comparison
operators (<, <=,>,>=, ≠)

In the case where the predicate F contains only equality (=), the turm Equijoin is used
instead.

Natural Join

R S: The natural join is an Equijoin of the two relations R and S over all common
attributes x. One occurrence of each common attribute is eliminated from the result.

Ex. List the names and comments of all clients who have viewed a property for rent

Equijoin------ Natural join (to remove one of the similar attributes)

(∏clientNo,fName,lName(client) ) (∏clientNo,propertyNo,comment(Viewing))

Outer Join

R S: The (left) outer join is a join in which tuples from R that do not have matching
values in the common attributes of S are also included in the result relation. Missing
values in the second relation are set to null.

Ex. Produce a status report on property viewings.@ properties that have been viewed
with comments and those that have not been viewed.

(∏propertyNo,street,city(PropertyForRent) Viewing

Semi Join

R F S: The semi join operation defines a relation that contains the tuples of R that
participate in the join of R with S

16 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

It decreases the number of tuples that need to be handled to form the join.

@ It is particularly useful for computing joins in distributed systems


Ex. List complete details of all staff who work at branch in Glascow

Staff staff.branchNo=Branch.branchNo Branch.city=’Glasgow’ Branch

Division Operation

R ÷ S: Assume relation R is defined over the attributes set A and relation S is defined
over the attribute set B such that B C A.Let C= A-B,that is,C is the set of attributes of R
that are not attributes of S.We have the following definition of the Division operation

The Division operation defines a relation over the attributes C that consists of the set of
tuples from R that match the combination of every tuple in S.Where c is the set of
attributes that are in R but not in S

Ex. Identify all clients who have viewed all properties with three rooms

(∏clientNo,propertyNo(viewing))÷ (∏propertyNo(σrooms=3(PropertyForRent)))

2.9 OVERVIEW OF QUERY PROCESSING


There are many ways that complex query can be performed, and one of the aims of query
processing is to determine which one is the most cost effective.

In declarative languages such as SQL, the user specifies what data is required rather than
how it is retrieved

 Query processing:-the activities involved in retrieving data from the database.

Giving the DBMS the responsibility for selecting the best strategy prevents users from
choosing strategies that are known to be inefficient and gives the DBMS more control
system performance

The aim of query processing are to transform a query written in a high level language,
typically SQL, into a correct and efficient execution strategy expressed in low level
language (implementing the relational algebra),and to execute the strategy to retrieve the
required data.

Query processing can be divided into four main phases:

1. Decomposition(consists of parsing and validation)

17 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

2. Optimization
3. Code generation and
4. Execution

Query in high-level language


(typically SQL)

Query System Catalog

Relational Algebra expression


Compile Query optimization Database

Cost generation

Generated code

Runtim Runtime Query Main databases

Query output

Query Decomposition

The aim of query decomposition is to transform a high-level query into a relational


algebra query, and to check that the query is syntactically and semantically correct

The typical stages of query decomposition are:

 Analysis
 Normalization
 Semantic analysis  Simplification and  Query restructuring.

1. Analysis

In this stage, the query is lexically and syntactically analyzed using the techniques of
programming language compilers

In addition, this stage verifies that the relations and attributes specified in the query are
defined in the system catalog.

It also verifies that any operations applied to database objects are appropriate for the
object type.

e.g.

18 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

SELECT StaffNumber
FROM staff

WHERE position >10; data type mismatch

On compilation of this stage, the high-level query has been transformed into some
internal representation that is more suitable for processing.

The internal form that is typically chosen is some kind of tree, which is constructed as
follows:

A leaf node is created for each base relation in the query

A non-leaf node is created for each intermediate relation produced by a relational algebra
operation.

The root of the tree represents the result of the query

The sequence of operations is derived from the leaves to the root

Ex. Find all managers who work at a Wolkite branch

∞s.branchNo=b.branchNo
root

σs.position=’manager’ σb.city=’Wolkite’
Intermediate op

Staff Branch
leaves

2. Normalization

Converts the query into a normalized form that can be more easily manipulated

19 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Conjunction normal form- a sequence of conjunction that are connected with the ᴧ (And)
operator
@ A conjunction selection contains only those tuples that satisfy all conjuncts

Example

(position=’Manager’ ᴠ salary >20000) ᴧ branchNo= ‘B003’

Disjunctive normal form-A sequence of disjuncts that are connected with the ᴠ(OR)
operator.

@ A disjunctive selection contains those tuples formed by the union of all tuples that
satisfy the disjuncts.

Example

(position = ‘Manager’ ᴧ branchNo= ‘B003’) ᴠ (salary > 20000 ᴧ branchNo=’B003’

3. Semantic Analyses

Objective of semantic analysis is to reject normalized queries that are incorrectly


formalized or contradictory.

If components do not contribute to the generation of the results happen if some join
specifications are missing.

A query is contradictory if it is predicate cannot be satisfied by any tuple E.g.

(position=’Manager’^ position =’Assistance’) v salary >20000

Could be simplified to (salary>20000)

@ F v salary >20000

4. Simplifications

The objective of the simplification stage is:

To detect redundant qualifications

To eliminate common sub expression and

To transform the query to a semantically equivalent but more easily and efficiently
computed form.

Typically:

20 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

• Access restrictions
• View definitions and
• Integrity constraints are considered at this stage ,some of which may also
introduce redundancy

* If the user the does not have the appropriate access to all the components of the query
the query must be rejected.

View definition
CREATE VIEW Staff3 AS

SELECT staffNo,fName,lName,salary,branchNo

FROM Staff

WHERE branchNo=‘B003’;

SELECT *

FROM Staff3

WHERE (branchNo=‘B003’ AND salary > 20000);

SELECT staffNo,fName,lName,salary

FROM StaffWHERE(branchNo=‘B003’ AND salary>20000) AND branchNo=‘B003’;

WHERE (branchNo=‘B003’ AND salary>20000)

 Assuming that the user has the appropriate access privileges an initial optimization is to
apply the well-known idempotent rules of Boolean algebra, such as:

• p ^ p =p

• P ^ False= False

• P ^ true = P

• P ^ -P=False

• p^ (p V q)=p

• p v p=p

• p v False =p

21 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

• p v true =true
• p v –p = true

• p v (p^q)=p

Integrity constraints

Pos=manager salary >20000

CREARE ASSERTION onlyManager SalaryHigh

CHECK((position<>’Manager’ AND Salary <20000)

OR (position=’Manager’ AND salary > 20000));

5. Query restructuring
The query is restructured to provide a more efficient implementation.

Heuristically Approach-uses transformation rules to convert one relational algebra


expression into an equivalent form that is known to be more efficient.

Ex

Select before join- manager working in Wolkite Branch

 QUERY OPTIMIZATION

The activity of choosing an efficient execution strategy for processing a query. The aim of
query optimization is to choose the one that minimizes resource usage. Generally, we try
to reduce the total execution time of the query which is the sum of execution time of all
individual operations that make up the query

However, resource usage may also be viewed as the response time of the query in which
case we concentrate on maximizing the number of parallel operations. (Pipelining)

Both methods of query optimization depend on database statistics to evaluate properly the
difference options that are available.

The accuracy and currency and of these statistics have a significant bearing on the
efficiency of the execution strategy chosen.

The statistics cover information about relations, attributes and indexes.

22 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Ex. The system catalog may store statistics


 giving the cardinality of relations

 the number of distinct values for each attribute and

 the number of levels in a multilevel index

23 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Chapter three

3. Transaction processing
3.1. Introduction to Transaction processing
A transaction is an executing program that forms a logical unit of database processing.
Transaction includes one or more database access operations—these can include insertion,
deletion, modification, or retrieval operations.
The database operations that form a transaction can either be embedded within an
application program or they can be specified interactively via a high-level query language
such as SQL.
One way of specifying the transaction boundaries is by specifying explicit begin
transaction and end transaction statements in an application program; in this case, all
database access operations between the two are considered as forming one transaction. If
the database operations in a transaction do not update the database but only retrieve data,
the transaction is called a read-only transaction; otherwise it is known as a read-write
transaction.

A transaction is an atomic unit of work that should either be completed in its entirety or not
done at all. For recovery purposes, the system needs to keep track of when each transaction
starts, terminates, and commits or aborts. Therefore, the recovery manager of the DBMS
needs to keep track of the following operations:
 BEGIN_TRANSACTION: This marks the beginning of transaction execution.
 READ or WRITE: These specify read or write operations on the database items that are
executed as part of a transaction.
 END_TRANSACTION: This specifies that READ and WRITE transaction operations
have ended and marks the end of transaction execution. However, at this point it may be
necessary to check whether the changes introduced by the transaction can be permanently
applied to the database (committed) or whether the transaction has to be aborted because
it violates serializability

24 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

 COMMIT_TRANSACTION: This signals a successful end of the transaction so that


any changes (updates) executed by the transaction can be safely committed to the
database and will not be undone.
 ROLLBACK (or ABORT): This signals that the transaction has ended unsuccessfully,
so that any changes or effects that the transaction may have applied to the database must
be undone.

3.2 Transaction processing concepts and theory


DBMS should provide closely related functions that are intended to insure that the DB is
reliable and remains in a consistent state, namely:

Transaction support
Concurrency control service
Recovery service

Both concurrency control and recovery are required to protect the DB from data
inconsistencies and data lose.

If the operations are not controlled, the access may interfere with another and the DB can
become inconsistent.

To overcome this, the DBMS implements a concurrency control protocol (set of standard
rules) that prevents DB access from interfering with one another.

Transaction: An action, or series of actions, carried out by a sing user or application


program, which reads or updates.

ex. Read(staffNo=x,salary)

Salary=salary*1.1

Write(staffNo=x, newsalary )

 Read and write are operations of DB

If all these updates are not made, referential integrity will be lost and the DB will be in an
inconsistent state.

25 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

A transaction should always transform the DB from one consistent state to another,
although we accept that consistency may be violated while the transaction is in the
progress (tuples with new staffNo and old one).

At the end of the transaction all necessary tuples should have the newstaffNo value.

A transaction can have one of the two outputs.

If it completes successfully, the transaction is said to have commit and the DB reaches a
new consistent state.
On the other hand, if the transaction does not execute successfully, the transaction is
aborted.
 If a transaction is aborted, the DB must be restored to the consistent state it was in before
the transaction state. Such transaction is rollback or undone

A committed transaction cannot be aborted. If it is mistake compensating transaction to


reverse its affect will be designed.

The key words BEGIN TRANSACTION, ROLLBACK and COMMIT are available in
many DML to delimit transactions.

PARTIALLY COMMITED, which occur after the final statement has been executed.

Properties of transaction

The four basic properties of transaction are:

Atomicity: a transaction the “All or nothing” property. A transaction is an


indivisible unit that is either performed in it’s entirely or is not performed at all.
(Maintained by DBMS).
Consistency: a transaction must transform the DB from one consistent state to
another consistent state. (Maintained by application developers or DBMS)
Isolation: Transaction execute independently of one another. In other word, the
partial effects of incomplete transactions should not be visible to other transactions.
Durability: The change should be permanent.

3.3 DB architectures
Four high level DB modules that handle transaction concurrency control and recovery:

Transaction manager: coordinates transactions on behalf of application programs.

26 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Scheduler: Module responsible for implementing a particular strategy for concurrency


control.
Recovery manager: Responsible to set DB to consistent state in the case of failure.
Buffer manager: Is responsible for the transfer of data between disk manager and main
memory.

3.4 Concurrency control


The process of managing simultaneous operations on the database without having them
interferes with one another.

A major objective in developing a DB is to enable many users to access shared data


concurrently

In reading data, there is no way users interfere with one another. In writing (updating)
there may be interference that can result in inconsistencies.

Potential problems caused by concurrency

The lost update problem


Uncommitted dependency problem
Inconsistent analysis problem

The lost update problem

An apparently successfully completed update operation by one user can be overridden by


another user.
Time T1 T2 Balx
T1 Begin transaction 100
T2 Begin transaction read(balx) 100

T3 read(balx) balx=balx+100 100


T4 balx=balx-10 write(balx) 200
T5 write(balx) commit 90

T6 commit
The lost of T2 update is avoided by preventing T1 from reading the value of balx, until
T2’s update has been completed.

27 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Uncommitted dependency problem

The uncommitted dependency problem (or dirty read) problem occurs when one
transaction is allowed to see the intermediate results of another transaction before it has
committed.

Time T3 T4 Balx

T1 Begin transaction 100

T2 read(balx) 100

T3 balx=balx+100 100

T4 Begin transaction write(balx) 200

T5 read(balx) . 200

T6 balx=balx-10 Rollback 200

T7 write(balx) 190

T8 commit 190

Inconsistency analysis problem

The problem occurs when a transaction reads several values from the DB, but a 2nd transaction
update some of them during the execution.

Time T5 T6 balx baly balz sum

T1 Begin transaction 100 50 25 0

T2 Begin transaction sum=0 100 50 25 0

T3 read(balx) read(balx) 100 50 25 0

T4 balx=balx-10 sum=sum+balx 100 50 25 100

T5 write(balx) read(baly) 90 50 25 100

T6 read(balz) sum=sum+baly 90 50 25 150

T7 balz=balz+10 90 50 25 150

T8 write(balz) 90 50 35 150
T9 commit read(balz) 90 50 35 185

T10 sum=sum+balz 90 50 35 185


T11 commit 90 50 35 185

28 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

The problem is avoided by preventing transaction T6 from reading balx and balz until after T5 has
committed its updates.

 Serializability and recoverability

The objective of concurrency control protocol is to schedule transactions in such a way as


to avoid any interference between them.

One solution is to allow only one transaction to execute at a time. One transaction is
committed before the next transaction is allowed to begin.

Violate the aim of a multiuser; DBMS maximizes the degree of concurrency or


parallelism in the system. Ex. Transactions that access different parts of the DB can be
scheduled together without interference.

Schedule: A sequence of the operations by a set of concurrent transaction that preserves


the order of the operations in each of the individual transactions.

Serial schedule: A schedule where the operations of each transaction are executed
consecutively without any interleaved operations from the other transactions.
The result may not be identical
T1T2 is different from T2T1
Non serial: A schedule where the operations from a set of concurrent transactions are
interleaved (i.e. parallel). But the result is the same as serial schedule and the schedule is
said to be correct.

The objective of serializability is to find non serial schedule that allow transactions to
execute concurrently without interfering with one other, and there by produce a DB state
that could be produced by a serial execution.

We say that the non serial schedule is correct if it produces the same result as serial
execution. Such schedule is called serializable.

In serializability, the ordering of read and write operations is important.


If two operations only read a data item, they do not conflict and the order is not
important.

29 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

If two operations either read or write completely separate data items, they do not conflict
and order is not important.

If one transaction writes a data item and another either reads or writes the same data item,
the order of execution is important.

Time T7 T8
T1 Begin T

T2 read(balx)

T3 write(balx)
T4 Begin T

T5 read(balx)

T6 write(balx)
T7 read(baly)
T8 write(baly)
T9 commit
T10 read(baly)
T11 write(baly)

T12 commit

This type of serializability is known as conflict serializability.

A conflict serializability schedule orders any conflicting operations in the same as some
serial execution.

A precedence (or serialization) graph can be produced to test for conflict serializabilty.

For a schedule S, a precedence graph is directed graph G(N,E) that consists of a set of
nodes N and a set of directed edges E which is constructed as follows:

Create node for each transaction


Create a directed edge TiTj, if Tj reads the value of an item written by Ti.

30 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Create a directed edge TiTj, if Tj writes a value into an item after it has been read by Ti
Create a directed edge TiTj, if Tj writes a value into an item after it has been written by
Ti

If an edge TiTj exists in the precedence graph for S, then in any serial schedule S’
equivalent to S, Ti must appear before Tj.

If the precedence graph contains a cycle, the schedule is not conflict serializable.

Ex. Constructing the precedence graphs for schedules to test for conflict serializability

Chapter four

31 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

4. Concurrency Control Techniques


4.1 Locking and time stamping
Locking: A procedure used to control concurrent access to data. When one is transaction
accessing the DB, a lock may deny access to other transactions to prevent incorrect result.

A transaction must claim a shared (read) or exclusive (write) lock on a data item before
the corresponding DB read or write operation.

Data items of various sizes, ranging from the entire DB down to a file, may be locked.

The size of the data item determines the fineness or granularity of the lock. Locks

are used in the following way:-

 Any transaction that needs to access a data item must first lock the item.
 If the item is not locked by another transaction, the lock will be granted.  If the item
is currently locked, the DBMS determines whether the request is compatible with the
existing lock. If a shared lock is requested on an item that already has a shared lock on
it, the request will be granted; otherwise, the transaction must wait until the existing
lock is released.
 A transaction continues to hold a lock until it explicitly releases it either during
execution or when it terminates (aborts or commits).

To guarantee serializability, we must follow an additional protocol concerning the


positioning of the lock and unlock operations in every transaction.

The best-known protocol is two-phase locking (2PL).

2PL: A transaction follows the 2PL protocol if all locking operations precede the first
unlock operations in the transaction.

Every transaction can be divided in to two phases.


1. A growing phase: in which it acquire all the locks needed but cannot release any
locks

32 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

2. Shirking phase: in which it releases its locks but cannot acquire any new locks There

is no requirement that all locks be obtained simultaneously.

It is possible to prevent the lost update problem, uncommitted dependency and


inconsistency analysis problem using 2PL.

Problem of locking

1. Cascading rollback:

Single transaction leads to a series of rollback, is called cascading rollback.

Time T1 T2
T1 begin T
T2 write_lock(balx)
T3 read(balx)
T4 read_lock(baly)
T5 read(baly)
T6 balx=balx+baly
T7 write(balx)
T8 unlock(balx) begin T
T9 . write_lock(balx)
T10 . read(balx)
T11 . balx=balx+100
T12 rollback write(balx)
T13 unlock(balx)
T14
Cascading rollback is undesirable.

Solution is rigorous 2PL: Leaving release of all locks until the end of the transaction
(committed or aborted)
2. Deadlock: since transactions can wait for locks on data items. If two transactions wait
for locks on item held by the other, deadlock will occurs.

33 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Time T1 T2
T1 begin T begin T
T2 write_lock(balx) write_lock(baly)
T3 read(balx) read(baly)
T4 write_lock(baly) write_lock(balx)
T5 wait ……. Wait ……
T6 wait ……. Wait ……
Once the deadlock occurs, the application involved cannot resolve the problem. Instead
the DBMS has to recognize that deadlock exists and break the deadlock in some way.
Unfortunately, there is only one way to break deadlock: aborting one or more of the
transactions.
Deadlock should be transparent to the user.
There are three general techniques for handling deadlock:
• Timeouts
• Deadlock prevention
• Deadlock detection and recovery Timeouts
The transaction that has requested a lock waits for at most a specified period of time.
Deadlock prevention
The DBMS looks ahead to determine if a transaction would cause deadlock and never
allows deadlock to occur. Deadlock detection and recovery
The DBMS allows deadlock to occur but recognizes occurrence of deadlock and break
them.
Frequency of deadlock detection (short time, long time, dynamic) Recovery
technique is only abortion.
1. Choice of deadlock victim
Abort the transaction that incur the minimum cost
A. How long the transaction has been running
B. How many data items the transaction have been updated by the transaction
C. How many data items the transaction is still to update
D. How far to roll a transaction to back (all or part)
E. Avoiding starvation

34 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

4.2 Time stamping method

Time stamping methods for concurrency control are quite different from locking method. No
locks are involved, and therefore there can be no deadlock. With time stamps method, there is
no waiting: transactions involved in conflict are simply rolled back and restored.

Time stamp: a unique identifier created by the DBMS that indicates the relative starting time
of a transaction
Time stamping: a concurrency control protocol that orders transactions in such a way that
older transactions, transaction with smaller time stamps, get priority in the event of conflict.
With time stamping, if a transaction attempts to read o r write a data item, then the read or
write is only allowed to proceed if the last update on data item was carried out by an older
transaction. Otherwise, the transaction requesting the read or write is restarted and given a
new time stamp.
Optimistic technique
Are based on the assumption that conflict is rare, and that is more efficient to allow
transactions to proceed without imposing delays to ensure serializability.
When the transaction wishes to commit a check is performed to determine

whether conflict has occurred. Granularity of data items

Granularity: the size of data items chosen as the unit of protection by concurrency control
protocol.
It can be:
The entire DB or
A file or
A page or
A record or
A field value of a records ex. Updating 1
record1 record is locked
Updating 95% of the recordentire file/relation is locked
Updating 90% of the DBentire DB is locked

35 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

4.3 Database recovery


The process of restoring the DB to a correct state in the event of a failure Causes
of failure:
1. System crashes:- due to HW or SW errors, results in loss of main memory.
2. Media failures:-such a head crashes or unreadable media, resulting in the loss of parts
of secondary storages.
3. Applicaton SW errors:-such as logical errors in the program that is accessing the
DB,which cause one or more transactions to fail.
4. Natural physical dissaster:-such as fires,floods,earth quakes, or power failure.
5. Carelessness or unintentional destruction of data or facilities by operators or users
6. Sabotage:- intentional corruption or destruction of data, HW, SW facilities.

Implementing read operation

 Find the address of the disk block that contains the record with primary key value x.
 Transfer the disk block in to a database buffer in main memory
 Copy the salary data from the DB buffer into the variable salary.

For write operation, the DBMS carries out the following steps:

 Find the address of the disk block that contains the record with primary key value x.
 Transfer the disk block in to a database buffer in main memory
 Copy the salary data from the variable salary into the DB buffer.
 Write the DB buffer back to disk

It is only once the buffers have been flushed to secondary storage that any update
operations can be regarded as permanent.
The explicit writing of the buffers to secondary storage is known as force-writing
(commit). If a failure occurs between writing to the buffer and flushing the buffer to
secondary storage, the recovery manager must determine the status of the transaction that
performed the write at the time of failure.
If the transaction had issued its commit, then to ensure durability the recovery manager
would have to redo that transaction’s update to the DB (roll forward).

36 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

On the other hand, if the transaction had not committed at the time of failure, then the
recovery manager would have undo (rollback) any effects of that transaction on the DB to
guarantee transaction atomicity.
T1
T2 :
T3 :
T4 :

T6
t0
T5 tcheck tfailure
:
Recovery failure
T1 and T6 aborted (undo)
T2 and T3 omit redo
T4 and T5 redo
Check point: the point of synchronization between the DB and the transaction log file.
All buffers are force-written to secondary storage. Check points are scheduled
predetermined interval.

 Recovery facilities
1. Backup mechanism
2. Logging facilities, which keep track of the current state of transactions and DB change
3. A check point, which enables update to the DB that are in progress to be made
permanent

37 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Chapter Five Database Recovery Techniques


5.1. Introduction of Database Recovery
It is the process of restoring a database to the correct state in the event of before failure.
Also it service that is provided by the DBMS to ensure that the database is reliable and
remain in consistent state in case of a failure. In general, recovery refers to the various
operations involved in restoring, rolling forward and rolling back a backup.

It the process of restoring the database to the most recent consistence state that exist just
before failure. This occur when the database is in the inconsistent state and violate the
atomic properties. The three states of database recovery are
 Precondition
 Condition
 Post condition

• Database recovery is the process of restoring a database to the correct state in the
event of a failure
• Database recovery is a service that is provided by the DBMS to ensure that the
database is reliable and remain in consistent state in case of a failure.
• Restoring a physical backup means reconstructing it and making it available to the
database server.
• To recover a restored backup, data is updated using redo command after the
backup was taken.
• Database server such as SQL server or ORACLE server performs cash recovery
and instance recovery automatically after an instance failure.
• In case of media failure, a database administrator (DBA) must initiate a recovery
operation.

Recovering a backup involves two distinct operations: rolling the backup forward to a
more recent time by applying redo data and rolling back all changes made in
uncommitted transactions to their original state.

38 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

In general, recovery refers to the various operation s involved in restoring, rolling


forward and rolling back a backup.
Backup and recovery refers to the various strategies and operations involved in Protecting
the database against data loss and reconstructing the database

5.2 Explain Log Based Recovery Method

5.2.1. Log based recovery


In short Transaction log is a journal or simply a data file, which contains history of all
transaction performed and maintained on stable storage. This is used to storing some
information about the transaction histories what we done before, when the transaction is
fail suddenly it hold what we did before in buffer It used to store the information about the
transaction
The most widely used structure for recording database modification is the log. The log is a
sequence of log records, recording all the update activities in the database. In short
Transaction log is a journal or simply a data file, which contains history of all transaction
performed and maintained on stable storage Since the log contains a complete record of all
database activity, the volume of data stored in the log may become unreasonable large. For
log records to be useful for recovery from system and disk failures, the log must reside on
stable storage. Log contains
1. Start of transaction
2. Transaction-id
3. Record-id
4. Type of operation (insert, update, delete)
5. Old value, new value
6. End of transaction that is committed or aborted
A. All such files are maintained by DBMS itself. Normally these are sequential files.
B. Recovery has two factors Rollback (Undo) and Roll forward (Redo).
C. When transaction Ti starts, it registers itself by writing a <Ti start>log record
D. Before Ti executes write(X), a log record <Ti , X, V1, V2> is written, where
V1 is the value of X before the write, and V2 is the value to be written to X.
E. Log record notes that Ti has performed a write on data item Xj

39 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

F. Xj had value V1 before the write, and will have value V2 after the write
G. When Ti finishes it last statement, the log record < Ti commit> is written. H.
Two approaches are used in log based recovery

5.2.2. Log based Recovery Techniques


Once a failure occurs, DBMS retrieves the database using the back-up of database and
transaction log. Various log based recovery techniques used by DBMS are:
1. Deferred Database Modification 2. Immediate Database Modification
Both of the techniques use transaction logs. These techniques are explained in
following sub-sections.
5.2.2.1. Deferred Database Modification log based recovery method.
Concept
Updates (changes) to the database are deferred (or postponed) until the transaction
commits.
• During the execution of transaction, updates are recorded only in the transaction
log and in buffers. After the transaction commits, these updates are recorded in
the database. When failure occurs
• If transaction has not committed, then it has not affected the database. And so, no
need to do any undoing operations. Just restart the transaction.
• If transaction has committed, then, still, it may not have modified the database.
And so, redo the updates of the transaction.
Transaction Log
 In this technique, transaction log is used in following ways:
 Transaction T starts by writing to the log.
 Any update is recorded as , where V indicates new value for data item X. Here, no
need to preserve old value of the changed data item. Also, V is not written to the X in
database, but it is deferred.
 Transaction T commits by writing to the log. Once this is entered in log, actual
updates are recorded to the database.
 If a transaction T aborts, the transaction log record is ignored, and no any updates are
recorded to the database. Example

40 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

 Consider the following two transactions, T 0 and T1 given in figure, where T0


executes before T1. Also consider that initial values for A, B and C are 500, 600 and
700 respectively.

The following figure shows the transaction log for above two transactions at three
different instances of time.

 If failure occurs in case of


1. No any REDO actions are required.
2. As Transaction T0 has already committed, it must be redone.
3. As Transactions T0 and T1 have already committed, they must be redone.

5.2.2.2. Immediate Database Modification log-based recovery method.


Concept
 Updates (changes) to the database are applied immediately as they occur without
waiting to reach to the commit point.
 Also, these updates are recorded in the transaction log.

41 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

 It is possible here that updates of the uncommitted transaction are also written to the
database. And, other transactions can access these updated values. When failure
occurs
 If transaction has not committed, then it may have modified the database. And so,
undo the updates of the transaction.
 If transaction has committed, then still it may not have modified the database.
And so, redo the updates of the transaction.
Transaction Log
 In this technique, transaction log is used in following ways :
 Transaction T starts by writing to the log.
 Any update is recorded as where Vold indicates the original value of data item X
and Vnew indicates new value for X. Here, as undo operation is required, it
requires preserving old value of the changed data item.
 Transaction T commits by writing to the log.
 If a transaction T aborts, the transaction log record is consulted, and required
undo operations are performed.
Example
Again, consider the two transactions, T 0 and T1, given in figure, where T0 executes
before T1.
Also consider that initial values for A, B and C are 500, 600 and 700 respectively.
The following figure shows the transaction log for above two transactions at three
different instances of time. Note that, here, transaction log contains original values
also along with new updated values for data items.
If failure occurs in case of –
Undo the transaction T0 as it has not committed, and restore A and B to 500 and 600
respectively.
Undo the transaction T1, restore C to 700; and, Redo the Transaction T0 set A and B
to 400 and 700 respectively.
Redo the Transaction T0 and Transaction T0; and, set A and B to 400 and 700
respectively, while set C to 500.

42 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Explain system recovery procedure with Checkpoint record concept


 Problems with Deferred & Immediate Updates Searching the entire log is
timeconsuming.
 It is possible to redo transactions that have already been stored their updates to the
database.
Checkpoint
 A point of synchronization between data base and transaction log file.
 Specifies that any operations executed before this point are done correctly and
stored safely.
 At this point, all the buffers are force-fully written to the secondary storage.
 Checkpoints are scheduled at predetermined time intervals Used to limit –
1. The size of transaction log file
2. Amount of searching, and
3. Subsequent processing that is required to carry out on the transaction log file.
When failure occurs
Find out the nearest checkpoint.
If transaction has already committed before this checkpoint, ignore it.
If transaction is active at this point or after this point and has committed before failure,
redo that transaction.
If transaction is active at this point or after this point and has not committed, undo that
transaction.

43 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Example
Consider the transactions given in following figure. Here, Tc indicates checkpoint,
while Tf indicates failure time.
Here, at failure time –
4. Ignore the transaction T1 as it has already been committed before checkpoint.
5. Redo transaction T2 and T3 as they are active at/after checkpoint, but have
committed before failure.
6. Undo transaction T4 as it is active after checkpoint and has not committee\d.

5.2.3. Explain Shadow Paging Technique.


Concept
Shadow paging is an alternative to transaction-log based recovery techniques.
Here, database is considered as made up of fixed size disk blocks, called pages. These
pages are mapped to physical storage using a table, called page table.
The page table is indexed by a page number of the database. The information about
physical pages, in which database pages are stored, is kept in this page table.
This technique is similar to paging technique used by Operating Systems to allocate
memory, particularly to manage virtual memory.
The following figure depicts the concept of shadow paging

5.2.3.1. Execution of Transaction


During the execution of the transaction, two page tables are maintained.
1. Current Page Table: Used to access data items during transaction execution.
2. Shadow Page Table: Original page table, and will not get modified during
transaction execution.

44 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Whenever any page is about to be written for the first time


3. A copy of this page is made onto an free page, 2. The current page table is made
to point to the copy,
3. The update is made on this copy.
At the start of the transaction, both tables are same and point• to same pages.
• The shadow page table is never changed, and is used to restore the database in
case of any failure occurs. However, current page table entries may change
during transaction execution, as it is used to record all updates made to the
database.
• When the transaction completes, the current page table becomes shadow page
table. At this time, it is considered that the transaction has committed.
• The following figure explains working of this technique.
• As shown in this figure, two pages – page 2 & 5 – are affected by a
transaction and copied to new physical pages. The current page table points to
these pages.
• The shadow page table continues to point to old pages which are not changed
by the transaction. So, this table and pages are used for undoing the
transaction.

45 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Figure 5. 1 Explain Shadow Paging Technique


Advantages
• No overhead of maintaining transaction log.
• Recovery is quite faster, as there is no any redo or undo operations required.
Disadvantages
• Copying the entire page table is very expensive.
• Data are scattered or fragmented.
• After each transaction, free pages need to be collected by garbage collector.
• Difficult to extend this technique to allow concurrent transactions

5.3. Review Questions


1. In which case the database update strategies transaction is summited to the physical
database only after the transaction is reach its commit point?
A. Immediate update
B. Deferred update
C. Summary problem update

46 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

D. All of the above


2. Transaction Ti read the item X and write item X lastly, but before the Ti is
permanently committed the transaction the transaction Ji is starting and read the
values of Ti which is not committed. Which concurrency problem of the above idea.
A. Lost of update problem
B. Uncommitted dependency
C. Dirty read problem
D. Incorrect summary
3. From the following which one is true about the locking technique?
A. More than one transaction using the item X at the same time
B. When one transaction using the variable X the other transaction must waiting up to
locked is acquire
C. When one transaction using the variable X the other transaction must waiting up to
locked is released
4. Which one of the following more described about the shared and exclusive locks of
lock types?
A. If the transaction only need to write item X it use shared lock
B. If the transaction only need to read the item Y is use the exclusive lock
C. If the transaction only need to read and write the item Z it uses exclusive lock
D. If the transaction only need to read and write the item Z it uses shared lock
E. All of the above
5. From the following which one is false about the two phase of locking?
A. The lock and unlock would done in two different phases
B. The transaction lock first in one phase than released in the other phase
C. Locked point is the point of stop locking and allow to unlock
D. At the growing phase the lock and released activity can occur
E. None of the above
F. All of the above
6. Which one is more describe about multiple granularity of locking techniques? A. It
allow the data with the various size and the hierarchy of the data structure

B. During this time when locking the tables all the database would be locked

47 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

C. On this mechanism when locking the table all the ascendants are locked
D. On this mechanism when locking the table all the descendent records are locked
E. All of the above
7. Which one is more describe about multiple granularity of locking techniques?
A. It allow the data with the various size and the hierarchy of the data structure
B. During this time when locking the tables all the database would be locked
C. On this mechanism when locking the table all the ascendants are locked
D. On this mechanism when locking the table all the descendent records are locked
E. All of the above
Which of the following is not a recovery technique?
A. Deferred update
B. Immediate update
C. Two-phase commit
D. Recovery management
8. …. deals with soft errors, such as power failures.
A. System recovery
B. Media recovery
C. Database recovery
D. Failure recovery
9. Rollback of transactions is normally used to :
A. Recover from transaction failure
B. Update the transaction
C. Retrieve old records
D. Repeat a transaction
10. A transaction performs on the isolation level of the Read Uncommitted if :
A. A dirty read occurs
B. Non-repeatable read occurs
C. Phantom reads occurs
D. All of the above

48 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Chapter 6
Database Security

6.1 Introduction to Database Security Issues


 Types of Security

 Legal and ethical issues


 Policy issues
 System-related issues
 The need to identify multiple security levels

 Threats to databases
Threats to databases can result in the loss or degradation of some or all of the following
commonly accepted security goals.
a. Loss of integrity b. Loss of availability c. Loss of confidentiality

To protect databases against these types of threats four kinds of countermeasures can be
implemented:
◦ Access control ◦ Inference control ◦ Flow control ◦ Encryption
A DBMS typically includes a database security and authorization subsystem that is
responsible for ensuring the security portions of a database against unauthorized access.
Two types of database security mechanisms:
◦ Discretionary security mechanisms
◦ Mandatory security mechanisms
The security mechanism of a DBMS must include provisions (supplies)s for restricting
access to the database as a whole

This function is called access control and is handled by creating user accounts and
passwords to control login process by the DBMS.
The security problem associated with databases is that of controlling the access to a
statistical database, which is used to provide statistical information or summaries of
values based on various criteria.

49 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

The countermeasure to statistical database security problem is called inference control


measures.
Another security is that of flow control, which prevents information from flowing in
such a way that it, reaches unauthorized users.
Channels that are pathways for information to flow implicitly in ways that violate (abuse)
the security policy of an organization are called covert channels.
A final security issue is data encryption, which is used to protect sensitive data (such as
credit card numbers) that is being transmitted via some type communication network. The
data is encoded using some encoding algorithm.
An unauthorized user who access encoded data will have difficulty deciphering it, but
authorized users are given decoding or decrypting algorithms (or keys) to decipher data.

6.2 Database Security and the DBA


The database administrator (DBA) is the central authority for managing a database
system.
The DBA’s responsibilities include
 Granting privileges to users who need to use the system
 Classifying users and data in accordance with the policy of the organization The
DBA is responsible for the overall security of the database system.
The DBA has a DBA account in the DBMS
Sometimes these are called a system or super user account. These accounts provide
powerful capabilities such as:

1. Account creation
2. Privilege granting
3. Privilege revocation
4. Security level assignment

Action 1 is access control, whereas 2 and 3 are discretionary and 4 is used to control
mandatory authorization

50 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

6.3 Access Protection, User Accounts, and Database Audits

Whenever a person or group of person s need to access a database system, the individual
or group must first apply for a user account.
The DBA will then create a new account id and password for the user if he/she deems
there is a legitimate need to access the database
The user must log in to the DBMS by entering account id and password whenever
database access is needed.
The database system must also keep track of all operations on the database that are
applied by a certain user throughout each login session.
To keep a record of all updates applied to the database and of the particular user who
applied each update, we can modify system log, which includes an entry for each operation
applied to the database that may be required for recovery from a transaction failure or
system crash.
If any tampering (interfere or cause damage) with the database is suspected, a database
audit is performed
A database audit consists of reviewing the log to examine all accesses and operations
applied to the database during a certain time period.
A database log that is used mainly for security purposes is sometimes called an audit
trail.
Discretionary Access Control Based on Granting and Revoking Privileges
The typical method of enforcing discretionary access control in a database system is
based on the granting and revoking privileges.

6.3.1 Types of Discretionary Privileges

The account level:

• At this level, the DBA specifies the particular privileges that each account holds
independently of the relations in the database.

51 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

The relation level (or table level):

• At this level, the DBA can control the privilege to access each individual relation or view
in the database.

The privileges at the account level apply to the capabilities provided to the account itself
and can include

 The CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base


relation;
 The CREATE VIEW privilege;
 The ALTER privilege, to apply schema changes such adding or removing attributes from
relations;
 The DROP privilege, to delete relations or views;
 The MODIFY privilege, to insert, delete, or update tuples;
 And the SELECT privilege, to retrieve information from the database by using a
SELECT query.

The second level of privileges applies to the relation level

 This includes base relations and virtual (view) relations.


 The granting and revoking of privileges generally follow an authorization model for
discretionary privileges known as the access matrix model where:
 The rows of a matrix M represent subjects (users, accounts, programs)
 The columns represent objects (relations, records, columns, views, operations).

Each position M(i,j) in the matrix represents the types of privileges (read, write, update)
that subject i holds on object j.

To control the granting and revoking of relation privileges, each relation R in a database
is assigned and owner account, which is typically the account that was used when the
relation was created in the first place.
The owner of a relation is given all privileges on that relation.
In SQL2, the DBA can assign and owner to a whole schema by creating the schema and
associating the appropriate authorization identifier with that schema, using the CREATE

52 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

SCHEMA command.
The owner account holder can pass privileges on any of the owned relation to other users
by granting privileges to their accounts.
In SQL the following types of privileges can be granted on each individual relation R:
SELECT (retrieval or read) privilege on R:
Gives the account retrieval privilege.
In SQL this gives the account the privilege to use the SELECT statement to retrieve
tuples from R.
MODIFY privileges on R:
This gives the account the capability to modify tuples of R.
In SQL this privilege is further divided into UPDATE, DELETE, and INSERT
privileges to apply the corresponding SQL command to R.
In addition, both the INSERT and UPDATE privileges can specify that only certain
attributes can be updated by the account.
In SQL the following types of privileges can be granted on each individual relation R:
REFERENCES privilege on R:
This gives the account the capability to reference relation R when specifying integrity
constraints.
The privilege can also be restricted to specific attributes of R.
Notice that to create a view; the account must have SELECT privilege on all relations
involved in the view definition.

6.3.2 Specifying Privileges Using Views


The mechanism of views is an important discretionary authorization mechanism in its
own right. For example,

If the owner A of a relation R wants another account B to be able to retrieve only some
fields of R, then A can create a view V of R that includes only those attributes and then
grant SELECT on V to B.
The same applies to limiting B to retrieving only certain tuples of R; a view V’ can be
created by defining the view by means of a query that selects only those tuples from R
that A wants to allow B to access.

53 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

6.3.3 Revoking Privileges


In some cases it is desirable to grant a privilege to a user temporarily. For example, The
owner of a relation may want to grant the SELECT privilege to a user for a specific
task and then revoke that privilege once the task is completed.
Hence, a mechanism for revoking privileges is needed. In SQL, a REVOKE command
is included for the purpose of canceling privileges.

6.3.4 Propagation of Privileges using the GRANT OPTION


Whenever the owner A of a relation R grants a privilege on R to another account B,
privilege can be given to B with or without the GRANT OPTION.
If the GRANT OPTION is given, this means that B can also grant that privilege on R to
other accounts.
Suppose that B is given the GRANT OPTION by A and that B then grants the privilege
on R to a third account C, also with GRANT OPTION. In this way, privileges on R can
propagate to other accounts without the knowledge of the owner of R.
If the owner account A now revokes the privilege granted to B, all the privileges that B
propagated based on that privilege should automatically be revoked by the system.
An Example
Suppose that the DBA creates four accounts A1, A2, A3, and A4 And wants only A1 to
be able to create base relations. Then the DBA must issue the following GRANT
command in SQL
GRANT CREATETAB TO A1;
In SQL2 the same effect can be accomplished by having the DBA issue a CREATE
SCHEMA command as follows:

CREATE SCHAMA EXAMPLE AUTHORIZATION A1;

User account A1 can create tables under the schema called EXAMPLE.
Suppose that A1 creates the two base relations EMPLOYEE and DEPARTMENT A1
is then owner of these two relations and hence all the relation privileges on each of
them.

54 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Suppose that A1 wants to grant A2 the privilege to insert and delete tuples in both of
these relations, but A1 does not want A2 to be able to propagate these privileges to
additional accounts:
Suppose that A1 wants to allow A3 to retrieve information from either of the two tables
and also to be able to propagate the SELECT privilege to other accounts.
A1 can issue the command:
GRANT SELECT ON EMPLOYEE, DEPARTMENT
TO A3 WITH GRANT OPTION;
A3 can grant the SELECT privilege on the EMPLOYEE relation to A4 by issuing:
GRANT SELECT ON EMPLOYEE TO A4;

◦ Notice that A4 can’t propagate the SELECT privilege because GRANT OPTION was not
given to A4
Suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE relation
from A3; A1 can issue:
REVOKE SELECT ON EMPLOYEE FROM A3;
The DBMS must now automatically revoke the SELECT privilege on EMPLOYEE from
A4, too, because A3 granted that privilege to A4 and A3 does not have the privilege any
more.
Suppose that A1 wants to give back to A3 a limited capability to SELECT from the
EMPLOYEE relation and wants to allow A3 to be able to propagate the privilege.
◦ The limitation is to retrieve only the NAME, BDATE, and ADDRESS attributes and only
for the tuples with DNO=5.
A1 then create the view:
CREATE VIEW A3EMPLOYEE AS
SELECT NAME, BDATE, ADDRESS
FROM EMPLOYEE
WHERE DNO = 5;
After the view is created, A1 can grant SELECT on the view A3EMPLOYEE to A3 as
follows:
GRANT SELECT ON A3EMPLOYEE TO A3 WITH GRANT OPTION;
Finally, suppose that A1 wants to allow A4 to update only the SALARY attribute of

55 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

EMPLOYEE;
A1 can issue:
GRANT UPDATE ON EMPLOYEE (SALARY) TO A4;
◦ The UPDATE or INSERT privilege can specify particular attributes that may be updated
or inserted in a relation.
◦ Other privileges (SELECT, DELETE) are not attribute specific.

6.3.5 Specifying Limits on Propagation of Privileges

Techniques to limit the propagation of privileges have been developed, although they
have not yet been implemented in most DBMSs and are not a part of SQL.

◦ Limiting horizontal propagation to an integer number i mean that an account B given


the GRANT OPTION can grant the privilege to at most i other accounts.
◦ Vertical propagation is more complicated; it limits the depth of the granting of
privileges

6.4 Mandatory Access Control and Role-Based Access Control for Multilevel
Security
The discretionary access control techniques of granting and revoking privileges on
relations have traditionally been the main security mechanism for relational database
systems.
This is an all-or-nothing method:
◦ A user either has or does not have a certain privilege.
In many applications, and additional security policy is needed that classifies data and
users based on security classes.

◦ This approach as mandatory access control, would typically be combined with the
discretionary access control mechanisms.
Typical security classes are top secret (TS), secret (S), confidential (C), and unclassified
(U), where TS is the highest level and U the lowest: TS ≥ S ≥ C ≥ U

56 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

The commonly used model for multilevel security, known as the Bell-LaPadula model,
classifies each subject (user, account, and program) and object (relation, tuple, column,
view, operation) into one of the security classifications, T, S, C, or U:
◦ Clearance (classification) of a subject S as class(S) and to the classification of an object
O as class(O).
Two restrictions are enforced on data access based on the subject/object classifications:
◦ Simple security property: A subject S is not allowed read access to an object O unless
class(S) ≥ class (O).

◦ A subject S is not allowed to write an object O unless class(S) ≤ class(O). This known as
the star property (or * property).
To incorporate multilevel security notions into the relational database model, it is
common to consider attribute values and tuples as data objects.
Hence, each attribute A is associated with a classification attribute C in the schema, and
each attribute value in a tuple is associated with a corresponding security classification.
In addition, in some models, a tuple classification attribute TC is added to the relation
attributes to provide a classification for each tuple as a whole.
Hence, a multilevel relation schema R with n attributes would be represented as

◦ R(A1,C1,A2,C2, …, An,Cn,TC)
Where each Ci represents the classification attribute associated with attribute Ai. The
value of the TC attribute in each tuple t – which is the highest of all attribute
classification values within t – provides a general classification for the tuple itself,
whereas each Ci provides a finer security classification for each attribute value within the
tuple.
◦ The apparent key of a multilevel relation is the set of attributes that would have formed
the primary key in a regular (single-level) relation.
A multilevel relation will appear to contain different data to subjects (users) with
different clearance levels.

◦ In some cases, it is possible to store a single tuple in the relation at a higher classification
level and produce the corresponding tuples at a lower-level classification through a
process known as filtering.

57 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

◦ In other cases, it is necessary to store two or more tuples at different classification levels
with the same value for the apparent key.
This leads to the concept of polyinstantiation where several tuples can have the same
apparent key value but have different attribute values for users at different classification
levels.
In general, the entity integrity rule for multilevel relations states that all attributes that
are members of the apparent key must not be null and must have the same security
classification within each individual tuple.
In addition, all other attribute values in the tuple must have a security classification
greater than or equal to that of the apparent key.
This constraint ensures that a user can see the key if the user is permitted to see any part
of the tuple at all.
Other integrity rules, called null integrity and interinstance integrity, informally ensure
that if a tuple value at some security level can be filtered (derived) from a higherclassified
tuple, then it is sufficient to store the higher-classified tuple in the multilevel relation.

6.4.1 Comparing Discretionary Access Control and Mandatory Access


Control
Discretionary Access Control (DAC) policies are characterized by a high degree of
flexibility, which makes them suitable for a large variety of application domains. The
main drawback of DAC models is their vulnerability to malicious attacks, such as
Trojan horses embedded in application programs.
By contrast, mandatory policies ensure a high degree of protection in a way, they prevent
any illegal flow of information.
Mandatory policies have the drawback of being too rigid and they are only applicable in
limited environments.

In many practical situations, discretionary policies are preferred because they offer a
better trade-off between security and applicability.

58 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

6.4.2 Role-Based Access Control


Role-based access control (RBAC) emerged rapidly in the 1990s as a proven
technology for managing and enforcing security in large-scale enterprise wide systems.
Its basic notion is that permissions are associated with roles, and users are assigned to
appropriate roles.
Roles can be created using the CREATE ROLE and DESTROY ROLE commands.
The GRANT and REVOKE commands discussed under DAC can then be used to assign
and revoke privileges from roles.
RBAC appears to be a viable alternative to traditional discretionary and mandatory
access controls; it ensures that only authorized users are given access to certain data or
resources.
Many DBMSs have allowed the concept of roles, where privileges can be assigned to
roles.
Role hierarchy in RBAC is a natural way of organizing roles to reflect the organization’s
lines of authority and responsibility.
Another important consideration in RBAC systems is the possible temporal constraints
that may exist on roles, such as time and duration of role activations, and timed triggering
of a role by an activation of another role.
Using an RBAC model is highly desirable goal for addressing the key security
requirements of Web-based applications.
In contrast, discretionary access control (DAC) and mandatory access control (MAC)
models lack capabilities needed to support the security requirements emerging
enterprises and Web-based applications.

6.4.3 Access Control Policies for E-Commerce and the Web


E-Commerce environments require elaborate policies that go beyond traditional
DBMSs.

In an e-commerce environment the resources to be protected are not only traditional data
but also knowledge and experience.
The access control mechanism should be flexible enough to support a wide spectrum of
heterogeneous protection objects.

59 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

A related requirement is the support for content-based access-control.


Another requirement is related to the heterogeneity of subjects, which requires access
control policies based on user characteristics and qualifications.
A possible solution, to better take into account user profiles in the formulation of access
control policies, is to support the notion of credentials.
A credential is a set of properties concerning a user that are relevant for security
purposes
For example, age, position within an organization
It is believed that the XML language can play a key role in access control for ecommerce
applications.

6.5 Introductions to Statistical Database Security


Statistical databases are used mainly to produce statistics on various populations. The
database may contain confidential data on individuals, which should be protected
from user access.
Users are permitted to retrieve statistical information on the populations, such as averages,
sums, counts, maximums, minimums, and standard deviations.
A population is a set of tuples of a relation (table) that satisfy some selection condition.
Statistical queries involve applying statistical functions to a population of tuples. For
example, we may want to retrieve the number of individuals in a population or the
average income in the population.
However, statistical users are not allowed to retrieve individual data, such as the income
of a specific person.
Statistical database security techniques must prohibit the retrieval of individual data. This
can be achieved by prohibiting queries that retrieve attribute values and by allowing only
queries that involve statistical aggregate functions such as COUNT, SUM, MIN,
MAX, AVERAGE, and STANDARD DEVIATION.
Such queries are sometimes called statistical queries.
It is DBMS’s responsibility to ensure confidentiality of information about individuals,
while still providing useful statistical summaries of data about those individuals to users.

60 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Provision of privacy protection of users in a statistical database is paramount. In


some cases it is possible to infer the values of individual tuples from sequence
statistical queries.
This is particularly true when the conditions result in a population consisting of a small
number of tuples.

6.6 Introductions to Flow Control


Flow control regulates the distribution or flow of information among accessible objects.
A flow between object X and object Y occurs when a program reads values from X and
writes values into Y.
Flow controls check that information contained in some objects does not flow explicitly
or implicitly into less protected objects.
A flow policy specifies the channels along which information is allowed to move.
The simplest flow policy specifies just two classes of information:
Confidential (C) and nonconfidential (N)
And allows all flows except those from class C to class N.

6.6.1 Covert Channels


A covert channel allows a transfer of information that violates the security or the policy.
A covert channel allows information to pass from a higher classification level to a lower
classification level through improper means.
Covert channels can be classified into two broad categories:
Storage channels do not require any temporal synchronization, in that information is
conveyed by accessing system information or what is otherwise inaccessible to the user.
Timing channel allow the information to be conveyed by the timing of events or processes.
Some security experts believe that one way to avoid covert channels is for programmers to
not actually gain access to sensitive data that a program is supposed to process after the
program has been put into operation.

Chapter Seven

61 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

7. Distributed Database

Systems
7.1 Basic Concepts:
Distributed Database: A single logical database that is spread physically across
computers in multiple locations that are connected by a data communications link.
Distributed database is a collection of connected sites:
 Each s7ite is a DB in its own right
 Has its own DBMS and its own users
 Operations can be performed locally as if the DB was not distributed
 The sites collaborate (transparently from the user’s point of view)
 The union of all DBs = the DB of the whole organisation (institution)
Schema: A structure that contains descriptions of the database objects. (Tables, views,
constraints...)
- Local schema describes database object on it’s own site only.
- Global schema describes database objects on all network nodes.

Distributed database management system (DDBMS)


- A software system that permits the management of the distributed database and makes the
distribution transparent to the users.

Characteristics of distributed databases:


Local Autonomy: local data is locally owned and managed
 local data belongs to the local server even if it is accessible from other servers
 security and integrity are in the responsibility of the local server
Transparency: users should not have to know where data is physically stored
Distributed Processing:

62 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

- Distributed Database:

• Distributed Catalog Management: Must keep track of how data is distributed across
sites.
• Site Catalog: Describes all objects (fragments, replicas) at a site + Keeps track of
replicas of relations created at this site.

7.2. Database Systems: Levels of Data and Process Distribution


Single-Site Data Multiple-Site Data

Single-Site Process Host DBMS Not applicable

Multiple-Site Process File Server Fully Distributed


Client/Server DDBMS
Client/Server DBMS

63 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

7.2.1 Single-Site Processing, Single-Site Data (SPSD):

Centralized database Management System, required that corporate data be stored in a


single central

All processing is done on single CPU or host computer (mainframe, midrange, or PC)
All data are stored on host computer’s local disk
Processing cannot be done on end user’s side of the system
Typical of most mainframe and midrange computer DBMSs
DBMS is located on the host computer, which is accessed by dumb terminals connected
to
it
Also typical of the first generation of single-user microcomputer databases

64 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

7.2.2 Multiple-Site Processing, Single-Site Data (MPSD):

Multiple processes run on different computers sharing a single data repository


MPSD scenario requires a network file server running conventional applications that are
accessed through a LAN
Many multi-user accounting applications, running under a personal computer network, fit
such a description
Also called Distributed Processing Environment

7.2.3 Multiple-Site Processing, Multiple-Site Data (MPMD)

65 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

A Fully Distributed Database Management System

Fully distributed database management system with support for multiple data processors
and transaction processors at multiple sites
Distributed Database Environment

7.3. Distributed Database systems

7.3.1 Types of DDBS:

a) Homogeneous DDBMSs

Integrate only one type of centralized DBMS over a network Data is distributed
across all the nodes.

Same DBMS at each node.

66 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

All data is managed by the distributed DBMS (no exclusively local data.) All
access is through one, global schema.
The global schema is the union of all the local schema.
b) Heterogeneous DDBMSs

Integrate different types of centralized DBMSs over a network Data distributed across
all the nodes.
Local access is done using the local DBMS and schema.
Remote access is done using the global schema.
Fully heterogeneous DDBMS: Support different DBMSs that may even support different
data models (relational, hierarchical, or network) running under different computer
systems, such as mainframes and microcomputers

7.3.2 DDB Components

DDBS must include (at least) the following components:

• Computer workstations
• Network hardware and software
• Communications media
• Transaction processor (or, application processor, or transaction manager): Software component found
in each computer that requests data
• Data processor or data manager: Software component residing on each computer that stores and
retrieves data located at the site, may be a centralized DBMS

67 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

7.4 DDBMS Advantages and Disadvantages


DDBMS Advantages DDBMS Disadvantages

Reflects organizational structure ( Data Complexity of management and control:


are located near “greatest demand” site) extra work must be done to ensure the

Faster data access transparency of the system and to

Improved performance ( data is located maintain multiple disparate systems,

near the site of greatest demand), instead of one big one.

Modularity — systems can be modified, Increased complexity and a more


added and removed from the distributed extensive infrastructure means extra
database without affecting other modules. labour costs.
Security: remote database fragments must
Improved communications
be secured, the infrastructure must also be
Economics - it costs less to create a secured (eg: by encrypting the network
network of smaller computers with the links between remote sites).
power of a single large computer. Lack of standards
Improved availability — a fault in one
database system will only affect one
Increased storage requirements
fragment, instead of the entire database
Processor independence Difficult to maintain integrity: enforcing
integrity over a network may require too
much networking resources to be feasible.

7.5 DDBMS Functions:


DDBMS must perform all the functions of a centralized DBMS, and must handle all
necessary functions imposed by the distribution of data and processing

• Must perform these additional functions transparently to the end user

68 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

• Provide the user interface needed for location transparency


• Locate the data - directs queries to proper site(s)
• Process queries - local, remote, compound (global)
• Provide network - wide concurrency control and recovery procedures Provide data
translation in heterogeneous systems

7.6. Distributed Database Transparency Features


Allow end user to see the distributed database’s as though it were a centralized one.
Transparency features include:

• Distribution transparency
• Heterogeneity transparency
• Transaction transparency
• Failure transparency
• Performance transparency
7.6.1 Distribution Transparency

Allows management of a physically dispersed database as though it were a centralized


database

Three levels of distribution transparency are recognized:

• Fragmentation transparency
• Location transparency
• Local mapping transparency
• A Summary of Transparency Features:
IF the SQL statement A Summary of Transparency Features:
requires:

Fragment Location Then the DBMS support Level of Distribution


Name Name transparency

69 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Yes Yes Local Mapping Low

Yes No Location Transparency Medium

No No Fragmentation Transparency High

• Replica transparency: DDBMS’s ability to hide the existence of multiple copies


of data from the user

7.6.2 Transaction Transparency

Ensures database transactions will maintain distributed database’s integrity and


consistency. We should distinguish between Distributed Requests and Distributed
Transactions:

Remote request: Lets a single SQL statement access data to be processed by a single
remote database processor

Remote transaction: Accesses data at a single remote site

70 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Distributed transaction:

Can update or request data from several different remote sites on a network.
Distributed transaction Allows a transaction to reference several different (local or
remote) DP sites

71 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Distributed request: Lets a single SQL statement reference data located at several
different local or remote DP sites

Another Distributed Request

7.6.3 Performance Transparency and Query Optimization

Objective of query optimization routine is to minimize total cost associated with the
execution of a request.

Example of Query optimization transparency:

72 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Site A: Suppliers ( S_id, City ) 10,000 tuples

Contracts ( S_id, P_id ) 1,000,000 tuples

Site B: Parts (P_id, Colour ) 100,000 tuples

SELECT S.S_id

FROM Suppliers S, Contracts C, Parts P

WHERE S.S_id = C.S_id AND P.P_id = C.P_id AND

City = ‘London’ AND

Colour = ‘red’ ;

• Possible evaluation procedures:


(1) move relation Parts to site A and evaluate the query at A

(2) move relations Suppliers and Contracts to B and evaluate at B

(3) join Suppliers with Contracts at A, restrict the tuples for suppliers from London, and
for each of these tuples check at site B to see whether the corresponding part is red

(4) join Suppliers with Contracts at A, restrict the tuples for suppliers from London,
transfer them B and terminate the processing there

(5) restrict Parts to tuples containing red parts, move the result to A and process there

(6) think of other possibilities …

• there is an extra dimension added by the site where the query was issued Costs
associated with a request are a function of the:
Access time (I/O) cost
Communication cost

73 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

CPU time cost


Must provide distribution transparency as well as replica transparency Query optimization
techniques: o Manual(end user,programmer) or automatic(DDMS) o Static(takes place at
compilation time) or dynamic(takes place at run time) o Statistically based(information about
the DB as size ,number of records, average access time,DDMS) or rule-based algorithms (is
based on a set of user-difined rules to determine the best query access strategy, end user)

7.7. Distributed Concurrency Control


Multisite, multiple-process operations are much more likely to create data
inconsistencies and deadlocked transactions than are single-site systems

7.7.1 The Effect of a Premature COMMIT

7.7.2 Two-Phase Commit Protocol

• Distributed databases make it possible for a transaction to access data at several sites
• The objective of the 2PC protocol is to ensure that all nodes commit their part of the
transaction.

• Final COMMIT must not be issued until all sites have committed their parts of the
transaction
• Two-phase commit protocol requires:

74 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

• Two-Phase Commit Protocol requires:


DO-UNDO-REDO protocol: used by DP to roll back and/or roll forward transactions
with the help of transaction log entries.
 DO perform the operation and writes the before and after values in the transaction log.
 UNDO reverses the operation, using the Transaction log entry written by DO operation
 REDO redoes the operation, using log entries
Write-ahead-Protocol: forces the log entry to be written to permanent storage before the
actual operation takes place. each individual DP’s transaction log entry must be written
before the database fragment is actually updated
The 2PC protocol defines operations between two types of nodes:
Coordinator TP in the site where transaction is executed
Subordinates or cohorts: DPs in sites where data affected by the transaction is located.
Phases of Two-Phase Commit Protocol

Phase 1: Preparation

The coordinator sends a PREPARE TO COMMIT message to all subordinates.


The subordinates receive the message, write transaction log entry, and send replay
(prepared / not prepared) to the coordinator.
The coordinator makes sure that all nodes are ready, or it aborts the action.
Phase 2: the Final COMMIT

The coordinator broadcast a COMITT message to all subordinates and waits for replies.
Each subordinate receive the message, then updates the database using the DO protocol.
The subordinates reply (COMMITTED or NOT COMMITTED) message to coordinator.
If one or more subordinates did not commit, the coordinator send ABORT message,

7.8. Distributed Database Design


DDBMS Design Strategies:

• Data fragmentation: How to partition the database into fragments

75 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

• Data replication: Which fragments to replicate


• Data allocation: Where to locate those fragments and replicas
7.8.1 Data Fragmentation

• Breaks single object into two or more segments or fragments


• Each fragment can be stored at any site over a computer network
• Information about data fragmentation is stored in the distributed data catalog (DDC),
from which it is accessed by the TP to process user requests

7.8.1.1 Data Fragmentation Strategies

• Horizontal fragmentation: Division of a relation into subsets (fragments) of tuples


(rows):

FRAGMENT Emp INTO

Lo_Emp AT SITE ‘London’

WHERE Dept_id = ‘Sales’

Le_Emp AT SITE ‘Leeds’

WHERE Dept_id = ‘Dev’ ;

• Vertical fragmentation: Division of a relation into attributes (column) subsets


Mixed fragmentation: Combination of horizontal and vertical strategies

76 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

Vertically Fragmented Table Contents


Site 1 Site 2

7.8.2 Data Replication

• Storage of data copies at multiple sites served by a computer network


• Fragment copies can be stored at several sites to serve specific information requirements
• Can enhance data availability and response time Can help to reduce communication
and total query costs Updating distributed copies:

Primary copy scheme:


 one copy is designated primary copy (unique), or
 primary copies can exist at different sites (distributed)
 An update is logically complete if the primary copy has been updated
 the site holding the primary copy would have to propagate the updates to other sites, this
has to be done before COMMIT (preserve - ACID)

77 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

 in some DDBMS: update propagation is guaranteed for some future time

Synchronous Replication:
 All copies of a modified relation (fragment) must be updated before the modifying
transaction commits.
 Data distribution is made transparent to users.
Asynchronous Replication:
 Copies of a modified relation are only periodically updated; different copies may get out
of synch in the meantime.

Replication Scenarios:

Fully replicated database:

• Stores multiple copies of each database fragment at multiple sites


• Can be impractical due to amount of overhead Partially replicated database:
• Stores multiple copies of some database fragments at multiple sites Most DDBMSs
are able to handle the partially replicated database well Unreplicated database:
• Stores each database fragment at a single site
• No duplicate database fragments

7.8.3 Data Allocation

Deciding where to locate data. Allocation strategies:

• Centralized data allocation - Entire database is stored at one site

78 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

• Partitioned data allocation - Database is divided into several disjointed parts


(fragments) and stored at several sites
• Replicated data allocation - Copies of one or more database fragments are stored at
several sites
• Data distribution over a computer network is achieved through data partition, data
replication, or a combination of both

Problems

1. Figure P10.1 The DDBMS Scenario for Problem 1


TABLES FRAGMENTS LOCATION

CUSTOMER N/A A
PRODUCT PROD_A A
PROD_B B

INVOICE N/A B
INV_LINE N/A B

Site A Site B

Site C

Specify the minimum type(s) of operation(s) the database must support (remote
request, remote transaction, distributed transaction, or distributed request) in order
to perform the following operations:

Use the following summary: a. SELECT *

FROM CUSTOMER;

At Site C: b. SELECT *

79 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems

FROM INVOICE WHERE

80 | P a g e ITdep’t Meskele Y .
Advanced Database Systems

INV_TOTAL > 1000; INSERT CUSTOMER(CUS_NUM,


CUS_NAME, CUS_ADDRESS
c. SELECT *
CUS_BALANCE)
FROM PRODUCT
VALUES ('34210','Victor Ephanor',
WHERE PROD_QOH < 10; '123 Main St', 0.00);

d. BEGIN WORK; INSERT INTO


INVOICE(INV_NUM,
UPDATE CUSTOMER
CUS_NUM, INV_DATE,
SET CUS_BALANCE = INV_TOTAL)
CUS_BALANCE + 100
VALUES ('986434', '34210', '10-AUG-
WHERE CUS_NUM='10936'; 1999', 2.00);

INSERT INTO INVOICE(INV_NUM, COMMIT WORK;


CUS_NUM, INV_DATE,
INV_TOTAL)
At Site A:
VALUES ('986391', '10936', '15-
FEB2002', 100); f. SELECT CUS_NUM,
CUS_NAME,
INSERT INTO INVLINE(INV_NUM,
INV_TOTAL
PROD_CODE, LINE_PRICE)
FROM CUSTOMER, INVOICE
VALUES ('986391', '1023', 100);
WHERE CUSTOMER.CUS_NUM =
UPDATE PRODUCT
INVOICE.CUS_NUM;
SET PROD_QOH =
g. SELECT *
PROD_QOH - 1
FROM INVOICE WHERE
WHERE PROD_CODE = '1023';
INV_TOTAL > 1000;
COMMIT WORK;
e. BEGIN WORK; h. SELECT *

81 | P a g e ITdep’t Meskele Y .
Advanced Database Systems

FROM PRODUCT
WHERE PROD_QOH < 10;

At Site B:

i. SELECT *

FROM CUSTOMER;

j. SELECT CUS_NAME,

INV_TOTAL FROM

CUSTOMER,

INVOICE WHERE

INV_TOTAL > 1000;

k. SELECT *

FROM PRODUCT

WHERE PROD_QOH < 10;

82 | P a g e ITdep’t Meskele Y .
2. The following data structure and constraints exist for a magazine publishing
company.

a. The company publishes one regional magazine each in Florida (FL), South Carolina (SC),
Georgia (GA), and Tennessee (TN).

b. The company has 300,000 customers (subscribers) distributed throughout the four states listed in
Part a.
c. On the first of each month an annual subscription INVOICE is printed and sent to all customers
whose subscription is due (CUS_SUB_DATE) for renewal. The INVOICE entity contains a
REGION attribute to indicate the state (FL, SC, GA, TN) in which the customer resides.
CUSTOMER (CUS_NUM, CUS_NAME, CUS_ADDRESS, CUS_CITY, CUS_STATE,
CUS_SUB_DATE)

INVOICE (INV_NUM, REG_CODE, CUS_NUM, INV_DATE, INV_TOTAL)

The company's management is aware of the problems associated with centralized management and
has decided that it is time to decentralize the management of the subscriptions in its four regional
subsidiaries. Each subscription site will handle its own customer and invoice data. The company's
management, however, wants to have access to customer and invoice data to generate annual
reports and to issue ad hoc queries, such as:

List all current customers by region.


List all new customers by region.
Report all invoices by customer and by region.
Given these requirements, how must you partition the database?

The CUSTOMER table must be partitioned horizontally by state. (We show the partitions in the
answer to 3c.)

3. Given the scenario and the requirements in Problem 2, answer the following questions:

83 | P a g e ITdep’t Meskele Y .
a. What recommendations will you make regarding the type and characteristics of the
required database system?

The DDBMS must be able to support distributed transparency features, such as fragmentation
transparency, replica transparency, transaction transparency, and performance transparency.
Heterogeneous capability is not a mandatory feature since we assume there is no existing DBMS
in place and that the company wants to standardize on a single DBMS.

b. What type of data fragmentation is needed for each table?

The database must be horizontally partitioned, using the STATE attribute for the CUSTOMER
table and the REGION attribute for the INVOICE table.

c. What must be the criteria used to partition each database?

The following fragmentation segments reflect the criteria used to partition each database:

Horizontal Fragmentation of the CUSTOMER Table By State

Fragment Name Location Condition Node name

C1 Tennessee CUS_STATE = 'TN' NAS

C2 Georgia CUS_STATE = 'GA' ATL

C3 Florida CUS_STATE = 'FL' TAM

84 | P a g e ITdep’t Meskele Y .
C4 South Carolina CUS_STATE = 'SC' CHA

Horizontal Fragmentation Of the INVOICE Table By Region is the same


e. What type of distributed database operations must be supported at each remote site?

Node

Fragment NAS ATL TAM CHA Headquarters

CUSTOMER C1 C2 C3 C4

INVOICE I1 I2 I3 I4

Distributed Operations Required none none none none distributed request

f. What type of distributed database operations must be supported at the headquarters site?

85 | P a g e ITdep’t Meskele Y .

You might also like