Advanced Database Module
Advanced Database Module
DILLA CAMPUS
Meskele Yirgalem
Dilla, Ethiopia
Nov, 2022
Advanced Database Systems
Contents
1. Concepts for object-oriented Databases.......................................................................... 4
1.1. Why object-orientation in databases? ...................................................................... 4
1.2. Object Oriented Databases (ODBMS) ...................................................................... 4
1.3. Object Identity ........................................................................................................... 6
1.4. Object Structure......................................................................................................... 6
1.5. Type Constructors: .................................................................................................... 7
1.6. Type (class) Hierarchy .............................................................................................. 9
2. Algorithms for query processing and optimization ...................................................... 10
2.1. Selection (Or Restriction)........................................................................................ 10
Selection Examples.................................................................................................... 10
2.2. Projection ................................................................................................................ 11
2.3. Combining Selection and Projection ....................................................................... 12
2.4. Union ....................................................................................................................... 13
2.5. Set Difference ..........................................................................................................
14
2.6. Intersection .............................................................................................................. 14
2.7. Cartesian product..................................................................................................... 14
2.8. Join Operations ........................................................................................................
15
2.9. OVERVIEW OF QUERY PROCESSING ............................................................. 17
3. Transaction processing.................................................................................................. 24
3.1. Introduction to Transaction processing .................................................................. 24
3.2. Transaction processing concepts and theory ........................................................... 25
3.3. DB architectures ...................................................................................................... 26
3.4. Concurrency control ................................................................................................ 27
4. Concurrency Control Techniques ................................................................................. 32
4.1. Locking and time stamping ..................................................................................... 32
5. Database Recovery Techniques .................................................................................... 38
5.1. Introduction of Database Recovery ........................................................................ 38
5.2. Explain Log Based Recovery Method .................................................................... 39
Chapter One
Terminologies
Data is formally represented as instances of one or more relations.
Attribute is a named column of a relation table, representing a “Property.”
Relation is a table:
Tuple is a row of the table, representing a “record.”
Informal terms Formal terms
Table Relation
Column Attribute/Domain
Row Tuple
e.g., the attribute address is represented by a variable address and two messages get-
address and set-address.
For convenience, many object-oriented data models permit direct access to variables of
other objects.
Inheritance
Data and functions are organized in a hierarchy
Objects inherit characteristics and functions of their ancestor objects
Sharing of data within hierarchy scope supports code reusability
A mechanism of reusability, the most powerful concept of OO programming
Ex1.
Eg2.
Animals:A head and a body, feed
Mammals:A head and a body, feed + Four legs, sit
Fish:A head and a body, feed + swim
Inheritance allows one class to define as a special case of more general class. These
special classes are known as subclass, and the more general classes are known as super
classes. By default, a subclass inherits all the properties (both state and behavior) of its
super class, and it also defines its own unique properties. In addition, a subclass can
redefine inherited methods. If class B is a subclass of class A, then if class B inherits the
characteristics of class A, every instance of B is automatically an instance of A. however
the reverse is not true.
Chapter 2
The selection operation acts like a filter on a relation by returning only a certain number
of tuples.
Selection Examples
10 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Result:
Name Office Dept Rank
Select only those Employees with last name Smith who are assistant professors:
Name = 'Smith' Rank = 'Assistant' (EMP) Result:
2.2 Projection
• Projection is also a Unary operator.
• The Projection operator is pi:
• Projection limits the attributes that will be returned from the original relation.
11 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Projection Examples
Assume the same EMP relation above is used.
Project only the names and departments of the employees:
name, dept (EMP) Results:
Name Dept
Smith CS
Jones Econ
Green Econ
Brown CS
Smith Fin
2.3 Combining Selection and Projection
• The selection and projection operators can be combined to perform both operations.
• Show the names of all employees working in the CS department:
Name
Smith
Brown
12 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Show the name and rank of those Employees who are not in the CS department or
Adjuncts:
Green Assistant
Smith Associate
2.4 Union
R U S: The union of two relations R and S defines a relation that contains all the tuples of
R or S or both R and S, duplicate tuples being eliminated. R and S must be unioncompatible
Bill Smith 22
Sally Green 28
Mary Keen 23
Tony Jones 32
S
First Last Age
Forrest Gump 36
Sally Green 28
DonJuan DeMarco 27
R S
Bill Smith 22
13 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Sally Green 28
Mary Keen 23
Tony Jones 32
Forrest Gump 36
DonJuan DeMarco 27
R-S
First Last Age
Bill Smith 22
Mary Keen 23
Tony Jones 32
2.6 Intersection
R n S: The intersection operation defines a relation consisting of the set of all tuples that
are in both R and S.R and S must be union compatible
R S
First Last Age
Sally Green 28
14 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Tony Jones 32
RXS
First Last Age Dinner Dessert
There are various forms of Join operations, each with subtle differences
15 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Natural join
Outer join
Semijoin
R F S: The theta join operation defines a realtion that contains tuples satsfying the
predicate F from the Cartesian product of R and S
The predicate F is of the form R.ai Ө S.bi where Ө may be one of the comparison
operators (<, <=,>,>=, ≠)
In the case where the predicate F contains only equality (=), the turm Equijoin is used
instead.
Natural Join
R S: The natural join is an Equijoin of the two relations R and S over all common
attributes x. One occurrence of each common attribute is eliminated from the result.
Ex. List the names and comments of all clients who have viewed a property for rent
(∏clientNo,fName,lName(client) ) (∏clientNo,propertyNo,comment(Viewing))
Outer Join
R S: The (left) outer join is a join in which tuples from R that do not have matching
values in the common attributes of S are also included in the result relation. Missing
values in the second relation are set to null.
Ex. Produce a status report on property viewings.@ properties that have been viewed
with comments and those that have not been viewed.
(∏propertyNo,street,city(PropertyForRent) Viewing
Semi Join
R F S: The semi join operation defines a relation that contains the tuples of R that
participate in the join of R with S
16 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
It decreases the number of tuples that need to be handled to form the join.
Division Operation
R ÷ S: Assume relation R is defined over the attributes set A and relation S is defined
over the attribute set B such that B C A.Let C= A-B,that is,C is the set of attributes of R
that are not attributes of S.We have the following definition of the Division operation
The Division operation defines a relation over the attributes C that consists of the set of
tuples from R that match the combination of every tuple in S.Where c is the set of
attributes that are in R but not in S
Ex. Identify all clients who have viewed all properties with three rooms
(∏clientNo,propertyNo(viewing))÷ (∏propertyNo(σrooms=3(PropertyForRent)))
In declarative languages such as SQL, the user specifies what data is required rather than
how it is retrieved
Giving the DBMS the responsibility for selecting the best strategy prevents users from
choosing strategies that are known to be inefficient and gives the DBMS more control
system performance
The aim of query processing are to transform a query written in a high level language,
typically SQL, into a correct and efficient execution strategy expressed in low level
language (implementing the relational algebra),and to execute the strategy to retrieve the
required data.
17 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
2. Optimization
3. Code generation and
4. Execution
Cost generation
Generated code
Query output
Query Decomposition
Analysis
Normalization
Semantic analysis Simplification and Query restructuring.
1. Analysis
In this stage, the query is lexically and syntactically analyzed using the techniques of
programming language compilers
In addition, this stage verifies that the relations and attributes specified in the query are
defined in the system catalog.
It also verifies that any operations applied to database objects are appropriate for the
object type.
e.g.
18 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
SELECT StaffNumber
FROM staff
On compilation of this stage, the high-level query has been transformed into some
internal representation that is more suitable for processing.
The internal form that is typically chosen is some kind of tree, which is constructed as
follows:
A non-leaf node is created for each intermediate relation produced by a relational algebra
operation.
∞s.branchNo=b.branchNo
root
σs.position=’manager’ σb.city=’Wolkite’
Intermediate op
Staff Branch
leaves
2. Normalization
Converts the query into a normalized form that can be more easily manipulated
19 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Conjunction normal form- a sequence of conjunction that are connected with the ᴧ (And)
operator
@ A conjunction selection contains only those tuples that satisfy all conjuncts
Example
Disjunctive normal form-A sequence of disjuncts that are connected with the ᴠ(OR)
operator.
@ A disjunctive selection contains those tuples formed by the union of all tuples that
satisfy the disjuncts.
Example
3. Semantic Analyses
If components do not contribute to the generation of the results happen if some join
specifications are missing.
@ F v salary >20000
4. Simplifications
To transform the query to a semantically equivalent but more easily and efficiently
computed form.
Typically:
20 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
• Access restrictions
• View definitions and
• Integrity constraints are considered at this stage ,some of which may also
introduce redundancy
* If the user the does not have the appropriate access to all the components of the query
the query must be rejected.
View definition
CREATE VIEW Staff3 AS
SELECT staffNo,fName,lName,salary,branchNo
FROM Staff
WHERE branchNo=‘B003’;
SELECT *
FROM Staff3
SELECT staffNo,fName,lName,salary
Assuming that the user has the appropriate access privileges an initial optimization is to
apply the well-known idempotent rules of Boolean algebra, such as:
• p ^ p =p
• P ^ False= False
• P ^ true = P
• P ^ -P=False
• p^ (p V q)=p
• p v p=p
• p v False =p
21 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
• p v true =true
• p v –p = true
• p v (p^q)=p
Integrity constraints
5. Query restructuring
The query is restructured to provide a more efficient implementation.
Ex
QUERY OPTIMIZATION
The activity of choosing an efficient execution strategy for processing a query. The aim of
query optimization is to choose the one that minimizes resource usage. Generally, we try
to reduce the total execution time of the query which is the sum of execution time of all
individual operations that make up the query
However, resource usage may also be viewed as the response time of the query in which
case we concentrate on maximizing the number of parallel operations. (Pipelining)
Both methods of query optimization depend on database statistics to evaluate properly the
difference options that are available.
The accuracy and currency and of these statistics have a significant bearing on the
efficiency of the execution strategy chosen.
22 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
23 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Chapter three
3. Transaction processing
3.1. Introduction to Transaction processing
A transaction is an executing program that forms a logical unit of database processing.
Transaction includes one or more database access operations—these can include insertion,
deletion, modification, or retrieval operations.
The database operations that form a transaction can either be embedded within an
application program or they can be specified interactively via a high-level query language
such as SQL.
One way of specifying the transaction boundaries is by specifying explicit begin
transaction and end transaction statements in an application program; in this case, all
database access operations between the two are considered as forming one transaction. If
the database operations in a transaction do not update the database but only retrieve data,
the transaction is called a read-only transaction; otherwise it is known as a read-write
transaction.
A transaction is an atomic unit of work that should either be completed in its entirety or not
done at all. For recovery purposes, the system needs to keep track of when each transaction
starts, terminates, and commits or aborts. Therefore, the recovery manager of the DBMS
needs to keep track of the following operations:
BEGIN_TRANSACTION: This marks the beginning of transaction execution.
READ or WRITE: These specify read or write operations on the database items that are
executed as part of a transaction.
END_TRANSACTION: This specifies that READ and WRITE transaction operations
have ended and marks the end of transaction execution. However, at this point it may be
necessary to check whether the changes introduced by the transaction can be permanently
applied to the database (committed) or whether the transaction has to be aborted because
it violates serializability
24 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Transaction support
Concurrency control service
Recovery service
Both concurrency control and recovery are required to protect the DB from data
inconsistencies and data lose.
If the operations are not controlled, the access may interfere with another and the DB can
become inconsistent.
To overcome this, the DBMS implements a concurrency control protocol (set of standard
rules) that prevents DB access from interfering with one another.
ex. Read(staffNo=x,salary)
Salary=salary*1.1
Write(staffNo=x, newsalary )
If all these updates are not made, referential integrity will be lost and the DB will be in an
inconsistent state.
25 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
A transaction should always transform the DB from one consistent state to another,
although we accept that consistency may be violated while the transaction is in the
progress (tuples with new staffNo and old one).
At the end of the transaction all necessary tuples should have the newstaffNo value.
If it completes successfully, the transaction is said to have commit and the DB reaches a
new consistent state.
On the other hand, if the transaction does not execute successfully, the transaction is
aborted.
If a transaction is aborted, the DB must be restored to the consistent state it was in before
the transaction state. Such transaction is rollback or undone
The key words BEGIN TRANSACTION, ROLLBACK and COMMIT are available in
many DML to delimit transactions.
PARTIALLY COMMITED, which occur after the final statement has been executed.
Properties of transaction
3.3 DB architectures
Four high level DB modules that handle transaction concurrency control and recovery:
26 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
In reading data, there is no way users interfere with one another. In writing (updating)
there may be interference that can result in inconsistencies.
T6 commit
The lost of T2 update is avoided by preventing T1 from reading the value of balx, until
T2’s update has been completed.
27 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
The uncommitted dependency problem (or dirty read) problem occurs when one
transaction is allowed to see the intermediate results of another transaction before it has
committed.
Time T3 T4 Balx
T2 read(balx) 100
T3 balx=balx+100 100
T5 read(balx) . 200
T7 write(balx) 190
T8 commit 190
The problem occurs when a transaction reads several values from the DB, but a 2nd transaction
update some of them during the execution.
T7 balz=balz+10 90 50 25 150
T8 write(balz) 90 50 35 150
T9 commit read(balz) 90 50 35 185
28 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
The problem is avoided by preventing transaction T6 from reading balx and balz until after T5 has
committed its updates.
One solution is to allow only one transaction to execute at a time. One transaction is
committed before the next transaction is allowed to begin.
Serial schedule: A schedule where the operations of each transaction are executed
consecutively without any interleaved operations from the other transactions.
The result may not be identical
T1T2 is different from T2T1
Non serial: A schedule where the operations from a set of concurrent transactions are
interleaved (i.e. parallel). But the result is the same as serial schedule and the schedule is
said to be correct.
The objective of serializability is to find non serial schedule that allow transactions to
execute concurrently without interfering with one other, and there by produce a DB state
that could be produced by a serial execution.
We say that the non serial schedule is correct if it produces the same result as serial
execution. Such schedule is called serializable.
29 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
If two operations either read or write completely separate data items, they do not conflict
and order is not important.
If one transaction writes a data item and another either reads or writes the same data item,
the order of execution is important.
Time T7 T8
T1 Begin T
T2 read(balx)
T3 write(balx)
T4 Begin T
T5 read(balx)
T6 write(balx)
T7 read(baly)
T8 write(baly)
T9 commit
T10 read(baly)
T11 write(baly)
T12 commit
A conflict serializability schedule orders any conflicting operations in the same as some
serial execution.
A precedence (or serialization) graph can be produced to test for conflict serializabilty.
For a schedule S, a precedence graph is directed graph G(N,E) that consists of a set of
nodes N and a set of directed edges E which is constructed as follows:
30 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Create a directed edge TiTj, if Tj writes a value into an item after it has been read by Ti
Create a directed edge TiTj, if Tj writes a value into an item after it has been written by
Ti
If an edge TiTj exists in the precedence graph for S, then in any serial schedule S’
equivalent to S, Ti must appear before Tj.
If the precedence graph contains a cycle, the schedule is not conflict serializable.
Ex. Constructing the precedence graphs for schedules to test for conflict serializability
Chapter four
31 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
A transaction must claim a shared (read) or exclusive (write) lock on a data item before
the corresponding DB read or write operation.
Data items of various sizes, ranging from the entire DB down to a file, may be locked.
The size of the data item determines the fineness or granularity of the lock. Locks
Any transaction that needs to access a data item must first lock the item.
If the item is not locked by another transaction, the lock will be granted. If the item
is currently locked, the DBMS determines whether the request is compatible with the
existing lock. If a shared lock is requested on an item that already has a shared lock on
it, the request will be granted; otherwise, the transaction must wait until the existing
lock is released.
A transaction continues to hold a lock until it explicitly releases it either during
execution or when it terminates (aborts or commits).
2PL: A transaction follows the 2PL protocol if all locking operations precede the first
unlock operations in the transaction.
32 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
2. Shirking phase: in which it releases its locks but cannot acquire any new locks There
Problem of locking
1. Cascading rollback:
Time T1 T2
T1 begin T
T2 write_lock(balx)
T3 read(balx)
T4 read_lock(baly)
T5 read(baly)
T6 balx=balx+baly
T7 write(balx)
T8 unlock(balx) begin T
T9 . write_lock(balx)
T10 . read(balx)
T11 . balx=balx+100
T12 rollback write(balx)
T13 unlock(balx)
T14
Cascading rollback is undesirable.
Solution is rigorous 2PL: Leaving release of all locks until the end of the transaction
(committed or aborted)
2. Deadlock: since transactions can wait for locks on data items. If two transactions wait
for locks on item held by the other, deadlock will occurs.
33 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Time T1 T2
T1 begin T begin T
T2 write_lock(balx) write_lock(baly)
T3 read(balx) read(baly)
T4 write_lock(baly) write_lock(balx)
T5 wait ……. Wait ……
T6 wait ……. Wait ……
Once the deadlock occurs, the application involved cannot resolve the problem. Instead
the DBMS has to recognize that deadlock exists and break the deadlock in some way.
Unfortunately, there is only one way to break deadlock: aborting one or more of the
transactions.
Deadlock should be transparent to the user.
There are three general techniques for handling deadlock:
• Timeouts
• Deadlock prevention
• Deadlock detection and recovery Timeouts
The transaction that has requested a lock waits for at most a specified period of time.
Deadlock prevention
The DBMS looks ahead to determine if a transaction would cause deadlock and never
allows deadlock to occur. Deadlock detection and recovery
The DBMS allows deadlock to occur but recognizes occurrence of deadlock and break
them.
Frequency of deadlock detection (short time, long time, dynamic) Recovery
technique is only abortion.
1. Choice of deadlock victim
Abort the transaction that incur the minimum cost
A. How long the transaction has been running
B. How many data items the transaction have been updated by the transaction
C. How many data items the transaction is still to update
D. How far to roll a transaction to back (all or part)
E. Avoiding starvation
34 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Time stamping methods for concurrency control are quite different from locking method. No
locks are involved, and therefore there can be no deadlock. With time stamps method, there is
no waiting: transactions involved in conflict are simply rolled back and restored.
Time stamp: a unique identifier created by the DBMS that indicates the relative starting time
of a transaction
Time stamping: a concurrency control protocol that orders transactions in such a way that
older transactions, transaction with smaller time stamps, get priority in the event of conflict.
With time stamping, if a transaction attempts to read o r write a data item, then the read or
write is only allowed to proceed if the last update on data item was carried out by an older
transaction. Otherwise, the transaction requesting the read or write is restarted and given a
new time stamp.
Optimistic technique
Are based on the assumption that conflict is rare, and that is more efficient to allow
transactions to proceed without imposing delays to ensure serializability.
When the transaction wishes to commit a check is performed to determine
Granularity: the size of data items chosen as the unit of protection by concurrency control
protocol.
It can be:
The entire DB or
A file or
A page or
A record or
A field value of a records ex. Updating 1
record1 record is locked
Updating 95% of the recordentire file/relation is locked
Updating 90% of the DBentire DB is locked
35 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Find the address of the disk block that contains the record with primary key value x.
Transfer the disk block in to a database buffer in main memory
Copy the salary data from the DB buffer into the variable salary.
For write operation, the DBMS carries out the following steps:
Find the address of the disk block that contains the record with primary key value x.
Transfer the disk block in to a database buffer in main memory
Copy the salary data from the variable salary into the DB buffer.
Write the DB buffer back to disk
It is only once the buffers have been flushed to secondary storage that any update
operations can be regarded as permanent.
The explicit writing of the buffers to secondary storage is known as force-writing
(commit). If a failure occurs between writing to the buffer and flushing the buffer to
secondary storage, the recovery manager must determine the status of the transaction that
performed the write at the time of failure.
If the transaction had issued its commit, then to ensure durability the recovery manager
would have to redo that transaction’s update to the DB (roll forward).
36 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
On the other hand, if the transaction had not committed at the time of failure, then the
recovery manager would have undo (rollback) any effects of that transaction on the DB to
guarantee transaction atomicity.
T1
T2 :
T3 :
T4 :
T6
t0
T5 tcheck tfailure
:
Recovery failure
T1 and T6 aborted (undo)
T2 and T3 omit redo
T4 and T5 redo
Check point: the point of synchronization between the DB and the transaction log file.
All buffers are force-written to secondary storage. Check points are scheduled
predetermined interval.
Recovery facilities
1. Backup mechanism
2. Logging facilities, which keep track of the current state of transactions and DB change
3. A check point, which enables update to the DB that are in progress to be made
permanent
37 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
It the process of restoring the database to the most recent consistence state that exist just
before failure. This occur when the database is in the inconsistent state and violate the
atomic properties. The three states of database recovery are
Precondition
Condition
Post condition
• Database recovery is the process of restoring a database to the correct state in the
event of a failure
• Database recovery is a service that is provided by the DBMS to ensure that the
database is reliable and remain in consistent state in case of a failure.
• Restoring a physical backup means reconstructing it and making it available to the
database server.
• To recover a restored backup, data is updated using redo command after the
backup was taken.
• Database server such as SQL server or ORACLE server performs cash recovery
and instance recovery automatically after an instance failure.
• In case of media failure, a database administrator (DBA) must initiate a recovery
operation.
Recovering a backup involves two distinct operations: rolling the backup forward to a
more recent time by applying redo data and rolling back all changes made in
uncommitted transactions to their original state.
38 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
39 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
F. Xj had value V1 before the write, and will have value V2 after the write
G. When Ti finishes it last statement, the log record < Ti commit> is written. H.
Two approaches are used in log based recovery
40 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
The following figure shows the transaction log for above two transactions at three
different instances of time.
41 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
It is possible here that updates of the uncommitted transaction are also written to the
database. And, other transactions can access these updated values. When failure
occurs
If transaction has not committed, then it may have modified the database. And so,
undo the updates of the transaction.
If transaction has committed, then still it may not have modified the database.
And so, redo the updates of the transaction.
Transaction Log
In this technique, transaction log is used in following ways :
Transaction T starts by writing to the log.
Any update is recorded as where Vold indicates the original value of data item X
and Vnew indicates new value for X. Here, as undo operation is required, it
requires preserving old value of the changed data item.
Transaction T commits by writing to the log.
If a transaction T aborts, the transaction log record is consulted, and required
undo operations are performed.
Example
Again, consider the two transactions, T 0 and T1, given in figure, where T0 executes
before T1.
Also consider that initial values for A, B and C are 500, 600 and 700 respectively.
The following figure shows the transaction log for above two transactions at three
different instances of time. Note that, here, transaction log contains original values
also along with new updated values for data items.
If failure occurs in case of –
Undo the transaction T0 as it has not committed, and restore A and B to 500 and 600
respectively.
Undo the transaction T1, restore C to 700; and, Redo the Transaction T0 set A and B
to 400 and 700 respectively.
Redo the Transaction T0 and Transaction T0; and, set A and B to 400 and 700
respectively, while set C to 500.
42 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
43 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Example
Consider the transactions given in following figure. Here, Tc indicates checkpoint,
while Tf indicates failure time.
Here, at failure time –
4. Ignore the transaction T1 as it has already been committed before checkpoint.
5. Redo transaction T2 and T3 as they are active at/after checkpoint, but have
committed before failure.
6. Undo transaction T4 as it is active after checkpoint and has not committee\d.
44 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
45 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
46 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
B. During this time when locking the tables all the database would be locked
47 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
C. On this mechanism when locking the table all the ascendants are locked
D. On this mechanism when locking the table all the descendent records are locked
E. All of the above
7. Which one is more describe about multiple granularity of locking techniques?
A. It allow the data with the various size and the hierarchy of the data structure
B. During this time when locking the tables all the database would be locked
C. On this mechanism when locking the table all the ascendants are locked
D. On this mechanism when locking the table all the descendent records are locked
E. All of the above
Which of the following is not a recovery technique?
A. Deferred update
B. Immediate update
C. Two-phase commit
D. Recovery management
8. …. deals with soft errors, such as power failures.
A. System recovery
B. Media recovery
C. Database recovery
D. Failure recovery
9. Rollback of transactions is normally used to :
A. Recover from transaction failure
B. Update the transaction
C. Retrieve old records
D. Repeat a transaction
10. A transaction performs on the isolation level of the Read Uncommitted if :
A. A dirty read occurs
B. Non-repeatable read occurs
C. Phantom reads occurs
D. All of the above
48 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Chapter 6
Database Security
Threats to databases
Threats to databases can result in the loss or degradation of some or all of the following
commonly accepted security goals.
a. Loss of integrity b. Loss of availability c. Loss of confidentiality
To protect databases against these types of threats four kinds of countermeasures can be
implemented:
◦ Access control ◦ Inference control ◦ Flow control ◦ Encryption
A DBMS typically includes a database security and authorization subsystem that is
responsible for ensuring the security portions of a database against unauthorized access.
Two types of database security mechanisms:
◦ Discretionary security mechanisms
◦ Mandatory security mechanisms
The security mechanism of a DBMS must include provisions (supplies)s for restricting
access to the database as a whole
This function is called access control and is handled by creating user accounts and
passwords to control login process by the DBMS.
The security problem associated with databases is that of controlling the access to a
statistical database, which is used to provide statistical information or summaries of
values based on various criteria.
49 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
1. Account creation
2. Privilege granting
3. Privilege revocation
4. Security level assignment
Action 1 is access control, whereas 2 and 3 are discretionary and 4 is used to control
mandatory authorization
50 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Whenever a person or group of person s need to access a database system, the individual
or group must first apply for a user account.
The DBA will then create a new account id and password for the user if he/she deems
there is a legitimate need to access the database
The user must log in to the DBMS by entering account id and password whenever
database access is needed.
The database system must also keep track of all operations on the database that are
applied by a certain user throughout each login session.
To keep a record of all updates applied to the database and of the particular user who
applied each update, we can modify system log, which includes an entry for each operation
applied to the database that may be required for recovery from a transaction failure or
system crash.
If any tampering (interfere or cause damage) with the database is suspected, a database
audit is performed
A database audit consists of reviewing the log to examine all accesses and operations
applied to the database during a certain time period.
A database log that is used mainly for security purposes is sometimes called an audit
trail.
Discretionary Access Control Based on Granting and Revoking Privileges
The typical method of enforcing discretionary access control in a database system is
based on the granting and revoking privileges.
• At this level, the DBA specifies the particular privileges that each account holds
independently of the relations in the database.
51 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
• At this level, the DBA can control the privilege to access each individual relation or view
in the database.
The privileges at the account level apply to the capabilities provided to the account itself
and can include
Each position M(i,j) in the matrix represents the types of privileges (read, write, update)
that subject i holds on object j.
To control the granting and revoking of relation privileges, each relation R in a database
is assigned and owner account, which is typically the account that was used when the
relation was created in the first place.
The owner of a relation is given all privileges on that relation.
In SQL2, the DBA can assign and owner to a whole schema by creating the schema and
associating the appropriate authorization identifier with that schema, using the CREATE
52 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
SCHEMA command.
The owner account holder can pass privileges on any of the owned relation to other users
by granting privileges to their accounts.
In SQL the following types of privileges can be granted on each individual relation R:
SELECT (retrieval or read) privilege on R:
Gives the account retrieval privilege.
In SQL this gives the account the privilege to use the SELECT statement to retrieve
tuples from R.
MODIFY privileges on R:
This gives the account the capability to modify tuples of R.
In SQL this privilege is further divided into UPDATE, DELETE, and INSERT
privileges to apply the corresponding SQL command to R.
In addition, both the INSERT and UPDATE privileges can specify that only certain
attributes can be updated by the account.
In SQL the following types of privileges can be granted on each individual relation R:
REFERENCES privilege on R:
This gives the account the capability to reference relation R when specifying integrity
constraints.
The privilege can also be restricted to specific attributes of R.
Notice that to create a view; the account must have SELECT privilege on all relations
involved in the view definition.
If the owner A of a relation R wants another account B to be able to retrieve only some
fields of R, then A can create a view V of R that includes only those attributes and then
grant SELECT on V to B.
The same applies to limiting B to retrieving only certain tuples of R; a view V’ can be
created by defining the view by means of a query that selects only those tuples from R
that A wants to allow B to access.
53 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
User account A1 can create tables under the schema called EXAMPLE.
Suppose that A1 creates the two base relations EMPLOYEE and DEPARTMENT A1
is then owner of these two relations and hence all the relation privileges on each of
them.
54 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Suppose that A1 wants to grant A2 the privilege to insert and delete tuples in both of
these relations, but A1 does not want A2 to be able to propagate these privileges to
additional accounts:
Suppose that A1 wants to allow A3 to retrieve information from either of the two tables
and also to be able to propagate the SELECT privilege to other accounts.
A1 can issue the command:
GRANT SELECT ON EMPLOYEE, DEPARTMENT
TO A3 WITH GRANT OPTION;
A3 can grant the SELECT privilege on the EMPLOYEE relation to A4 by issuing:
GRANT SELECT ON EMPLOYEE TO A4;
◦ Notice that A4 can’t propagate the SELECT privilege because GRANT OPTION was not
given to A4
Suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE relation
from A3; A1 can issue:
REVOKE SELECT ON EMPLOYEE FROM A3;
The DBMS must now automatically revoke the SELECT privilege on EMPLOYEE from
A4, too, because A3 granted that privilege to A4 and A3 does not have the privilege any
more.
Suppose that A1 wants to give back to A3 a limited capability to SELECT from the
EMPLOYEE relation and wants to allow A3 to be able to propagate the privilege.
◦ The limitation is to retrieve only the NAME, BDATE, and ADDRESS attributes and only
for the tuples with DNO=5.
A1 then create the view:
CREATE VIEW A3EMPLOYEE AS
SELECT NAME, BDATE, ADDRESS
FROM EMPLOYEE
WHERE DNO = 5;
After the view is created, A1 can grant SELECT on the view A3EMPLOYEE to A3 as
follows:
GRANT SELECT ON A3EMPLOYEE TO A3 WITH GRANT OPTION;
Finally, suppose that A1 wants to allow A4 to update only the SALARY attribute of
55 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
EMPLOYEE;
A1 can issue:
GRANT UPDATE ON EMPLOYEE (SALARY) TO A4;
◦ The UPDATE or INSERT privilege can specify particular attributes that may be updated
or inserted in a relation.
◦ Other privileges (SELECT, DELETE) are not attribute specific.
Techniques to limit the propagation of privileges have been developed, although they
have not yet been implemented in most DBMSs and are not a part of SQL.
6.4 Mandatory Access Control and Role-Based Access Control for Multilevel
Security
The discretionary access control techniques of granting and revoking privileges on
relations have traditionally been the main security mechanism for relational database
systems.
This is an all-or-nothing method:
◦ A user either has or does not have a certain privilege.
In many applications, and additional security policy is needed that classifies data and
users based on security classes.
◦ This approach as mandatory access control, would typically be combined with the
discretionary access control mechanisms.
Typical security classes are top secret (TS), secret (S), confidential (C), and unclassified
(U), where TS is the highest level and U the lowest: TS ≥ S ≥ C ≥ U
56 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
The commonly used model for multilevel security, known as the Bell-LaPadula model,
classifies each subject (user, account, and program) and object (relation, tuple, column,
view, operation) into one of the security classifications, T, S, C, or U:
◦ Clearance (classification) of a subject S as class(S) and to the classification of an object
O as class(O).
Two restrictions are enforced on data access based on the subject/object classifications:
◦ Simple security property: A subject S is not allowed read access to an object O unless
class(S) ≥ class (O).
◦ A subject S is not allowed to write an object O unless class(S) ≤ class(O). This known as
the star property (or * property).
To incorporate multilevel security notions into the relational database model, it is
common to consider attribute values and tuples as data objects.
Hence, each attribute A is associated with a classification attribute C in the schema, and
each attribute value in a tuple is associated with a corresponding security classification.
In addition, in some models, a tuple classification attribute TC is added to the relation
attributes to provide a classification for each tuple as a whole.
Hence, a multilevel relation schema R with n attributes would be represented as
◦ R(A1,C1,A2,C2, …, An,Cn,TC)
Where each Ci represents the classification attribute associated with attribute Ai. The
value of the TC attribute in each tuple t – which is the highest of all attribute
classification values within t – provides a general classification for the tuple itself,
whereas each Ci provides a finer security classification for each attribute value within the
tuple.
◦ The apparent key of a multilevel relation is the set of attributes that would have formed
the primary key in a regular (single-level) relation.
A multilevel relation will appear to contain different data to subjects (users) with
different clearance levels.
◦ In some cases, it is possible to store a single tuple in the relation at a higher classification
level and produce the corresponding tuples at a lower-level classification through a
process known as filtering.
57 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
◦ In other cases, it is necessary to store two or more tuples at different classification levels
with the same value for the apparent key.
This leads to the concept of polyinstantiation where several tuples can have the same
apparent key value but have different attribute values for users at different classification
levels.
In general, the entity integrity rule for multilevel relations states that all attributes that
are members of the apparent key must not be null and must have the same security
classification within each individual tuple.
In addition, all other attribute values in the tuple must have a security classification
greater than or equal to that of the apparent key.
This constraint ensures that a user can see the key if the user is permitted to see any part
of the tuple at all.
Other integrity rules, called null integrity and interinstance integrity, informally ensure
that if a tuple value at some security level can be filtered (derived) from a higherclassified
tuple, then it is sufficient to store the higher-classified tuple in the multilevel relation.
In many practical situations, discretionary policies are preferred because they offer a
better trade-off between security and applicability.
58 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
In an e-commerce environment the resources to be protected are not only traditional data
but also knowledge and experience.
The access control mechanism should be flexible enough to support a wide spectrum of
heterogeneous protection objects.
59 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
60 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Chapter Seven
61 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
7. Distributed Database
Systems
7.1 Basic Concepts:
Distributed Database: A single logical database that is spread physically across
computers in multiple locations that are connected by a data communications link.
Distributed database is a collection of connected sites:
Each s7ite is a DB in its own right
Has its own DBMS and its own users
Operations can be performed locally as if the DB was not distributed
The sites collaborate (transparently from the user’s point of view)
The union of all DBs = the DB of the whole organisation (institution)
Schema: A structure that contains descriptions of the database objects. (Tables, views,
constraints...)
- Local schema describes database object on it’s own site only.
- Global schema describes database objects on all network nodes.
62 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
- Distributed Database:
• Distributed Catalog Management: Must keep track of how data is distributed across
sites.
• Site Catalog: Describes all objects (fragments, replicas) at a site + Keeps track of
replicas of relations created at this site.
63 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
All processing is done on single CPU or host computer (mainframe, midrange, or PC)
All data are stored on host computer’s local disk
Processing cannot be done on end user’s side of the system
Typical of most mainframe and midrange computer DBMSs
DBMS is located on the host computer, which is accessed by dumb terminals connected
to
it
Also typical of the first generation of single-user microcomputer databases
64 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
65 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Fully distributed database management system with support for multiple data processors
and transaction processors at multiple sites
Distributed Database Environment
a) Homogeneous DDBMSs
Integrate only one type of centralized DBMS over a network Data is distributed
across all the nodes.
66 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
All data is managed by the distributed DBMS (no exclusively local data.) All
access is through one, global schema.
The global schema is the union of all the local schema.
b) Heterogeneous DDBMSs
Integrate different types of centralized DBMSs over a network Data distributed across
all the nodes.
Local access is done using the local DBMS and schema.
Remote access is done using the global schema.
Fully heterogeneous DDBMS: Support different DBMSs that may even support different
data models (relational, hierarchical, or network) running under different computer
systems, such as mainframes and microcomputers
• Computer workstations
• Network hardware and software
• Communications media
• Transaction processor (or, application processor, or transaction manager): Software component found
in each computer that requests data
• Data processor or data manager: Software component residing on each computer that stores and
retrieves data located at the site, may be a centralized DBMS
67 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
68 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
• Distribution transparency
• Heterogeneity transparency
• Transaction transparency
• Failure transparency
• Performance transparency
7.6.1 Distribution Transparency
• Fragmentation transparency
• Location transparency
• Local mapping transparency
• A Summary of Transparency Features:
IF the SQL statement A Summary of Transparency Features:
requires:
69 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Remote request: Lets a single SQL statement access data to be processed by a single
remote database processor
70 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Distributed transaction:
Can update or request data from several different remote sites on a network.
Distributed transaction Allows a transaction to reference several different (local or
remote) DP sites
71 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Distributed request: Lets a single SQL statement reference data located at several
different local or remote DP sites
Objective of query optimization routine is to minimize total cost associated with the
execution of a request.
72 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
SELECT S.S_id
Colour = ‘red’ ;
(3) join Suppliers with Contracts at A, restrict the tuples for suppliers from London, and
for each of these tuples check at site B to see whether the corresponding part is red
(4) join Suppliers with Contracts at A, restrict the tuples for suppliers from London,
transfer them B and terminate the processing there
(5) restrict Parts to tuples containing red parts, move the result to A and process there
• there is an extra dimension added by the site where the query was issued Costs
associated with a request are a function of the:
Access time (I/O) cost
Communication cost
73 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
• Distributed databases make it possible for a transaction to access data at several sites
• The objective of the 2PC protocol is to ensure that all nodes commit their part of the
transaction.
• Final COMMIT must not be issued until all sites have committed their parts of the
transaction
• Two-phase commit protocol requires:
74 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Phase 1: Preparation
The coordinator broadcast a COMITT message to all subordinates and waits for replies.
Each subordinate receive the message, then updates the database using the DO protocol.
The subordinates reply (COMMITTED or NOT COMMITTED) message to coordinator.
If one or more subordinates did not commit, the coordinator send ABORT message,
75 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
76 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
77 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Synchronous Replication:
All copies of a modified relation (fragment) must be updated before the modifying
transaction commits.
Data distribution is made transparent to users.
Asynchronous Replication:
Copies of a modified relation are only periodically updated; different copies may get out
of synch in the meantime.
Replication Scenarios:
78 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
Problems
CUSTOMER N/A A
PRODUCT PROD_A A
PROD_B B
INVOICE N/A B
INV_LINE N/A B
Site A Site B
Site C
Specify the minimum type(s) of operation(s) the database must support (remote
request, remote transaction, distributed transaction, or distributed request) in order
to perform the following operations:
FROM CUSTOMER;
At Site C: b. SELECT *
79 | P a g e
ITdep’t Meskele Y .
Advanced Database Systems
80 | P a g e ITdep’t Meskele Y .
Advanced Database Systems
81 | P a g e ITdep’t Meskele Y .
Advanced Database Systems
FROM PRODUCT
WHERE PROD_QOH < 10;
At Site B:
i. SELECT *
FROM CUSTOMER;
j. SELECT CUS_NAME,
INV_TOTAL FROM
CUSTOMER,
INVOICE WHERE
k. SELECT *
FROM PRODUCT
82 | P a g e ITdep’t Meskele Y .
2. The following data structure and constraints exist for a magazine publishing
company.
a. The company publishes one regional magazine each in Florida (FL), South Carolina (SC),
Georgia (GA), and Tennessee (TN).
b. The company has 300,000 customers (subscribers) distributed throughout the four states listed in
Part a.
c. On the first of each month an annual subscription INVOICE is printed and sent to all customers
whose subscription is due (CUS_SUB_DATE) for renewal. The INVOICE entity contains a
REGION attribute to indicate the state (FL, SC, GA, TN) in which the customer resides.
CUSTOMER (CUS_NUM, CUS_NAME, CUS_ADDRESS, CUS_CITY, CUS_STATE,
CUS_SUB_DATE)
The company's management is aware of the problems associated with centralized management and
has decided that it is time to decentralize the management of the subscriptions in its four regional
subsidiaries. Each subscription site will handle its own customer and invoice data. The company's
management, however, wants to have access to customer and invoice data to generate annual
reports and to issue ad hoc queries, such as:
The CUSTOMER table must be partitioned horizontally by state. (We show the partitions in the
answer to 3c.)
3. Given the scenario and the requirements in Problem 2, answer the following questions:
83 | P a g e ITdep’t Meskele Y .
a. What recommendations will you make regarding the type and characteristics of the
required database system?
The DDBMS must be able to support distributed transparency features, such as fragmentation
transparency, replica transparency, transaction transparency, and performance transparency.
Heterogeneous capability is not a mandatory feature since we assume there is no existing DBMS
in place and that the company wants to standardize on a single DBMS.
The database must be horizontally partitioned, using the STATE attribute for the CUSTOMER
table and the REGION attribute for the INVOICE table.
The following fragmentation segments reflect the criteria used to partition each database:
84 | P a g e ITdep’t Meskele Y .
C4 South Carolina CUS_STATE = 'SC' CHA
Node
CUSTOMER C1 C2 C3 C4
INVOICE I1 I2 I3 I4
f. What type of distributed database operations must be supported at the headquarters site?
85 | P a g e ITdep’t Meskele Y .