0% found this document useful (0 votes)
24 views194 pages

3 DBMS

The document discusses key concepts in database transaction processing, including ACID properties (Atomicity, Consistency, Isolation, Durability) and the importance of concurrency control and crash recovery. It explains transaction states, schedules, serializability, and the significance of ensuring recoverable and cascadeless schedules to maintain database integrity. Additionally, it covers view serializability and the conditions for determining if schedules are view equivalent to serial schedules.

Uploaded by

sandip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views194 pages

3 DBMS

The document discusses key concepts in database transaction processing, including ACID properties (Atomicity, Consistency, Isolation, Durability) and the importance of concurrency control and crash recovery. It explains transaction states, schedules, serializability, and the significance of ensuring recoverable and cascadeless schedules to maintain database integrity. Additionally, it covers view serializability and the conditions for determining if schedules are view equivalent to serial schedules.

Uploaded by

sandip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 194

Topics

• Transaction Processing
• Concurrency Control
• Crash Recovery
• Query processing and optimization
• Distributed and parallel database
• Object Oriented Databases
• Data warehouse and data mining
• Database Security
Transaction processing and concurrency control
Transaction Concept
• A transaction is a unit of program execution that accesses and
possibly updates various data items.
• To preserve the integrity of data the database system must
ensure ACID Property
• Two main issues to deal with:
• Failures of various kinds, such as hardware failures and system crashes
• Concurrent execution of multiple transactions
ACID Properties
• Atomicity. Either all operations of the transaction are properly reflected in the
database or none are. This is handled by Recovery System of database
• Consistency. Execution of a transaction in isolation preserves the consistency of the
database. Maintaining Consistency is the responsibility of Application Programmer
• Isolation. Although multiple transactions may execute concurrently, each transaction
must be unaware of other concurrently executing transactions. Intermediate
transaction results must be hidden from other concurrently executed transactions.
• That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution
before Ti started, or Tj started execution after Ti finished.
• This is handled by Concurrency Control System
• Durability. After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures. This is handled by Recovery System
of database
Required Properties of a Transaction
• Consider a transaction to transfer $ 50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Atomicity requirement
• If the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent
database state
• Failure could be due to software or hardware
• The system should ensure that updates of a partially executed transaction are not reflected in the
database
• Durability requirement — once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the database by the transaction must persist even if there
are software or hardware failures.
Required Properties of a Transaction (Cont.)
• Consistency requirement in above example: The sum of A and B is unchanged by the execution of the
transaction
• In general, consistency requirements include :
• Explicitly specified integrity constraints such as primary keys and foreign keys
• Implicit integrity constraints
• e.g., sum of balances of all accounts, minus sum of loan amounts must equal value of
cash-in-hand
• A transaction, when starting to execute, must see a consistent database. During transaction
execution the database may be temporarily inconsistent. When the transaction completes
successfully the database must be consistent. Erroneous transaction logic can lead to inconsistency
Required Properties of a Transaction (Cont.)
• Isolation requirement — if between steps 3 and 6 (of the fund transfer transaction) ,
another transaction T2 is allowed to access the partially updated database, it will see an
inconsistent database (the sum A + B will be less than it should be).

T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running transactions serially
• That is, one after the other.
• However, executing multiple transactions concurrently has significant benefits, as we will see
later.
Execution of translation in isolation preserves the _________ of a database
a) Atomicity
b) Consistency
c) Durability
d) All of the mentioned

Which of the following systems is responsible for ensuring isolation?


a) Recovery system
b) Atomic system
c) Concurrency control system
d) Compiler system

The scheme that controls the interaction between executing transactions is called as _____
a) Concurrency control scheme
b) Multiprogramming scheme
c) Serialization scheme
d) Schedule scheme

The “all-or-none” property is commonly referred to as _________


a) Isolation
b) Durability
c) Atomicity
d) None of the mentioned
Transaction State
Active – the initial state; the transaction stays
in this state while it is executing
Partially committed – after the final statement
has been executed.
Failed -- after the discovery that normal
execution can no longer proceed.
Aborted – after the transaction has been rolled
back and the database restored to its state
prior to the start of the transaction. Two
options after it has been aborted:
Restart the transaction
can be done only if no internal logical
error
Kill the transaction
Committed – after successful completion.
Concurrent Executions : Transaction Isolation
• Multiple transactions are allowed to run concurrently in the system.
Advantages are:
• Increased processor and disk utilization, leading to better transaction
throughput
• E.g. one transaction can be using the CPU while another is reading from or writing to the
disk
• Reduced average response time for transactions: short transactions need not
wait behind long ones.
• Concurrency control schemes – mechanisms to achieve isolation
• That is, to control the interaction among the concurrent transactions in order
to prevent them from destroying the consistency of the database.
Schedules
• Schedule – a sequences of instructions that specify the chronological
order in which instructions of concurrent transactions are executed
• A schedule for a set of transactions must consist of all instructions of those
transactions
• Must preserve the order in which the instructions appear in each individual
transaction.
• A transaction that successfully completes its execution will have a
commit instructions as the last statement
• By default transaction assumed to execute commit instruction as its last step
• A transaction that fails to successfully complete its execution will have an
abort instruction as the last statement
Schedule 1 Schedule 2
• Let T1 transfer $50 from A to B, and T2 transfer 10%
• A serial schedule in which T2 is
of the balance from A to B. followed by T1 :
• An example of a serial schedule in which T1 is
followed by T2 :

Suppose A=1000 and B=2000, Then After Schedule 1


Suppose A=1000 and B=2000, Then After
execution There will be
Schedule 1 execution There will be
A=855 and B=2145
A=855 and B=2145
Schedule 3 Schedule 4
• Let T1 and T2 be the transactions defined • The following concurrent schedule does
previously. The following schedule is not preserve the sum of “A + B”
not a serial schedule, but it is equivalent
to Schedule 1.

In Schedule 4 sum “A + B” is not preserved


Note -- In schedules 1, 2 and 3, the sum “A + B” is preserved.
Serializability
• Basic Assumption – Each transaction preserves database consistency.
• Thus, serial execution of a set of transactions preserves database
consistency.
• A (possibly concurrent) schedule is serializable if it is equivalent to a
serial schedule.
• concept in a transaction that helps to identify which non-serial schedule
is correct and will maintain the database consistency
• If a schedule of concurrent ‘n’ transactions can be converted into an
equivalent serial schedule. Then we can say that the schedule is
serializable. And this property is known as serializability.
• Different forms of schedule equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
Conflicting Instructions
• Let li and lj be two Instructions of transactions Ti and Tj respectively.
Instructions li and lj conflict if and only if there exists some item Q
accessed by both li and lj, and at least one of these instructions wrote
Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
• Intuitively, a conflict between li and lj forces a (logical) temporal order
between them.
• If li and lj are consecutive in a schedule and they do not conflict, their results
would remain the same even if they had been interchanged in the schedule.
Conflict Serializability
• If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting
instructions, we say that S and S´ are conflict equivalent.
• We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule

Schedule 3 can be transformed into Schedule 6 -- a serial schedule where


T2 follows T1, by a series of swaps of non-conflicting instructions.
Therefore, Schedule 3 is conflict serializable. Example of a schedule that is not conflict
serializable: We are unable to swap
instructions in the above schedule to obtain
either the serial schedule < T3, T4 >, or the
serial schedule < T4, T3 >.

Schedule 3 Schedule 6
Testing for Conflict Serializability
• Consider some schedule of a set of transactions T1, T2, ..., Tn
• Precedence graph — a direct graph where the vertices are the
transactions (names). We draw an arc from Ti to Tj if the two
transaction conflict, and Ti accessed the data item on which the conflict
arose earlier. We may label the arc by the item that was accessed.
• A schedule is conflict serializable if and only if its precedence graph is
acyclic.
• Cycle-detection algorithms exist which take order n2 time, where n is
the number of vertices in the graph.
• If precedence graph is acyclic, the serializability order can be
obtained by a topological sorting of the graph.
• That is, a linear order consistent with the partial order of the graph.
• For example, a serializability order for the schedule (a) would be
one of either (b) or (c)
Conflicting Transactions
Conflict Serializability R-W
T1 T2 T3 W-R
R(x) Check conflict pairs in W-W

R(y) Other transactions and draw edge


R(x) No Loop/Cycle in precedence graph
R(y)  schedule is conflict Serializable
If loop
R(z)  it is not conflict serializable but we needs to test for view
W(y)
serializability
W(z)
T1

R(z)
W(x)
T2
W(z)
Here is no loop in the precedence graph so its Serializable (It is a conflict T3
serializable – it means it can be converted to serial equivalent schedule
Consistent
Order of serializability T2T3T1
Check Whether schedule is conflict serializable or not ?
T1 T2 T3
R(A)

T1 T2
W(A)
Non Conflict serializable
W(A) Because it contains loop
T3
W(A)

If there is a loop in any two transaction in a Precedence graph then


This cant be answer that this is serializable or not
We needs to reconfirm by view serializability test
T1

T2
T3
Ans: T1T3T2
Recoverable Schedules
• Recoverable schedule — if a transaction Tj reads Cascading rollback – a single transaction failure
a data item previously written by a transaction Ti , leads to a series of transaction rollbacks.
then the commit operation of Ti must appear
Consider the following schedule where none of
before the commit operation of Tj.
the transactions has yet committed (so the
• The following schedule is not recoverable if T9 schedule is recoverable)
commits immediately after the read(A)
operation.

• If T8 should abort, T9 would have read (and


possibly shown to the user) an inconsistent If T10 fails, T11 and T12 must also be rolled back.
database state. For Recoverable of above Can lead to the undoing of a significant amount
Schedule T9 would have to delay committing of work
until after T8 commits.
• Hence, database must ensure that schedules are
recoverable.
Cascadeless Schedules
• Cascadeless schedules — for each pair of transactions Ti and Tj such that Tj reads a
data item previously written by Ti, the commit operation of Ti appears before the
read operation of Tj.
• Every cascadeless schedule is also recoverable
• It is desirable to restrict the schedules to those that are cascadeless
• Example of a schedule that is NOT cascadeless
View Serializability
• View serializability is a concept that is used to compute whether schedules are View-
Serializable or not. A schedule is said to be View-Serializable if it is view equivalent to a
Serial Schedule (where no interleaving of transactions is possible).

view serializable
• A schedule S will view serializable if it is
View Serializability
view equivalent to a serial schedule.
• If a schedule is conflict serializable,
ConflictConflict
Serializability then it will be view serializable. So
Serializability
Conflict Serializability Better first check Conflict serializable or
not ?
• The view serializable which does not
conflict serializable contains blind writes
• Blind write it that where transition
Consistent Schedule
perform write operation without reading
Inconsistent Schedule anything .
View Serializability Test Conditions
• Let S and S´ be two schedules with the same set of transactions.
• S and S´ are view equivalent if the following three conditions are met, for each data item
Q,
1. Initial Read : If in schedule S, transaction Ti reads the initial value of Q, then in schedule
S’ also transaction Ti must read the initial value of Q. In simple word: If a transaction T1
reading data item A from database in S1 then in S2 also T1 should read A from
database.
2. Updated Read : If in schedule S transaction Ti executes read(Q), and that value was
produced by transaction Tj (if any), then in schedule S’ also transaction Ti must read the
value of Q that was produced by the same write(Q) operation of transaction Tj . In
simple word If Ti is reading A which is updated by Tj in S1 then in S2 also Ti should read
A which is updated by Tj.
3. Final Write operation : The transaction (if any) that performs the final write(Q)
operation in schedule S must also perform the final write(Q) operation in schedule S’. In
simple word If a transaction T1 updated A at last in S1, then in S2 also T1 should
perform final write operations.
• As can be seen, view equivalence is also based purely on reads and writes alone.
View Serializability (Cont.)
• A schedule S is view serializable if it is view equivalent to a serial schedule.
• Every conflict serializable schedule is also view serializable.
• Below is a schedule which is view-serializable but not conflict serializable.

T1 T2 T3
R(A)
W(A)
W(A)
W(A)

• What serial schedule is above equivalent to?


• Every view serializable schedule that is not conflict serializable has blind writes.
• (Blind write it that where transition perform write operation without reading anything . Here
in T2 W(A) is blind write.
View Serializability (Cont.) For more details
https://www.youtube.com/
Check Whether schedule is conflict serializable or not ? watch?v=Ewzmv2jBlmM
T1 T2 T3
R(A) https://www.youtube.com/
T1 T2 watch?v=FJteasXARxg
W(A)
W(A)
Non Conflict serializable
W(A) Because it contains loop but it is view serializable
T3 As shown below

Original Schedule (S)

T1 T2 T3
T1 T2
R(A)
W(A)
In original Schedule
Final write operation was W(A)
Done by T3 so it must W(A) T3
Be preserved also in
View Equivalent Schedule
View Serializable Schedule(S’)
Test for View Serializability
• The precedence graph test for conflict serializability cannot be used directly to
test for view serializability.
• Extension to test for view serializability has cost exponential in the size of
the precedence graph.
• The problem of checking if a schedule is view serializable falls in the class of
NP-complete problems.
• Thus, existence of an efficient algorithm is extremely unlikely.
• If there are three Transactions then We needs to create views of 3! i.e for N no
of transactions in a schedule we needs create and check N! Possibilities of
views with three conditions for each
• However ,practical algorithms that just check some sufficient conditions for
view serializability can still be used.
Q. Given a Schedule S below check View Serializability

Schedule S
With 3 transactions, the total number of possible schedule
3! = 6
1.S1 = <T1 T2 T3>
2.S2 = <T1 T3 T2>
3.S3 = <T2 T3 T1>
Taking first schedule S1: 4.S4 = <T2 T1 T3>
5.S5 = <T3 T1 T2>
6.S6 = <T3 T2 T1>
Schedule S1
Step 1: final updation on data items
In both schedules S and S1, there is no read except the initial read that's
why we don't need to check that condition.
Step 2: Initial Read
The initial read operation in S is done by T1 and in S1, it is also done by
T1.
Step 3: Final Write
he final write operation in S is done by T3 and in S1, it is also done by T3. So, S and S1 are view
Equivalent.
The first schedule S1 satisfies all three conditions, so we don't need to check another schedule.
Hence, view equivalent serial schedule is:T1 → T2 → T3
Is Given Schedule is Conflict Serializable ?

Ans: NO
The Given Schedule is Conflict Serializable , Which one will be the correct order of Execution of Transaction

A. T1T3T2
B. T2T3T1
C. T1T2T3
D. It can execute in any sequence

Ans: A
Consider the following Schedule

Which of the following statement is true about schedule S

Ans: C
If a schedule S can be transformed into a schedule S’ by a series of swaps of non-conflicting
instructions, then S and S’ are
a) Non conflict equivalent
b) Equal
c) Conflict equivalent
d) Isolation equivalent

A schedule is __________ if it is conflict equivalent to a serial schedule.


a) Conflict serializable
b) Conflicting
c) Non serializable
d) None of the mentioned
Concurrency Control
Concurrency Control
• Process of managing simultaneous execution of transaction in a shared database, to
ensure serializability of transaction
• Goal – to develop concurrency control protocols that will assure serializability.
• Purpose:
• To enforce isolation,
• To Preserve database consistency
• To resolve read-write and write-write conflicts
• A database must provide a mechanism that will ensure that all possible schedules are
both: Conflict serializable, Recoverable and preferably cascadeless
• A policy in which only one transaction can execute at a time generates serial
schedules, but provides a poor degree of concurrency
• Concurrency-control schemes tradeoff between the amount of concurrency they allow
and the amount of overhead that they incur
• Tests for serializability help us understand why a concurrency control protocol is
correct
Concurrency Control protocols Types
Lock-Based Protocols
 Shared/Exclusive Lock
 2PL (Strict 2PL, Rigorous 2PL)
Timestamp-Based Protocols
Validation-Based Protocols
Multiple Granularity
Intention Lock Modes
Multiversion Schemes
Multiversion Timestamp Ordering
Multiversion Two-Phase Locking
Lock-compatibility matrix
S/X Lock-Based Protocols
• A lock is a mechanism to control concurrent access to a data item
• Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested
using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S
instruction.
• Lock requests are made to the concurrency-control manager by the programmer.
• Transaction can proceed only after request is granted.
T1 T2
S(A) X(A) In T2 – Lock is exclusive so we can read
R(A) R(A) and write on A ,
U(A) W(A) But in case of T1 Lock is shared so we can
Just read operation.
U(A)
S/X Lock-Based Protocols
• A locking protocol is a set of rules followed by all transactions while
requesting and releasing locks.
• Locking protocols restrict the set of possible schedules.
• A transaction may be granted a lock on an item if the requested lock
is compatible with locks already held on the item by other
transactions
• Any number of transactions can hold shared locks on an item, But if
any transaction holds an exclusive on the item no other transaction
may hold any lock on the item.
• If a lock cannot be granted, the requesting transaction is made to wait
till all incompatible locks held by other transactions have been
released.
• The lock is then granted.
Problem of S/X locking
• May not sufficient to produce only serializable schedule
• May not free from irrecoverability
• May not free from deadlock
• May not free from starvation
Lock-Based Protocols
• Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
• Locking as above is not sufficient to guarantee serializability — if A and B
get updated in-between the read of A and B, the displayed sum would be
wrong.
The Two-Phase Locking Protocol
• A protocol which ensures conflict-
serializable schedules.
• Phase 1: Growing Phase
• Transaction may obtain locks

Locks
• Transaction may not release locks
• Phase 2: Shrinking Phase
• Transaction may release locks Time
• Transaction may not obtain locks
• The protocol assures serializability. It can be
proved that the transactions can be
serialized in the order of their lock points
(i.e., the point where a transaction acquired
its final lock).
The Two-Phase Locking Protocol (Cont.)
T1
X(A) T1 T2
Here Lock-S(A) is
S(B) Lock-S(A) Granted because of
R(A) Lock-S(A) T1: Lock-S(A) and
W(A) T2 : Lock-S(A) are
Lock-X(B)
Growing Not conflicting locks
R(b) Unlock(A)
Phase
S(C) Lock-X(D)
R(C ) Unlock (B)
S(D) Unlock(A)
R(D) Unlock (D) If Lock-X(A) is
Try to acquire
U(A) then it will be
blocked
Shrinking Phase U(B) Immediately
due to
….. Conflicting lock
request
Drawbacks of 2PL

• May not free from irrecoverability


• Not free from deadlocks
• Not free from starvation
• Not free from Cascading Rollback
Deadlocks
• System is deadlocked if there is a set of transactions such that every transaction in
the set is waiting for another transaction in the set.

• Consider the partial schedule

• Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3
to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to
release its lock on A.
• Such a situation is called a deadlock.
• To handle a deadlock one of T3 or T4 must be rolled back
and its locks released.
Strict 2PL and Rigorous 2PL
• Cascading roll-back is possible under two-phase locking.
• Extensions to basic two-phase locking needed to ensure recoverability of freedom from
cascading roll-back
• strict two-phase locking
• a transaction must hold all its exclusive locks till it commits/aborts.
• Helps in cascadeless schedule
• Rigorous two-phase locking is even stricter.
• Here, all locks (shared and exclusive both ) are held(cant released) till commit/abort.
• In this protocol transactions can be serialized in the order in which they commit.
• Avoiding Cascading Rollback
Rigorous 2PL
Strict 2PL
Strict 2PL
Most databases implement rigorous two-phase
locking, but refer to it as simply two-phase locking Basic 2PL
Lock Conversions
• Two-phase locking with lock conversions:
– First Phase:
• can acquire a lock-S on item
• can acquire a lock-X on item
• can convert a lock-S to a lock-X (upgrade)
– Second Phase:
• can release a lock-S
• can release a lock-X
• can convert a lock-X to a lock-S (downgrade)
• This protocol assures serializability.
• But still relies on the programmer to insert the various locking
instructions.
Starvation

• Two-phase locking does not ensure freedom from deadlocks.


• In addition to deadlocks, there is a possibility of starvation.
• Starvation occurs if the concurrency control manager is badly designed.
For example:
• A transaction may be waiting for an X-lock on an item, while a sequence of other
transactions request and are granted an S-lock on the same item.
• The same transaction is repeatedly rolled back due to deadlocks.
• Concurrency control manager can be designed to prevent starvation.
Implementation of Locking
• A lock manager can be implemented as a separate process to which
transactions send lock and unlock requests
• The lock manager replies to a lock request by sending a lock grant
messages (or a message asking the transaction to roll back, in case of
a deadlock)
• The requesting transaction waits until its request is answered
• The lock manager maintains a data-structure called a lock table to
record granted locks and pending requests
• The lock table is usually implemented as an in-memory hash table
indexed on the name of the data item being locked
Lock Table • Dark blue rectangles indicate granted locks;
light blue indicate waiting requests
• Lock table also records the type of lock
granted or requested
• New request is added to the end of the queue
of requests for the data item, and granted if it
is compatible with all earlier locks
• Unlock requests result in the request being
deleted, and later requests are checked to see
if they can now be granted
• If transaction aborts, all waiting or granted
requests of the transaction are deleted
• lock manager may keep a list of locks held
by each transaction, to implement this
efficiently
Deadlock Handling
• Deadlock prevention protocols ensure that the system will never enter into a
deadlock state.
• Some prevention strategies :
• Require that each transaction locks all its data items before it begins execution (pre-
declaration).
• Impose partial ordering of all data items and require that a transaction can lock data
items only in the order specified by the partial order.
• wait-die scheme — non-preemptive
• older transaction may wait for younger one to release data item. (older means smaller
timestamp) Younger transactions never Younger transactions never wait for older ones;
they are rolled back instead.
• a transaction may die several times before acquiring needed data item
• wound-wait scheme — preemptive
• older transaction wounds (forces rollback) of younger transaction instead of waiting for
it. Younger transactions may wait for older ones.
• may be fewer rollbacks than wait-die scheme.
Deadlock prevention (Cont.)
• Both in wait-die and in wound-wait schemes, a rolled back
transactions is restarted with its original timestamp. Older
transactions thus have precedence over newer ones, and starvation is
hence avoided.
• Timeout-Based Schemes:
• a transaction waits for a lock only for a specified amount of time. If the lock
has not been granted within that time, the transaction is rolled back and
restarted,
• Thus, deadlocks are not possible
• simple to implement; but starvation is possible. Also difficult to determine
good value of the timeout interval.
Deadlock Detection
• Deadlocks can be described as a wait-for graph, which consists of a pair G =
(V,E),
• V is a set of vertices (all the transactions in the system)
• E is a set of edges; each element is an ordered pair Ti Tj.
• If Ti  Tj is in E, then there is a directed edge from Ti to Tj, implying that Ti
is waiting for Tj to release a data item.
• When Ti requests a data item currently being held by Tj, then the edge Ti 
Tj is inserted in the wait-for graph. This edge is removed only when Tj is no
longer holding a data item needed by Ti.
• The system is in a deadlock state if and only if the wait-for graph has a
cycle. Must invoke a deadlock-detection algorithm periodically to look for
cycles.
Tree Protocol
• Only exclusive locks are allowed.
• The first lock by Ti may be on any data item. Subsequently, a data Q can be locked by
Ti only if the parent of Q is currently locked by Ti.
• Data items may be unlocked at any time.
• A data item that has been locked and unlocked by Ti cannot subsequently be
relocked by Ti
Graph-Based Protocols

• Graph-based protocols are an alternative to two-phase locking


• Impose a partial ordering  on the set D = {d1, d2 ,..., dh} of all data items.
• If di  dj then any transaction accessing both di and dj must access di before accessing
dj .
• Implies that the set D may now be viewed as a directed acyclic graph, called a database
graph.
• The tree-protocol is a simple kind of graph protocol.
Graph-Based Protocols (Cont.)
• The tree protocol ensures conflict serializability as well as freedom from deadlock.
• Unlocking may occur earlier in the tree-locking protocol than in the two-phase locking protocol.
• Shorter waiting times, and increase in concurrency
• Protocol is deadlock-free, no rollbacks are required
• Drawbacks
• Protocol does not guarantee recoverability or cascade freedom
• Need to introduce commit dependencies to ensure recoverability
• Transactions may have to lock data items that they do not access.
• increased locking overhead, and additional waiting time
• potential decrease in concurrency
• Schedules not possible under two-phase locking are possible under the tree protocol, and vice versa.
Multiple Granularity

• Allow data items to be of various sizes and define a hierarchy of data


granularities, where the small granularities are nested within larger
ones
• Can be represented graphically as a tree (but don't confuse with tree-
locking protocol)
• When a transaction locks a node in the tree explicitly, it implicitly
locks all the node's descendants in the same mode.
• Granularity of locking (level in tree where locking is done):
• Fine granularity (lower in tree): high concurrency, high locking overhead
• Coarse granularity (higher in tree): low locking overhead, low concurrency
Example of Granularity Hierarchy
The levels, starting from the coarsest (top) level are
• database
• area
• file
• record
Intention Lock Modes
• In addition to S and X lock modes, there are three additional lock
modes with multiple granularity:
• intention-shared (IS): indicates explicit locking at a lower level of the tree but
only with shared locks.
• intention-exclusive (IX): indicates explicit locking at a lower level with
exclusive or shared locks
• shared and intention-exclusive (SIX): the subtree rooted by that node is
locked explicitly in shared mode and explicit locking is being done at a lower
level with exclusive-mode locks.
• intention locks allow a higher level node to be locked in S or X mode
without having to check all descendent nodes.
Compatibility Matrix with Intention Lock Modes

• The compatibility matrix for all lock modes is:


Multiple Granularity Locking Scheme
• Transaction Ti can lock a node Q, using the following rules:
1. The lock compatibility matrix must be observed.
2. The root of the tree must be locked first, and may be locked in any mode.
3. A node Q can be locked by Ti in S or IS mode only if the parent of Q is currently locked by
Ti in either IX or IS mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if the parent of Q is currently
locked by Ti in either IX or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any node (that is, Ti is two-phase).
6. Ti can unlock a node Q only if none of the children of Q are currently locked by Ti.
• Observe that locks are acquired in root-to-leaf order, whereas they are released
in leaf-to-root order.
• Lock granularity escalation: in case there are too many locks at a particular
level, switch to higher granularity S or X lock
Timestamp-Based Protocols
• Timestamp is a Unique value assign to every transaction
• It tells the order (when they enters into the system).
• Each transaction is issued a timestamp when it enters the system.
• The protocol manages concurrent execution such that the time-stamps
determine the serializability order.
• The timestamp ordering protocol ensures that any conflicting read and
write operations are executed in timestamp order.
• Order the transactions based on their Timestamps
• Timestamp-based protocols manage concurrent execution such that
time-stamp order = serializability order
Timestamp-Based Protocols
• Each transaction Ti is issued a timestamp TS(Ti) when it enters the system.
• Each transaction has a unique timestamp
• Newer transactions have timestamps strictly greater than earlier ones
• Timestamp could be based on a logical counter
• Real time may not be unique
• Can use (wall-clock time, logical counter) to ensure
• Suppose, if an old transaction Ti has timestamp TS(Ti), a new transaction Tj is
assigned timestamp TS(Tj) such that TS(Ti) < TS(Tj).
• The protocol manages concurrent execution such that the timestamps
determine the serializability order.
Timestamp-Based Protocols
• If an old transaction Ti has time-stamp TS(Ti), a new transaction Tj is
assigned time-stamp TS(Tj) such that TS(Ti) < TS(Tj).
• In order to assure such behavior, the protocol maintains for each data
Q two timestamp values:
• W-timestamp(Q) -
• WTS(Q)  is the largest time-stamp of any transaction that executed
write(Q) successfully.
• R-timestamp(Q) -
• RTS(Q)  is the largest time-stamp of any transaction that executed
read(Q) successfully.
• Imposes rules on read and write operations to ensure that
Any conflicting operations are executed in timestamp order
• Out of order operations cause transaction rollback
Timestamp-Based Protocols : Rules
• Suppose a transaction Ti issues a read(Q)
1. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that
was already overwritten. Hence, the read operation is rejected, and Ti is rolled back.

2. If TS(Ti)  W-timestamp(Q), then the read operation is executed, and


R-timestamp(Q) is set to max(R-timestamp(Q), TS(Ti)).

• Suppose that transaction Ti issues write(Q).


1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and
the system assumed that that value would never be produced.
 Hence, the write operation is rejected, and Ti is rolled back.

2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an


obsolete value of Q.
 Hence, this write operation is rejected, and Ti is rolled back.

3. Otherwise, the write operation is executed, and W-timestamp(Q) is


set to TS(Ti).
Timestamp-Based Protocols: Rules Summary
A. Transaction Ti issues a Read(A) Operation
1) If WTS(A) > TS(Ti) , Rollback Ti
2) Otherwise execute Read(A) operation
Set RTS(A)= MAX { RTS(A), TS(Ti) }
B.Transaction Ti Issues Write (A) Operation
a) If RTS(A) > TS(Ti) then Rollback Ti
b) If WTS(A) > TS(Ti) then Rollback Ti
c) Otherwise execute Write(A) Operation
Set WTS(A)=TS(Ti)
Correctness of Timestamp-Ordering Protocol

• The timestamp-ordering protocol guarantees serializability since all the


arcs in the precedence graph are of the form:

Thus, there will be no cycles in the precedence graph


• Timestamp protocol ensures freedom from deadlock as no transaction ever
waits.
• But the schedule may not be cascade-free, and may not even be
recoverable.
TS Protocol Advantages/ Disadvantages
• Schedules produced by Basic TO are guaranteed to be conflict
serializable.
• Already discussed that using Timestamp can ensure that our schedule
will be deadlock free.
• One drawback of the Basic TO protocol is that Cascading Rollback is
still possible.
• Suppose we have a Transaction T1 and T2 has used a value written by
T1. If T1 is aborted and resubmitted to the system then, T2 must also
be aborted and rolled back. So the problem of Cascading aborts still
prevails.
Recoverability and Cascade Freedom
• Problem with timestamp-ordering protocol:
• Suppose Ti aborts, but Tj has read a data item written by Ti
• Then Tj must abort; if Tj had been allowed to commit earlier, the schedule is not
recoverable.
• Further, any transaction that has read a data item written by Tj must abort
• This can lead to cascading rollback --- that is, a chain of rollbacks
• Solution 1:
• A transaction is structured such that its writes are all performed at the end of its
processing
• All writes of a transaction form an atomic action; no transaction may execute while a
transaction is being written
• A transaction that aborts is restarted with a new timestamp
• Solution 2: Limited form of locking: wait for data to be committed before
reading it
• Solution 3: Use commit dependencies to ensure recoverability
Thomas’ Write Rule
• Modified version of the timestamp-ordering protocol in which
obsolete write operations may be ignored under certain
circumstances.
• When Ti attempts to write data item Q, if TS(Ti) < WTS(Q), then Ti is
attempting to write an obsolete value of {Q}.
• Rather than rolling back Ti as the timestamp ordering protocol would have
done, this {write} operation can be ignored.
• Otherwise this protocol is the same as the timestamp ordering
protocol.
• Thomas' Write Rule allows greater potential concurrency.
• Allows some view-serializable schedules that are not conflict-serializable.
Validation Based Protocol

• Validation Based Protocol is also called Optimistic Concurrency Control Technique.


• Used where transactions are read only and conflicts are low i.e In cases where a majority
of transactions are read-only transactions, the rate of conflicts among transactions may be low.
• no checking is done while the transaction is been executed. Until the transaction end is
reached updates in the transaction are not applied directly to the database
• All updates are applied to local copies of data items kept for the transaction. At the end
of transaction execution, while execution of the transaction, a validation phase checks
whether any of transaction updates violate serializability.
• If there is no violation of serializability the transaction is committed and the database is
updated; or else, the transaction is updated and then restarted.
• The idea behind optimistic concurrency is to do all the checks at once; hence
transaction execution proceeds with a minimum of overhead until the validation
phase is reached
Validation-Based Protocol
• Execution of transaction Ti is done in three phases. Every Ti has to go through three phases for
validation based protocol:
1. Read and execution phase: Transaction Ti writes only to temporary local variables
• Ti reads all the data and stored then in temporary variables (local variable of Ti) . After reading all
the write operations are made in temporary variables instead of actual DB. Values of committed
data items from the database can be read by a transaction. Updates are only applied to local data
versions.
2. Validation phase: Transaction Ti performs a ''validation test'' to determine if local variables can be
written without violating serializability.
• Checking is performed to make sure that there is no violation of serializability when the
transaction updates are applied to the database. If any transactions failed during this phase then
transaction aborted and rolled back
3. Write phase: If Ti is validated, the updates are applied to the database; otherwise, Ti is rolled back.
• If Ti passes the validation test then actual changes are made to DB . Validation Test ensures the
violation free execution of transaction .
On the success of the validation phase, the transaction updates are applied to the database, otherwise,
the updates are discarded and the transaction is slowed down.
Validation-Based Protocol
• The three phases of concurrently executing transactions can be interleaved, but each transaction must
go through the three phases in that order.
• Assume for simplicity that the validation and write phase occur together, atomically and
serially
• I.e., only one transaction executes validation/write at a time.
• Also called as optimistic concurrency control since transaction executes fully in the hope that all will
go well during validation

three time-stamps in Validation Protocol


1. Start(Ti): It is the time when Ti started its execution.
2. Validation(Ti): It is the time when Ti just finished its read phase and begin its validation phase.
3. Finish(Ti): the time when Ti end it’s all writing operations in the database under write-phase.

Two more terms that we need to know are:


1. Write_set: of a transaction contains all the write operations that Ti performs.
2. Read_set: of a transaction contains all the read operations that Ti performs.
Validation-Based Protocol (Cont.)
• Each transaction Ti has 3 timestamps
• Start(Ti) : the time when Ti started its execution
• Validation(Ti): the time when Ti entered its validation phase
• Finish(Ti) : the time when Ti finished its write phase
• Serializability order is determined by timestamp given at validation
time; this is done to increase concurrency.
• Thus, TS(Ti) is given the value of Validation(Ti).
• This protocol is useful and gives greater degree of concurrency if
probability of conflicts is low.
• because the serializability order is not pre-decided, and
• relatively few transactions will have to be rolled back.
Validation-Based Protocol : Validation test conditions
To Clear all the validation test by Ti one of the following below conditions must
hold :
1. Finish(Ti) < Starts(Tj),
• It means Ti is older transaction and is get finished before Tj starts (No Overlap
).
• Since Ti finishes its execution means completes its write-phase before
Tj started its execution(read-phase). Then the serializability indeed
maintained.
2. Finish (Ti)< Validate(Tj) :
• This Ensures actual write by Ti and Tj will not overlap
• Ti begins its write phase after Tj completes its write phase, and the read_set
of Ti should be disjoint with write_set of Tj.
3. Validate (Ti) < Validate (Tj)
• It Ensures that Ti has completed read phase before Tj completed read phase
Validation Test for Transaction Tj
• If for all Ti with TS (Ti) < TS (Tj) either one of the following condition
holds:
• finish(Ti) < start(Tj)
• start(Tj) < finish(Ti) < validation(Tj) and the set of data items written by Ti
does not intersect with the set of data items read by Tj.
then validation succeeds and Tj can be committed. Otherwise,
validation fails and Tj is aborted.
• Justification: Either the first condition is satisfied, and there is no
overlapped execution, or the second condition is satisfied and
 the writes of Tj do not affect reads of Ti since they occur after Ti has finished
its reads.
 the writes of Ti do not affect reads of Tj since Tj does not read any item
written by Ti.
Ex: Here two Transactions Ti and Tj are given, since TS(Tj) < Tj Ti

TS(Ti) so the validation phase succeeds in the Schedule-A. r(x) // x=12


It’s noteworthy that the final write operations to the database
r(x)
are performed only after the validation of both Ti and Tj.
Since Ti reads the old values x=x-10
r(y) //y=15
of x(12) and y(15) while print(x+y) operation unless final write
operation take place. y=y+10
r(x)

<validate>
print(x+y)

<validate>

w(x)
w(y)
Schedule Produced by Validation
• Example of schedule produced using validation
Validation-Based Protocol
Advantages:
• Avoid Cascading-rollbacks:
• avoids cascading rollbacks since the final write operations to the database are performed only after the
transaction passes the validation phase.
• If the transaction fails then no updation operation is performed in the database.
• So no dirty read will happen hence possibilities cascading-rollback would be null.
• Since only local copies of data are included in rollbacks, cascading rollbacks are avoided
• Avoid deadlock:
• Since a strict time-stamping based technique is used to maintain the specific order of transactions.
• Hence deadlock isn’t possible in this scheme.
Disadvantages:
• Starvation:
• There might be a possibility of starvation for long-term transactions, due to a sequence of conflicting short-term
transactions that cause the repeated sequence of restarts of the long-term transactions so on and so forth. To
avoid starvation, conflicting transactions must be temporarily blocked for some time, to let the long-term
transactions to finish
• Not favorable for longer transactions :
• This method is not favorable for longer transactions because they are more likely to have conflicts and might be
repeatedly rolled back due to conflicts with short transactions.
Multiversion Schemes
• Multiversion schemes keep old versions of data item to increase concurrency.
Several variants:
• Multiversion Timestamp Ordering
• Multiversion Two-Phase Locking
• Snapshot isolation
• Key ideas:
• Each successful write results in the creation of a new version of the data item written.
• Use timestamps to label versions.
• When a read(Q) operation is issued, select an appropriate version of Q based on the
timestamp of the transaction issuing the read request, and return the value of the selected
version.
• reads never have to wait as an appropriate version is returned immediately.
• Each data item Q has a sequence of versions <Q1, Q2,...., Qm>. Each version Qk
contains three data fields:
• Content -- the value of version Qk.
• W-timestamp(Qk) -- timestamp of the transaction that created (wrote) version Qk
• R-timestamp(Qk) -- largest timestamp of a transaction that successfully read version Qk
Multiversion Timestamp Ordering (Cont)
• Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk
denote the version of Q whose write timestamp is the largest write
timestamp less than or equal to TS(Ti).
1. If transaction Ti issues a read(Q), then
• the value returned is the content of version Qk
• If R-timestamp(Qk) < TS(Ti), set R-timestamp(Qk) = TS(Ti),
2. If transaction Ti issues a write(Q)
1. if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back.
2. if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten
3. Otherwise, a new version Qi of Q is created
• W-timestamp(Qi) and R-timestamp(Qi) are initialized to TS(Ti).
Multiversion Schemes
• Maintains the different versions of data items
• Multiversion schemes keep old versions of data item to increase
concurrency.
• Multiversion Timestamp Ordering
• Multiversion Two-Phase Locking
• Each successful write results in the creation of a new version of the data
item written.
• Use timestamps to label versions.
• When a read(Q) operation is issued, select an appropriate version of Q
based on the timestamp of the transaction, and return the value of the
selected version.
• reads never have to wait as an appropriate version is returned
immediately.
Multiversion Timestamp Ordering
• Each data item Q has a sequence of versions <Q1, Q2,...., Qm>. Each
version Qk contains three data fields:
• Content – data value of version Qk.
• W-timestamp(Qk) -- timestamp of the transaction that created (wrote)
version Qk
• R-timestamp(Qk) -- largest timestamp of a transaction that successfully read
version Qk
• When a transaction Ti creates a new version Qk of Q, Qk's W-
timestamp and R-timestamp are initialized to TS(Ti).
• R-timestamp of Qk is updated whenever a transaction Tj reads Qk, and
TS(Tj) > R-timestamp(Qk).
Multiversion Timestamp Ordering (Cont)
• Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk
denote the version of Q whose write timestamp is the largest write
timestamp less than or equal to TS(Ti).
1. If transaction Ti issues a read(Q), then the value returned is the content of
version Qk.
2. If transaction Ti issues a write(Q)
1. if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back.
2. if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten
3. else a new version of Q is created.
• Observe that
• Reads always succeed
• A write by Ti is rejected if some other transaction Tj that (in the serialization order
defined by the timestamp values) should read
Ti's write, has already read a version created by a transaction older than Ti.
• Protocol guarantees serializability
Multiversion Two-Phase Locking
• Differentiates between read-only transactions and update transactions
• Update transactions acquire read and write locks, and hold all locks up
to the end of the transaction. That is, update transactions follow
rigorous two-phase locking.
• Each successful write results in the creation of a new version of the data item
written.
• Each version of a data item has a single timestamp whose value is obtained
from a counter ts-counter that is incremented during commit processing.
• Read-only transactions are assigned a timestamp by reading the
current value of ts-counter before they start execution; they follow
the multiversion timestamp-ordering protocol for performing reads.
Multiversion Two-Phase Locking (Cont.)
• When an update transaction wants to read a data item:
• it obtains a shared lock on it, and reads the latest version.
• When it wants to write an item
• it obtains X lock on; it then creates a new version of the item and sets this
version's timestamp to .
• When update transaction Ti completes, commit processing occurs:
• Ti sets timestamp on the versions it has created to ts-counter + 1
• Ti increments ts-counter by 1
• Read-only transactions that start after Ti increments ts-counter will see
the values updated by Ti.
• Read-only transactions that start before Ti increments the
ts-counter will see the value before the updates by Ti.
• Only serializable schedules are produced.
MVCC: Implementation Issues
• Creation of multiple versions increases storage overhead
• Extra tuples
• Extra space in each tuple for storing version information
• Versions can, however, be garbage collected
• E.g. if Q has two versions Q5 and Q9, and the oldest active transaction has
timestamp > 9, than Q5 will never be required again
Crash Recovery System
Failure Classification
• Transaction failure :
• Logical errors: transaction cannot complete due to some internal error
condition
• System errors: the database system must terminate an active transaction due
to an error condition (e.g., deadlock)
• System crash: a power failure or other hardware or software failure
causes the system to crash.
• Fail-stop assumption: non-volatile storage contents are assumed to not be
corrupted by system crash
• Database systems have numerous integrity checks to prevent corruption of disk data
• Disk failure: a head crash or similar disk failure destroys all or part of
disk storage
• Destruction is assumed to be detectable: disk drives use checksums to detect
failures
Recovery Algorithms
• Consider transaction Ti that transfers $50 from account A to
account B
• Two updates: subtract 50 from A and add 50 to B
• Transaction Ti requires updates to A and B to be output to the
database.
• A failure may occur after one of these modifications have been made but
before both of them are made.
• Modifying the database without ensuring that the transaction will commit
may leave the database in an inconsistent state
• Not modifying the database may result in lost updates if failure occurs just
after transaction commits
Recovery Algorithms
• Recovery algorithms have two parts
1. Actions taken during normal transaction processing to ensure enough
information exists to recover from failures
2. Actions taken after a failure to recover the database contents to a state
that ensures atomicity, consistency and durability
Data Access
• Physical blocks are those blocks residing on the disk.
• Buffer blocks are the blocks residing temporarily in main memory.
• Block movements between disk and main memory are initiated through
the following two operations:
• input(B) transfers the physical block B to main memory.
• output(B) transfers the buffer block B to the disk, and replaces the appropriate
physical block there.
• We assume, for simplicity, that each data item fits in, and is stored
inside, a single block.
Example of Data Access
buffer
Buffer Block A input(A)
X A
Buffer Block B Y B
output(B)
read(X)
write(Y)

x2
x1
y1

work area work area


of T1 of T2

memory disk
Data Access (Cont.)
• Each transaction Ti has its private work-area in which local copies of all
data items accessed and updated by it are kept.
• Ti's local copy of a data item X is called xi.
• Transferring data items between system buffer blocks and its private
work-area done by:
• read(X) assigns the value of data item X to the local variable xi.
• write(X) assigns the value of local variable xi to data item {X} in the buffer block.
• Note: output(BX) need not immediately follow write(X). System can perform the
output operation when it deems fit.
• Transactions
• Must perform read(X) before accessing X for the first time (subsequent reads can
be from local copy)
• write(X) can be executed at any time before the transaction commits
Recovery and Atomicity
• To ensure atomicity despite failures, we first output information
describing the modifications to stable storage without modifying the
database itself.
• log-based recovery mechanisms
• Less used alternative: shadow-copy and shadow-paging

shadow-copy
Log-Based Recovery
• A log is a sequence of log records.
• Log of each transaction is maintained in some stable storage so that if any
failure occurs, then it can be recovered from there. The log is kept on stable
storage
• If any operation is performed on the database, then it will be recorded in the log.
• process of storing the logs should be done before the actual transaction is
applied in the database.
• When transaction Ti starts, it registers itself by writing a
<Ti start> log record
• Before Ti executes write(X), a log record
<Ti, X, V1, V2>
is written, where V1 is the value of X before the write (the old value), and V2 is the
value to be written to X (the new value).
• When Ti finishes it last statement, the log record <Ti commit> is written.
Log-Based Recovery
• Two approaches using logs
• Immediate database modification
• Deferred database modification.

<Tn, Start> When the transaction is initiated, then it writes 'start' log
<Tn, City, 'Kathamndu’, ' lalitpur' > When the transaction modifies the City from 'Noida' to 'Bangalore',
then another log is written to the file.
<Tn, Commit> When the transaction is finished, then it writes another log to
indicate the end of the transaction.
Immediate Database Modification(Log Based Recovery)

• database is modified immediately after every operation. It follows an actual


database modification.
• database modification occurs while the transaction is still active
Current Value in DB
A=100 Old Value
B=200 New Value

T1 Log File Log File


R(A) <T1,Start> <T1,Start>
A=A+100 <T1,A,100,200> <T1,A,1000,2000>
W(A) <T1,B,200,400> <T1,B,5000,6000> REDO
R(B) <T1,Commit> <T1,Commit>
B=B+200 <T2,Start>
W(B) Redo Is done In this <T2,C,700,800> UNDO
case because T1 No T2 Commit Found for
Start and T1 T2 Start
commit is present
Immediate Database Modification
• The immediate-modification scheme allows updates of an
uncommitted transaction to be made to the buffer, or the disk itself,
before the transaction commits
• Update log record must be written before database item is written
• We assume that the log record is output directly to stable storage
• Output of updated blocks to stable storage can take place at any time
before or after transaction commit
• Order in which blocks are output can be different from the order in
which they are written.
Immediate DB Modification Recovery Example

Below we show the log as it appears at three instances of time.

Recovery actions in each case above are:


(a) undo (T0): B is restored to 2000 and A to 1000, and log records
<T0, B, 2000>, <T0, A, 1000>, <T0, abort> are written out
(b) redo (T0) and undo (T1): A and B are set to 950 and 2050 and C is
restored to 700. Log records <T1, C, 700>, <T1, abort> are written out.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Transaction Commit
• A transaction is said to have committed when its commit log record
is output to stable storage
• All previous log records of the transaction must have been output already
• Writes performed by a transaction may still be in the buffer when
the transaction commits, and may be output later
Immediate Database Modification Example

Log Write Output

<T0 start>
<T0, A, 1000, 950>
<T0, B, 2000, 2050>
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
BC output before T1
C = 600 commits
BB , BC
<T1 commit>
BA
• Note: BX denotes block containing X. BA output after T0
commits
Undo and Redo Operations

• Undo of a log record <Ti, X, V1, V2> writes the old value V1 to X
• Redo of a log record <Ti, X, V1, V2> writes the new value V2 to X
• Undo and Redo of Transactions
• undo(Ti) restores the value of all data items updated by Ti to their old values,
going backwards from the last log record for Ti
• each time a data item X is restored to its old value V a special log record <Ti , X, V> is
written out
• when undo of a transaction is complete, a log record
<Ti abort> is written out.
• redo(Ti) sets the value of all data items updated by Ti to the new values, going
forward from the first log record for Ti
• No logging is done in this case
Undo and Redo on Recovering from Failure

• When recovering after failure:


• Transaction Ti needs to be undone if the log
• contains the record <Ti start>,
• but does not contain either the record <Ti commit> or <Ti abort>.
• Transaction Ti needs to be redone if the log
• contains the records <Ti start>
• and contains the record <Ti commit> or <Ti abort>
• Note that If transaction Ti was undone earlier and the <Ti abort>
record written to the log, and then a failure occurs, on recovery
from failure Ti is redone
• such a redo redoes all the original actions including the steps that
restored old values
• Known as repeating history
• Seems wasteful, but simplifies recovery greatly
Deferred database modification:
• the transaction does not modify the database until it has committed.
• all the logs are created and stored in the stable storage, and the database is updated
when a transaction commits.
• The deferred-modification scheme performs updates to buffer/disk only at the time of
transaction commit
• Simplifies some aspects of recovery
• But has overhead of storing local copy
• We Store Only The New Value
Current Value T1 Log File Log File
in DB R(A) <T1,Start> <T1,Start>
A=100 A=A+100 <T1,A,200> <T1,A, 200>
B=200 REDO REDO
W(A) <T1,B,400> <T1,B,400>
R(B) <T1,Commit> <T1,Commit>
B=B+200 <T2,Start> Do
W(B) Redo Is done In this <T2,C, 500> Nothing/Roll
case because T1 Start back
and T1 commit is
present
Recovery using Log records
• When the system is crashed, then the system consults the log to find which transactions need to
be undone and which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then the Transaction Ti needs to be
redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit> or <Ti, abort>, then the
Transaction Ti needs to be undone.
Checkpoints
• Redoing/undoing all transactions recorded in the log can be very slow
1. processing the entire log is time-consuming if the system has run for a long time
2. we might unnecessarily redo transactions which have already output their updates
to the database.
• Streamline recovery procedure by periodically performing checkpointing
1. Output all log records currently residing in main memory onto stable storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint L> onto stable storage where L is a list of all
transactions active at the time of checkpoint.
• All updates are stopped while doing checkpointing
Checkpoints (Cont.)
• During recovery we need to consider only the most recent
transaction Ti that started before the checkpoint, and transactions
that started after Ti.
1. Scan backwards from end of log to find the most recent <checkpoint L>
record
• Only transactions that are in L or started after the checkpoint need to be
redone or undone
• Transactions that committed or aborted before the checkpoint already have
all their updates output to stable storage.
• Some earlier part of the log may be needed for undo operations
1. Continue scanning backwards till a record <Ti start> is found for every
transaction Ti in L.
• Parts of log prior to earliest <Ti start> record above are not needed for
recovery, and can be erased whenever desired.
Example of Checkpoints
Tc Tf
T1
T2
T3
T4

checkpoint system failure

• T1 can be ignored (updates already output to disk due to


checkpoint)
• T2 and T3 redone.
• T4 undone
Recovery Algorithm
• Logging (during normal operation):
• <Ti start> at transaction start
• <Ti, Xj, V1, V2> for each update, and
• <Ti commit> at transaction end
• Transaction rollback (during normal operation)
• Let Ti be the transaction to be rolled back
• Scan log backwards from the end, and for each log record of Ti of the form <Ti,
Xj, V1, V2>
• perform the undo by writing V1 to Xj,
• write a log record <Ti , Xj, V1>
• such log records are called compensation log records
• Once the record <Ti start> is found stop the scan and write the log record <Ti
abort>
Recovery Algorithm (Cont.)
• Recovery from failure: Two phases
• Redo phase: replay updates of all transactions, whether they committed,
aborted, or are incomplete
• Undo phase: undo all incomplete transactions
• Redo phase:
1. Find last <checkpoint L> record, and set undo-list to L.
2. Scan forward from above <checkpoint L> record
1. Whenever a record <Ti, Xj, V1, V2> or <Ti, Xj, V2> is found, redo it by writing V2 to Xj
2. Whenever a log record <Ti start> is found, add Ti to undo-list
3. Whenever a log record <Ti commit> or <Ti abort> is found, remove Ti from undo-list
Recovery Algorithm (Cont.)
• Undo phase:
1. Scan log backwards from end
1. Whenever a log record <Ti, Xj, V1, V2> is found where Ti is in undo-list perform same
actions as for transaction rollback:
1. perform undo by writing V1 to Xj.
2. write a log record <Ti , Xj, V1>
2. Whenever a log record <Ti start> is found where Ti is in undo-list,
1. Write a log record <Ti abort>
2. Remove Ti from undo-list
3. Stop when undo-list is empty
 i.e. <Ti start> has been found for every transaction in undo-list

After undo phase completes, normal transaction processing can


commence
Example of Recovery with chekcpoint
Shadow Paging
• Shadow paging is an alternative to log-based recovery; this scheme is useful
if transactions execute serially
• Idea: maintain two page tables during the lifetime of a transaction –the
current page table, and the shadow page table
• Store the shadow page table in nonvolatile storage, such that state of the
database prior to transaction execution may be recovered.
• Shadow page table is never modified during execution
• To start with, both the page tables are identical. Only current page table is
used for data item accesses during execution of the transaction.
• Whenever any page is about to be written for the first time
• A copy of this page is made onto an unused page.
• The current page table is then made to point to the copy
• The update is performed on the copy
Example of Shadow Paging
Sample Page Table

Shadow and current page tables after write to page 4


Shadow Paging (Cont.)
• To commit a transaction :
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as follows:
• keep a pointer to the shadow page table at a fixed (known) location on disk.
• to make the current page table the new shadow page table, simply update the
pointer to point to current page table on disk
• Once pointer to shadow page table has been written, transaction is
committed.
• No recovery is needed after a crash — new transactions can start right
away, using the shadow page table.
• Pages not pointed to from current/shadow page table should be freed
(garbage collected).
Show Paging (Cont.)
• Advantages of shadow-paging over log-based schemes
• no overhead of writing log records
• recovery is trivial
• Disadvantages :
• Copying the entire page table is very expensive
• Can be reduced by using a page table structured like a B+-tree
• No need to copy entire tree, only need to copy paths in the tree that lead to updated leaf
nodes
• Commit overhead is high even with above extension
• Need to flush every updated page, and page table
• Data gets fragmented (related pages get separated on disk)
• After every transaction completion, the database pages containing old
versions of modified data need to be garbage collected
• Hard to extend algorithm to allow transactions to run concurrently
• Easier to extend log based schemes
In the ___________ scheme, a transaction that wants to update the database first creates a complete
copy of the database.
A Shadow copy
b. Shadow Paging
c. Update log records
d. All of the mentioned

If database modifications occur while the transaction is still active, the transaction is said to use the
___________technique.
a. Deferred-modification
b. Late-modification
c. Immediate-modification
d. Undo

____________ using a log record sets the data item specified in the log record to the old value.
a. Deferred-modification
b. Late-modification
c. Immediate-modification
d. Undo
In the __________ phase, the system replays updates of all transactions by scanning the log forward
from the last checkpoint.
a. Repeating
b. Redo
c. Replay
d. Undo
Query Processing and Optimization
Query Processing? : Transforming High level language to Low Level Language Steps in Query Processing
1. Parsing and Translation
Internal Form, 2. Optimization
High Level Query Check syntax and 3. Evaluations
verify relations

Best Query

Query Tree
Answers

Meta data is used Consists of DBMS


for the best Catalog
evaluation of
query
Basic Steps in Query Processing
• Parsing and translation : translate the query into its internal form. This
is then translated into relational algebra.
• Parser checks syntax, verifies relations
• Evaluation : The query-execution engine takes a query-evaluation plan,
executes that plan, and returns the answers to the query.
• Query Optimization:
• Amongst all equivalent evaluation plans choose the one with lowest cost. Cost
is estimated using statistical information from the database catalog ,
• to determine the most efficient way to execute a given query by considering
the possible query plans
• the overall process of choosing the most efficient means of executing a SQL
statement
• e.g.. number of tuples in each relation, size of tuples, etc.
Basic Steps in Query Processing: Optimization

• A relational algebra expression may have many equivalent expressions


• E.g.,
• salary75000(salary(instructor)) is equivalent to
• salary(salary75000(instructor))
• Each relational algebra operation can be evaluated using one of several
different algorithms
• Correspondingly, a relational-algebra expression can be evaluated in many ways.
• Annotated expression specifying detailed evaluation strategy is called an
evaluation-plan. E.g.,:
• Use an index on salary to find instructors with salary < 75000,
• Or perform complete relation scan and discard instructors with salary  75000
Measuring of Query Cost
• Many factors contribute to time cost
• disk access, CPU, and network communication

• Disk Accesses
• How many Disk Access it takes to process the query
• CUP Cycles
• How many CPU cycles it consumed to process query
• Transit time in N/W
• Distributed and Parallel system is concerned

• CPU Cost is difficult to calculate, so mainly concentrated


with Disk Access
• CPU speed increasing at faster rate than disk speed
• CPU cost is relatively lower than disk cost
• Primary consideration with distributed /parallel system
Measures of Query Cost
• Many factors contribute to time cost
• disk access, CPU, and network communication
• Cost can be measured based on
• response time, i.e. total elapsed time for answering query, or
• total resource consumption
• We use total resource consumption as cost metric
• Response time harder to estimate, and minimizing resource consumption is a
good idea in a shared database
• We ignore CPU costs for simplicity
• Real systems do take CPU cost into account
• Network costs must be considered for parallel systems
• We describe how estimate the cost of each operation
• We do not include cost to writing output to disk
Measures of Query Cost (Cont.)
• Required data may be buffer resident already, avoiding disk I/O,
• But hard to take into account for cost estimation
• Several algorithms can reduce disk IO by using extra buffer space
• Amount of real memory available to buffer depends on other concurrent queries
and OS processes, known only during execution
• Worst case estimates assume that no data is initially in buffer and only the minimum
amount of memory needed for the operation is available
• But more optimistic estimates are used in practice
• Several algorithms can reduce disk IO by using extra buffer space
• Amount of real memory available to buffer depends on other concurrent queries
and OS processes, known only during execution
• We often use worst case estimates, assuming only the minimum amount of
memory needed for the operation is available
• Required data may be buffer resident already, avoiding disk I/O
• But hard to take into account for cost estimation
Disk Access Cost
• No of seeks
• No of blocks read
• No of blocks write
• Cost(W)>Cost )
No of seeks(N) ;Cost = N*Ang Seek Time
No of blocks read; cost =N* Avg block read
No of blocks write ; cost=N* avg block write

No of blcoks transfer from disk and no of seeks


Are major parameters
tT=time to transfer 1 block
tS=Time for 1 seek
Cost for b block transfer plus S seek
= b*tT+S*tS
Measures of Query Cost
• Disk cost can be estimated as:
• Number of seeks * average-seek-cost
• Number of blocks read * average-block-read-cost
• Number of blocks written * average-block-write-cost
• For simplicity we just use the number of block transfers from disk and the number of
seeks as the cost measures
• tT – time to transfer one block • We ignore CPU
• Assuming for simplicity that write cost is same as read cost costs for simplicity
• tS – time for one seek • Real systems do take
CPU cost into
• Cost for b block transfers plus S seeks
account
= b * tT + S * tS
• We do not include
• tS and tT depend on where data is stored; with 4 KB blocks: cost to writing
• High end magnetic disk: tS = 4 msec and tT =0.1 msec output to disk in
• SSD: tS = 20-90 microsec and tT = 2-10 microsec for 4KB our cost formulae
Evaluation of Expressions
• algorithms for individual operations
• Alternatives for evaluating an entire expression tree
• Materialization: generate results of an expression whose inputs
are relations or are already computed, materialize (store) it on
disk. Repeat.
• Pipelining: pass on tuples to parent operations even as an
operation is being executed
Materialization
• Materialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use
intermediate results materialized into temporary relations to evaluate next-level operations.
• E.g., in figure below, compute and store
 building"Watson " (department)
then compute the store its join with instructor, and finally compute the projection on name.

• Materialized evaluation is always applicable


• Cost of writing results to disk and reading them back can be
quite high
• Our cost formulas for operations ignore cost of writing
results to disk, so
Overall cost = Sum of costs of individual operations +
cost of writing intermediate results to disk
• Double buffering: use two output buffers for each operation,
when one is full write it to disk while the other is getting filled
• Allows overlap of disk writes with computation and
reduces execution time
Pipelining
• Pipelined evaluation : evaluate several operations simultaneously, passing the
results of one operation on to the next.
• E.g., in previous expression tree, don’t store result of
 building"Watson " (department)
• instead, pass tuples directly to the join. Similarly, don’t store result of join,
pass tuples directly to projection.
• Much cheaper than materialization: no need to store a temporary relation to disk.
• Pipelining may not always be possible – e.g., sort, hash-join.
• For pipelining to be effective, use evaluation algorithms that generate output
tuples even as tuples are received for inputs to the operation.
• Pipelines can be executed in two ways: demand driven and producer driven
Pipelining (Cont.)
• In demand driven or lazy evaluation
• system repeatedly requests next tuple from top level operation
• Each operation requests next tuple from children operations as required, in order
to output its next tuple
• In between calls, operation has to maintain “state” so it knows what to return
next
• In producer-driven or eager pipelining
• Operators produce tuples eagerly and pass them up to their parents
• Buffer maintained between operators, child puts tuples in buffer, parent removes tuples from
buffer
• if buffer is full, child waits till there is space in the buffer, and then generates more tuples
• System schedules operations that have space in output buffer and can process
more input tuples
• Alternative name: pull and push models of pipelining
Pipelining (Cont.)
• Implementation of demand-driven pipelining
• Each operation is implemented as an iterator implementing the following operations
• open()
• E.g. file scan: initialize file scan
• state: pointer to beginning of file
• E.g.merge join: sort relations;
• state: pointers to beginning of sorted relations
• next()
• E.g. for file scan: Output next tuple, and advance and store file pointer
• E.g. for merge join: continue with merge from earlier state till
next output tuple is found. Save pointers as iterator state.
• close()
Evaluation Algorithms for Pipelining
• Some algorithms are not able to output results even as they get input tuples
• E.g. merge join, or hash join
• intermediate results written to disk and then read back
• Algorithm variants to generate (at least some) results on the fly, as input tuples are read in
• E.g. hybrid hash join generates output tuples even as probe relation tuples in
the in-memory partition (partition 0) are read in

• Double-pipelined join technique: Hybrid hash join, modified to buffer partition


0 tuples of both relations in-memory, reading them as they become available,
and output results of any matches between partition 0 tuples
• When a new r0 tuple is found, match it with existing s0 tuples, output matches, and save it
in r0
• Symmetrically for s0 tuples
Query Optimization
• Query optimization is the process of selecting the most efficient query-evaluation
plan from among the many strategies usually possible for processing a given query,
especially if the query is complex
• We do not expect users to write their queries so that they can be processed
efficiently. Rather, we expect the system to construct a query-evaluation plan that
minimizes the cost of query evaluation. This is where query optimization comes into
play.
Query Optimization
• Alternative ways of evaluating a given query
• Equivalent expressions
• Different algorithms for each operation
Query Optimization
• An evaluation plan defines exactly what algorithm is used for each operation, and how the
execution of the operations is coordinated.

 Find out how to view query execution plans on your favorite database
Optimization
• Query Optimization: Amongst all equivalent evaluation plans choose the one with
lowest cost.
• Cost is estimated using statistical information from the
database catalog
• e.g. number of tuples in each relation, size of tuples, etc.
• A relational algebra expression may have many equivalent expressions
• E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
• Each relational algebra operation can be evaluated using one of several different
algorithms
• Correspondingly, a relational-algebra expression can be evaluated in many ways.
• Annotated expression specifying detailed evaluation strategy is called an evaluation-
plan.
• E.g., can use an index on salary to find instructors with salary < 75000,
• or can perform complete relation scan and discard instructors with salary  75000
Cost-based query optimization
• Cost difference between evaluation plans for a query can be enormous
• E.g. seconds vs. days in some cases
• Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
• Estimation of plan cost based on:
• Statistical information about relations. Examples:
• number of tuples, number of distinct values for an attribute
• Statistics estimation for intermediate results
• to compute cost of complex expressions
• Cost formulae for algorithms, computed using statistics
Generating Equivalent Expressions
Transformation of Relational Expressions
• Two relational algebra expressions are said to be equivalent if the two expressions
generate the same set of tuples on every legal database instance
• Note: order of tuples is irrelevant
• we don’t care if they generate different results on databases that violate
integrity constraints
• In SQL, inputs and outputs are multisets of tuples
• Two expressions in the multiset version of the relational algebra are said to be
equivalent if the two expressions generate the same multiset of tuples on every
legal database instance.
• An equivalence rule says that expressions of two forms are equivalent
• Can replace expression of first form by second, or vice versa
Equivalence Rules
Equivalence Rules
Pictorial Depiction of Equivalence Rules
Multiple Transformations
Query Optimization: Aadditional Optimization Techniques
• Using Nested Subqueries
• Using Materialized Views
• Top-K Queries
• Join minimization
• Multiquery optimization
• Parametric Query Optimization
• Query Plan Caching

Optimizing Nested Subqueries
• Nested query example:
select name
from instructor
where exists (select *
from teaches
where instructor.ID = teaches.ID and teaches.year = 2007)
• SQL conceptually treats nested subqueries in the where clause as
functions that take parameters and return a single value or set of values
• Parameters are variables from outer level query that are used in the nested
subquery; such variables are called correlation variables
• Conceptually, nested subquery is executed once for each tuple in the
cross-product generated by the outer level from clause
• Such evaluation is called correlated evaluation
• Note: other conditions in where clause may be used to compute a join (instead of a
cross-product) before executing the nested subquery
Materialized Views
• A materialized view, or snapshot as they were previously known, is a table
segment whose contents are periodically refreshed based on a query,
either against a local or remote table.
• A materialized view is a view whose contents are computed and stored.
• Consider the view
create view department_total_salary(dept_name, total_salary) as
select dept_name, sum(salary)
from instructor
group by dept_name
• Materializing the above view would be very useful if the total salary by
department is required frequently
• Saves the effort of finding multiple tuples and adding up their amounts
• Top-K queries Multiquery optimization: find best overall plan for a set of
queries, expoiting sharing of common subexpressions between
select * queries where it is useful
from r, s
where r.B = s.B
order by r.A ascending Parametric Query Optimization:
optimizer generates a set of plans, optimal for
limit 10 different values of $1
Set of optimal plans usually small for 1 to 3
parameters

• Join minimization
select r.A, r.B :Query Plan Caching
from r, s If optimizer decides that same plan is likely to be
where r.B = s.B optimal for all parameter values, it caches plan and
reuses it, else reoptimize each time
Check if join with s is redundant, drop it Implemented in many database systems
E.g. join condition is on foreign key from r to s, r.B is
declared as not null, and no selection on s
Architecture of database systems
• traditional centralized database systems,
• client–server database systems,
• Parallel database systems, and
• distributed database systems
Architecture of database systems
• Centralized database systems are those that run on a single
computer system and do not interact with other computer
systems. Such database systems span a range from single-
user database systems running on personal computers to
high-performance database systems running on high-end
server systems
• Networking of computers allows some tasks to be executed
on a server system and some tasks to be executed on client
systems. This division of work has led to client–server
database systems.
Architecture of database systems
• Parallel processing within a computer system allows database-system activities to
be speeded up, allowing faster response to transactions, as well as more
transactions per second.
• Queries can be processed in a way that exploits the parallelism offered by the
underlying computer system. The need for parallel query processing has led to
parallel database systems.

• Distributing data across sites in an organization allows those data to reside where
they are generated or most needed, but still to be accessible from other sites and
from other departments.
• Keeping multiple copies of the database across different sites also allows large
organizations to continue their database operations even when one site is
affected by a natural disaster, such as flood, fire, or earthquake.
• Distributed database systems handle geographically or administratively
distributed data spread across multiple database systems.
Distributed Database.
A logically interrelated collection of shared data (and a
description of this data), physically distributed over a
computer network

Distributed DBMS.
Software system that permits the management of the
distributed database and makes the distribution
transparent to users.
Concepts in Distributed Database Systems

• Collection of logically-related shared data.


• Data split into fragments.
• Fragments may be replicated.
• Fragments/replicas allocated to sites.
• Sites linked by a communications network.
• Data at each site is under control of a DBMS.
• DBMSs handle local applications autonomously.
• Each DBMS participates in at least one global
application.
• A single application should be able to operate transparently on data that is:
• spread across a variety of different DBMS's
• running on a variety of different machines
• supported by a variety of different operating systems
• connected together by a variety of different communication networks
• The distribution can be geographical or local
An Example of a Distributed Database
Distributed DBMS
Types of DDBMS
• Homogeneous DDBMS
• Heterogeneous DDBMS
Advantages of DDBMSs Disadvantages of DDBMSs
• Organizational Structure • Complexity
• Shareability and Local Autonomy • Cost
• Improved Availability • Security
• Improved Reliability • Integrity Control More Difficult
• Improved Performance • Lack of Standards
• Lack of Experience
• Economics
• Database Design More
• Modular Growth Complex
Parallel Database Distributed Database
processes are tightly coupled and constitutes a single database the sites are loosely coupled and share no physical components i.e.,
system i.e., the parallel database is a centralized database and data distributed database is our geographically departed, and data are
reside in a single location distributed at several locations.
In parallel databases, query processing and transaction is In distributed databases, query processing and transaction is more
complicated. complicated in distributed database systems
In distributed databases, a local and global transaction can be
In parallel databases, it’s not applicable.
transformed into distributed database systems
In parallel databases, the data is partitioned among various disks so In distributed databases, each site preserve a local database system
that it can be retrieved faster. for faster processing due to the slow interconnection between sites
In parallel databases, there are 3 types of architecture: shared Distributed databases are generally a kind of shared-nothing
memory, shared disk, and shared shared-nothing. architecture
In distributed databases, query Optimisation techniques may be
In parallel databases, query optimization is more complicated.
different at different sites and are easy to maintain
In distributed databases, data is replicated at any number of sides to
In parallel databases, data is generally not copied.
improve the performance of systems
Distributed databases may be homogeneous or heterogeneous in
Parallel databases are generally homogeneous in nature
nature.
Skew is the major issue with the increasing degree of parallelism in Blocking due to site failure and transparency are the major issues in
parallel databases. distributed databases.
Object-Relational Data Models
• Extend the relational data model by including object orientation and
constructs to deal with added data types.
• Allow attributes of tuples to have complex types, including non-
atomic values such as nested relations.
• Preserve relational foundations, in particular the declarative access to
data, while extending modeling power.
• Upward compatibility with existing relational languages.
Complex Types and SQL
• Extensions introduced in SQL:1999 to support complex types:
• Collection and large object types
• Nested relations are an example of collection types
• Structured types
• Nested record structures like composite attributes
• Inheritance
• Object orientation
• Including object identifiers and references
• Not fully implemented in any database system currently
• But some features are present in each of the major commercial database
systems
• Read the manual of your database system to see what it supports
Structured Types and Inheritance in SQL
• Structured types (a.k.a. user-defined types) can be declared and used in SQL
create type Name as
(firstname varchar(20),
lastname varchar(20))
final
create type Address as
(street varchar(20),
city varchar(20),
zipcode varchar(20))
not final
• Note: final and not final indicate whether subtypes can be created
• Structured types can be used to create tables with composite attributes
create table person (
name Name,
address Address,
dateOfBirth date)
• Dot notation used to reference components: name.firstname
Constructor Functions
• Constructor functions are used to create values of structured types
• E.g.
create function Name(firstname varchar(20), lastname varchar(20))
returns Name
begin
set self.firstname = firstname;
set self.lastname = lastname;
end
• To create a value of type Name, we use
new Name(‘John’, ‘Smith’)
• Normally used in insert statements
insert into Person values
(new Name(‘John’, ‘Smith),
new Address(’20 Main St’, ‘New York’, ‘11001’),
date ‘1960-8-22’);
Type Inheritance
• Suppose that we have the following type definition for people:
create type Person
(name varchar(20),
address varchar(20))
• Using inheritance to define the student and teacher types
create type Student
under Person
(degree varchar(20),
department varchar(20))
create type Teacher
under Person
(salary integer,
department varchar(20))
• Subtypes can redefine methods by using overriding method in place of method in the
method declaration
Object-Relational Mapping
• Object-Relational Mapping (ORM) systems built on top of traditional
relational databases
• Implementor provides a mapping from objects to relations
• Objects are purely transient, no permanent object identity
• Objects can be retried from database
• System uses mapping to fetch relevant data from relations and construct objects
• Updated objects are stored back in database by generating corresponding
update/insert/delete statements
• The Hibernate ORM system is widely used
• Provides API to start/end transactions, fetch objects, etc
• Provides query language operating directly on object model
• queries translated to SQL
• Limitations: overheads, especially for bulk updates
Comparison of O-O and O-R Databases
• Relational systems
• simple data types, powerful query languages, high protection.
• Persistent-programming-language-based OODBs
• complex data types, integration with programming language, high
performance.
• Object-relational systems
• complex data types, powerful query languages, high protection.
• Object-relational mapping systems
• complex data types integrated with programming language, but built as a
layer on top of a relational database system
• Note: Many real systems blur these boundaries
• E.g. persistent programming language built as a wrapper on a relational
database offers first two benefits, but may have poor performance.
Data warehouse
• Definition: A data warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management's decision making
process.
• Subject-Oriented: A data warehouse can be used to analyze a particular subject area.
For example, "sales" can be a particular subject.
• Integrated: A data warehouse integrates data from multiple data sources. For
example, source A and source B may have different ways of identifying a product, but
in a data warehouse, there will be only a single way of identifying a product.
• Time-Variant: Historical data is kept in a data warehouse. For example, one can
retrieve data from 3 months, 6 months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system, where often only the most
recent data is kept. For example, a transaction system may hold the most recent
address of a customer, where a data warehouse can hold all addresses associated
with a customer.
• Non-volatile: Once data is in the data warehouse, it will not change. So, historical
data in a data warehouse should never be altered.
Key tern :
ETL-Extract , Transform , Load
Data Mining
• Data mining is the process of posing various queries and extracting useful
information, patterns, and trends often previously unknown from large quantities
of data possibly stored in databases.
• Essentially, for many organizations, the goals of data mining include improving
marketing capabilities, detecting abnormal patterns, and predicting the future
based on past experiences and current trends
Data Mining System Architecture
Knowledge Discovery in Database (KDD) Process

Data Selection(1) Relevant Data from various sources

Data Pre Processing(2) Consistent State , Removal of unnecessary information

Data Transformations(3) Suitable Format

Data Mining(4) Algorithms

Pattern Evaluation(5) Patterns and Knowledge

Knowledge Fig: Steps in Data Mining


Representation(6) Data Visualization Process/Phases of KDD in Database
Knowledge Discovery in Databases

https://www.javatpoint.com/kdd-process-in-data-mining
Database Security
• Database security is the technique that protects and secures the database against
intentional or accidental threats.
• Security concerns will be relevant not only to the data resides in an organization’s
database: the breaking of security may harm other parts of the system which may
ultimately affect the database structure. Consequently, database security
includes hardware part, software part, human resource, and data.

• To efficiently do the uses of security needs appropriate controls, which are


distinct in a specific mission and purpose for the system.

• The requirement for getting proper security while often having been neglected or
overlooked in the past days; is now more and more thoroughly checked by the
different organizations
Database Security?
• Most of the computer-based database security is listed below:
Access authorization.
Access controls.
Views.
Auditing
Authentication
Backup and recovery of data.
Data integrity.
Encryption of data.
RAID technology.
New Trends in DBMS
• Databases that bridge SQL/NoSQL
• Graph Database
• Databases in the cloud/Platform as a Service
• Big Data
• Machine learning and artificial intelligence in Database Management
System
• Automated management
• An increased focus on security
• In-memory databases

You might also like