5.
Real-Time Databases
Dr Shamala Subramaniam
5.1 Introduction 5.2 Real-time vs. general-purpose databases 5.3 Main memory databases 5.4 Scheduling of transactions 5.5 Real-time disk scheduling 5.6 Concurrency control issues
5.1 Introduction
Databases are structured and convenient way to m anage the sharing of large quantities of data among multiple task.
Applications of RT Databases
airline reservation banking, stock market systems (soft real-time no deadline after catastrophe) radar tracking systems (hard real-time)
May/November 2003
Chapter 5(DrShamala Subramaniam)
5.1 Introduction (cont.)
Definitions
transaction(Tx)- a sequence of read and write operations. history interleaving of the reads and writes query only read or update operations RI(x), WI(x)
transaction I reads from or writes into datum x
Transactions may be aborted anytime
Hence , all transactions are regarded as tentative Updates are only done after a commit point. Commit point is a point of no return, that the transaction will be completed successfully.
May/November 2003 Chapter 5(DrShamala Subramaniam)
5.1 Introduction
Conventional Database has the ACID properties: Atomicity
The transaction is done completely or not. If a transaction has several steps, the end result must be as if either all or none of these steps are carried out.
Consistency
Transaction transforms the database from one consistent state to another. A consistent state is one that results from the execution of some given sequence of transactions.
Isolation The actions of a transaction are not visible to another
transaction.
Durability The actions of a transaction on a database are
permanent.
May/November 2003
Chapter 5(DrShamala Subramaniam)
5.1 Introduction(cont.)
A given history S is serial if only one
transaction is active in the system at any one time. Final-state serialization consistency if the net effect of the operations is as if the transactions were executed in some serial order.
Eg. R1(x)W1(x)R2(x)W2(x) vs, R1(x)R2(x)W1(x)W2(x)
May/November 2003
Chapter 5(DrShamala Subramaniam)
5.2 Real-Time vs. General-Purpose Databases
Real-time databases, are felt by:
1) queries to the database have deadlines 2) the data returned requires consistency
P1. Absolute vs. relative consistency
absolute consistency: accuracy
That is data about the operating environment must be consistent with the environment.
relative consistency:
For multiple data, data must have been collected as close to one another
consistency intervals of a data variable Time 100 200 300
May/November 2003
Temperature 100 300 700
Pressure 360 720 100
If the database returns 100 for the temperatur e (measured at time 1 00) and 100 (measure d at time 300), the co mposite temperaturepressure is inconsiste nt
6
Chapter 5(DrShamala Subramaniam)
P2. Need for response-time predictability
Due to deadlines transaction response time be predictable. BUT : difficult to predict response times of Tx. Why ?
overhead required to meet the ACID properties Transactions may be aborted ; wasted time databases is to large to fit into main memory thus, must fit into in disks Page faults can occur if the desired record is not in main memory. Transactions may be data dependent Transactions may suffer delay because data us locked by another transaction.
If DB is used for hard RT, the worst-case must be used. for large response time variance, worst-case assumptions require the system to be overdesigned. Pre-Analyze Transactions Dry Run of the transaction to evaluate its processor
May/November 2003
Chapter 5(DrShamala Subramaniam)
P3. Relaxing the ACID properties
to reduce the overhead ex) data durability: if outdated data is discarded and data is frequently collected. Machine-tool control system ex) serialization consistency: if interleaving is acceptable or concurrent task.
May/November 2003
Chapter 5(DrShamala Subramaniam)
5.3 Main Memory Databases
MM not practiced largely in general-purpose database
system. Problem of disk-resident databases
disk I/O -> multiprogramming -> interleaving of transactions -> concurrency control -> overhead
Advantages of MMDB
less concurrency control required possible to schedule sequentially possible to increase locking granularity less locks to be maintained but overhead of contention which can be reduced by lowering transactions times. (lock granule a database object which is under the purview of a single lock, I.e. each lock has a domain, and granule means locking all the elements within its domain). flexible use of pointers and indexing scheme x disk Stable main memory (buffer) solution for transaction logs.
May/November 2003 Chapter 5(DrShamala Subramaniam)
5.3 Main Memory Databases (cont.)
Advantages of MMDB
flexible use of pointers indexing scheme x disk B-three (3 levels because its stored in blocks, it must be shallow) x main memory , since it is less expensive to traverse so deeper tree . Stable main memory (buffer) solution for transaction logs.
May/November 2003
Chapter 5(DrShamala Subramaniam)
10
Disadvantage of MMDB
limited size of main memory disk is required for the log Durability becomes harder to maintain. Frequent backups are necessary volatility of MM necessitates use of disk for backup Disk degrade more gracefully single unit fails, rest are functional; if MM fails whole system are possible for degradation
May/November 2003
Chapter 5(DrShamala Subramaniam)
11
5.4 Scheduling of Transactions
Transactions are granted access to processors based on their prioritieseg Performance Measure Performance measure
the fraction of Tx violating deadlines
Transaction classes( due to intractable computation
of scheduling transactions) thus use heuristics classes
those that have missed their deadlines (kept in background) those that have not (scheduled using EDF)
In case of heavy workloads
EDF does not work well congestion control: admission controlrefusing admission of additional transactions.
May/November 2003 Chapter 5(DrShamala Subramaniam)
12
Adaptive earliest deadline (AED) algorithm
combines congestion control with the EDF algorithm
data structure: a list of pending transactions(Tx) the HIT group: those that are located in the first H elements in the list the MISS group: the remaining ones objective: to meet all deadlines in the HIT group and serve the MISS group if any time is left over
May/November 2003
Chapter 5(DrShamala Subramaniam)
13
procedure
1. Arriving Txs are randomly inserted in the list. 2. Prioritize and schedule the Txs in the HIT group according to the EDF algorithm. Schedule the Txs in MISS group if no pending Txs exist in HIT group, in order of their positions in list. 3. Upon completion, Tx is removed from the list. * Once a Tx is assigned to a group, it stays there.
H: control parameter
if H = infinity: AED -> EDF
if H = 0: a random service scheme H can be adjusted adaptively.
May/November 2003
Chapter 5(DrShamala Subramaniam)
14
Clear reflection of load significance
adjusting H repeatedly:
# Txs in HIT group that meet deadlines success(HIT) = ----------------------------------------total # of Txs in HIT group total # of Txs that meet their deadlines success ---- (ALL) = -------------------------------------total # of Txs 1. H = success(HIT) * H * 1.05 2. If success(ALL) < 0.95, then H = min{ H, success(ALL) * total # of Txs * 1.25 }
extensions
case of transactions with different values to the system adaptive earliest virtual deadline (AEVD) algorithm
May/November 2003
Chapter 5(DrShamala Subramaniam)
15
5.6 Concurrency Control Issues in the Pessimistic Approach
Strict two-phase locking
two phases
1. growing phase: acquire the read and write locks - transaction is aborted or committed 2. shrinking phase: release locks
deadlock problem priority inversion problem from real-time standpoint
highest priority task H, middle M, lowest priority task L where H and L share data direct blocking: H is blocked from execution by L indirect blocking: M can preempt L causing delays to H
May/November 2003
Chapter 5(DrShamala Subramaniam)
16
5.6 Concurrency Control Issues
Concurrency Control Issues is to allow transactions to execute in parallel while ensuring that the net effect of their execution on the database is as if they had been executed in some serial order
Pessimistic Approach -first ensure the transaction will not
violate the serialization consistency
Optimistic Approach
May/November 2003
Chapter 5(DrShamala Subramaniam)
17
Pessimistic Concurrency Control
Two-phase locking The procedure requires that the transaction lock and unlock in two(2) distinct phases.
a) The locking phase acquires the read & write it needs b) The unlocking phase must follow(not overlap) the locking phase.
Thus, the transaction must obtain all the locks it will need before it releases any locks. However, is has a potential deadlock
May/November 2003
Chapter 5(DrShamala Subramaniam)
18
For RT systems, unmodified 2-phase problems
Prob: H has to wait for L due to data conflicts. Solutions to priority inversion problem
priority inheritance (L inherit the priority of H)
Not appropriate for RT systems database tasks: long execution times not practical to make H wait until L completes
aborting the lower-priority transaction
may be wasteful
threshold-based solution
if L is sufficiently close to completion, L inherits the priority of H , and makes H wait until it completes sufficiently: defined by the threshold if not, L is aborted, and H starts immediately for H, the blocking time is bounded by the threshold extension: make the threshold a function of the difference in priority levels between H and L
Multiple version (next slide)
May/November 2003 Chapter 5(DrShamala Subramaniam)
19
Multiple version approach
use multiple versions of data
when a Tx writes a data item, it does not lock the database copy; rather it writes into its own private copy write lock: not exclusive Additional certify lock for certification: upon completion of Tx, all of its writes are copied over to DB read operations: on the latest certified version of data
locking rules
requested lock locks already set read write certify -----------------------------------------------read granted granted blocked write granted granted granted certify blocked granted blocked
May/November 2003
Chapter 5(DrShamala Subramaniam)
20
Multiple version approach
use multiple versions of data
when a Tx writes a data item
it does not lock the database copy rather it writes into its own private copy Thus, you can get a write lock on a granule
3 locks :
Write Read Certify (additional)
May/November 2003
Chapter 5(DrShamala Subramaniam)
21
Problem with multiversion
priority inversion
e.g., when H tries to obtain a certify lock on some data item that L currently read-lock
solutions to priority inversion problem
the locking rules modified
lock requested by H locks already set by L read write certify -----------------------------------------------------------------------read granted L-aborted can not occur write granted granted granted certify conversion granted conversion lock requested by L locks already set by H read write certify -----------------------------------------------------------------------read granted granted blocked write blocked granted blocked certify blocked granted blocked
Note: L-aborted = lower-priority transaction(L) is aborted conversion = lower-priority transaction lock converted to write
May/November 2003
Chapter 5(DrShamala Subramaniam)
22
to reduce the number of Ls abortions
when L has a certify lock on a data item, H is made to wait for a read lock on that item when L requests a certify lock on an item that is write-locked by H, the request is granted to L
lock requested by H locks already set by L read write certify ------------------------------------------------------------------------read granted granted L-aborted write g/b granted granted certify blocked granted conversion lock requested by L locks already set by H read write certify ------------------------------------------------------------------------read granted granted blocked write blocked granted g/b certify blocked granted blocked
Note: L-aborted = lower-priority transaction(L) is aborted g/b = either granted or blocked, depending on implementation
May/November 2003
Chapter 5(DrShamala Subramaniam)
23
5.5 Real-Time Disk Scheduling
Unavoidable disk access Access time model
ta = tw + tp + tt tw =time spent in q, tp=time to position the arm over the track tt = time to transfer the desired block of data
Performance measure
miss ratio mean tardy time
Algorithms
FCFS SCAN(elevator)* -in one direction,the return journey serve CSCAN- in one direction but return journey no serves SSTF(Shortest Seek Time) serves the request closest to the arm
May/November 2003
Chapter 5(DrShamala Subramaniam)
24
Real-time algorithms
ELEVATOR (SCAN) modified
Prioritize based on their deadlines Highest Priority is served first If >1, the arm goes to the nearest one If none (no high priority just does the elevator algo
EDF algorithm Feasible deadline SCAN (FD-SCAN)
Pick the earliest feasible request: the target of SCAN Along the way serve all the request pending that lie between them
Another variation
Considers the deadline and the time taken to move the arm to serve it . Thus, feasible if td + current time deadline
May/November 2003
Chapter 5(DrShamala Subramaniam)
25
Real-time algorithms (cont.)
shortest seek and earliest deadline by ordering (SSEDO), by value (SSEDV)
a queue sorted according to deadline a window of size m: first m requests in the queue service first the request with the highest priority in the window SSEDO: based on seek distance in the window & queue SSEDV: based on seek distance and deadline
May/November 2003
Chapter 5(DrShamala Subramaniam)
26
5.9 Two-phase approach to improve predictability
two main causes of uncertainty in estimating execution time of a real-time Tx
data-dependent I/O: where the data will be on disk serialization consistency: blocking, abort
two phases of scheduling
1. the system determines which data items are required, and completes required disk I/O.
upon completion, all data items required are in main memory no transactions are blocked in the first phase
2. carry out the prescribed computations
May/November 2003
Chapter 5(DrShamala Subramaniam)
27