DBMS Revision Notes
Serializability in DBMS
Serializability is the highest level of isolation in a database system that ensures the correctness of
concurrent transaction execution.
It guarantees that the outcome of executing multiple transactions concurrently is the same as if the
transactions had been executed serially (one after another).
Types of Serializability:
1. Conflict Serializability
- Conflict equivalent: same operations and order of conflicting operations.
- A schedule is conflict serializable if it can be transformed into a serial schedule by swapping
non-conflicting operations.
2. View Serializability
- View equivalent: same initial reads, read operations read the same values, final writes are the
same.
- More general but harder to test than conflict serializability.
Testing Conflict Serializability:
- Use a Precedence Graph: nodes = transactions; edges show order of conflicting operations.
- Acyclic graph conflict serializable; cyclic not serializable.
Serial vs Serializable Schedules:
- Serial: one transaction at a time; low concurrency.
- Serializable: interleaved but ensures correctness and consistency.
Advantages:
- Guarantees data consistency.
- Prevents anomalies like dirty read, lost update, etc.
Drawbacks:
- Reduced concurrency.
- Computationally expensive checks in view serializability.
Multi-Version Concurrency Control (MVCC)
MVCC is a concurrency control method that allows multiple transactions to occur simultaneously by
maintaining multiple versions of data items.
How it works:
- Each write creates a new version with a timestamp.
- Reads access a version valid at the transaction's start time.
Advantages:
- Readers dont block writers.
- No read-write conflicts or deadlocks.
- Higher concurrency.
Drawbacks:
- More storage required for versions.
- Needs garbage collection for old versions.
Example:
- T1 starts, reads A = 100.
- T2 updates A to 200.
- T1 still reads old version (100), T2 commits new version (200).
Used in: PostgreSQL, Oracle, MySQL (InnoDB).
Time-Stamping Protocols
Time-stamping protocols assign each transaction a unique timestamp to control the order of
operations for concurrency.
Each data item has:
- read_TS(X): latest timestamp that read X.
- write_TS(X): latest timestamp that wrote X.
Rules:
1. Read(X): If TS(Ti) < write_TS(X) abort Ti.
2. Write(X): If TS(Ti) < read_TS(X) or write_TS(X) abort Ti.
Advantages:
- No deadlocks.
- Easy serializability.
Drawbacks:
- High abort rate.
- Starvation and overhead.
Optimizations:
- Thomas Write Rule: ignores obsolete writes.
- MVCC: reduces aborts by allowing multiple versions.
Log-Based Recovery
Log-based recovery ensures database consistency after a crash using a transaction log.
Types of Logs:
- Undo log: for rolling back uncommitted transactions.
- Redo log: for redoing committed transactions.
Each log record includes: transaction ID, data item, old value, new value.
Checkpoint:
- A point where the system saves current state.
- Reduces recovery time by limiting logs to process.
Recovery Process:
1. Analysis: identify active transactions.
2. Redo: redo all committed transactions after checkpoint.
3. Undo: rollback all uncommitted transactions.
Database Triggers in PL/SQL
Triggers are stored procedures in PL/SQL that automatically execute in response to certain events
on a table.
Types:
1. BEFORE Trigger Executes before an operation (e.g., insert/update/delete).
2. AFTER Trigger Executes after an operation.
3. INSTEAD OF Trigger Used on views to perform custom operations.
Uses:
- Enforce business rules.
- Maintain audit logs.
- Validate data.
Example (AFTER):
CREATE OR REPLACE TRIGGER after_insert_trigger
AFTER INSERT ON employees
FOR EACH ROW
BEGIN
INSERT INTO audit_log(emp_id, action) VALUES (:NEW.id, 'INSERTED');
END;
Example (INSTEAD OF):
CREATE OR REPLACE TRIGGER view_trigger
INSTEAD OF INSERT ON employee_view
FOR EACH ROW
BEGIN
INSERT INTO employees VALUES (:NEW.id, :NEW.name);
END;
Lossless Join Decomposition
A decomposition is lossless if we can reconstruct the original relation by joining the decomposed
relations without losing any information.
Criteria:
- A decomposition of R into R1 and R2 is lossless if:
R1 R2 R1 or R1 R2 R2 (a functional dependency exists).
Example:
R(A, B, C) with FD: A B
Decompose into:
R1(A, B)
R2(A, C)
Since A B, it's a lossless decomposition.
Purpose:
- Ensures data integrity during normalization.
Functional Dependencies and Normal Forms
Functional Dependency (FD): X Y means if two tuples have the same X value, they must have the
same Y value.
Used in Normalization:
- Identify anomalies and break relations into normal forms.
1NF:
- Atomic values only.
- No repeating groups.
2NF:
- In 1NF and no partial dependency on candidate key.
3NF:
- In 2NF and no transitive dependency.
Example:
R(StudentID, Course, Instructor)
FDs: StudentID Instructor
Break into:
R1(StudentID, Instructor)
R2(StudentID, Course)
Now in 3NF.