0% found this document useful (0 votes)
18 views43 pages

Unit 3 Notes1

The document discusses schema refinement and normalization in database management systems, emphasizing the importance of eliminating data redundancy and addressing anomalies such as insertion, deletion, and update anomalies. It outlines the process of normalization, detailing various normal forms including 1NF, 2NF, 3NF, and BCNF, and explains functional dependencies and their significance in maintaining data integrity. The document also introduces key concepts like candidate keys, super keys, and the decomposition technique to achieve a well-structured database.

Uploaded by

raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views43 pages

Unit 3 Notes1

The document discusses schema refinement and normalization in database management systems, emphasizing the importance of eliminating data redundancy and addressing anomalies such as insertion, deletion, and update anomalies. It outlines the process of normalization, detailing various normal forms including 1NF, 2NF, 3NF, and BCNF, and explains functional dependencies and their significance in maintaining data integrity. The document also introduces key concepts like candidate keys, super keys, and the decomposition technique to achieve a well-structured database.

Uploaded by

raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: schema refinement and normalization Class: 2nd CSE-B ,1ST SEM.

UNIT-3

SCHEMA REFINEMENT AND NORMALISATION

Schema Refinement:
• The Schema Refinement refers to refine the schema by using some technique. The
best technique of schema refinement is decomposition.
• Normalization or Schema Refinement is a technique of organizing the data in the
database.
• It is a systematic approach of decomposing tables to eliminate data redundancy and
undesirable characteristics like Insertion, Update and Deletion Anomalies.
Redundancy:
• Redundancy refers to repetition of same data or duplicate copies of same data stored
in different locations.
• Redundancy means having multiple copies of same data in the database.

As it can be observed that values of attribute college name, college rank, course is
being repeated which can lead to problems
Anomalies:
• Anomalies refers to the problems occurred after poorly planned and normalized
databases where all the data is stored in one table which is sometimes called a flat file
database.
• Problems caused due to redundancy are:

1
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: schema refinement and normalization Class: 2nd CSE-B ,1ST SEM.

Insertion anomaly, Deletion anomaly, and Updation anomaly


Anomalies or problems facing without normalization(problems due to redundancy)
• Anomalies refers to the problems occurred after poorly planned and unnormalised
databases where all the data is stored in one table which is sometimes called a flat file
database. Let us consider such type of schema

Here all the data is stored in a single table which causes redundancy of data or say anomalies
as SID and Sname are repeated once for same CID . Let us discuss anomalies one by one.
Due to redundancy of data we may get the following problems
Insertion anomalies: It may not be possible to store some information unless some other
information is stored as well.
Redundant storage: some information is stored repeatedly
Update anomalies: If one copy of redundant data is updated, then inconsistency is created
unless all redundant copies of data are updated.
Deletion anomalies: It may not be possible to delete some information without losing some
other information as well.

2
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: schema refinement and normalization Class: 2nd CSE-B ,1ST SEM.

Problem in updation / updation anomaly:

Insertion Anomaly and Deletion Anomaly


These anomalies exist only due to redundancy, otherwise they do not exist.
Insertion Anomalies: New course is introduced C4, But no student is there who is having C4
subject.

3
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: schema refinement and normalization Class: 2nd CSE-B ,1ST SEM.

Because of insertion of some data, It is forced to insert some other dummy data.
Deletion Anomaly:
• Deletion of S3 student cause the deletion of course.
• Because of deletion of some data forced to delete some other useful data.

Solutions To Anomalies
Decomposition of Tables –
Schema Refinement

4
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: schema refinement and normalization Class: 2nd CSE-B ,1ST SEM.

What is the Solution??


Solution: decomposing into relations as shown below

TO AVOID REDUNDANCY and problems due to redundancy, we use refinement technique


called DECOMPOSITION.
Decomposition:- Process of decomposing a larger relation into smaller relations.
Each of smaller relations contain subset of attributes of original relation.

5
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: FUNCTIONAL DEPENDANCY Class: 2nd CSE-B ,1ST SEM

FUNCTIONAL DEPENDANCY:
• It is a relationship that exists when one attribute uniquely determines another attribute

X Y

• If R is a relation or table with attributes x and y


Then F.D represents XY(Y is functionally dependant on X)
• Because x is always uniquely determines.
• It is a set of constraints or rules between two attributes in a relation.
Functional dependencies
• X is determinant is what we called as identification key.(uniquely determinant)
• Example: if very attribute B of R dependant of A, then A is called a primary key.

ID NAME SUR_NAME

S1 BHANU P

S2 SIVA G

S3 PRIYA M

• ID as identification key
• IDNAME
• ID__SUR_NAME (Functionally dependant table)

6
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: FUNCTIONAL DEPENDANCY Class: 2nd CSE-B ,1ST SEM

Functional dependencies:
• Functional dependency is a relationship that exist when one attribute uniquely
determines another attribute.
• Functional dependency is a form of integrity constraint that can identify schema with
redundant storage problems and to suggest refinement.
• A functional dependency AB in a relation holds true if two tuples having the same
value of attribute A also have the same value of attribute B
• IF t1.X=t2.X then t1.Y=t2.Y where t1,t2 are tuples and X,Y are attributes.
Fully functional dependencies
• Example:ABCD (D is fully FD on ABC)
• ABC combining ,uniquely determines D
• D cannot depends on any subset of ABC
As ..BCD (not possible as FFD)
CD
AD
Because BC cannot determines D,C cannot determines D and A cannot determines D

St_ID St_NAME Id_proof grade

s1 pinky p2 5

s2 lucky p1 6

Here st_ID,id_proof are identification keys


St_ID ,id_proofgrade (fully functional dependency)

7
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: FUNCTIONAL DEPENDANCY Class: 2nd CSE-B ,1ST SEM

Grade fully dependant on combination of two attributes


Depending on two or more attributes is fully dependant
Reasoning about functional dependencies:
Armstrong Axioms :
Armstrong axioms defines the set of rules for reasoning about functional dependencies and
also to infer all the functional dependencies on a relational database.
Various axioms rules or inference rules:
Primary axioms

8
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: FUNCTIONAL DEPENDANCY Class: 2nd CSE-B ,1ST SEM

Secondary or derived axioms

Attribute closure
Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
NOTE:
To find attribute closure of an attribute set-
1) Add elements of attribute set to the result set.
2) Recursively add elements to the result set which can be functionally determined from the
elements of result set.

Types of functional dependencies:


• Trivial functional dependency:-If XY is a functional dependency where Y subset X,
these type of FD’s called as trivial functional dependency.
• Non-trivial functional dependency:-If XY and Y is not subset of X then it is called
non-trivial functional dependency.

9
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: FUNCTIONAL DEPENDANCY Class: 2nd CSE-B ,1ST SEM

• Completely non-trivial functional dependency:-If XY and X∩Y=Ф(null) then it is


called completely non-trivial functional dependency.
Prime and non-prime attributes:

Attributes which are parts of any candidate key of relation are called as prime attribute, others
are non-prime attributes.
Candidate Key:
• Candidate Key is minimal set of attributes of a relation which can be used to identify
a tuple uniquely.
• Consider student table: student(sno, sname,sphone,age)
• We can take sno as candidate key. we can have more than 1 candidate key in a table.
types of candidate keys:
• simple(having only one attribute)
• composite(having multiple attributes as candidate key)
Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.
– Adding zero or more attributes to candidate key generates super key.
• A candidate key is a super key but vice versa is not true
Normalization:
Normalization is a process of designing a consistent database with minimum redundancy
which support data integrity by grating or decomposing given relation into smaller relations
preserving constraints on the relation
Characteristics of normalisation:
• Scalar values in each field
• Minimal use of null values
• Absence of redundancy

10
DATA BASE MANAGEMENT SYSTEMS

• Minimal loss of information

Subject: DBMS Faculty Name: R RajaSekhar

Topic: NORMALIZATION Class: 2nd CSE-B ,1ST SEM

Normalization removes data redundancy and it will helps in designing a good data base
which involves a set of normal forms as follows –
• 1)First normal form(1NF)
• 2)Second normal form(2NF)
• 3)Third normal form(3NF)
• 4)Boyce coded normal form(BCNF)
• 5)Forth normal form(4NF)
• 6)Fifth normal form(5NF

• )

11
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: FUNCTIONAL DEPENDANCY Class: 2nd CSE-B ,1ST SEM

12
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: NORMAL FORMS Class: 2nd CSE-B ,1ST SEM

First normal form


• A relation is said to be in first normal form if it contains all atomic values or single
values.
Example:

Domain Courses

Programming C , java

Web designing HTML , PHP

• The above table consist of multiple values in single columns which can be reduced
into atomic values by using first normal form as follows-

Domain Courses

Programming C

Programming Java

Web designing HTML

Web designing PHP

First normal form: Rules


• Single valued attributes or atomic values
• Attributes domain should not change

13
DATA BASE MANAGEMENT SYSTEMS

• Unique name for attributes/columns


• Order does not matter

Subject: DBMS Faculty Name: R RajaSekhar

Topic: NORMAL FORMS Class: 2nd CSE-B ,1ST SEM

Second normal form


• A relation is said to be in second normal form if it is in first normal form without any
partial dependencies.
• In second normal form non-prime attributes should not depend on proper subset of
key attributes.
• Dependancyfunctional dependency all column are depends on one primary key.
• Partial dependency: There are 2 or more Primary keys in one table (candidate key)
• Any attribute depends on a part of candidate key
Second normal form Example:

Student id Student name Project Id Project name

Here (student id, project id) are key attributes and (student name, project name) are non-
prime attributes.
It is decomposed as-

Student id Student name Project id

14
DATA BASE MANAGEMENT SYSTEMS

Project id Project name

(Or)
student

Stu id Stu name address

subject

sub_id Sub_name

score

Score_id St_id Sub_id marks teacher

Marks depends on all the key values but teacher depends on part of the primary key
TEACHER column depends on part of the primary key leads partial dependency

15
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: NORMAL FORMS Class: 2nd CSE-B ,1ST SEM

Sub_id Sub_name teacher

Third normal form


– A relation is said to be in third normal form , if it is already in second normal
form and no transitive dependencies exists.
– Transitive dependency – If A->B and B->C are two FDs then A->C is called
transitive dependency.
– When one column depends on another column which is not primary key and
can take null values
– A relation is in 3NF if at least one of the following condition holds in every
non-trivial function dependency X –> Y
– X is a super key.
– Y is a prime attribute (each element of Y is part of some candidate key).
– Non prime attributenon prime attribute
• A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.

16
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: NORMAL FORMS Class: 2nd CSE-B ,1ST SEM

• It is also used to achieve the data integrity.


• If there is no transitive dependency for non-prime attributes, then the relation must be
in third normal form
score

Score_id Stu_id Sub_id marks Exam_name Total_marks

101 201 1 16 mid 20

102 201 2 69 final 80

103 202 1 19 mid 20

Total_marks as non prime attribute depends on another nonprime attribute exam _name
score

Score_id Stu_id Sub_id marks Exam_id

exam

Exam_id Exam_name Total_marks

Boyce Codd normal form


• It is an extension of third normal form where in a functional dependency X A , X
must be a super key
• A relation is in BCNF if in every non-trivial functional dependency X –> Y, X is a
super key.

17
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: NORMAL FORMS Class: 2nd CSE-B ,1ST SEM

• For every FD. X  Y,, X should be super key of table

STD COURSE TEACHER

pinky DBMS priya

lucky DBMS madhu

deepu COA bhanu

sonu COA bhanu

jhansi DBMS madhu

Two courses with different teachers


Key(std,course)
F.D(std,course)  TEACHER (determines teacher)
But TEACHER is not super key but determines course

STD COURSE

Pinky DBMS

lucky DBMS

deepu COA

sonu COA

jhansi DBMS

18
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: NORMAL FORMS Class: 2nd CSE-B ,1ST SEM

COURSE TEACHER

DBMS Priya

DBMS Madhu

COA bhanu

Eliminating redundant data


Here STD ,COURSE are super keys
No transitive dependency hence its in 3NF
Fourth normal form (4NF)
• A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
• For a dependency A → B, if for a single value of A, multiple values of B exists, then
the relation will be a multi-valued dependency.
Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute. A multivalued dependency consists of at
least two attributes that are dependent on a third attribute that's why it always requires
at least three attributes
Example

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

19
DATA BASE MANAGEMENT SYSTEMS

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY
• In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
• So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

20
DATA BASE MANAGEMENT SYSTEMS

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

21
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: DECOMPOSITION Class: 2nd CSE-B ,1ST SEM

What is decomposition?
• Decomposition is the process of breaking down in parts or elements.
• It replaces a relation with a collection of smaller relations.
• It breaks the table into multiple tables in a database.
• It should always be lossless, because it confirms that the information in the original
relation can be accurately reconstructed based on the decomposed relations.
• If there is no proper decomposition of the relation, then it may lead to problems like
loss of information.
Properties of Decomposition
Following are the properties of Decomposition,
1. Lossless Decomposition
2 .Loosy decomposition
3. Dependency Preservation
4. Lack of Data Redundancy

1. Lossless Decomposition
* If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
* The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
* The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.

22
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: DECOMPOSITION Class: 2nd CSE-B ,1ST SEM

Example

EMPLOYEE_DEPARTMENT table:

EM EMP_ EMP EMP_ DEP DEPT_


P_I NAME _AG CITY T_I NAME
D E D

22 Denim 28 Mumb 827 Sales


ai
33 Alina 25 Delhi 438 Marketin
g
46 Stephan 30 Bangal 869 Finance
ore
52 Katherin 36 Mumb 575 Producti
e ai on
60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida

23
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: PROPERTIES OF DECOMPOSITION Class: 2nd CSE-B ,1ST SEM

DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME

827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:

Employee⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales


33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition

Lossy Decomposition:

"The decomposition of relation R into R1 and R2 is lossy when the join of R1 and R2 does
not yield the same relation as in R."
One of the disadvantages of decomposition into two or more relational schemes (or tables) is
that some information is lost during retrieval of original relation or table.
Consider that we have table STUDENT with three attribute roll_no , sname and department

24
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: PROPERTIES OF DECOMPOSITION Class: 2nd CSE-B ,1ST SEM

STUDENT:
Roll_no Sname Dept

111 parimal COMPUTER

222 parimal ELECTRICAL

This relation is decomposed into two relation no_name and name_dept :


No_name:
Roll_no Sname

111 parimal

222 parimal

Name_dept :

Sname Dept

parimal COMPUTER

parimal ELECTRICAL

In lossy decomposition, spurious tuples are generated when a natural join is applied to the
relations in the decomposition
stu_joined:

Roll_no Sname Dept

111 parimal COMPUTER

111 parimal ELECTRICAL

222 parimal COMPUTER

222 parimal ELECTRICAL

25
DATA BASE MANAGEMENT SYSTEMS

The above decomposition is a bad decomposition or Lossy decomposition

3. Dependency Preserving

* It is an important constraint of the database.


* In the dependency preservation, at least one decomposed table must satisfy every
dependency.
* If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
* For example, suppose there is a relation R (A, B, C, D) with functional dependency
set (A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).

4. Lack of Data Redundancy

* Lack of Data Redundancy is also known as a Repetition of Information.


* The proper decomposition should not suffer from any data redundancy.
* The careless decomposition may cause a problem with the data.
* The lack of data redundancy property may be achieved by Normalization process.

Join dependency
* a join dependency is a constraint on the set of legal relations over a database scheme.
* A table T is subject to a join dependency if T can always be recreated by joining
multiple tables each having a subset of the attributes of T. If one of the tables in the
join has all the attributes of the table T, the join dependency is called trivial.
* The join dependency plays an important role in the Fifth normal form, also known
as project-join normal form, because it can be proven that if a scheme R is
decomposed in tables R1to Rn, the decomposition will be a lossless-join
decomposition if the legal relations on R are restricted to a join dependency on
R called *(R1,R2,.. ,Rn).

26
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: 5TH NF Class: 2nd CSE-B ,1ST SEM

Fifth normal form (5NF)

* A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
* 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy.
* 5NF is also known as Project-join normal form (PJ/NF).
Example

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.
* Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all
three columns together acts as a primary key, so we can't leave other two columns
blank.
* So to make the above table into 5NF, we can decompose it into three relations P1, P2
& P3:

Subject: DBMS Faculty Name: R RajaSekhar

Topic: DECOMPOSITION Class: 2nd CSE-B ,1ST SEM

27
DATA BASE MANAGEMENT SYSTEMS

P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2
SUBJECT LECTURER

Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen

P3
SEMSTER LECTURER

Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS Class: 2nd CSE-B ,1ST SEM

28
DATA BASE MANAGEMENT SYSTEMS

Transaction Management

Transactions in DBMS:

Transactions are a set of operations used to perform a logical set of work. A transaction
usually means that the data in the database has changed. One of the major uses of DBMS is
to protect the user’s data from system failures. It is done by ensuring that all the data is
restored to a consistent state when the computer is restarted after a crash. The transaction is
any one execution of the user program in a DBMS. Executing the same program multiple
times will generate multiple transactions.

Example –

Transaction to be performed to withdraw cash from an ATM vestibule.

Set of Operations:
Consider the following example for transaction operations as follows.
Example -ATM transaction steps.
 Transaction Start.
 Insert your ATM card.
 Select language for your transaction.
 Select Savings Account option.
 Enter the amount you want to withdraw.
 Enter your secret pin.
 Wait for some time for processing.
 Collect your Cash.
 Transaction Completed.
Three operations can be performed in a transaction as follows.
1. Read/Access data (R).
2. Write/Change data (W).
3. Commit.
Example –
Transfer of 50₹ from Account A to Account B. Initially A= 500 ₹, B= 800 ₹. This data is
brought to RAM from Hard Disk.

R(A) -- 500 // Accessed from RAM.


A = A-50 // Deducting 50 ₹ from A.
W(A)--450 // Updated in RAM.
R(B) -- 800 // Accessed from RAM.
B=B+50 // 50 ₹ is added to B's Account.
W(B) --850 // Updated in RAM.
Commit // The data in RAM is taken back to Hard Disk.
Note –
The updated value of Account A = 450 ₹ and Account B = 850 ₹.

29
DATA BASE MANAGEMENT SYSTEMS

All instructions before commit come under a partially committed state and are stored in
RAM. When the commit is read the data is fully accepted and is stored in Hard Disk.
If the data is failed anywhere before commit we have to go back and start from the
beginning. We can’t continue from the same state. This is known as Roll Back.
Uses of Transaction Management:
 The DBMS is used to schedule the access of data concurrently. It means that the user
can access multiple data from the database without being interfered with each other.
Transactions are used to manage concurrency.
 It is also used to satisfy ACID properties.
 It is used to solve Read/Write Conflict.
 It is used to implement Recoverability, Serializability, and Cascading.
 Transaction Management is also used for Concurrency Control Protocols and Locking
of data.

A transaction is a set of logically related operations. For example, you are transferring
money from your bank account to your friend’s account, the set of operations would be like
this:

 Simple Transaction Example


 1. Read your account balance
2. Deduct the amount from your balance
3. Write the remaining balance to your account
4. Read your friend’s account balance
5. Add the amount to his account balance
6. Write the new updated balance to his account

 This whole set of operations can be called a transaction. Although I have shown you
read, write and update operations in the above example but the transaction can have
operations like read, write, insert, update, delete.

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS Class: 2nd CSE-B ,1ST SEM

In DBMS, we write the above 6 steps transaction like this:

30
DATA BASE MANAGEMENT SYSTEMS

Let’s say your account is A and your friend’s account is B, you are transferring 10000 from A
to B, the steps of the transaction are:

1. R(A);
2. A = A - 10000;
3. W(A);
4. R(B);
5. B = B + 10000;
6. W(B);

In the above transaction R refers to the Read operation and W refers to the write operation.
Transaction failure in between the operations

Now that we understand what transaction is, we should understand what are the problems
associated with it.

The main problem that can happen during a transaction is that the transaction can fail before
finishing the all the operations in the set. This can happen due to power failure, system crash
etc. This is a serious problem that can leave database in an inconsistent state. Assume that
transaction fail after third operation (see the example above) then the amount would be
deducted from your account but your friend will not receive it.

To solve this problem, we have the following two operations

Commit: If all the operations in a transaction are completed successfully then commit those
changes to the database permanently.

Rollback: If any of the operation fails then rollback all the changes done by previous
operations.

Even though these operations can help us avoiding several issues that may arise during
transaction but they are not sufficient when two transactions are running concurrently. To
handle those problems we need to understand database ACID properties.

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS, ACID PROERTIES Class: 2nd CSE-B ,1ST SEM

31
DATA BASE MANAGEMENT SYSTEMS

ACID properties in DBMS

To ensure the integrity of data during a transaction (A transaction is a unit of program that
updates various data items, read more about it here), the database system maintains the
following properties. These properties are widely known as ACID properties:

 Atomicity: This property ensures that either all the operations of a transaction reflect in
database or none. Let’s take an example of banking system to understand this: Suppose
Account A has a balance of 400$ & B has 700$. Account A is transferring 100$ to
Account B. This is a transaction that has two operations a) Debiting 100$ from A’s
balance b) Creating 100$ to B’s balance. Let’s say first operation passed successfully
while second failed, in this case A’s balance would be 300$ while B would be having
700$ instead of 800$. This is unacceptable in a banking system. Either the transaction
should fail without executing any of the operation or it should process both the
operations. The Atomicity property ensures that.
 Consistency: To preserve the consistency of database, the execution of transaction
should take place in isolation (that means no other transaction should run concurrently
when there is a transaction already running). For example account A is having a
balance of 400$ and it is transferring 100$ to account B & C both. So we have two
transactions here. Let’s say these transactions run concurrently and both the
transactions read 400$ balance, in that case the final balance of A would be 300$
instead of 200$. This is wrong. If the transaction were to run in isolation then the
second transaction would have read the correct balance 300$ (before debiting 100$)
once the first transaction went successful.
 Isolation: For every pair of transactions, one transaction should start execution only
when the other finished execution. I have already discussed the example of Isolation in
the Consistency property above.
 Durability: Once a transaction completes successfully, the changes it has made into
the database should be permanent even if there is a system failure. The recovery-
management component of database systems ensures the durability of transaction.

DBMS Transaction States

In this guide, we will discuss the states of a transaction in DBMS. A transaction in DBMS
can be in one of the following states.

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS ,STATE DIAGRAM Class: 2nd CSE-B ,1ST SEM

32
DATA BASE MANAGEMENT SYSTEMS

DBMS Transaction States Diagram

Lets discuss these states one by one.

Active State

As we have discussed in the DBMS transaction introduction that a transaction is a sequence


of operations. If a transaction is in execution then it is said to be in active state. It doesn’t
matter which step is in execution, until unless the transaction is executing, it remains in active
state.

Failed State

If a transaction is executing and a failure occurs, either a hardware failure or a software


failure then the transaction goes into failed state from the active state.

Partially Committed State

As we can see in the above diagram that a transaction goes into “partially committed” state
from the active state when there are read and write operations present in the transaction.

A transaction contains number of read and write operations. Once the whole transaction is
successfully executed, the transaction goes into partially committed state where we have all
the read and write operations performed on the main memory (local memory) instead of the
actual database.

The reason why we have this state is because a transaction can fail during execution so if we
are making the changes in the actual database instead of local memory, database may be left
in an inconsistent state in case of any failure. This state helps us to rollback the changes
made to the database in case of a failure during execution

33
DATA BASE MANAGEMENT SYSTEMS

Committed State

If a transaction completes the execution successfully then all the changes made in the local
memory during partially committed state are permanently stored in the database. You can
also see in the above diagram that a transaction goes from partially committed state to
committed state when everything is successful.

Committed State

If a transaction completes the execution successfully then all the changes made in the local
memory during partially committed state are permanently stored in the database. You can
also see in the above diagram that a transaction goes from partially committed state to
committed state when everything is successful.

Aborted State

As we have seen above, if a transaction fails during execution then the transaction goes into a
failed state. The changes made into the local memory (or buffer) are rolled back to the
previous consistent state and the transaction goes into aborted state from the failed state.
Refer the diagram to see the interaction between failed and aborted state.

DBMS Schedules and the Types of Schedules

We know that transactions are set of instructions and these instructions perform operations on
database. When multiple transactions are running concurrently then there needs to be a
sequence in which the operations are performed because at a time only one operation can be
performed on the database. This sequence of operations is known as Schedule.

Let’s take an example to understand what a schedule in DBMS is.

DBMS Schedule example

The following sequence of operations is a schedule. Here we have two transactions T1 & T2
which are running concurrently. This schedule determines the exact order of operations that
are going to be performed on database. In this example, all the instructions of transaction T1
are executed before the instructions of transaction T2,

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS ,SCHEDULES Class: 2nd CSE-B ,1ST SEM

34
DATA BASE MANAGEMENT SYSTEMS

T1 T2
---- ----
R(X)
W(X)
R(Y)
R(Y)
R(X)
W(Y)

Types of Schedules in DBMS

We have various types of schedules in DBMS. Let’s discuss them one by one.

A series of operation from one transaction to another transaction is known as schedule. It is


used to preserve the order of the operation in each of the individual transaction.

Schedule: A sequence of operations by a set of concurrent transactions that preserves the


order of the operations in each of the individual transactions.

Serial schedule: A schedule where the operations of each transaction are executed
consequently without any interleaved operations from other transactions.

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS ,SCHEDULES Class: 2nd CSE-B ,1ST SEM

35
DATA BASE MANAGEMENT SYSTEMS

Non serial schedule: A schedule the operations from a set of concurrent transactions are
interleaved.

Serializable schedule: The objective of Serializability is to find non_serializable schedules


that allows transactions to execute concurrently without interfering one another and thereby
produce a database state that could be produced by a serial execution ,that means a non-serial
schedule is serializable and correct if it produces same result as some serial execution

TIME T1 T2 T1 T2 T1 T2

t1 Begin_trans Begin_trans Begin_trans

t2 Read(bal_x) Read(bal_x) Read(bal_x)

t3 Write(bal_x) Write(bal_x) Write(bal_x)

t4 begin_tran begin_t Read(bal_y)


s rans

t5 read(bal_x read(b Write(bal_y)


) al_x)

t6 write(bal_ Read(bal_y) commit


x)

t7 Read(bal_y) write(b Begin_tra


al_x) ns

t8 Write(bal_y) (write bal_y) Read(bal


_x)

t9 commit commit Wrie(bal_


x)

t10 read(bal_ read(bal Read(bal_


y) _y) y)

36
DATA BASE MANAGEMENT SYSTEMS

t11 write(bal write(ba Write(bal


_y) l_y) _y)

t12 commit commit commit

Non_serial schedule S1 Non_serial schedule S2 serial schedule S3

CONCURRENT EXECUTION IN TRANSACTION | DBMS

Transaction-processing systems usually allow multiple transactions to run concurrently.


Allowing multiple transactions to update data concurrently causes several complications with
consistency of the data.

Ensuring consistency in spite of concurrent execution of transactions requires extra work; it is


far easier to insist that transactions run serially—that is, one at a time, each starting only after
the previous one has completed.

However, there are two good reasons for allowing concurrency:

Improved throughput and resource utilization:

 A transaction consists of many steps. Some involve I/O activity; others involve CPU activity.
The CPU and the disks in a computer system can operate in parallel. Therefore, I/O activity
can be done in parallel with processing at the CPU.
 The parallelism of the CPU and the I/O system can therefore be exploited to run multiple
transactions in parallel.
 While a read or write on behalf of one transaction is in progress on one disk, another
transaction can be running in the CPU, while another disk may be executing a read or write
on behalf of a third transaction.
 All of this increases the throughput of the system—that is, the number of transactions
executed in a given amount of time.
 Correspondingly, the processor and disk utilization also increase; in other words, the
processor and disk spend less time idle, or not performing any useful work.

Reduced waiting time:

 There may be a mix of transactions running on a system, some short and some long.
 If transactions run serially, a short transaction may have to wait for a preceding long
transaction to complete, which can lead to unpredictable delays in running a transaction.
 If the transactions are operating on different parts of the database, it is better to let them run
concurrently, sharing the CPU cycles and disk accesses among them.
 Concurrent execution reduces the unpredictable delays in running transactions.
 Moreover, it also reduces the average response time: the average time for a transaction to be
completed after it has been submitted.

37
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: SERIALIZABILITY Class: 2nd CSE-B ,1ST SEM

The database system must control the interaction among the concurrent transactions to
prevent them from destroying the consistency of the database. It is achieved
using concurrency-control schemes.

DBMS Serializability

When multiple transactions are running concurrently then there is a possibility that the
database may be left in an inconsistent state. Serializability is a concept that helps us to check
which schedules are serializable. A serializable schedule is the one that always leaves the
database in consistent state.

What is a serializable schedule?

A serializable schedule always leaves the database in consistent state. A serial schedule is
always a serializable schedule because in serial schedule, a transaction only starts when the
other transaction finished execution. However a non-serial schedule needs to be checked for
Serializability.A non-serial schedule of n number of transactions is said to be serializable
schedule, if it is equivalent to the serial schedule of those n transactions. A serial schedule
doesn’t allow concurrency, only one transaction executes at a time and the other starts when
the already running transaction finished.

Types of Serializability

There are two types of Serializability.


1. Conflict Serializability
2. View Serializability

Conflict Serializability is one of the type of Serializability, which can be used to check
whether a non-serial schedule is conflict serializable or not.What is Conflict Serializability?

A schedule is called conflict serializable if we can convert it into a serial schedule after
swapping its non-conflicting operations.

Conflicting operations

Two operations are said to be in conflict, if they satisfy all the following three conditions:

38
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: SERIALIZABILITY Class: 2nd CSE-B ,1ST SEM

1. Both the operations should belong to different transactions.


2. Both the operations are working on same data item.
3. At least one of the operation is a write operation.

Let’s see some examples to understand this:

Example 1: Operation W(X) of transaction T1 and operation R(X) of transaction T2 are


conflicting operations, because they satisfy all the three conditions mentioned above. They
belong to different transactions; they are working on same data item X, one of the operation
in write operation.

Example 2: Similarly Operations W(X) of T1 and W(X) of T2 are conflicting operations.

Example 3: Operations W(X) of T1 and W(Y) of T2 are non-conflicting operations because


both the write operations are not working on same data item so these operations don’t satisfy
the second condition.

Example 4: Similarly R(X) of T1 and R(X) of T2 are non-conflicting operations because


none of them is write operation.

Example 5: Similarly W(X) of T1 and R(X) of T1 are non-conflicting operations because


both the operations belong to same transaction T1.

Example of Conflict Serializability

Let’s consider this schedule:

T1 T2
----- ------
R(A)
R(B)
R(A)
R(B)
W(B)
W(A)
To convert this schedule into a serial schedule we must have to swap the R(A) operation of
transaction T2 with the W(A) operation of transaction T1. However we cannot swap these

39
DATA BASE MANAGEMENT SYSTEMS

two operations because they are conflicting operations, thus we can say that this given
schedule is not Conflict Serializable.

Let’s take another example:

T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)

Let’s swap non-conflicting operations:

After swapping R(A) of T1 and R(A) of T2 we get:

T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)

After swapping R(A) of T1 and R(B) of T2 we get:

T1 T2
----- ------
R(A)
R(B)
R(A)
W(B)
R(B)
W(A)
After swapping R(A) of T1 and W(B) of T2 we get:

T1 T2
----- ------
R(A)
R(B)
W(B)
R(A)
R(B)
W(A)

40
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS ,SERIELIZABILITY Class: 2nd CSE-B ,1ST SEM

We finally got a serial schedule after swapping all the non-conflicting operations so we can
say that the given schedule is Conflict Serializable.

DBMS View Serializability

What is View Serializability?

View Serializability is a process to find out that a given schedule is view serializable or not.

To check whether a given schedule is view serializable, we need to check whether the given
schedule is View Equivalent to its serial schedule. Lets take an example to understand what I
mean by that.

Given Schedule:

T1 T2
----- ------
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)

Serial Schedule of the above given schedule:


As we know that in Serial schedule a transaction only starts when the current running
transaction is finished. So the serial schedule of the above given schedule would look like
this:

41
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS, SERIALIZABILITY Class: 2nd CSE-B ,1ST SEM

T1 T2
----- ------
R(X)
W(X)
R(Y)
W(Y)
R(X)
W(X)
R(Y)
W(Y)

If we can prove that the given schedule is View Equivalent to its serial schedule then the
given schedule is called view Serializable.

Recoverability in DBMS

As discussed, a transaction may not execute completely due to hardware failure, system
crash or software issues. In that case, we have to roll back the failed transaction. But some
other transaction may also have used values produced by the failed transaction. So we have
to roll back those transactions as well.

Recoverable Schedules:
 Schedules in which transactions commit only after all transactions whose changes they
read commit are called recoverable schedules. In other words, if some transaction T j is
reading value updated or written by some other transaction T i, then the commit of
Tj must occur after the commit of T i.
Example 1:
 S1: R1(x), W1(x), R2(x), R1(y), R2(y),
W2(x), W1(y), C1, C2;
Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2
hence there is no chances of conflict occur. R1(x) appears before W1(x) and transaction T1
is committed before T2 i.e. completion of first transaction performed first update on data
item x, hence given schedule is recoverable.

42
DATA BASE MANAGEMENT SYSTEMS

Subject: DBMS Faculty Name: R RajaSekhar

Topic: TRANSACTIONS, SERIALIZABILITY Class: 2nd CSE-B ,1ST SEM

Example 2: Consider the following schedule involving two transactions T 1 and T2.

T1 T2

R(A)

W(A)

W(A)

R(A)

commit

commit

This is a recoverable schedule since T 1 commits before T2, that makes the value read by
T2 correct.

43

You might also like