DBMS Unit 3
DBMS Unit 3
1. Objective
4. Teaching Methodology
6. Deliverables
Lecture -24
Introduction to Schema Refinement:
<DeptDetails>
Dept_ID Emp_ID Dept_Name
Dpt1 E001 Operations
Dpt2 E002 HR
Dpt3 E003 Finance
Therefore, the above relation had lossless decomposition i.e. no loss of information.
Lossy Decomposition
As the name suggests, when a relation is decomposed into two or more relational
schemas, the loss of information is unavoidable when the original relation is retrieved.
Let us see an example −
<EmpInfo>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables, since Emp_ID isn’t part ofthe
DeptDetails relation.Therefore, the above relation has lossy decomposition. Redundancy
means having multiple copies of same data in the database.This problem arises when a
database is not normalized. Suppose a table of student details attributes are:student Id,
student name, college name, college rank, course opted.
As it can be observed that values of attribute college name, college rank, course is being
repeated which can lead to problems.
Problems caused due to redundancy are: Insertion anomaly, Deletion anomaly, and
Updation anomaly.
Lecture-25
1. Insertion Anomaly –
If a student detail has to be inserted whose course is not being decided yet then insertion
will not be possible till the time course is decided forstudent.
This problem happens when the insertion of a data record is not possiblewithout adding
some additional unrelated data to the record.
2. Deletion Anomaly –
If the details of students in this table are deleted then the details of college will also get
deleted which should not occur by common sense. This anomaly happens when deletion ofa
data record results in losing some unrelated information that was stored as part of the
record that wasdeleted from a table.
It is not possible to delete some information without loosing some otherinformation in the
table as well.
3. Updation Anomaly –
Suppose if the rank of the college changes then changes will have to be all over thedatabase
which will be time-consuming and computationallycostly.
Lecture-26
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attributewithin a table.
1. X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address. Here
Emp_Id attribute can uniquely identify the Emp_Name attribute of employee tablebecause
if we know the Emp_Id, we can tell that employeename associated with it.
Functional dependency can be written as:
1. Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
1. ID → Name,
2. Name → DOB
Inference Rule (IR):
1. If X ⊇ Y then X → Y
Example:
1. X = {a, b, c, d, e}
2. Y = {a, b, c}
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In augmentation,if X determines Y,then
XZ determines YZ for any Z.
1. If X → Y then XZ → YZ
Example:
1. If X → Y and Y → Z then X → Z
4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must alsodetermine Y and Z.
1. If X → Y and X → Z then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of unionrule.
This Rule says, if X determines Y and Z, then X determines Y and Xdetermines Z separately.
1. If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6)
In Pseudo transitive Rule, if X determines Y and YZ determines W, thenXZ determines W.
1. If X → Y and YZ → W then XZ → W
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
Lecture-27
Closures of a set of functional dependencies:
A Closure is a set of FDs is a set of all possible FDs that can bederived from a given set ofFDs.
It is also referred as a Complete set of FDs. If F is used to donate the set of FDs for relation R,
then a closure of a set of FDs implied by F is denotedby F+. Let's consider the set F of
functional dependencies givenbelow:
F = {A -> B, B -> C, C -> D}
from F, it is possible to derive following dependencies.A -> A ...By using
Rule-4, Self-Determination.
A -> B ...Already given in F.
A -> C ...By using rule-3, Transitivity.A -> D
...By using rule-3, Transitivity.
Now, by applyiing Rule-6 Union, it is possible to derive A+ -> ABCD and it can be denoted
using A -> ABCD. All such type ofFDs derived from each FD of F form a closureof F. Steps to
determine F+example:
Determine each set of attributes X that appears as a left hand sideofsome FD in F.
Determine the set X+ of all attributes that are dependent on X, asgiven in above
example.
In other words, X+ represents a set of attributes that are functionallydetermined
by X based on F. And, X+ is called theClosure of X under F.
All such sets of X+, in combine, Form a closure of F.
Closure of an Attribute Set-
The set of all those attributes which can be functionally determinedfroman attribute set is
called as a closure of that attribute set.
Closure of attribute set {X} is denoted as {X}+.
Steps to Find Closure of an Attribute Set-
Add the attributes contained in the attribute set for which closure is beingcalculated to the result set.
Step-02:
Recursively add the attributes to the result set which can be functionallydetermined from the attributes
already contained in the result set.
Example-
Now, let us find the closure of some attributes and attribute sets-
Closure of attribute A-
A+ = { A }
= { A , B , C } ( Using A → BC )
= { A , B , C , D , E } ( Using BC → DE )
= { A , B , C , D , E , F } ( Using D → F )
= { A , B , C , D , E , F , G } ( Using CF → G )
Thus,
A+ = { A , B , C , D , E , F , G }
Closure of attribute D-
D+ = { D }
= { D , F } ( Using D → F )
We can not determine any other attribute using attributes D and Fcontained in the result
set.
Thus,
D+ = { D , F }
{ B , C }+= { B , C }
= { B , C , D , E } ( Using BC → DE )
= { B , C , D , E , F } ( Using D → F )
= { B , C , D , E , F , G } ( Using CF → G )
Thus,
{ B , C }+ = { B , C , D , E , F , G }
Finding the Keys Using Closure-
Super Key-
If the closure result of an attribute set contains all the attributes ofthe relation, then
that attribute set is called as a super key of thatrelation.
Thus, we can say-
“The closure of a super key is the entire relation schema.”
Example-
In the above example,
The closure of attribute A is the entire relation schema.
Thus, attribute A is a super key for that relation.
Candidate Key-
If there exists no subset of an attribute set whose closure contains allthe attributes of
the relation, then that attribute set is called as a candidate key of that relation.
Example-
In the above example,
No subset of attribute A contains all the attributes of the relation.
Thus, attribute A is also a candidate key for that relation.
Lecture:28
First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following 4rules:
1. It should only have single(atomic) valued attributes/columns.
2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.
It's very simple, because all we have to do is break the values into atomicvalues.
Here is our updated table and it now satisfies the First Normal Form.
101 Akon OS
101 Akon CN
102 Bkon C
What is Dependency?
In this table, student_id is the primary key and will be unique for every row, hence we can
use student_id to fetch any row of data from this table
Even for a case, where student names are same, if we knowthe student_id
we can easily fetch the correct record.
Hence we can say a Primary Key for a table is the column or a group of columns(composite
key) which can uniquely identify each record in the table.
I can ask from branch name of student with student_id 10, and I can get it.Similarly, if I askfor
name of student with student_id 10 or 11, I will get it. So all I need is student_id and every
other column depends on it, or canbe fetched using it.
This is Dependency and we also call it Functional Dependency.What is Partial
Dependency?
Now that we know what dependency is, we are in a better state tounderstand what partial
dependency is.
For a simple table like Student, a single column like student_id canuniquely identfyall
the records in a table.
But this is not true all the time. So now let's extend our example to see ifmore than 1 column
together can act as a primary key.
Let's create another table for Subject, which will
have subject_id and subject_name fields and subject_id will be theprimary key.
subject_id subject_name
1 Java
2 C++
3 Php
Now we have a Student table with student information and anothertable Subject for
storing subject information.
Let's create another table Score, to store the marks obtained by students inthe respective
subjects. We will also be saving name of the teacher who teaches that subject along with
marks.
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these
and subject_id to know for which subject the marks arefor.
Together, student_id + subject_id forms a Candidate Key(learn about Database
Keys) for this table, which can be the Primary key.
Confused, How this combination can be a primary key?
See, if I ask you to get me marks of student with student_id 10, can you get it from this
table? No, because you don't know for which subject. Andif I give you subject_id, you
would not know for which student. Hence weneed student_id + subject_id to uniquely
identify any row.
There can be many different solutions for this, but out objective is toremove teacher's
name from Score table.
The simplest solution is to remove columns teacher from Score table andadd it to the
Subject table. Hence, the Subject table will become:
1 10 1 70
2 10 2 75
3 11 1 80
Quick Recap
1. For a table to be in the Second Normal form, it should be in the FirstNormal form and it
should not have Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in the
table depends only on a part of the primary key and not on the complete primary key.
3. To remove Partial dependency, we can divide the table, remove the attribute which is
causing partial dependency, and move it to some other table where it fits in well.
Third normal form:
Third Normal Form is an upgrade to Second Normal Form. When a table is in the Second
Normal Form and has no transitive dependency, then it isin the Third Normal Form.
Score Table
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the exam name and total
marks, so let's add 2 more columns to the Score table.
score_id student_id subject_id marks exam_name total_marks
Requirements for Third Normal Form For a table tobe in the third normal form,
1 Workshop 200
2 Mains 70
3 Practicals 30
Advantage of removing Transitive Dependency
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy thefollowing two
conditions:
1. It should be in the Third Normal Form.
2. And, for any dependency A → B, A should be a super key.
The second point sounds a bit tricky, right? In simple words, it means, thatfor a dependency A
→ B, A cannot be a non-prime attribute, if B isa
prime attribute.
Example
103 C# P.Chash
above:
One student can enrol for multiple subjects. For example, studentwith student_id
101, has opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like wehave for Java.
What do you think should be the Primary Key?
Well, in the table above student_id, subject together form the primary key,because using
student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only onesubject, but one
subject may have two different professors.
Hence, there is a dependency between subject and professor here,where
subjectdepends on the professor name.
This table satisfies the 1st Normal form because all the values are atomic,column names
areunique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no PartialDependency.
And, there is no Transitive Dependency, hence the table also satisfiesthe 3rd
NormalForm.But this table is not in Boyce-Codd Normal Form.
To make this relation(table) satisfy BCNF, we will decompose this tableinto two tables,
student table and professor table.
Below we have the structure for both the tables.
Student Table
student_id p_id
101 1
101 2
and so on...
1 P.Java Java
2 P.Cpp C++
and so on...
And now, this relation satisfy Boyce-Codd Normal Form. In the nexttutorial we
willlearn about the Fourth Normal Form.
A more Generic Explanation
For a table to satisfy the Fourth Normal Form, it should satisfy thefollowing
twoconditions:
1. It should be in the Boyce-Codd Normal Form.
2. And, the table should not have any Multi-valued Dependency.
Let'stry to understand what multi-valued dependency is in the
nextsection.
Example
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
As you can see in the table above, student with s_id 1 has opted for two courses, Science and
Maths, and has two hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two morerecords, as
shownbelow, because for one student, two hobbies exists, hence along with both the
courses, these hobbies should be specified.
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
And, in the table above, there is no relationship between the columns course
andhobby. They are independent of each other.
So there is multi-value dependency, which leads to un-necessary repetitionof data and
otheranomalies as well.
To make the above relation satify the 4th normal form, we can decomposethe table into 2
tables.
CourseOpted Table
s_id course
1 Science
1 Maths
2 C#
2 Php
And, Hobbies Table,
s_id hobby
1 Cricket
1 Hockey
2 Cricket
2 Hockey
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with multi-valued dependency.In that
case, the functionally dependent columns are movedin a separate table and the multi-
valued dependent columns are moved toseparate tables.
If you design your database carefully, you can easily avoid these issues.
1. Lecture -30
Transactions:
Lecture-31
States of Transactions
The various states of a transaction concept in DBMS are listed below:
Lecture-32 ACID
Properties
Lecture-33
Serial schedule-concurrent schedule:
Serial Schedules
Concurren
tSchedulesSerial
Schedules:
A schedule is said to be serial if and only if all the instructions of all the
transactions getexecutednon-preemptively as an unit. OR
Each serial schedule consists of a sequence of instructions from various transactions,
wheretheinstructions belonging to one single transaction appear together in that schedule.
Thus, for a set of n transactions, there exist n! different valid serial
schedules.if there exist three transactions then all possible serial
schedulesare:
here n = 3(no of
transactions)3! = 3*2*1 = 6
serial schedules
T1->T2-
>T3, T1-
>T3->T2,
T2->T1-
>T3, T2-
>T3->T1,
T3->T2-
>T1, T3-
>T2->T1
write-write,
write-read,
read-write.
Lecture-34
Serializable Schedules:
A schedule is said to be serializable if and only if the result of the given concurrentschedule is
equivalent to one of the possible serial schedule.
Definition:
Process of managing simultaneous operations on the database without having theminterfere
with one another
In a database transaction, the two main operations are READ and WRITE operations.So,there is
a need to manage these two operations in the concurrent execution of the transactions.\
Example:
T1 T2
Read_item(X)
X=X+N
X=X+10
Write_item(X)
In the above example ,transaction 1 changes the value of x but it gets overwritten by
theupdate done by transaction2 on x.therefore, the update done by transaction1 is lost.
t1 t2 t3 t4 t5 Begin_transaction 100
Begin_transaction
t6t7 Read(balx) 100
Read(balx) Balx=balx-
t8 Balx=balx+100 100
10 Write(balx)
Write(balx) 100
Commit
. 100
rollback 100
100
100
Problem avoided by preventing T1 from reading balx until after T2 commits or aborts
Occurs when transaction reads several values but 2nd transaction updates some of
themduringexecution of first
t1 Begin_transactio 100 50 25 0
Begin_transaction
t2 nSum=0 100 50 25 0
Read(balx)
t3 Read(balx) 100 50 25 100
Balx=balx-10
t4 sum=sum+bal 100 50 25 100
Write(balx)
t5 zread(balx) 90 50 25 150
Read(balz)
t6 sum=sum+bal 90 50 25 150
Balz=balz+10
t7 y 90 50 25 150
Write(balz) commit
t8 90 50 35 150
t9 90 50 35 150
t10 Read(balz) 90 50 35 185
t11 sum=sum+bal 90 50 35 185
zcommit
Problem avoided by preventing T2 from reading balx and balz until after T1 completed
updates
Lecture:35
All the operations and transactions should go in a order (or) scheduled way.
Schedule
Whenever multiple transactions are operating, schedule says that which operation has
to be performed under which transaction and at which time span is going to be
executed.
One transaction must wait for another to complete its operation, in which there is no
overlapping of execution time span(or) no interleaving of any operations is said to be
serial schedule.
The other transaction proceeds without waiting for the previous transaction to
complete in which there is overlapping of execution time span and interleaving of
operations is said to be non-serial schedule.
(b) If two transactions either read or write completely separate data items
(c) If one transaction writes a data item and another reads or writes same
dataitem
Conflict Serializability
For example,
item.
In the above example, the operation ’x’ is the same data item in two transaction.
It should satisfy the set of operations belong to different transactions
inwhich atleast one of the two operations is a write operation.
In this below transaction (a) and (b) are non-serial schedule.In order
toconvert non-serial to serial schedule we have to perform swapping.
In the above example (a) T7 started transaction at time stamp (t1) to (t3)
with read and write operations. Later by deviating transaction T8 at time
stamp (t4).
Now in (b) transaction T7 has started with read and write operations from
time stamp (t1) to (t3) later by deviation T8 transaction had started with
onlyread operation at time stamp t4 and t5.
By performing all the operations in (a) and (b) .Now in (c) transaction T7 has
started with read and write operations from time stamp (t1) to (t5) ,at time
stamp(t6) transaction T7 has successfully committed. Now the transaction
T8 has started and performed all operations and finally committed at time
stamp(t12).
So now the above transactions (a) and (b) are completely transformed
intoserial schedule by using swapping.
Lecture-36
View Serializability
If Ti reads data initially, after this Tj writes the same data, in the given schedule.
Thissequence must be followed in the transaction combination (read write operation).
Transaction T2 is reading A from initial database
If transaction Ti reads a data item that has been updated by the transaction Tj in
schedule S1,then in schedule S2 also,transaction Ti must read the same data item that
has been updated by the transaction Tj .The Write-read sequence must be same
SCHEDULE S1:
T1 T2 T3
W(A)
W(A)
R(A)
SCHEDULE S2:
T1 T2 T3
W(A)
R(A)
W(A)
Rule 3 −Final Write
If Ti writes data, after this Tj writes the data finally. This sequence must be
followed in the transaction combination (write-write operation).
For each data object A,the transaction that performs the final write on A in S1 mustalsoperform
final write on A in S2.
T1 T2 T1 T2
R(A) R(A)
W(A) W(A)
W(A) W(A)
These two schedules are not view equal as the final write operations in S1 is doneby T1
,while in S2 it is done by T2.
Given
Schedule:
T1 T2 T3
R(A)
W(A)
COMMIT R(A)
W(A)
COMMIT
W(A)
COMMIT
T1 T2 T3
R(A)
W(A)
COMMIT
W(A)
COMMIT R(A)
W(A)
COMMIT
Given schedule:
T1 T2
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y
)
R(Y)
W(Y)
T1 T2
R(X)
W(X)
R(Y)
W(Y)
R(X)
W(X)
R(Y)
W(Y)
Initial read Final read Updated read
T1 T2 T1 - > T2
Since,all these three condition are same in given non-serial scheduleandserial
schedule , so the given schedule is view serializable.
7. Keywords
Super Key
Candidate Key
Prime Attributes
Functional Dependency
1NF,2NF,3NF,BCNF,4NF,5NF
Transaction
Serializability
ACID properties
Schedule
WR,WW,RW conflicts
8. Sample Questions
Remember:
-----
At the end of this session, the facilitator (Teacher) shall randomly pick-up few studentsto
summarize the deliverables.
-------------------------