0% found this document useful (0 votes)
12 views28 pages

IT 220 Unit 4 Relational Database Design

The document covers the principles of relational database design, focusing on creating schemas that minimize redundancy and facilitate easy information retrieval. It discusses various concepts such as functional dependencies, normal forms, and the importance of normalization to eliminate anomalies in database management. Additionally, it outlines the processes for determining candidate keys, functional dependencies, and the implications of decomposition in database design.

Uploaded by

Binit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views28 pages

IT 220 Unit 4 Relational Database Design

The document covers the principles of relational database design, focusing on creating schemas that minimize redundancy and facilitate easy information retrieval. It discusses various concepts such as functional dependencies, normal forms, and the importance of normalization to eliminate anomalies in database management. Additionally, it outlines the processes for determining candidate keys, functional dependencies, and the implications of decomposition in database design.

Uploaded by

Binit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

7/30/2024

Unit 4: Relational Database


Design

Database Management System

Topics
• Informal Design Guidelines for Relational Schemas;
• Functional Dependencies
• Normal Forms Based on Primary Keys;
• First, Second and Third Normal Forms; Boyce-Codd Normal Form; Multivalued
Dependency and Fourth Normal Form;
• Properties of Relational Decomposition.

1
7/30/2024

Relational Database Design


the goal of relational database design is to generate a set of relation
schemas that allows us to store information without unnecessary
redundancy, yet also allows us to retrieve information easily.
accomplished by designing schemas that are in an appropriate
normal form.
To determine whether a relation schema is in one of the desirable
normal forms, we need information about the real-world enterprise
that we are modeling with the database

Pitfalls in Relational DB Design


A bad design may have several properties, including:
Repetition of information.

Inability to represent certain information.

Loss of information.

2
7/30/2024

Anomalies
Anomalies are problems that can occur in poorly planned, un-
normalised databases where all the data is stored in one table (a flat-
file database).
 Insert anomaly: insertion anomaly occurs in database when certain
attributes cannot be inserted without the presence of other.
 Delete anomaly: Certain information are loss due to the delete of
other values.
 Update anomaly: anomalies occur during the time of recording the
data in any one table causes a partial update or data redundant.

Table Design
Teach_id Teacher_name Department Course_name
T1010 Mr. R K Agrawal Science & Technology Physics
T1011 Mr. Shyam Krishna Mahat Management Economics
T1012 Mr. R K Agrawal Science & Technology Biology
T1013 Ms. Anjila Bhatta Information Technology Java Programming

 Insertion anomalies: If new teacher is added, but s/he is not assign to any
department or any course and if null entries are not allowed in database then
insertion anomalies occur.
 Deletion anomalies: If management department is deleted then records of
teacher as well as course name also deleted from database.
 Update anomalies: If department assign to R K Agrawal is an error, then it need
to updated at two places to make consistency.

3
7/30/2024

Functional Dependencies
Functional dependency is a relationship that exists
when one attribute uniquely determines another
attribute.
Functional dependency (FD) is a set of constraints
between two attributes in a relation. Functional
dependency says that if two tuples have same values
for attributes A1, A2,..., And, then those two tuples
must have to have same values for attributes B1, B2,
..., Bn.
If A is the determinant and B is the determined then we
say that A functionally determines B and graphically
represent this as A  B. The symbols AB, can also be
expressed as B is functionally determined by A.

Functional Dependencies
An important property of a functional dependency is Armstrong’s
axiom
In a relation, R, with three attributes (X, Y, Z) Armstrong’s axiom holds
true if the following conditions are satisfied:
Axiom of Transivity: If X->Y and Y->Z, then X->Z
Axiom of Reflexivity (Subset Property): If Y is a subset of X, then X->Y
Axiom of Augmentation: If X->Y, then XZ->YZ
To make life easier we can use some additional rules, derivable from
Armstrong's Axioms:
Union rule: if X->Y and Y->Z , then X->YZ holds.
Decomposition rule: if X->YZ holds, then X->Y and X->Z both hold.
Pseudotransitivity rule: if X->Y holds, and AY->Z holds, then XA->Z holds.

4
7/30/2024

FD Example
Instructor_id name phone
1010 Ram Krishna Karki 9851112130
1020 Subodh Satyal 9801112134
1030 Smiriti Shah 9841424344
1040 Ram Krishna Karki 9803214315

Instructor_id  name (instructor_id functionally determine name)

instructor_id  phone (instructor_id functionally determine phone)

Closure of a Set of Functional Dependencies


• Given a set F of functional dependencies, there are certain other functional
dependencies that are logically implied by F.
E.g. If A → B and B → C, then we can infer that A → C
The set of all functional dependencies logically implied by F is the closure
of F.
We denote the closure of F by F+.
We can find all of F+ by applying Armstrong’s Axioms:
if β ⊆ α, then α → β (reflexivity)
if α → β, then γ α → γ β (augmenta on)
if α → β, and β → γ, then α → γ (transi vity)

10

5
7/30/2024

Example of FD
R = (A, B, C, G, H, I)
Given F = { A → B ,A → C, CG → H, CG → I, B → H}
some members of F+
A → H : by transitivity from A → B and B → H
AG → I : by augmenting A → C with G, to get AG → CG and then transitivity with
CG → I
CG → HI :from CG → H and CG → I : “union rule” can be inferred from definition of
functional dependencies, or:
(1) augmentation of CG → I to infer CG → CGI, (2) augmentation of CG → H to infer
CGI → HI, and then (3) transitivity.

11

Procedure to Compute F+
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F+
until F+ does not change any further

12

6
7/30/2024

Closure of Attribute Sets


The set of attributes that are functionally dependent on the attribute
A is called Attribute Closure of A and it can be represented as A+.
R(A,B,C,D)
Given :A B, B DC B
Closure of Attribute A= A+ : Set of attributes which can be
determined by A.
A+= ABD
Hence Closure of A = ABD

13

An algorithm for computing +:


result := 
repeat
temp := result
for each functional dependency    in F do
if   result then
result := result  
until temp = result

14

7
7/30/2024

Algorithm to find the X+ or Algorithm to find


Closure of Functional Dependency
X+ = X;
repeat
oldX+ := X+
for each FD Y→Z in S do
if Y is subset of X+ then
X+ : = X+ ∪ Z;
until (X+ =old X+) /* If X+ did not change then leave loop*/

15

Find the closure of given:


Relation R: (A,B,C,D,E)
Given FD {AB C, BD, CE, DA} Find B+ :
{B}: B->B
{BD}: BD
{BDA} : DA
{BDAC}: ABC
{BDACE}: CE

Find (C,D)+
Find (B,C)+

16

8
7/30/2024

Home Work

• Relation R:(A,B,C,D), Given FD {A->B,B->D,C->B} Find A+, B+


• Relation R:(A,B,C,D,E,F), Given FD {A->C, C ->D, D->B, E->F} Find A+,
B+, E+, AE+,AB+
• Relation R:(A,B,C,D,E,F,G,H), Given FD {CH-> G, A->BC, B->CFH, E->A,
F->EG } Find A+, B+, D+, DE+,DA+,DB+, DF+

17

Keys
Candidate Key: Candidate Key is minimal set of
attributes of a relation which can be used to identify a
tuple uniquely.
Note: A candidate key may or may not be a primary key.
Super Key: Super Key is set of attributes of a relation
which can be used to identify a tuple uniquely.
Note: A Candidate key is always super key but not vice
versa.

18

9
7/30/2024

Findings Candidate Key


R (A, B, C, D, E, F)
Set of FD are given below:
A C
CD
DB
EF
Find all possible candidate key
List all the attributes which are not present in RHS of FD.
A, E
Now find out closure of AE (AE)+=AECDBF
All attributes are determined by AE, so AE is our candidate key.

19

Finding Candidate Keys and Super Keys of a


Relation using FD set
Staff (s_id,name,city,state)
FD: s_id name,s_idcity, s_idstate, citystate
Closure of attribute set
(s_id)+= (s_id,name, city, state)
(s_id,name)+= (s_id,name, city, state)
(s_id,city)+= (s_id,name, city, state)
(s_id,state)+= (s_id,name, city, state)
(name)+= (name)
(city)+= (city,state)
(s_id)+,(s_id,name)+,(s_id,city)+ ,(s_id,state)+ determine all the attributes of
relation set so all of these are super key.
Here minimal super key is s_id so it is candidate key.

20

10
7/30/2024

Decomposition
Decomposition in DBMS removes redundancy, anomalies and
inconsistencies from a database by dividing the table into multiple
tables.
Lossy Decomposition: when a relation is decomposed into two or more
relational schemas, the loss of information is unavoidable when the
original relation is retrieved.

Lossless Decomposition: Decomposition is lossless if it is feasible to


reconstruct relation R from decomposed tables using Joins. This is the
preferred choice. The information will not lose from the relation when
decomposed.

21

Lossy Decomposition Example

22

11
7/30/2024

Example of Lossless-Join Decomposition


• Decomposition of R = (A, B, C)
R1 = (A, B) R2 = (B, C)

A B C A B B C
 1 A  1 1 A
 2 B  2 2 B
r A,B(r) B,C(r)

A B C
A (r) B (r)
 1 A
 2 B

23

Normalization
Normalization is a technique of organizing the data in the database.
Systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like Insertion, Update and
Deletion Anomalies.
Normalization is the process of removing anomalies from the database
table.
Normalization removes the undesirable properties by working through
a series of stages of called Normal forms.
Such Normal forms are: 1NF, 2NF, 3NF, BCNF, 4NF and 5NF
Normalization is used for mainly two purpose
Eliminating redundant (useless) data.
Ensuring data dependencies make sense i.e, data is logically stored.

24

12
7/30/2024

Functional Dependency:
• If an attribute's (A) value is known on the basis of another attribute
B, then A is said to be functionally dependent on B.
Symbolically, FD: B -> A.
• Example, Roll -> Name (i.e. if roll is known name can be known)
• If we know the value of primary key, then we can find out the another
attributes.
• Functionally dependencies are of following types:
• Trivial & Non-Trivial Functional Dependency
• Partial Dependency
• Transitivity Dependency

25

Trivial functional dependency:


• The dependency of an attribute on a set of attributes is known as trivial
functional dependency if the set of attributes includes that attribute.
• Symbolically: A->B is trivial functional dependency if B is a subset of A.
• The following dependencies are also trivial: A->A & B->B
• {Student_ld, Student_Name} -> Student_ld is a trivial functional
dependency as Student_ld is a subset of {Student_ld, Student_Name}.
• That makes sense because if we know the values of Student Id and
Student_Name then the value of Student_ld can be uniquely determined.
• Student -Id -> Student -Id & Student -Name -> Student -Name are trivial
dependencies too.

26

13
7/30/2024

Non trivial functional dependency


• Non trivial functional dependency:
• If a functional dependency X->Y holds true where Y is not a subset of X
then this dependency is called non trivial Functional dependency.
• For example:
An employee table with three attributes: emp_id, emp_name, emp_address. The
following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
• Completely non trivial FD:
• If a FD X->Y holds true where X Intersection Y is null, then this dependency
is said to be completely non trivial function dependency.

27

Partial Key Dependency:


It is a type of functional dependency where a non key attribute is
determined by only a part of composite primary key, but not the whole
composite key.
For example, FD: AB -> x;
B -> X
Here, x is partially dependent on AB.
Student (Roll, Name, Age, Address, Marks)
Given, Combination of {Roll. Name} is a composite primary key.
Name -> Age
Here, Roll. Name -> Age name -> Age
So, Age is Partially dependent on Roll, Name.

28

14
7/30/2024

Transitivity Dependency:
It is a type of functional dependency where a non key attribute
determines another non key attribute.

Example: R(A,B,C,D,E) And given that C -> D , then there exists


transitivity dependency since a non key attribute(C) determines
another non key attribute(D).

29

First Normal Form


A table is said to be in the 1NF when each cell of the table contains precisely one
value .
No two Rows of data must contain repeating group of information i.e each set of
column must have a unique value, such that multiple columns cannot be used to
fetch the same row.
How?
Each row in a table should be identified by primary key (a unique column value or group of
unique column values)
No rows of data should have repeating group of column values.
eid name job dep_id knowledge
1 Abhinav Designer 10 HTML, CSS, jQuery
2 Anjel Network Enginer 20 CCNA, CISA

 Given table is not in 1NF because knowledge has more than one value.
Solution : split table

30

15
7/30/2024

Cont..
eid Name Job Dep_id eid Knowledge

1 Abhinav Designer 10 1 HTML


1 CSS
2 Anjel Network Enginer 20
1 JQuery
2 CCNA
2 CISA

31

Second Normal Form


A table is said to be in the 2NF when it is in 1NF and every attribute in
the row is functionally dependent upon the whole key, not just part of
the key.
Consider the given table

32

16
7/30/2024

This situation could lead to the following problems:


Insertion: The department of a particular employee cannot
be recorded until the employee is assigned a project.
Updating: For a given employee, the employee code and
department are repeated several times. Hence if an
employee is transferred to another department, this change
will have to be recorded in every row.
Deletion: If an employee completes work on a project, the
employee’s record will be deleted. The information regarding
the department to which the employee will also be lost.

33

Second Normal Form


Given table satisfies the definition to 1NF.
Let check whether it satisfies 2NF or not.
For each values of Ecode, there is more than one value of hours. i.e. Hours is
not functionally dependent on Ecode. Similarly for ProjCode. However, for a
combination of Ecode and ProjCode values, there is one value of Hours.
Hence Hours is functionally dependent on whole key, Ecode+ProjCode. Dept
is functionally dependent on Ecode but not functionally dependent on
ProjCode. Thus Dept is functionally dependent on part of the key not on the
whole key. Therefore table Project is not in 2NF.
How?
Every non-key attributes are identified by the use of primary key
All subset of data, which applies to have multiple rows in a table must
be removed and placed in a new table. And this new table and the
parent table should be related by the use of foreign key.

34

17
7/30/2024

Second Normal Form


Guidelines for converting table to 2NF:
Find and remove attributes that are functionally dependent on only a
part of the key are not on the whole key. Place them in a different
table.
Group the remaining attributes.

35

Third Normal Form


A relation is said to be in 3NF when it is in 2NF and every non-prime
attribute of table must be dependent on primary key, or we can say
that, there should not be the case that a non-prime attribute is
determined by another non-prime attribute. So this transitive
functional dependency should be removed from the table
Consider the table Employee.

36

18
7/30/2024

Given situation could lead to the following problems:


Insertion: The department head of a new department that does not
have any employee at present cannot be inserted in the DeptHead
column.
Updating: For a given department, the code for a particular
department head is repeated several times. Hence, if a department
head moves to another department, the change will have to be made
consistently across the table.
Deletion: If the record of an employee is deleted, the information
regarding the head of the department will also be deleted.

37

Third Normal Form


This table satisfies the definition of 1NF and 2NF.
Lets check whether it satisfies 3NF or Not:
Attribute DeptHead is dependent on the attribute Dept also.
According to 3NF rule, all non-key attributes have to be
functionally dependent only on the primary key.
This table is not in 3NF since DeptHead is functionally
dependent on Dept, which is not a primary key.

38

19
7/30/2024

Guidelines for converting a table to 3NF:


Find and remove non-key attributes that are functionally dependent on attributes
that are not the primary key. Place them in a different table.
If there is any columns which are not related to primary key, then remove them
and put it in a separate table, relate both the table by means of foreign key i.e.;
there should not be any transitive dependency.
Group the remaining attributes.

39

Boyce-Codd Normal Form


The original definition of 3NF was inadequate in some situations. It was not
satisfactory for the tables:
If table have multiple candidate keys.
Where the multiple candidate keys are composite.
Where the multiple candidate keys are overlapped.
Note: Hence, a new normal form Boyce-Codd Normal Form was introduced. In
tables where the above 3 conditions do not apply, you can stop at 3NF, in such case,
3NF is same as BCNF.
How?
Meets all the requirement of 3NF
Any table is said to be in BCNF, if its candidate keys do not have any partial dependency on the
other attributes. i.e.; in any table with (x, y, z) columns, if (x, y)->z and z->x then it's a violation
of BCNF. If (x, y) are composite keys and (x, y)->z, then there should not be any reverse
dependency, directly or partially.

40

20
7/30/2024

Cont…
A relation is in BCNF if and only if every determinant is a candidate key.
Consider the following table :

This table has redundancies. If the name of an employee is changes,


the change will have to be made in every row of the table, otherwise
there will be inconsistencies.

41

Cont…
Ecode + ProjCode is the primary key. Name + ProjCode should be chosen
as primary key and hence is a candidate key.
Hours is functionally dependent on Ecode+ProjCode.
Hours is also functionally dependent on name+ProjCode.
Name is functionally dependent on Ecode.
Ecode is functionally dependent on Name.

You will notice this table has:


• Multiple candidate keys, i.e. Ecode+ProjCode and Name+ProjCode.
• The candidate keys are composite.
• The candidate keys overlap since the attribute ProjCode is common.

42

21
7/30/2024

Cont..
Ecode and Name are determinants since they are functionally dependent on each
other. However, they are not candidate keys by themselves. As per BCNF, the
determinants have to be candidate keys.
Guidelines for converting a table to BCNF:
Find and remove the overlapping candidate keys. Place the part of the candidate
key and the attribute it is functionally dependent on, in a different table.
Group the remaining attributes into a table

43

Fourth Normal Form (4NF)


• Multi-valued Dependency (MVD)
• Dependency between attributes (for example, A, B, and C) in a relation, such
that for each value of A there is a set of values for B and a set of values for C.
However, the set of values for B and C are independent of each other.

Employee Skill Language


1234 Cooking French
1234 Cooking German
1453 Carpentry Spanish

1453 Cooking Spanish


2345 Cooking Spanish
44

44

22
7/30/2024

Fourth Normal Form (4NF)


• Although BCNF removes anomalies due to functional
dependencies, another type of dependency called a multi-valued
dependency (MVD) can also cause data redundancy.
• A relation schema R is in 4NF with respect to a set of function
and multivalued dependencies if, for all multivalued
dependencies of the form A->>B, where A is a subset of R and B
is a subset of R, at least one of the following holds:
• A->>B is a trivial multivalued dependency.
• A is a superkey for schema R.

45

45

Fourth Normal Form (4NF)


• Fourth normal form eliminates independent many-to-one relationships
between columns.
• To be in Fourth Normal Form,
• a relation must first be in Boyce-Codd Normal Form.
• a given relation may not contain more than one multi-valued attribute.
Example (Not in 4NF)
Scheme  {Employee, Skill, Language )
Primary Key  {Employee, Skill, Language }
1. All columns are a part of the only candidate key, hence BCNF
2. Each employee can speak multiple languages
3. Each employee can have multiple skills Employee Skill Language

4. Thus violates 4NF 1234 Cooking French

1234 Cooking German

1453 Carpentry Spanish

1453 Cooking Spanish

2345 Cooking Spanish

46

23
7/30/2024

4NF Decomposition
Employee Skill Language

1234 Cooking French

1234 Cooking German


1453 Carpentry Spanish

1453 Cooking Spanish


2345 Cooking Spanish

Old Scheme  {Employee, Skill, ForeignLanguage}


New Scheme  {Employee, Skill}
New Scheme  {Employee, ForeignLanguage}

Employee Skill Employee Language

1234 Cooking 1234 French

1453 Carpentry 1234 German

1453 Cooking 1453 Spanish

2345 Cooking 2345 Spanish

47

Decomposition into 4NF Method


• Similar to BCNF decomposition algorithm
• Find a 4NF violation in R, say AB, where A is not a
superkey.
• If there is such a 4NF violation, break the schema for the
relation R that has the 4NF violation into two schemas.
1. R1 (A, B)
2. R2 (A, R-B)
• Find the FD’s and MVD’s that hold in R1 and R2.
• Recursively decompose R1 and R2 with respect to their
projected dependencies.

48

24
7/30/2024

4NF Decomposition Example


Drinkers(name, addr, phones, beersLiked)
FD: nameaddr
MVD’s: namephones
namebeersLiked
• Key is {name, phones, beersLiked}.
• All dependencies violate 4NF.

49

4NF Decomposition Example (cont.)


• Decompose using nameaddr:
1. Drinkers1(name, addr)
• In 4NF; only dependency is nameaddr.

2. Drinkers2(name, phones, beersLiked)


• Not in 4NF. MVD’s namephones and
namebeersLiked apply. No FD’s, so all three
attributes form the key.

50

25
7/30/2024

4NF Decomposition Example (cont.)

• Decompose Drinkers2
• Either MVD name ->-> phones or name ->-> beersLiked
tells us to decompose to:
• Drinkers3(name, phones)
• Drinkers4(name, beersLiked)

51

MVD Example
Drinkers(name, addr, phones, beersLiked)
• A drinker’s phones are independent of the beers they
like.
• namephones and namebeersLiked.
• Thus, each of a drinker’s phones appears with each of
the beers they like in all combinations.

52

26
7/30/2024

MVD Example (cont.)


• Tuples Implied by namephones
• If we have tuples:

Name Addr Phone BeersLiked


Ram A P1 B1
Ram A P2 B2
Ram A P2 B1
Ram A P1 b2

• Then these tuples must also be in the relation.

53

Example
Drinkers(name, addr, phones, beersLiked)
with MVD Name  phones. If Drinkers has the
two tuples:
Name Addr Phone BeersLiked
Ram A P1 B1
Ram A P2 B2

it must also have the same tuples with phones components


swapped:

Name Addr Phone BeersLiked


Ram A P2 B1
Ram A P1 B2

Note: we must check this condition for all pairs of tuples that
agree on name, not just one pair.

54

27
7/30/2024

Properties of Relational Decomposition.


• Relational decomposition is the process of breaking down a large table into
smaller, simpler tables in order to achieve a certain level of normalization. The
properties of relational decomposition are as follows:
1.Lossless: A decomposition is lossless if it is possible to reconstruct the original table
from the decomposed tables without losing any information.
2.Dependency Preserving: A decomposition is dependency preserving if the dependencies
that hold in the original table also hold in the decomposed tables.
3.Non-redundancy: A decomposition should not introduce any redundancy in the data. This
means that the decomposed tables should not contain any duplicate data that is not
present in the original table.
4.Minimality: A decomposition is minimal if it is the smallest set of tables that satisfies the
lossless and dependency preserving properties.
5.Simplicity: A decomposition is simple if it is easy to understand and use.
6.Irreversibility: A decomposition is irreversible if it is not possible to reconstruct the
original table from the decomposed tables without losing information or introducing
redundancy.
• It is important to mention that the goal of relational decomposition is to
achieve a certain level of normalization (1NF, 2NF, 3NF, BCNF) and to design
a database that is free from certain types of logical inconsistencies.

55

Denormalization in Databases
As the name suggests, denormalization is the opposite of normalization.

Normalization organize data to ensure integrity and eliminate


redundancies. Database denormalization means you deliberately put the
same data in several places, thus increasing redundancy.

The intentional introduction of redundancy in a table in order to improve


performance is called denormalization. The decision to de-normalize will
obviously result in a trade-off between performance and data integrity.
Denormalization also increases disk space utilization.

56

28

You might also like