Database
AC2040
1
Chapter 6: RELATIONAL DATABASE DESIGN
Lecturer: PhD. DO Thi Ngoc Diep
SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
2
Content
• 6.1. Introduction
• 6.2. Functional Dependencies
• 6.3. Normalization Forms
• 6.4. Relational Decomposition
3
6.1 Introduction
• A good Relational Design ?
• Another example
• SUPPLIERS(SID, SNAME, ADDR, ITEM, PRICE)
• 1 supplier provides many items, data about the person (SNAME, ADDR) is
repeated many times Redundancy.
• When editing data of 1 supplier but not editing on all records
Inconsistency Anomalies appear
• When adding/deleting a supplier, it may be necessary to add empty data
(ITEM, PRICE), or delete data of the item Inconsistency Anomalies
appear
4
6.1 Introduction
• Relational Database Design Choose good relational diagrams.
• Satisfy the basic properties of a database: no redundancy, support
easy information searching
• Good design:
• Include all necessary attributes/relationships
• Store minimum amount of information (avoid the repetition-of-
information problem)
• Ensure data consistency after database operations
5
6.1 Introduction
• Data normalization is the process of structuring a relational database to
minimize data redundancy and dependencies (based on keys and
functional dependencies)
• Purpose:
• Identify a set of relational schemas that allow information to be easily
retrieved while avoiding data redundancy.
• Use the decomposition of bad relational schemas into better schemas
• Avoid the loss decomposition (cannot reconstruct the original relation)
Approach: Design the relations in a suitable normal form
6
6.1 Introduction
• Theory on:
• Functional dependencies
• Multivalued dependencies
7
6.2. Functional dependencies
• Functional dependencies in a relation are constraints determined from the
semantics and internal relationships of the attributes.
• Require that the value for a certain set of attributes determines
uniquely the value for another set of attributes.
• A functional dependency is a generalization of the notion of a key.
• Functional dependencies are used as a metric to evaluate the quality of a
designed relational diagram.
8
6.2. Functional dependencies
• Given a relation r(R) on the schema R
• X, Y is the attribute set of R: X R and Y R
• The functional dependency X Y holds on R if and only if
• for any two tuples t1 and t2 of r agree on the attributes X, they also
agree on the attributes Y.
• If t1[X]=t2[X] then t1[Y]=t2[Y] for all tuples of r(R)
• The attribute set X determines the attribute set Y, or Y is functionally
dependent on X
• Trivial Functional Dependencies: X Y is trivial FD if Y X
A B
1 4 B A hold
1 5 A B does NOT hold
3 7
9
Example
Ssn → Fname
Ssn → Lname
Ssn → Date
...
Pnumber → Pname
Pnumber → Location
Pnumber → Dnum
...
{ Essn , Pno } → Hours
10
Example (2)
• A → BA → CA → DA → E
• B → AB → CB → DB → E
=> A set of functional dependencies
• C → AC → BC → DC → E
• ...
• AE →BC, BC →ADE, BD →ACE
• ...
11
Full and Partial Dependencies
• X Y is fully functional dependency if removing any attribute in X makes
this functional dependency no longer true.
• Example
• Work( Ssn , Pnumber , Position, Hours)
• {SSN, PNUMBER} HOURS
• is a fully FD because SSN HOURS or PNUMBER HOURS are not
correct
• {SSN, PNUMBER, POSITION} HOURS
• are partial FD because {SSN, PNUMBER} HOURS
12
Transitive dependency
• X Y is a transitive functional dependency if there is a set of attributes Z
that are not keys and there exists X Z and Z Y functional
dependencies.
• Example
• Pnumber Mgr_ssn
• is a transitive FD because Pnumber Dnum and Dnum Mgr_ssn
• Pnumber Pname
• is not a transitive FD because there does not exist a set of attributes X such
that Pnumber X and X Pname
13
Axioms for functional dependencies
• Let F be the set of functional dependencies of a relation
• Given a set F of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
• Armstrong’s Axioms
F1. Reflexive rule
If Y X (X U, U: set of relation attritbutes) then X Y
F2. Augmentation rule
If X Y, Z U then XZ YZ (notation XZ = X Z)
F3. Transitivity rule
If X Y, Y Z then X Z
14
Axioms for functional dependencies
• Additional rules inferred from Armstrong’s axioms:
• F4: Union rule
If X Y and X Z, then X YZ
• F5: Decomposition rule
If X YZ, then X Y and X Z
If X Y, Z Y then X Z
• F6: Pseudo-transitivity rule
If X Y and WY Z, then WX Z
15
Example
R(ABCDEFGHIJ)
F = { AB →E, AG →J, BE →I, E →G, GI →H }
Functional dependencies can be derived:
16
Closure of a Set of Functional Dependencies
• F+ contains the set of all functional dependencies logically implied by F
• F+ is the closure of F
• F+ can be computed by repeatedly applying Armstrong’s Axioms:
• Sound -- generate only functional dependencies that actually hold, and
• Complete -- generate all functional dependencies that hold.
• F+ can be very large!
• Two functional dependencies F and G are equivalent if F+ = G+
17
Closure of a Set of Functional Dependencies
• Compute the closure of a set of functional dependencies F:
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
Closure of Attribute Sets
• X is a set of attributes, F is a set of functional dependencies.
• The closure of an attribute set X under F (denoted by X+) is the largest
set of attributes that are functionally determined by X, using the set F.
• X + = {A U | X → A F +}
• | X+ | : number of attributes
• X, Y are attributes of R , X Y holds on F + Y X +
19
Closure of Attribute Sets
Input: Set of functional dependencies F, Set of attributes X
Find the closure of the attribute set X (X+)
X+ = X;
stillChanging = true;
while ( stillChanging ){
stillChanging = false;
for each W Z in F {
if (W X+) AND NOT (Z X+){
X+ = X+ Z
stillChanging = true;
}
}
}
20
Example
Given R(A, B, C, D, E),
F = {A BC, CD E, B D, E A }
Calculate A + and B + :
A + := {A}
:= {A, B, C} {A} A + , A BC
:= {A, B, C, D} {B} A + , B D
:= {A, B, C, D, E} {C, D} A + , CD E
:= {A, B, C, D, E} On every FD of F, A+ remains unchanged => stop
21
Example (2)
Given R (A, B, C, D, E)
F = {A BC, CD E, B D, E A }
B + := {B}
:= {B, D} {B} B + , B D; stop because B+ does not change anymore
22
Keys
• Given a relation schema R consisting of a set of attributes U and a set of
functional dependencies F
• K is a candidate key of R if
• KU F+
• and with every K’ K then K’U F+
• If K is a candidate key then:
• K+ = U
• K is the smallest set of attributes satisfying this property
“A functional dependency is a generalization of the notion of a key”
23
Candidate key
• Identify candidate key
• Input: R (A1 , A2 , ..., An ), F
• Find the candidate keys
K = {A1 , A2 , ..., An } ;
for i = 1 to n do {
Kt = K – {Ai } ;
if Kt+ = U then K = Kt ;
}
24
Example
R (ABCDEF),
F = { BC →AD, D →E, CF →B }.
25
Uses of Attribute Closure
• Testing for superkey:
• To test if A is a superkey, we compute A+, and check if A+ contains all attributes
of R.
• Testing functional dependencies:
• To check if a functional dependency A B holds (or, in other words, is in F+),
just check if B A+.
• Computing closure of F
• For each A R, we find the closure A+, and for each S A+, we output a
functional dependency A S.
6.3. The Normalization Forms
• Normalization Theory
• Whether a particular relation R is in “good” form ?
• If it is not, decompose it into set of relations {R1, R2, ..., Rn} such that
• Each relation is in good form
• The decomposition is a lossless decomposition
• To determine whether the design needs further refinement? Is the
design good? => need to define normal forms
• Transform relations into normal forms decomposition into smaller
relations normalization.
• Each normal form ensures the prevention (minimization) of some
types of data redundancy or anomalies
• avoid complexities, eliminate duplicates, and organize data in a consistent way
• eliminate anomalies associated with insertion, deletion, and updating
27
6.3. The Normalization Forms
• First normal form – 1NF
• Second normal form – 2NF
• Third normal form – 3NF
• Boyce-Codd normal form – BCNF
• Forth normal form – 4NF
• Fifth normal form – 5NF
• ...
28
First normal form – 1NF
• A relation schema is in 1NF if the domains of all attributes are atomic.
• Domain is atomic if its elements are considered to be indivisible units
• Non-atomic values complicate storage and encourage redundant (repeated)
storage of data
• First normal form removes multivalued attributes, composite attributes,
repeating groups.
First normal form – 1NF
• Remove the composite attributes
EID Name, Address, Birthdate, Project
EID Name, Address, Birthdate, PName , StartDate
30
First normal form – 1NF
• Remove the multivalued attribute
Dnumber Dname , Dmgr_ssn , Dlocations
Dnumber Dname , Dmgr_ssn , Dlocations
31
Second normal form – 2NF
• Prime vs. Non-Prime Attributes
• Prime attribute: an attribute that is part of any candidate key of the relation.
• Non-prime attribute: an attribute that is not part of any candidate key.
• A relation R is in 2nd normal form if and only if :
• It is in 1NF
• No non-prime attribute is partially dependent on a key
• Normalization to 2NF Eliminate partial FDs on the key
• Identify non-prime attribute that are only partially dependent on the key
• Delete these attributes from the original table
• Create a new table with these attributes plus the attributes of the key they
depend on.
32
Second normal form – 2NF
Example 1:
1. Indentify the candidate key
EID → Name, Address, Birthdate
EID, PName → StartDate
=> candidate key = (EID, PName )
2. Identify non-prime attribute partially dependent on the key
Name, Address, Birthdate
33
Second normal form – 2NF
3. Decompose into 2NF :
EID → Name, Address, Birthdate
EID, PName → StartDate
Candidate key = (EID, PName )
EID → Name, Address, Birthdate
Candidate key = (EID) +
EID, PName → StartDate
Candidate key = (EID, PName)
34
Second normal form – 2NF
Example 2:
• Ssn , Pnumber → Hours
• Ssn → Ename
• Pnumber → Pname , Placement
• Candidate key: ( Ssn , Pnumber )
• Identify non-prime attribute partially dependent on the key: Ename ,
Pname , Placement
Ssn , Pnumber → Hours Ssn → Ename Pnumber → Pname , Placement
35
Third Normal Form – 3NF
• A relation R is in 3rd normal form if it is :
• in 2nd normal form
• and any non-prime attributes in R is not transitively dependent on the
candidate key
• No non-prime attribute depends on another non-prime attribute
36
Third Normal Form – 3NF
• Normalize to 3NF => Remove transitive dependencies
• Identify non-prime attributes that are transitively dependent on the
candidate key (which depend only on other non-prime attributes)
• Delete them from the original table
• Create a new table with these attributes plus the non-prime attributes they
depend on
37
Third Normal Form – 3NF
Example 3:
• Ssn → Ename , Bdate , Address, Dnumber
• Dnumber → Dname , Dmgr_ssn
• Candidate key: Ssn
• Non-prime attribute depends on another non-prime attribute:
• Dname, Dmgr_ssn
Ssn → Ename , Bdate , Address, Dnumber Dnumber → Dname , Dmgr_ssn
38
For example
Example 4:
• Property_id# → County_name , Lot#, Area, Price, Tax_rate
• County_name , Lot# → Property_id#, Area, Price, Tax_rate
• County_name → Tax_rate
• Area → Price
+ Candidate keys:
{Property_id# }, {County_name, Lot#}
+ Normalize to 2NF:
• County_name , Lot# → Property_id#, Area, Price • County_name → Tax_rate
• Property_id# → County_name , Lot#, Area, Price
• Area → Price
+ Normalize to 3NF
• County_name , Lot# → Property_id#, Area • Area → Price • County_name → Tax_rate
• Property_id# → County_name , Lot#, Area
39
Third Normal Form – 3NF
• “No non-prime attribute depends on another non-prime attribute”
• A relation schema R is in 3NF if for all A B in F+ at least one of the following
holds:
• A B is trivial
• A is a superkey for R
• B is prime-attribute
• Consider the schema R below, which is in 3NF
• R = (J, K, L )
• F = {JK L, L K }
• And an instance table:
• What is wrong with the table?
• Repetition of information
• Need to use null values (e.g., to represent the relationship l2, k2 where there is
no corresponding value for J)
40
Boyce-Codd NF
• A relation R is in Boyce-Codd normal form if for every dependency
A B in R, then A must be a superkey of R
• R has only non-trivial dependencies, where a key determine one or
more other attributes
• R is not in BCNF if there exists a dependency whose left-hand side is
not a key
• If a relation is in BCNF, it is in 3NF
• since in BCNF one of the first two conditions of 3NF must hold.
41
Boyce-Codd NF
• Example :
• R(ABC), F = {A → BC, B → C }
Candidate key : A
Non-prime attr: BC
+ R is not in 3NF
+ R is not in BCNF (B → C violates)
• R(ABCD), F = {A → BCD, BC → AD }
Candidate key: A, BC
Non-prime attr: D
+ R is not in 3NF
+ R is in BCNF (but redundancy caused by BC→A and A→BC)
42
Boyce-Codd NF
• Normalize to Boyce-Codd Norm
• Let A B be the FD that causes a violation of BCNF
• Move A, B to a separate relation with A is key
• The remain relation is (R-B+A).
• Example
• R = (J, K, L )
F = {JK L, L K }
Candidate Key: JK. L K violates
Decompose into R1(J, L); R2(L, K)
• R (ABC), F = {A → BC, B → C }
Candidate key : A, B → C violates
Decompose into R1(B,C), R2(A,B)
• R(A1, A2, A3, A4, A5)
F = {A1A2 → A3A4A5, A4 → A2}
Candidate Key: (A1, A2), A4 → A2 violates
Decompose into R1(A4, A2); R2 (A1, A4, A3, A5)
43
Comment
• The higher NF is always stronger than the previous:
• All relations in 2nd normal form are in 1st normal form.
• All relations in 3rd normal form are in 2nd normal form.
• All relations in Boyce-Codd normal form are in 3rd normal form.
• However, there are some relations that are in 3rd normal form
but not in BCNF.
44