0% found this document useful (0 votes)
12 views44 pages

En Database Principle-C6 Normalization Part1 TL

The document discusses relational database design, focusing on functional dependencies, normalization forms, and relational decomposition. It emphasizes the importance of minimizing data redundancy and ensuring data consistency through various normalization techniques, including 1NF, 2NF, and 3NF. The document also outlines the concepts of candidate keys and functional dependencies, providing examples to illustrate these principles.

Uploaded by

Bùi Duy Thái
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views44 pages

En Database Principle-C6 Normalization Part1 TL

The document discusses relational database design, focusing on functional dependencies, normalization forms, and relational decomposition. It emphasizes the importance of minimizing data redundancy and ensuring data consistency through various normalization techniques, including 1NF, 2NF, and 3NF. The document also outlines the concepts of candidate keys and functional dependencies, providing examples to illustrate these principles.

Uploaded by

Bùi Duy Thái
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Database

AC2040

1
Chapter 6: RELATIONAL DATABASE DESIGN

Lecturer: PhD. DO Thi Ngoc Diep


SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

2
Content

• 6.1. Introduction
• 6.2. Functional Dependencies
• 6.3. Normalization Forms
• 6.4. Relational Decomposition

3
6.1 Introduction

• A good Relational Design ?

• Another example
• SUPPLIERS(SID, SNAME, ADDR, ITEM, PRICE)
• 1 supplier provides many items, data about the person (SNAME, ADDR) is
repeated many times  Redundancy.
• When editing data of 1 supplier but not editing on all records 
Inconsistency  Anomalies appear
• When adding/deleting a supplier, it may be necessary to add empty data
(ITEM, PRICE), or delete data of the item  Inconsistency  Anomalies
appear

4
6.1 Introduction

• Relational Database Design  Choose good relational diagrams.


• Satisfy the basic properties of a database: no redundancy, support
easy information searching
• Good design:
• Include all necessary attributes/relationships
• Store minimum amount of information (avoid the repetition-of-
information problem)
• Ensure data consistency after database operations

5
6.1 Introduction

• Data normalization is the process of structuring a relational database to


minimize data redundancy and dependencies (based on keys and
functional dependencies)
• Purpose:
• Identify a set of relational schemas that allow information to be easily
retrieved while avoiding data redundancy.
• Use the decomposition of bad relational schemas into better schemas
• Avoid the loss decomposition (cannot reconstruct the original relation)

Approach: Design the relations in a suitable normal form

6
6.1 Introduction

• Theory on:
• Functional dependencies
• Multivalued dependencies

7
6.2. Functional dependencies

• Functional dependencies in a relation are constraints determined from the


semantics and internal relationships of the attributes.
• Require that the value for a certain set of attributes determines
uniquely the value for another set of attributes.
• A functional dependency is a generalization of the notion of a key.

• Functional dependencies are used as a metric to evaluate the quality of a


designed relational diagram.

8
6.2. Functional dependencies

• Given a relation r(R) on the schema R


• X, Y is the attribute set of R: X  R and Y  R
• The functional dependency X  Y holds on R if and only if
• for any two tuples t1 and t2 of r agree on the attributes X, they also
agree on the attributes Y.
• If t1[X]=t2[X] then t1[Y]=t2[Y] for all tuples of r(R)
• The attribute set X determines the attribute set Y, or Y is functionally
dependent on X
• Trivial Functional Dependencies: X  Y is trivial FD if Y  X

A B
1 4 B  A hold
1 5 A  B does NOT hold
3 7

9
Example

Ssn → Fname
Ssn → Lname
Ssn → Date
...

Pnumber → Pname
Pnumber → Location
Pnumber → Dnum
...

{ Essn , Pno } → Hours

10
Example (2)

• A → BA → CA → DA → E
• B → AB → CB → DB → E
=> A set of functional dependencies
• C → AC → BC → DC → E
• ...
• AE →BC, BC →ADE, BD →ACE
• ...

11
Full and Partial Dependencies

• X  Y is fully functional dependency if removing any attribute in X makes


this functional dependency no longer true.

• Example
• Work( Ssn , Pnumber , Position, Hours)
• {SSN, PNUMBER}  HOURS
• is a fully FD because SSN  HOURS or PNUMBER  HOURS are not
correct
• {SSN, PNUMBER, POSITION}  HOURS
• are partial FD because {SSN, PNUMBER}  HOURS

12
Transitive dependency

• X  Y is a transitive functional dependency if there is a set of attributes Z


that are not keys and there exists X  Z and Z  Y functional
dependencies.

• Example

• Pnumber  Mgr_ssn
• is a transitive FD because Pnumber  Dnum and Dnum  Mgr_ssn
• Pnumber  Pname
• is not a transitive FD because there does not exist a set of attributes X such
that Pnumber  X and X  Pname

13
Axioms for functional dependencies

• Let F be the set of functional dependencies of a relation


• Given a set F of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
• Armstrong’s Axioms
F1. Reflexive rule
If Y X (X U, U: set of relation attritbutes) then X  Y
F2. Augmentation rule
If X  Y, Z U then XZ  YZ (notation XZ = X Z)
F3. Transitivity rule
If X  Y, Y  Z then X  Z

14
Axioms for functional dependencies

• Additional rules inferred from Armstrong’s axioms:


• F4: Union rule
If X  Y and X  Z, then X  YZ
• F5: Decomposition rule
If X  YZ, then X  Y and X  Z
If X  Y, Z Y then X  Z
• F6: Pseudo-transitivity rule
If X  Y and WY  Z, then WX  Z

15
Example

R(ABCDEFGHIJ)
F = { AB →E, AG →J, BE →I, E →G, GI →H }

Functional dependencies can be derived:

16
Closure of a Set of Functional Dependencies

• F+ contains the set of all functional dependencies logically implied by F


• F+ is the closure of F
• F+ can be computed by repeatedly applying Armstrong’s Axioms:
• Sound -- generate only functional dependencies that actually hold, and
• Complete -- generate all functional dependencies that hold.
• F+ can be very large!
• Two functional dependencies F and G are equivalent if F+ = G+

17
Closure of a Set of Functional Dependencies

• Compute the closure of a set of functional dependencies F:


repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1and f2 in F +
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency to F +
until F + does not change any further
Closure of Attribute Sets

• X is a set of attributes, F is a set of functional dependencies.


• The closure of an attribute set X under F (denoted by X+) is the largest
set of attributes that are functionally determined by X, using the set F.
• X + = {A U | X → A F +}
• | X+ | : number of attributes

• X, Y are attributes of R , X  Y holds on F +  Y  X +

19
Closure of Attribute Sets

Input: Set of functional dependencies F, Set of attributes X


Find the closure of the attribute set X (X+)

X+ = X;
stillChanging = true;
while ( stillChanging ){
stillChanging = false;
for each W  Z in F {
if (W  X+) AND NOT (Z  X+){
X+ = X+  Z
stillChanging = true;
}
}
}

20
Example

Given R(A, B, C, D, E),


F = {A  BC, CD  E, B  D, E  A }
Calculate A + and B + :

A + := {A}
:= {A, B, C} {A}  A + , A  BC
:= {A, B, C, D} {B}  A + , B  D
:= {A, B, C, D, E} {C, D}  A + , CD  E
:= {A, B, C, D, E} On every FD of F, A+ remains unchanged => stop

21
Example (2)

Given R (A, B, C, D, E)
F = {A  BC, CD  E, B  D, E  A }

B + := {B}
:= {B, D} {B}  B + , B  D; stop because B+ does not change anymore

22
Keys

• Given a relation schema R consisting of a set of attributes U and a set of


functional dependencies F
• K is a candidate key of R if
• KU F+
• and with every K’ K then K’U F+
• If K is a candidate key then:
• K+ = U
• K is the smallest set of attributes satisfying this property

“A functional dependency is a generalization of the notion of a key”

23
Candidate key

• Identify candidate key


• Input: R (A1 , A2 , ..., An ), F
• Find the candidate keys

K = {A1 , A2 , ..., An } ;
for i = 1 to n do {
Kt = K – {Ai } ;
if Kt+ = U then K = Kt ;
}

24
Example

R (ABCDEF),
F = { BC →AD, D →E, CF →B }.

25
Uses of Attribute Closure

• Testing for superkey:


• To test if A is a superkey, we compute A+, and check if A+ contains all attributes
of R.
• Testing functional dependencies:
• To check if a functional dependency A  B holds (or, in other words, is in F+),
just check if B  A+.
• Computing closure of F
• For each A  R, we find the closure A+, and for each S  A+, we output a
functional dependency A  S.
6.3. The Normalization Forms

• Normalization Theory
• Whether a particular relation R is in “good” form ?
• If it is not, decompose it into set of relations {R1, R2, ..., Rn} such that
• Each relation is in good form
• The decomposition is a lossless decomposition
• To determine whether the design needs further refinement? Is the
design good? => need to define normal forms
• Transform relations into normal forms  decomposition into smaller
relations  normalization.
• Each normal form ensures the prevention (minimization) of some
types of data redundancy or anomalies
• avoid complexities, eliminate duplicates, and organize data in a consistent way
• eliminate anomalies associated with insertion, deletion, and updating

27
6.3. The Normalization Forms

• First normal form – 1NF


• Second normal form – 2NF
• Third normal form – 3NF
• Boyce-Codd normal form – BCNF
• Forth normal form – 4NF
• Fifth normal form – 5NF
• ...

28
First normal form – 1NF

• A relation schema is in 1NF if the domains of all attributes are atomic.


• Domain is atomic if its elements are considered to be indivisible units
• Non-atomic values complicate storage and encourage redundant (repeated)
storage of data
• First normal form removes multivalued attributes, composite attributes,
repeating groups.
First normal form – 1NF

• Remove the composite attributes

EID  Name, Address, Birthdate, Project

EID  Name, Address, Birthdate, PName , StartDate


30
First normal form – 1NF

• Remove the multivalued attribute

Dnumber  Dname , Dmgr_ssn , Dlocations

Dnumber  Dname , Dmgr_ssn , Dlocations

31
Second normal form – 2NF

• Prime vs. Non-Prime Attributes


• Prime attribute: an attribute that is part of any candidate key of the relation.
• Non-prime attribute: an attribute that is not part of any candidate key.
• A relation R is in 2nd normal form if and only if :
• It is in 1NF
• No non-prime attribute is partially dependent on a key
• Normalization to 2NF  Eliminate partial FDs on the key
• Identify non-prime attribute that are only partially dependent on the key
• Delete these attributes from the original table
• Create a new table with these attributes plus the attributes of the key they
depend on.

32
Second normal form – 2NF

Example 1:

1. Indentify the candidate key


EID → Name, Address, Birthdate
EID, PName → StartDate
=> candidate key = (EID, PName )
2. Identify non-prime attribute partially dependent on the key
Name, Address, Birthdate

33
Second normal form – 2NF

3. Decompose into 2NF :

EID → Name, Address, Birthdate


EID, PName → StartDate
Candidate key = (EID, PName )

EID → Name, Address, Birthdate


Candidate key = (EID) +
EID, PName → StartDate
Candidate key = (EID, PName)

34
Second normal form – 2NF

Example 2:

• Ssn , Pnumber → Hours


• Ssn → Ename
• Pnumber → Pname , Placement
• Candidate key: ( Ssn , Pnumber )
• Identify non-prime attribute partially dependent on the key: Ename ,
Pname , Placement

Ssn , Pnumber → Hours Ssn → Ename Pnumber → Pname , Placement

35
Third Normal Form – 3NF

• A relation R is in 3rd normal form if it is :


• in 2nd normal form
• and any non-prime attributes in R is not transitively dependent on the
candidate key
• No non-prime attribute depends on another non-prime attribute

36
Third Normal Form – 3NF

• Normalize to 3NF => Remove transitive dependencies


• Identify non-prime attributes that are transitively dependent on the
candidate key (which depend only on other non-prime attributes)
• Delete them from the original table
• Create a new table with these attributes plus the non-prime attributes they
depend on

37
Third Normal Form – 3NF

Example 3:

• Ssn → Ename , Bdate , Address, Dnumber


• Dnumber → Dname , Dmgr_ssn
• Candidate key: Ssn
• Non-prime attribute depends on another non-prime attribute:
• Dname, Dmgr_ssn

Ssn → Ename , Bdate , Address, Dnumber Dnumber → Dname , Dmgr_ssn

38
For example

Example 4:
• Property_id# → County_name , Lot#, Area, Price, Tax_rate
• County_name , Lot# → Property_id#, Area, Price, Tax_rate
• County_name → Tax_rate
• Area → Price

+ Candidate keys:
{Property_id# }, {County_name, Lot#}
+ Normalize to 2NF:

• County_name , Lot# → Property_id#, Area, Price • County_name → Tax_rate


• Property_id# → County_name , Lot#, Area, Price
• Area → Price

+ Normalize to 3NF

• County_name , Lot# → Property_id#, Area • Area → Price • County_name → Tax_rate


• Property_id# → County_name , Lot#, Area
39
Third Normal Form – 3NF

• “No non-prime attribute depends on another non-prime attribute”


• A relation schema R is in 3NF if for all A  B in F+ at least one of the following
holds:
• A  B is trivial
• A is a superkey for R
• B is prime-attribute
• Consider the schema R below, which is in 3NF
• R = (J, K, L )
• F = {JK  L, L  K }
• And an instance table:

• What is wrong with the table?


• Repetition of information
• Need to use null values (e.g., to represent the relationship l2, k2 where there is
no corresponding value for J)

40
Boyce-Codd NF

• A relation R is in Boyce-Codd normal form if for every dependency


A  B in R, then A must be a superkey of R
• R has only non-trivial dependencies, where a key determine one or
more other attributes
• R is not in BCNF if there exists a dependency whose left-hand side is
not a key
• If a relation is in BCNF, it is in 3NF
• since in BCNF one of the first two conditions of 3NF must hold.

41
Boyce-Codd NF

• Example :
• R(ABC), F = {A → BC, B → C }
Candidate key : A
Non-prime attr: BC
+ R is not in 3NF
+ R is not in BCNF (B → C violates)

• R(ABCD), F = {A → BCD, BC → AD }
Candidate key: A, BC
Non-prime attr: D
+ R is not in 3NF
+ R is in BCNF (but redundancy caused by BC→A and A→BC)

42
Boyce-Codd NF

• Normalize to Boyce-Codd Norm


• Let A  B be the FD that causes a violation of BCNF
• Move A, B to a separate relation with A is key
• The remain relation is (R-B+A).
• Example
• R = (J, K, L )
F = {JK  L, L  K }
Candidate Key: JK. L  K violates
Decompose into R1(J, L); R2(L, K)

• R (ABC), F = {A → BC, B → C }
Candidate key : A, B → C violates
Decompose into R1(B,C), R2(A,B)

• R(A1, A2, A3, A4, A5)


F = {A1A2 → A3A4A5, A4 → A2}
Candidate Key: (A1, A2), A4 → A2 violates
Decompose into R1(A4, A2); R2 (A1, A4, A3, A5)
43
Comment

• The higher NF is always stronger than the previous:


• All relations in 2nd normal form are in 1st normal form.
• All relations in 3rd normal form are in 2nd normal form.
• All relations in Boyce-Codd normal form are in 3rd normal form.
• However, there are some relations that are in 3rd normal form
but not in BCNF.

44

You might also like