0% found this document useful (0 votes)
15 views25 pages

6.data Normalization Process and The Normal Forms

The document discusses the process of data normalization in relational databases, outlining the importance of organizing attributes to avoid redundancies and anomalies such as insertion, deletion, and update anomalies. It details the three normal forms (1NF, 2NF, 3NF) and their definitions, emphasizing functional dependencies and the significance of candidate and primary keys. The document also highlights the need for normalization to ensure a well-structured database schema.

Uploaded by

itsme789yo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views25 pages

6.data Normalization Process and The Normal Forms

The document discusses the process of data normalization in relational databases, outlining the importance of organizing attributes to avoid redundancies and anomalies such as insertion, deletion, and update anomalies. It details the three normal forms (1NF, 2NF, 3NF) and their definitions, emphasizing functional dependencies and the significance of candidate and primary keys. The document also highlights the need for normalization to ensure a well-structured database schema.

Uploaded by

itsme789yo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

• Data Normalization process and the normal forms

• Introduction to data normalization

• 1st Normal Form(1st NF)

• 2nd Normal Form (2nd NF)

• 3rd Normal Form (3rd NF)


Introduction
• Conceptual Modeling is a subjective process
• Therefore, the schema after the logical database design
phases may not be very good. (contain redundancies)
• However, there are formalisms to ensure that the
schema is good.

• This process is called Normalization.


Normalization
• Relational database schema = set of relations
• Relations = set of attributes
• However we group the attributes to relations is very
important.
• Too many attributes in a relation
❑ Waste space
❑ Anomalies
• Decomposing the relation into too smaller set of relations.
❑ Loss-less join property
❑ Dependency preserving property.
❑ Example Lecturer (id,name,address,salary,department,building)
Anomalies
• Insertion anomaly
Inserting a new lecturer to the LECTURER table. Department
information is reported (ensure that correct department
information is inserted)
Note: Inserting a department with no employees (impossible
– because null values for id is not allowed)

• Deletion anomaly
Deleting the last lecturer from the department will lose
information about the department.
Anomalies
• Update Anomaly
Updating the department’s building needs to be done for all
lecturers working for that department.

• When redundancies exits , we should decompose the


relations to smaller relations.
Anomalies
• Decomposing the relation into too smaller relations
• Less-less join property : we might lose information if we
decompose relations…
• Dependency-preserving property: The set of dependencies in
S can be verified by a set of dependencies in R1 and R2.
• Example
S R1 R2
S P D S P P D
S1 P1 D1 S1 P1 P1 D1
S2 P2 D2 S2 P2 P2 D2
S3 P3 D3 S3 S3 P1 D3
Anomalies
• Joining them together, we get spurious
tuples…
S P D
S1 P1 D1
S1 P1 D3
S2 P2 D2
S3 P1 P1
S3 P1 D3
Normalization
• To avoid the above mentioned issues in the relational schema,
we can apply a formal process called Normalizations.

• Normalization is based on functional dependence.


Functional Dependency
• Functional dependencies (FDs) are used to specify formal
measures of the "goodness" of relational designs.
• FDs and keys are used to define normal forms for
relations.
• FDs are constraints that are derived from the meaning
and interrelationships of the data attributes.
• A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique value
for Y.
Functional Dependency
• X -> Y holds if whenever two tuples have the same value
for X, they must have the same value for Y.
• For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]=t2[X], then t1[Y]=t2[Y]
• X -> Y in R specifies a constraint on all relation instances
r(R) Written as X -> Y;
• can be displayed graphically on a relation schema as in
Figures
• FDs are derived from the real-world constraints on the
attributes
Functional Dependency
• social security number determines employee name
SSN -> ENAME

• project number determines project name and location


PNUMBER -> {PNAME, PLOCATION}

• employee ssn and project number determines the hours per


week that the employee works on the project
{SSN, PNUMBER} -> HOURS
Functional Dependency
• An FD is a property of the attributes in the schema R
• The constraint must hold on every relation instance r(R)
• If K is a key of R, then K functionally determines all attributes
in R (since we never have two distinct tuples with t1[K]=t2[K])
Keys
• A superkey of a relation schema R = {A1, A2, ...., An} is a set of
attributes S subset-of R with the property that no two tuples
t1 and t2 in any legal relation state r of R will have t1[S] =t2[S]

• A key K is a superkey with the additional property that


removal of any attribute from K will cause K not to be a
superkey any more.
Keys
• If a relation schema has more than one key, each is called a
candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called
secondary keys.
• A Prime attribute must be a member of some candidate key
• A Nonprime attribute is not a prime attribute—that is, it is
not a member of any [Link] key
1st Normal Form
• Disallows composite attributes, multivalued attributes, and
nested relations; attributes whose values for an individual
tuple are non-atomic.
• A relation R is in first normal form (1NF) if domains of all
attirbutes in the relations are atomic.
• Considered to be part of the definition of relation
1st Normal Form
2nd Normal Form
• Uses the concepts of FDs, primary key
• Prime attribute - attribute that is member of the primary
key K
• Full functional dependency - a FD Y -> Z where removal
of any attribute from Y means the FD does not hold any
more
{SSN, PNUMBER} -> HOURS is a full FD since neither SSN
-> HOURS nor PNUMBER -> HOURS hold
- {SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds
2nd Normal Form
• A relation schema R is in second normal form (2NF) if every
non-prime attribute A in R is fully functionally dependent on
the primary key

• R can be decomposed into 2NF relations via the process of


2NF normalization
2nd Normal Form
3rd Normal Form
• Transitive functional dependency
• a FD X -> Z that can be derived from two FDs X -> Y
and Y -> Z

- SSN -> DMGRSSN is a transitive FD since


SSN -> DNUMBER and DNUMBER -> DMGRSSN hold
-SSN -> ENAME is non-transitive since there is no set of
attributes X where SSN -> X and X -> ENAME
3rd Normal Form
• A relation schema R is in third normal form (3NF) if it is in 2NF
and no non-prime attribute A in R is transitively dependent
on the primary key
• R can be decomposed into 3NF relations via the process of
3NF normalization
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this
a problem only if Y is not a candidate key. When Y is a
candidate key, there is no problem with the transitive
dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate key
• Normalization
• Redancadancy
• Anomalies
• Insert, delete and update anomalies
• Candidate key, primary key, prime attribute , non prime
attribute.
• 1st normal, 2nd normal form, 3 rd normal form.
• There are many Normal Forms proposed to reduce
redundancies.
• Some of the well-known ones are:
• 1st Normal Form
• 2nd Normal Form
• 3rd Normal Form
• Each key of a relation is called candidate key.
• A candidate key is chosen to be the primary key.
• An attribute key which is a member of a candidate key is
prime attribute.
• Discuss the attribute semantics as an informal measure of
goodness for a relation schema.
• What is insertion, deletion and modification anomalies.
• Why are they considered bad? Illustrate with example.
• What is functional dependency?
• Define the first, second and third normal forms with a
example.

You might also like