0% found this document useful (0 votes)
15 views29 pages

Chapter 4

Uploaded by

Mezgebe Abebe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views29 pages

Chapter 4

Uploaded by

Mezgebe Abebe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 4

Logical DB design: Normalization


Normalization
 A relational database is merely a collection of data, organized in a
particular manner. The father of the relational database approach, Codd
created a series of rules called normal forms that help define that
organization.
 One of the best ways to determine what information should be stored in
a database is to clarify what questions will be asked of it and what data
would be included in the answers.
 Database normalization is a series of steps followed to obtain a database
design that allows for consistent storage and efficient access of data in a
relational database. These steps reduce data redundancy and the risk of
data becoming inconsistent.
 NORMALIZATION is the process of identifying the logical associations
between data items and designing a database that will represent such
associations but without suffering the update anomalies which are;
Insertion, Deletion and Modification Anomalies
Normalization … Cont’d
 Normalization may reduce system performance since data will be cross
referenced from many tables.
 Thus DB normalization is sometimes used to improve performance, at
the cost of reduced consistency guarantees.
 Normalization normally is considered as good if it is lossless
decomposition.
 Mnemonic for remembering the rationale for normalization could be the
following:
 No Repeating or Redundancy: no repeating fields in the table
 The Fields Depend Upon the Key: the table should solely depend on the key
 The Whole Key: no partial key dependency
 And Nothing But The Key: no inter data dependency
Normalization … Cont’d
 All the normalization rules will eventually remove the update
anomalies that may exist during data manipulation after the
implementation.
 Pitfalls of Normalization
 Requires data to see the problems
 May reduce performance of the system
 Is time consuming,
 Difficult to design and apply and
 Prone to human error
Normalization … Cont’d
 The underlying ideas in normalization are simple enough. Through
normalization we want to design for our relational database a set of
tables that;
1. Contain all the data necessary for the purposes that the
database is to serve,
2. Have as little redundancy as possible,
3. Accommodate multiple values for types of data that require
them,
4. Permit efficient updates of the data in the database, and
5. Avoid the danger of losing data unknowingly
 The type of problems that could occur in insufficiently normalized
table is called update anomalies which includes;
Normalization … Cont’d
1. Insertion anomalies
 An "insertion anomaly" is a failure to place information about a new database
entry into all the places in the database where information about that new
entry needs to be stored.
 In a properly normalized database, information about a new entry needs to be
inserted into only one place in the database.
 In an inadequately normalized database, information about a new entry may
need to be inserted into more than one place and, human fallibility being what
it is, some of the needed additional insertions may be missed.
2. Deletion anomalies
 A "deletion anomaly" is a failure to remove information about an existing
database entry when it is time to remove that entry.
 In a properly normalized database, information about an old, to-be-gotten-rid-
of entry needs to be deleted from only one place in the database.
Normalization … Cont’d
 In an inadequately normalized database, information about that old entry may
need to be deleted from more than one place, and, human fallibility being what
it is, some of the needed additional deletions may be missed.
3. Modification anomalies
 A modification of a database involves changing some value of the attribute of a
table. In a properly normalized database table, what ever information is
modified by the user, the change will be effected and used accordingly.
 The purpose of normalization is to reduce the chances for anomalies to occur in a
database.
Normalization … Cont’d
 Deletion Anomalies: If employee with ID 16 is deleted then ever
information about skill C++ and the type of skill is deleted from the
database. Then we will not have any information about C++ and its skill
type.
 Insertion Anomalies: What if we have a new employee with a skill called
Pascal? We can not decide weather Pascal is allowed as a value for skill
and we have no clue about the type of skill that Pascal should be
categorized as.
 Modification Anomalies: What if the address for Helico is changed from
Piazza to Mexico? We need to look for every occurrence of Helico and
change the value of School_Add from Piazza to Mexico, which is prone to
error.
 Database-management system can work only with the information that we
put explicitly into its tables for a given database and into its rules for
working with those tables, where such rules are appropriate and possible.
Functional Dependency (FD)
 Before moving to steps of normalization, it is important to have an
understanding of "functional dependency.“
FD- is a relationship among attributes
Data Dependency
 The logical association between data items that point the database
designer in the direction of a good database design are referred to as
determinant or dependent relationships.
 Two data items A and B are said to be in a determinant or dependent
relationship if certain values of data item B always appears with
certain values of data item A.
 If the data item A is the determinant data item and B the dependent
data item then the direction of the association is from A to B and not
vice versa.
Functional Dependency (FD) … Cont’d
 The essence of this idea is that if the existence of something, call it A, implies
that B must exist and have a certain value, then we say that "B is functionally
dependent on A."
 We also often express this idea by saying that "A determines B," or that "B is
a function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the statement,
"If A, then B.“
 It is important to note that the value B must be unique for a given value of A,
i.e., any given value of A must imply just one and only one value of B, in order
for the relationship to qualify for the name "function." (However, this does not
necessarily prevent different values of A from implying the same value of B.)
 X Y holds if whenever two tuples have the same value for X, they must
have the same value for Y
Functional Dependency (FD) … Cont’d
 The notation is: A B which is read as; B is functionally dependent on A
 In general, a functional dependency is a relationship among attributes. In
relational databases, we can have a determinant that governs one other
attribute or several other attributes.
 FDs are derived from the real-world constraints on the attributes

 Since the type of Wine served depends on the type of Dinner, we say
Wine is functionally dependent on Dinner. Dinner Wine
Since both Wine type and Fork type are determined by
the Dinner type, we say Wine is functionally dependent
on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
Dinner Fork
Partial Dependency
 If an attribute which is not a member of the primary key is dependent
on some part of the primary key (if we have composite primary key)
then that attribute is partially functionally dependent on the primary
key.
 Let {A,B} is the Primary Key and C is no key attribute.
Then if {A,B} C and B C
Then C is partially functionally dependent on {A,B}
Full Dependency
 If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key (if we
have composite primary key) then that attribute is fully functionally
dependent on the primary key.
 Let {A,B} is the Primary Key and C is no key attribute
Transitive Dependency
 In mathematics and logic, a transitive relationship is a relationship of
the following form: "If A implies B, and if also B implies C, then A
implies C."
Steps of Normalization
 We have various levels or steps in normalization called Normal Forms.
 The level of complexity, strength of the rule and decomposition increases
as we move from one lower level Normal Form to the higher.
 A table in a relational database is said to be in a certain normal form if it
satisfies certain constraints.
 Normal form below represents a stronger condition than the previous one.
 Normalization towards a logical design consists of the following steps:
 Un Normalized Form: Identify all data elements
 First Normal Form: Find the key with which you can find all data
 Second Normal Form: Remove part-key dependencies. Make all data
dependent on the whole key.
 Third Normal Form: Remove non-key dependencies. Make all data
dependent on nothing but the key.
 For most practical purposes, databases are considered normalized if they
adhere to third normal form.
UNNORMALIZED FORM (UNF)
 A table that contains one or more repeating groups.
 A repeating group is a field or group of fields that hold multiple values
for a single occurrence of a field.

Repeating group= (Skill, SkillType, School, SchoolAdd, SkillLevel)


First Normal Form (1NF)
 Requires that all column values in a table are atomic (e.g., a number is
an atomic value, while a list or a set is not).
 We have tow ways of achieving this:
 1. Putting each repeating group into a separate table and
connecting them with a primary key-foreign key relationship
 2. Moving this repeating groups to a new row by repeating the
common attributes. If so then Find the key with which you can find
all data
 Definition of a table (relation) in 1NF if:
 There are no duplicated rows in the table. Unique identifier
 Each cell is single-valued (i.e., there are no repeating groups).
 Entries in a column (attribute, field) are of the same kind.
First Normal Form (1NF) … Cont’d
 FIRST NORMAL FORM (1NF): Remove all repeating groups.
 Distribute the multi-valued attributes into different rows and identify
a unique identifier for the relation so that is can be said is a relation in
relational database.
First Normal Form (1NF) … Cont’d
 Example 2: Consider the following UNF relation.

Here , Tele and fax fields are multi-valued


- To change in to 1NF relation, we need to split the table in to three
- The following tables are equivalent 1st Normal form of the above employee
table:
Second Normal form 2NF
 No partial dependency of a non key attribute on part of the primary
key.
 Any table that is in 1NF and has a single-attribute (i.e., a non-
composite) key is automatically also in 2NF.
 Definition of a table (relation) in 2NF
 It is in 1NF and
 If all non-key attributes are dependent on all of the key. i.e. no
partial dependency.
 Since a partial dependency occurs when a non-key attribute is
dependent on only a part of the (composite) key, the definition of
2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and
if it has no partial dependencies."
Second Normal form 2NF … Cont’d
 Example for 2NF:

 This schema is in its 1NF since we don’t have any repeating groups or
attributes with multi-valued property. To convert it to a 2NF we need to
remove all partial dependencies of non key attributes on part of the primary
key.
 {EmpID, ProjNo} EmpName, ProjName, ProjLoc, ProjFund,
ProjMangID
 But in addition to this we have the following dependencies
EmpID EmpName
ProjNo ProjName, ProjLoc, ProjFund, ProjMangID
Second Normal form 2NF … Cont’d
 As we can see some non key attributes are partially dependent on
some part of the primary key. Thus these collections of attributes
should be moved to a new relation.
Second Normal form 2NF … Cont’d
• Example 2: Normalize the following relation.

• The primary key for this table is the composite key (PatientId,
RelativeId).
Second Normal form 2NF … Cont’d
 So, to determine if it satisfies 2NF, you have to find out if all other
fields in it depend fully on both PatientId and RelativeId; that is,you
need to decide whether the following conditions are true:
 (PatientId, RelativeId) Relationship and
 (PatientId, RelativeId) Patient_tel.
 However, on the dependencies in the patient table, only the following
are true:
 (PatientId, RelativeId) Relationship and
 (PatientId) Patient_tel.

Therefore; based on the above dependency the normalized relation will be


divided into to tables.
Second Normal form 2NF … Cont’d
Third Normal Form (3NF )
 Eliminate Columns Not Dependent On Key - If attributes do not contribute to
a description of the key, remove them to a separate table.
 This level avoids update and delete anomalies.
 Definition of a Table (Relation) in 3NF
 It is in 2NF and
 There are no transitive dependencies between attributes.
 Example for (3NF): Assumption: Students of same batch (same year) live in
one building or dormitory

• This schema is in its 2NF since the primary key is a single attribute.
Third Normal Form (3NF ) … Cont’d
Third Normal Form (3NF ) … Cont’d
 Consider the following example:

Now, PK = empid
 We have functional dependencies:
 Empid → depid
 Depid → depname
 Or Depid → depbudjet
 Therefore, the above table is not is 3NF. To normalize it, we can use the
functional dependencies:
 Depid → depname
 Depid → depbudjet And
 Empid → depid
Third Normal Form (3NF ) … Cont’d
 So that the resulting tables are the following:

You might also like