Database Normalization Basics
Database Normalization Basics
Chapter 4
Normalization
1
Fundamentals of Databases 1
2
Fundamentals of Databases 1
Course Overview:
•Introduction
•ER modelling
•Relational database design
• Mapping ER model into relational DB model
• Design of the database scheme
•Normalization
•Referential Integrity: Constraints
•Implementation
• Creation of database, tables and fields
• Loading data into the database
•Structured Query Language (SQL)
• Simple queries
• Complex queries: Joins, views, nested queries
•Referential Integrity: Temporal Data and Triggers
•Physical data organization
•Indexing Structures
Fundamentals of Databases 1
Learning objectives
• You can analyze a given relation and identify what kind of anomalies it would allow or avoid
and which normal form it is.
• You can analyze a given relation to identify functional dependencies.
• You know and understand the 1st, 2nd, 3rd and BC normal forms.
• You can transform relations that are not in these normal forms into these normal forms.
• You know and understand the terms "repeating group", "atomic value range", "functional
dependency", "full functional dependency“, “partial functional dependency” and "transitive
functional dependency".
Reading
• El, chapter 14
• Co, Chapter 6
4
Fundamentals of Databases 1
5
Fundamentals of Databases 1
• Let's assume that a new article "sports bag" is to be added to the assortment. It
already has an article_ID (5) and a description (kid‘s sports bag). The article will be
available in Vienna and Berlin.
There are a couple of potential suppliers. Negotiations are still going on with potential
suppliers. How do you do add the article to the relation?
• Let's assume that the article „tent" is to be removed from the assortment. What
happens?
A relation that is not normalized (not in normal form) causes anomaly problems.
6
[email protected]
Fundamentals of Databases 1
• Insertion anomaly
Example: A new article is to be added - initially there is not yet a supplier.
The new article cannot be added because in this case part of the key would carry a
NULL value, which is not allowed.
A phantom supplier („dummy“) would have to be inserted as key.
• Deletion anomaly
Example: Let us assume that the article „tent" is to be removed from the assortment
and the tuple is to be deleted. When deleting the article „tent", the supplier C and
all the information belonging to the supplier are also deleted.
To keep the vendor, the other attributes would have to be set to NULL values. This is
not possible because, again, part of the key would carry NULL values.
7
[email protected]
Fundamentals of Databases 1
In addition, data should be stored in the relations in such a way that the data in the
database can be easily read and processed by the DBMS.
El, p.475: Normalization of data can be considered a process of analyzing the given
relation schemas based on their FDs and primary keys to achieve the desirable
properties of (1) minimizing redundancy and (2) minimizing the insertion, deletion,
and update anomalies …... The normalization process is primarily achieved via
decomposition.
8
[email protected]
Fundamentals of Databases 1
There are a total of 6 normal forms: The 1st to 3rd N, the Boyce-Codd normal form as
refinement of the 3rd NF and the 4th and 5th NF. The NF build on each other. This
means: if a relation is in the 3rd NF, it is automatically also in the 2nd NF.
El, p.475: The normal form of a relation refers to the highest normal form condition
that it meets, and hence indicates the degree to which it has been normalized.
Of essential importance are the normal forms 1 to 3 and Boyce-Codd normal form: A
relational database design should only contain relations that correspond at least to
the 3rd NF / Boyce-Codd normal form. (Unless relations are kept unnormalized on
purpose.)
9
[email protected]
Fundamentals of Databases 1
Functional Dependency - FD
R {D1 x D2 x …. Dn}
R = {A, B, …. N} with A,B,…N attributes of R
Functional Dependency X Y
t1, t2 R
if t1[X] = t2[X] t1[Y] = t2[Y]
if t1[X] = t2[X] then follows t1[Y] = t2[Y]
10
[email protected]
Fundamentals of Databases 1
Functional Dependency - FD
R = {A1, A2, …. An}
X,Y R
Functional Dependency X Y if t1[X] = t2[X] then t1[Y] = t2[Y]
• In other words: With given attribute sets X and Y with X,Y subsets of the relation
R: Y is functionally dependent on X if the following holds true: The values of X
determine the values of Y.
11
[email protected]
Fundamentals of Databases 1
Functional Dependency - FD
{A} {B} ?
{B} {C} ?
{A} {C} ?
{C,D} {B} ?
1
{B,C} {D} ?
2
3
4
5
Kemper, p.180
FDs represent semantic constraints that need to hold true for all possible
instances of a relation.
FDs represent contraints for the database schema.
12
[email protected]
Fundamentals of Databases 1
Functional Dependency - FD
EL, p.473:
13
[email protected]
Fundamentals of Databases 1
Functional Dependency - FD
14
[email protected]
Fundamentals of Databases 1
Functional Dependency - FD
El, p.472:
If a constraint on R states that there cannot be more than one tuple with a given X-
value in any relation instance r(R)—that is, X is a candidate key of R—this implies
that X → Y for any subset of a ributes Y of R (because the key constraint implies
that no two tuples in any legal state r(R) will have the same value of X).
If X is a candidate key of R, then X → R.
• In other words: All attributes are functionally dependent on the primary key
15
[email protected]
Fundamentals of Databases 1
enrollment
18
[email protected]
Fundamentals of Databases 1
Fundamentals of Databases 1
Normalization Process
roomEquip
buildingID cYear roomID maxOcc catID equipment price
1 1980 1 25 B chair: 25; desk: 4; 100
projector: 1;
1 1980 2 50 A chair: 50; desk: 12; 200
projector: 1;
flip chart: 2
2 1960 1 10 C chair: 10; desk: 1; 50
3 2012 1 12 C chair: 12; desk: 1; 50
3 2012 3 100 A chair: 100; desk: 2; 200
projector: 3;
• 1st normal form does not mean that in a relation "by chance" all attribute
values are atomic. It requires that only atomic attribute values are possible.
21
[email protected]
Fundamentals of Databases 1
mariadb documentation
https://mariadb.com/kb/en/database-normalization-1st-normal-form/
• What this means is that data must be able to fit into a tabular format, where each
field contains one value. This is also the stage where the primary key is defined.
Some sources claim that defining the primary key is not necessary for a table to be in
first normal form, but usually it's done at this stage and is necessary before we can
progress to the next stage. Theoretical debates aside, you'll have to define your
primary keys at this point.
• Although not always seen as part of the definition of 1st normal form, the principle
of atomicity is usually applied at this stage as well. This means that all columns must
contain their smallest parts or be indivisible. A common example of this is where
someone creates a name field, rather than first name and surname fields. They
usually regret it later.
22
[email protected]
Fundamentals of Databases 1
• El, p. 477: First normal form (1NF) is now considered to be part of the formal
definition of a relation in the basic (flat) relational model; historically, it was
defined to disallow multivalued attributes, composite attributes, and their
combinations. It states that the domain of an attribute must include only atomic
(simple, indivisible) values and that the value of any attribute in a tuple must be
a single value from the domain of that attribute. Hence, 1NF disallows having a
set of values, a tuple of values, or a combination of both as an attribute value
for a single tuple.
• In other words, 1NF disallows relations within relations or relations as attribute
values within tuples. The only attribute values permitted by 1NF are single
atomic (or indivisible) values.
23
[email protected]
Fundamentals of Databases 1
In our example roomEquip we actually have a two-fold violation of the 1st normal
form:
roomEquip:
t1: {1,1, 2020, 25, A, 100, {chair:25, desk:4; projector: 1}}
• 2nd violation of 1st normal form: composite attribute within the repeating group
- each value of the repeating group holds two components / two attributes:
equipment and amount.
24
[email protected]
Fundamentals of Databases 1
Example:
roomEquip: {1,1, 2020, 25, A, 100, {chair:25, desk:4; projector: 1}}
2. Each value stands for its own. In this case the attribute is not functionally
dependent on the primary key.
26
Fundamentals of Databases 1
Key?
27
Fundamentals of Databases 1
A relation is in 2nd NF
• if it is in 1st NF and
• if all non-key attributes are fully functionally dependent on the entire key.
(There are no partial dependencies)
For relations with atomic key it holds that they are in the 2nd NF if they are in
the 1st NF.
28
Fundamentals of Databases 1
29
Fundamentals of Databases 1
• Non-key attributes:
• Which non-key attributes are fully functionally dependent on the entire key?
• Which non-key attributes are NOT fully functionally dependent on the entire
key?
30
[email protected]
Fundamentals of Databases 1
Relations in 2nd NF
31
Fundamentals of Databases 1
• The attribute "price" is determined by the attribute "catID" and the attribute
"catID" by the entire key {buildingID, roomID}. Thus, the attribute "price" is
transitively dependent on the key {buildingID, roomID}.
32
[email protected]
Fundamentals of Databases 1
33
Fundamentals of Databases 1
1 1 25 B B 100
1 2 50 A A 200
2 1 10 C C 50
3 3 100 A
34
Fundamentals of Databases 1
Relations in 3rd NF
35
Fundamentals of Databases 1
There is a difference between BCNF and 3rd NF only if there are multiple
candidate keys with overlapping attributes.
Example: candidate key 1 consisting of attributes {A,B} and candidate key 2
consisting of attributes {B,C}. Overlapping attribute in both key candidates: B
37
Fundamentals of Databases 1
Normalizing to BCNF:
Decomposition of the relation into two relations.
38
Fundamentals of Databases 1
Candidate keys?
FDs
Is the relation in 3rd normal form?
39
Fundamentals of Databases 1
Any professor can only examine one course. Each course is only examined by one
professor.
Candidate keys?
FDs
40
Fundamentals of Databases 1
When decomposing relations from 3rd to BCNF, one MUST do it in a way that
renders non-additive joins and does not generate spurious tuples.
When decomposing relations from 3rd to BCNF, it is not always possible to
preserve all FDs. Sometimes one needs to decide between arriving at BCNF
but losing FDs or staying with 3rd NF and preserving FDs.
41
Fundamentals of Databases 1
Any professor can only examine one course but each course can be examined by
multiple professors.
Candidate keys?
FDs
FDs cannot all be preserved but non-additive join property can be achieved
42
Fundamentals of Databases 1
Wrong solution:
Generation of spurious (incorrect) tuples (additive join property)
43
Fundamentals of Databases 1
Summary Normalization
NF StudID
1st NF all attrbutes have atomic values only, no repearting groups, no
composite attributes, table format, PK
2nd NF 1st NF plus no partial dependency of non-key attributes
3rd NF 2nd NF plus no transitive dependencies of non-key attributes
BCNF 3rd NF plus every determinant is a candidate key
Definition: An attribute (or group of attributes) of which others are fully functionally
dependent is called a determinant.
44
Fundamentals of Databases 1
Normalization Summary
45
Fundamentals of Databases 1
Denormalization: N1NF
1st normal form:
Often, an attribute of an entity is described by a set of different values - by a repeating
group:
• Room Equipment :
• {chairs, tables, projectorr, flipchart, ...}
• hobbies:
• {reading, swimming, biking, dancing, ...}
• Programming skills
• {Java, Javascript, Php, C++, ....}
• …
These attributes must be decomposed to comply with the 1st NF. An entity instance is
then represented in more than one tuple in several relations.
Since these relations must be reassembled (joined) for queries, this also represents a
performance disadvantage.
46
Fundamentals of Databases 1
N1NF
A set. A string object that can have zero or more values, each of which must be
chosen from the list of values 'value1', 'value2', ... A SET column can have a
maximum of 64 members. SET values are represented internally as integers.
• SET('value1','value2',...)
set (“swimming“,“singing“,“football“,“collecting stamps“)
47
Fundamentals of Databases 1