UNIT 3
RELATIONAL DATABASE DESIGN AND NORMALIZATION
ER and EER-to-Relational mapping –Update anomalies –Functional dependencies –
Inference rules –Minimal cover –Properties of relational decomposition –Normalization
(up to BCNF).
3.1 ER and EER-to-Relational mapping
What are the ER diagrams?
ER diagram is a visual representation of data based on the ER model, and it describes how
entities are related to each other in the database.
What are EER diagrams?
EER diagram is a visual representation of data, based on the EER model that is an extension of
the original entity-relationship (ER) model.
Entity
An entity is any singular, identifiable and separate object. It refers to individuals, organizations,
systems, bits of data or even distinct system components that are considered significant in and of
themselves. For example, People, Property, Organizations, Agreements and, etc. In the ER
diagram, the entity is represented by a rectangle
Weak Entity
A weak entity is an entity that depends on the existence of another entity. In more technical
terms it can be defined as an entity that cannot be identified by its own attributes.
Attributes
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB,
Age, Address, Mobile_No are the attributes that define entity type Student. In the ER diagram,
the attribute is represented by an oval.
Multi-valued Attribute
If an attribute can have more than one value it is called a multi-valued attribute.
Derived Attribute
An attribute-based on another attribute.
Relationships
A relationship is an association that describes the interaction between entities.
Recursive Relationship
1
If the same entity participates more than once in a relationship it is known as a recursive
relationship.
The following are the types of entities, attributes, and relationships.
Cardinality
Cardinality refers to the maximum number of times an instance in one entity can relate to
instances of another entity. There are three types of cardinalities.
one to one (1 to 1)
one to many(1 to N)
many to many (M to N)
Participation
Participation constraint specifies the existence of an entity when it is related to another entity in a
relationship type. There are two types. Partial and Total participation.
Total Participation − Each entity is involved in the relationship. Total participation is
represented by double lines.
Partial participation − Not all entities are involved in the relationship. Partial participation is
represented by single lines.
2
There are three steps in database development.
Requirements Analysis- Understand the data problem through user interviews, forms,
reports, observation and etc.
Component design stage- Create a data model that is a graphical representation of what will
be finally will be implemented.
Implementation stage- Actual development of the database which leads to database actually
being used in the real environment.
In the above stages after the first stage(Requirements analysis) you have to follow the database
design stages.
Conceptual design
Logical design
Physical design
In the Conceptual design, we identify all entities, we define attributes and their
properties and we define relationships between entities.
In the Logical design, we transform the conceptual design into relational, transform
entities as tables, transform entity attributes into table column, and normalization.
3
In the Physical design, we specify internal storage structure and paths, assign one or
more indexes and tune indexes.
At the conceptual design stage, we design the ER or EER diagrams.
Here are some rules for drawing ER and EER diagrams
Write all entities in the singular form
Write all relationships in a singular form.
Write all attributes in the singular form.
If you want to separate words use underscore mark.
Now, let’s see how to draw ER diagrams and EER diagrams.
Drawing ER and EER diagrams
Below points show how to go about creating an ER diagram.
1. Identify all the entities in the system. An entity should appear only once in a particular
diagram. Create rectangles for all entities and name them properly.
2. Identify relationships between entities. Connect them using a line and add a diamond in
the middle describing the relationship.
3. Add attributes for entities. Give meaningful attribute names so they can be understood
easily.
4. Mark the cardinalities and participation between the entities.
example of ER diagrams.
Now let’s see how to draw EER diagrams.
Here we just need to add a few things to above.
4
1. As in drawing ER diagrams first, we have to identify all entities.
After we found entities from the scenario you should check whether those entities have sub-
entities. If so you have to mark sub-entities in your diagram.
Dividing entities into sub-entities we called as specialization. And combining sub-entities to one
entity is called a generalization.
2. Then you have to identify relationships between entities and mark them.
3. Add attributes for entities. Give meaningful attribute names so they can be understood easily.
4. Mark the cardinalities and participation
If it is an EER diagram you have to add a few to your diagram.
Here also we have to check whether sub-entities totally depend on the main entity or not. And
you should mark it.
If all members in the superclass (main entity) participate in either one subclass it is known as
total participation. It marks by double lines.
5
If at least one member in the superclass does not participate in subclass it is known as partial
participation. It is denoted by one single line.
6
If all the members in the superclass participate for only one subclass it is known as disjoint and
denoted by “d”. If all the members in the superclass participate in more than one subclass it is
known as overlap and denoted by “o”.
Benefits of ER and EER diagrams.
Easy to understand and does not require a person to undergo extensive recently training to be
able to work with it and accurately.
Readily translatable to relational tables which can be used to quickly build databases
Can directly be used by database developers as the blueprint for implementing databases in
specific software application
It can be applied in other contexts such as describing the different relationships and operations
within an organization.
Now let’s move on. Our next topic is map ER and EER diagrams into relational schemas.
Mapping ER and EER diagrams into relational schemas
First I’ll tell you how to map the ER diagram into a relational schema.
Mapping ER diagrams into relational schemas
Follow the steps one by one to get it done.
1. Mapping strong entities.
2. Mapping weak entities.
3. Map binary one-to-one relations.
4. Map binary one-to-many relations
5. Map binary many-to-many relations.
6. Map multivalued attributes.
7. Map N-ary relations
Let’s go deep with the examples.
1. Mapping strong entities.
2. Mapping weak entities.
7
Above it shows an ER diagram with its relationships. You can see there are two strong entities
with relationships and a weak entity with a weak relationship.
When you going to make a relational schema first you have to identify all entities with their
attributes. You have to write attributes in brackets as shown below. Definitely you have to
underline the primary keys. In the above DEPENDENT is a weak entity. To make it strong go
through the weak relationship and identify the entity which connects with this. Then you have
written that entity’s primary key inside the weak entity bracket.
Then you have to map the primary key to where you took from as shown below.
3. Map binary one to one relations.
Let’s assume that the relationship between CUSTOMER and CARD is one to one.
There are three occasions where one to one relations take place according to the participation
constraints.
I. Both sides have partial participation.
8
When both sides have partial participation you can send any of the entity’s primary key to
others. At the same time, if there are attributes in the relationship between those two entities,
it is also sent to other entity as shown above.
So, now let us see how we write the relational schema.
Here you can see I have written CUSTID and STARTDATE inside the CARD table. Now
you have to map CUSTID from where it comes. That’s it.
II. One side has partial participation.
You can see between the relationship and CARD entity it has total participation.
When there is total participation definitely the primary of others comes to this. And also if there
are attributes in the relationship it also comes to total participation side.
Then you have to map them as below.
9
III. Both sides have total participation
If both sides have total participation you need to make a new relationship with a suitable
name and merge entities and the relationship.
Following it shows how we should write the relation.
Now let us see how to map one to many relations.
4. Map binary one-to-many relations
10
If it is one-to-many, always to the many side other entities' primary keys and the attributes in
the relationship go to the many side. No matter about participation. And then you have to
map the primary key.
5. Map binary many to many relations.
If it is many to many relations, you should always make a new relationship with the name of the
relationship between the entities.
And there you should write both primary keys of the entities and attributes in the relationship and
map them to the initials as shown below.
11
6. Map multivalued attributes.
If there are multivalued attributes you have to make a new relationship with a suitable name and
write the primary key of the entity which belongs to the multivalued attribute and also the name
of the multivalued attribute as shown below.
7. Map N-ary relations.
First, let us consider unary relationships.
We categorized them into two.
I. one-to-one and one to many relations.
12
If it is unary and one to one or one to many relations, you do not need to make a new relationship
you just want to add a new primary key to the current entity as shown below and map it to the
initial. For example, in the above diagram, the employee is supervised by the supervisor.
Therefore, we need to make a new primary key as SID and map it to EMPID. Because of SID
also an EMPID.
II. many-to-many relations.
If it is unary and many to many relations, you need to make a new relationship with a suitable
name. Then you have to give it a proper primary key and it should map to where it comes as
shown below.
13
Now let us see how to map relations with more than two entities.
If there are more than three entities for a relationship you have to make a new relation table and
put all primary keys of connected entities and all attributes in the relationship. And in the end,
you have to map them as shown below.
Mapping EER diagrams into relational schemas.
Let us go through the following diagram.
14
There are four ways to draw relational schema for an EER. You have to choose the most suitable
one. In the end, I'll give you the guidelines on how to choose the best and most suitable way.
First method
Here we write separate relations to all superclass entities and subclass entities. And here we have
to write the superclass entities' primary key to all subclass entities and then map them as shown
above. Note that we write only the attributes belongs to each entity.
Second method
Here we do not write the superclass entity but in each subclass entity, we write all attributes that
are in superclass entity.
Third method
15
Here we write only the superclass entity and write all the attributes which belong to subclass
entities. Specialty in here is that to identify that PERSON is an EMPLOYEE or STUDENT we
add a column as PERSONTYPE. After the table creates we can mark as a STUDENT or
EMPLOYEE.
Fourth method
Here instead of PERSONTYPE, we write STUDENT and EMPLOYEE both.
The reason for that is sometime PERSON will belong to both categories.
3.2 Anomalies
There are different types of anomalies which can occur in referencing and referenced relation
which can be discussed as:
Insertion anomaly: If a tuple is inserted in referencing relation and referencing attribute value is
not present in referenced attribute, it will not allow inserting in referencing relation. For
Example, if we try to insert a record in STUDENT_COURSE with STUD_NO =7, it will not
allow.
16
Deletion and Updating Anomaly: If a tuple is deleted or updated from referenced relation and
referenced attribute value is used by referencing attribute in referencing relation, it will not allow
deleting the tuple from referenced relation. For Example, if we try to delete a record from
STUDENT with STUD_NO =1, it will not allow. To avoid this, following can be used in query:
ON DELETE/UPDATE SET NULL: If a tuple is deleted or updated from referenced
relation and referenced attribute value is used by referencing attribute in referencing relation, it
will delete/update the tuple from referenced relation and set the value of referencing attribute to
NULL.
ON DELETE/UPDATE CASCADE: If a tuple is deleted or updated from referenced relation
and referenced attribute value is used by referencing attribute in referencing relation, it will
delete/update the tuple from referenced relation and referencing relation as well.
3.3 Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant; the right side of the production is known as a
dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because
if we know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
17
1. Trivial functional dependency
A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial
dependencies too.
2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
When A Intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
3.4 Inference Rule (IR):
The Armstrong's axioms are the basic inference rule.
Armstrong's axioms are used to conclude functional dependencies on a relational database.
The inference rule is a type of assertion. It can apply to a set of FD(functional dependency) to
derive other FD.
Using the inference rule, we can derive additional functional dependency from the initial set.
The Functional dependency has 6 types of inference rule:
1. Reflexive Rule (IR1)
In the reflexive rule, if Y is a subset of X, then X determines Y.
If X ⊇ Y, then X →Y
Example:
X = {a, b, c, d, e}
Y = {a, b, c}
18
2. Augmentation Rule (IR2)
The augmentation is also called as a partial dependency. In augmentation, if X determines Y,
then XZ determines YZ for any Z.
If X → Y, then XZ → YZ
Example:
For R(ABCD), if A → B then AC → BC
3. Transitive Rule (IR3)
In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
If X → Y and Y → Z, then X → Z
4. Union Rule (IR4)
Union rule says, if X determines Y and X determines Z, then X must also determine Y and Z.
If X → Y and X → Z, then X → YZ
Proof:
1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5)
Decomposition rule is also known as project rule. It is the reverse of union rule.
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
Proof:
1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6)
In Pseudo Transitive Rule, if X determines Y and YZ determines W, then XZ determines W.
If X → Y and YZ → W, then XZ → W
19
Proof:
1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)
3.5 Minimal Cover
A minimal cover of a set of functional dependencies (FD) E is a minimal set of dependencies F
that is equivalent to E.
The formal definition is: A set of FD F to be minimal if it satisfies the following conditions −
Every dependency in F has a single attribute for its right-hand side.
We cannot replace any dependency X->A in F with a dependency Y->A, where Y is a proper
subset of X, and still have a set of dependencies that is equivalent to F.
We cannot remove any dependency from F and still have a set of dependencies that are
equivalent to F.
Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD
FC is called canonical cover of F if each FD in FC is a −
Simple FD.
Left reduced FD.
Non-redundant FD.
Simple FD − X->Y is a simple FD if Y is a single attribute.
Left reduced FD − X->Y is a left reduced FD if there are no extraneous attributes in X.
{extraneous attributes: Let XA->Y then, A is a extraneous attribute if X_>Y}
Non-redundant FD − X->Y is a Non-redundant FD if it cannot be derived from F- {X->y}.
Example
Consider an example to find canonical cover of F.
The given functional dependencies are as follows −
A BC
B C
AB
20
AB C
Minimal cover: The minimal cover is the set of FDs which are equivalent to the given FDs.
Canonical cover: In canonical cover, the LHS (Left Hand Side) must be unique.
First of all, we will find the minimal cover and then the canonical cover.
First step − Convert RHS attribute into singleton attribute.
AB
AC
BC
AB
AB C
Second step − Remove the extra LHS attribute
Find the closure of A.
A+ = {A, B, C}
So, AB C can be converted into A -> C
AB
AC
BC
AB
AC
Third step − Remove the redundant FDs.
AB
BC
Now, we will convert the above set of FDs into canonical cover.
The canonical cover for the above set of FDs will be as follows −
A BC
BC
21
3.5 Properties of Relational Decomposition
When a relation in the relational model is not appropriate normal form then the decomposition of
a relation is required. In a database, breaking down the table into multiple tables termed as
decomposition. The properties of a relational decomposition are listed below:
1. Attribute Preservation:
Using functional dependencies, the algorithms decompose the universal relation schema R in a
set of relation schemas D = {R1, R2, …... Rn} relational database schema, where ‘D’ is called
the Decomposition of R.
The attributes in R will appear in at least one relation schema Ri in the decomposition, i.e., no
attribute is lost. This is called the Attribute Preservation condition of decomposition.
2. Dependency Preservation:
If each functional dependency X->Y specified in F appears directly in one of the relation
schemas Ri in the decomposition D or could be inferred from the dependencies that appear in
some Ri. This is the Dependency Preservation.
If a decomposition is not dependency preserving some dependency is lost in decomposition.
To check this condition, take the JOIN of 2 or more relations in the decomposition.
For example:
R = (A, B, C)
F = {A ->B, B->C}
Key = {A}
R is not in BCNF.
Decomposition R1 = (A, B), R2 = (B, C)
R1 and R2 are in BCNF, Lossless-join decomposition, Dependency preserving.
Each Functional Dependency specified in F either appears directly in one of the relations in
the decomposition.
It is not necessary that all dependencies from the relation R appear in some relation Ri.
It is sufficient that the union of the dependencies on all the relations Ri be equivalent to the
dependencies on R.
3. No redundancy:
Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy. If the relation has no proper decomposition, then it may lead to
problems like loss of information.
22
4. Non Additive Join Property / Lossless Join:
Another property of decomposition is that D should possess is the Non Additive Join Property,
which ensures that no spurious tuples are generated when a NATURAL JOIN operation is
applied to the relations resulting from the decomposition.
Lossless join property is a feature of decomposition supported by normalization. It is the ability
to ensure that any instance of the original relation can be identified from corresponding instances
in the smaller relations.
For example:
R: relation, F: set of functional dependencies on R,
X, Y: decomposition of R,
A decomposition {R1, R2, …, Rn} of a relation R is called a lossless decomposition for R if the
natural join of R1, R2, …, Rn produces exactly the relation R.
A decomposition is lossless if we can recover:
R (A, B, C) -> Decompose -> R1(A, B) R2(A, C) -> Recover -> R’ (A, B, C)
Thus, R’ = R
Decomposition is lossless if:
X intersection Y -> X, that is: all attributes common to both X and Y functionally determine
ALL the attributes in X.
X intersection Y -> Y, that is: all attributes common to both X and Y functionally determine
ALL the attributes in Y
If X Intersection Y forms a super key of either X or Y, the decomposition of R is a lossless
decomposition.
3.6 Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using relationship.
The normal form is used to reduce redundancy from the database table.
Types of Normal Forms
There are the four types of normal forms:
23
Normal Form- Description
1NF- A relation is in 1NF if it contains an atomic value.
2NF- A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
3NF- A relation will be in 3NF if it is in 2NF and no transition dependency exists.
4NF- A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
5NF- A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
First Normal Form (1NF)
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
24
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional dependent on the primary
key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Maths 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which
is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
25
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Maths
83 Computer
Third Normal Form (3NF)
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in third
normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP} ....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
26
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>
table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
27