MODULE 5
RELATIONAL DATABASE
DESIGN
Mrs. Smita Abhijit Ganjare
Features of Good Relational Design
Data Representation: Data is organized in tables (relations), with rows (tuples) representing
records and columns (attributes) representing data fields.
Atomic Values: Each attribute in a table contains atomic values, meaning no multi-valued or
nested data is allowed in a single cell.
Unique Keys: Every table has a primary key to uniquely identify each record, ensuring no
duplicate rows.
Attribute Domain: Each attribute has a defined domain, specifying the valid data types and
constraints for the values it can hold.
Tuples as Rows: Rows in a table, called tuples, represent individual records or instances of
real-world entities or relationships.
Relation Schema: A table’s structure is defined by its schema, which specifies the table
name, attributes, and their domains.
Data Independence: The model ensures logical and physical data independence, allowing
changes in the database schema without affecting the application layer.
Pitfalls in Relational Database Design
Poor Data Modelling: One of the most critical aspects of designing a
relational database is data modeling. A well-designed data model can ensure
that the database is efficient and easy to use. However, a poorly designed
data model can lead to problems such as data redundancy, data
inconsistencies, and poor performance. It is essential to spend time
understanding the data requirements and designing a data model that meets
those requirements.
Lack of Normalization: Normalization is a process that helps eliminate data
redundancy and improve data consistency. When a database is not
normalized, it can lead to several issues, such as poor performance, data
inconsistencies, and difficulty in maintaining the database. It is crucial to
normalize the database to ensure that the data is organized efficiently and to
improve the database's overall performance.
Pitfalls in Relational Database Design
Overuse of Indexes: Indexes can help improve query performance, but they can also
slow down the database if they are overused. It is essential to use indexes judiciously
and only on columns that are frequently used in queries. Overusing indexes can lead to
poor performance, increased disk space usage, and slower database backups.
Lack of Proper Data Types : Choosing the right data types for columns is crucial for
efficient database design. Using improper data types can lead to poor performance,
data truncation, and data inconsistencies. It is essential to choose the appropriate data
type for each column, based on the data requirements and the data's expected size.
Ignoring Referential Integrity : Referential integrity is a set of rules that ensure the
consistency and accuracy of data between related tables. Ignoring referential integrity
can lead to data inconsistencies, data corruption, and poor database performance. It is
essential to define relationships between tables and enforce referential integrity
constraints to maintain data consistency.
Pitfalls in Relational Database Design
Lack of Backup and Recovery Plan : A backup and recovery plan is essential to ensure
the database's availability and recoverability in case of any data loss or corruption. A
lack of a backup and recovery plan can result in significant data loss and downtime,
leading to loss of revenue and customer trust. It is crucial to design and implement a
backup and recovery plan that includes regular backups, testing backups, and recovery
procedures.
Inadequate Performance Monitoring : Monitoring the database's performance is
essential to ensure that it is running efficiently and to identify any performance
bottlenecks. Inadequate performance monitoring can lead to poor performance, slow
response times, and customer dissatisfaction. It is crucial to monitor the database's
performance regularly and identify any performance issues promptly.
Conclusion on Pitfalls In Database Design
In conclusion, relational database design is a complex process
that requires careful planning and attention to detail. Avoiding
these common pitfalls can help ensure that the database is
efficient, scalable, and easy to use. By following best practices
in database design, database designers can create a robust and
reliable database that meets the organization's data
requirements.
Normalization in DBMS
Normalization is a systematic approach
to organize data in a database to
eliminate redundancy, avoid anomalies
and ensure data consistency. The
process involves breaking down large
tables into smaller, well-structured
ones and defining relationships
between them. This not only reduces
the chances of storing duplicate data
but also improves the overall efficiency
of the database.
Types of Normal Forms
1. First Normal Form (1NF) :
• A table is referred to as being in its First Normal Form if the atomicity of the table is
1.
• Atomicity states that a single cell cannot hold multiple values; it must hold only a
single attribute.
• The First Normal Form disallows the multi-valued attribute, composite attribute, and
their combinations.
• Now, you will understand the First Normal Form with the help of an example.
• Below is a students’ record table that has information about the student roll number,
student name, student course, and age of the student.
In the student record table, the course column
has two values. However, it does not follow
the First Normal Form. You get the table
below if you use the First Normal Form in the
above table.
In the student record table, the course column
has two values. However, it does not follow
the First Normal Form. You get the table
below if you use the First Normal Form in the
above table.
By applying the First Normal Form, you
achieve atomicity; every column has unique
values.
2. Second Normal Form (2NF)
The first condition for the table to be in the Second Normal Form is that the table has to be in the
First Normal Form. The table should not possess partial dependency. The partial dependency
here means the proper subset of the candidate key should give a non-prime attribute.
Now, understand the Second Normal Form with the help of an example.
Consider the table Location:
The Location table possesses a composite
primary key cust_id, storeid. The non-key
attribute is store_location. In this case,
store_location only depends on storeid,
which is a part of the primary key. Hence,
this table does not fulfill the second normal
form.
To bring the table to the Second Normal Form,
you need to split the table into two parts. This
will give you the tables below:
As you have removed the partial functional
dependency from the location table, the column
store_location entirely depends on the primary
key of that table, storeid.
3. Third Normal Form (3NF)
• The first condition for a table to be in the Third Normal Form is that it should be in
the Second Normal Form.
• The second condition is that there should be no transitive dependency for non-
prime attributes, which indicates that non-prime attributes (not part of the
candidate key) should not depend on other non-prime attributes in a table.
Therefore, a transitive dependency is a functional dependency in which A → C (A
determines C) indirectly because of A → B and B → C (where it is not the case that
B → A).
The Third Normal Form ensures the reduction of data duplication. It is also used to
achieve data integrity.
Below is a student table that has student ID, student name, subject ID, subject
name, and address of the student as its columns.
In the above student table, stu_id
determines subid, and subid determines
sub. Therefore, stu_id determines sub via
subid. This implies that the table
possesses a transitive functional
dependency and does not fulfill the third
normal form criteria.
Now, to change the table to the third
normal form, you need to divide the table
as shown below:
As you can see in both tables, all the non-
key attributes are now fully functional,
dependent only on the primary key. In the
first table, the columns' names, subids, and
addresses only depend on stu_id. In the
second table, the sub only depends on the
subid.
4. Boyce CoddNormal Form (BCNF)
Boyce Codd Normal Form is also known as 3.5 NF. It is the superior version of 3NF and was
developed by Raymond F. Boyce and Edgar F. Codd to tackle certain types of anomalies that were
not resolved with 3NF.
The first condition for the table to be in Boyce Codd Normal Form is that the table should be in the
third normal form. Secondly, every Right-Hand Side (RHS) attribute of the functional dependencies
should depend on the super key of that particular table.
Consider the subject table below:
The subject table follows these conditions:
• Each student can enroll in multiple subjects.
• Multiple professors can teach a particular subject.
• For each subject, it assigns a professor to the student.
In the above table, student_id and subject together form the primary key because by using
student_id and subject, you can determine all the table columns.
Another important point is that one professor teaches only one subject, but one subject may have
two professors.
This exhibits a dependency between the subject and the professor, i.e., the subject depends on the
professor's name.
The table is in 1st Normal form as all the column names are unique, all values are atomic, and all the
values stored in a particular column are of the same domain.
The table also satisfies the 2nd Normal Form, as there is no Partial Dependency.
There is no Transitive Dependency; hence, the table satisfies the 3rd Normal Form.
This table follows all the Normal forms except the Boyce Codd Normal Form.
As you can see, stuid and subject form the primary key, which means the subject attribute is a prime
attribute.
However, there exists yet another dependency
- professor → subject.
BCNF does not follow in the table as a subject
is a prime attribute, and the professor is a
non-prime attribute.
To transform the table into the BCNF, you will
divide the table into two parts. One table will
hold stuid, which already exists, and the
second table will hold a newly created column
profid.
The second table will have profid, subject, and
professor columns, which satisfies the BCNF.
Normal Forms in DBMS
Advantages of Normalization
• Normalization eliminates data redundancy and ensures that each piece of data
is stored in only one place, reducing the risk of data inconsistency and making it
easier to maintain data accuracy.
• By breaking down data into smaller, more specific tables, normalization helps
ensure that each table stores only relevant data, which improves the overall
data integrity of the database.
• Normalization simplifies the process of updating data, as it only needs to be
changed in one place rather than in multiple places throughout the database.
• Normalization enables users to query the database using a variety of different
criteria, as the data is organized into smaller, more specific tables that can be
joined together as needed.
• Normalization can help ensure that data is consistent across different
applications that use the same database, making it easier to integrate different
applications and ensuring that all users have access to accurate and consistent
data.
Disadvantages of Normalization
• Normalization can result in increased performance overhead due to
the need for additional join operations and the potential for slower
query execution times.
• Normalization can result in the loss of data context, as data may be
split across multiple tables and require additional joins to retrieve.
• Proper implementation of normalization requires expert
knowledge of database design and the normalization process.
• Normalization can increase the complexity of a database design,
especially if the data model is not well understood or if the
normalization process is not carried out correctly.
Example of Normalization
Anomalies in Relational Model
Anomalies in the relational model refer to inconsistencies or
errors that can arise when working with relational databases,
specifically in the context of data insertion, deletion, and
modification. There are different types of anomalies that can
occur in referencing and referenced relations which can be
discussed as:
These anomalies can be categorized into three types:
• Insertion Anomalies
• Deletion Anomalies
• Update Anomalies.
Anomalies in Relational Model
Insertion Anomalies: These anomalies occur when it is not possible to insert
data into a database because the required fields are missing or because the
data is incomplete. For example, if a database requires that every record has
a primary key, but no value is provided for a particular record, it cannot be
inserted into the database.
Deletion anomalies: These anomalies occur when deleting a record from a
database and can result in the unintentional loss of data. For example, if a
database contains information about customers and orders, deleting a
customer record may also delete all the orders associated with that customer.
Update anomalies: These anomalies occur when modifying data in a
database and can result in inconsistencies or errors. For example, if a
database contains information about employees and their salaries, updating
an employee’s salary in one record but not in all related records could lead to
incorrect calculations and reporting.
Removal of Anomalies
Anomalies in DBMS can be removed by applying normalization. Normalization
involves organizing data into tables and applying rules to ensure data is stored
in a consistent and efficient manner. By reducing data redundancy and ensuring
data integrity, normalization helps to eliminate anomalies and improve the
overall quality of the database.
According to E.F.Codd, who is the inventor of the Relational Database, the
goals of Normalization include:
• It helps in vacating all the repeated data from the database.
• It helps in removing undesirable deletion, insertion, and update anomalies.
• It helps in making a proper and useful relationship between tables.
Removal of Anomalies
Key steps include:
1. First Normal Form (1NF): Ensures each column contains atomic
values and removes repeating groups.
2. Second Normal Form (2NF): Eliminates partial dependencies by
ensuring all non-key attributes are fully dependent on the primary
key.
3. Third Normal Form (3NF): Removes transitive dependencies by
ensuring non-key attributes depend only on the primary key.
By implementing these normalization steps, the database
becomes more structured, reducing the likelihood of insertion,
update, and deletion anomalies.
Decomposition In DBMS
Decomposition refers to the division of tables into multiple
tables to produce consistency in the data.
When we divide a table into multiple tables or divide a relation
into multiple relations, then this process is termed
Decomposition in DBMS.
We perform decomposition in DBMS when we want to process a
particular data set. It is performed in a database management
system when we need to ensure consistency and remove
anomalies and duplicate data present in the database.
When we perform decomposition in DBMS, we must try to
ensure that no information or data is lost.
Decomposition In DBMS
Types of Decomposition
There are two types of Decomposition:
• Lossless Decomposition
• Lossy Decomposition
•
Lossless Decomposition
The process in which where we can regain the original relation R with the help of joins
from the multiple relations formed after decomposition. This process is termed as
lossless decomposition. It is used to remove the redundant data from the database
while retaining the useful information. The lossless decomposition tries to ensure
following things:
• While regaining the original relation, no information should be lost.
• If we perform join operation on the sub-divided relations, we must get the original
relation.
Example:
There is a relation called R(A, B, C)
Now we
decompose this
relation into two
sub relations R1
and R2
R1(A, B)
After performing
the Join operation
we get the same
original relation
Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join
operation on the sub-relations it doesn't result to the same relation which was
decomposed. After the join operation, we always found some extraneous
tuples. These extra tuples genrates difficulty for the user to identify the
original tuples.
Example:
We have a relation R(A, B, C)
Now , we decompose it into
sub-relations R1 and R2
Properties of Decomposition
• Lossless: All the decomposition that we perform in Database
management system should be lossless. All the information should not be
lost while performing the join on the sub-relation to get back the original
relation. It helps to remove the redundant data from the database.
• Dependency Preservation: Dependency Preservation is an important
technique in database management system. It ensures that the functional
dependencies between the entities is maintained while performing
decomposition. It helps to improve the database efficiency, maintain
consistency and integrity.
• Lack of Data Redundancy: Data Redundancy is generally termed as
duplicate data or repeated data. This property states that the
decomposition performed should not suffer redundant data. It will help us
to get rid of unwanted data and focus only on the useful data or
information.
Example of Normalization
Step 1: First Normal Form (1NF)
Step 2: Second Normal Form (2NF)
Step 3: Third Normal Form (3NF)
In 3NF, the table must:
• Be in 2NF.
• Remove transitive
dependencies, where
non-key attributes
depend on other non-
key attributes.