SJ - Dbms Unit-2 Material
SJ - Dbms Unit-2 Material
UNIT-2
SYLLABUS:
Relational database model - logical view of data-keys -Integrity rules - relational set operators -
data dictionary - Codd's rules. Entity relationship model - ER diagram
Table: STUDENT
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE
4 SURESH DELHI 18 IT
Table: BRANCH
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
ELECTRONICS AND
ECE
COMMUNICATION ENGINEERING
CV CIVIL ENGINEERING
Explanation: BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The relation
which is referencing another relation is called REFERENCING RELATION (STUDENT in
this case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).
Anomalies in the Relational Model
An anomaly is an irregularity or something which deviates from the expected or normal state.
When designing databases, we identify three types of anomalies: Insert, Update, and Delete.
Insertion Anomaly in Referencing Relation: We can’t insert a row in REFERENCING
RELATION if referencing attribute’s value is not present in the referenced attribute
value. e.g.; Insertion of a student with BRANCH_CODE ‘ME’ in STUDENT relation will
result in an error because ‘ME’ is not present in BRANCH_CODE of BRANCH.
Deletion/ Updation Anomaly in Referenced Relation:
Set operations form the foundation of SQL and enable us to combine, compare, and filter data
from multiple sources. These operations are indispensable for tasks ranging from data
integration and cleansing to advanced analytics and reporting.
Set operations in SQL are techniques for combining or comparing the results of two or more
SELECT statements. They act like mathematical set operations, letting us find the union,
intersection, or difference between the rows returned by our queries. This makes them
indispensable when analyzing data from multiple sources or perspectives.
Here's a quick overview of the core set operations:
UNION: Merges all unique rows from two or more SELECT statements, eliminating
duplicates.
UNION ALL: Merges all rows from two or more SELECT statements, keeping duplicates.
INTERSECT: Returns only the rows that appear in both SELECT statements.
EXCEPT: Returns rows from the first SELECT statement that don't appear in the second.
Set operators are specialized commands or symbols used to perform operations on the result
sets of multiple SELECT queries. They enable us to perform tasks like finding the union (all
rows), intersection (shared rows), and difference (unique rows) between different datasets.
Set operators are essential tools for data manipulation and analysis within SQL. They provide a
powerful means of querying and processing data from relational databases.
Set operators in SQL offer a practical implementation of these concepts within a database
environment. They enable us to perform set operations like union, intersection, and difference
directly on the result sets of SQL queries.
While relational algebra provides a formal foundation for database operations, set operators in
SQL offer a standardized, user-friendly interface for data manipulation tasks.
Understanding the relationship between set operators and relational algebra operations provides
insights into the theoretical underpinnings of SQL. It enables users to leverage these concepts
effectively in database querying and analysis tasks.
UNION
INTERSECT
EXCEPT (or MINUS in some dialects)
These operators mathematically correspond to the concepts of union, intersection, and set
difference.
The UNION operator
The UNION operator combines the results of two or more SELECT queries into a single result
set, removing duplicate rows by default.
For example, suppose we have two tables, employees and contractors, each with similar
columns such as contractors, department, and salary. For learning purposes, let’s consider these
two dummy tables:
John HR 55000
Employees
Contractors
We can combine the results from both tables using the following command:
John HR 55000
Notice that Carol, who appears in both tables, is only listed once in the result. If we wanted to
keep both instances of Carol (with her different salaries), we would use UNION ALL.
It's important to remember that the UNION operator does not remove NULL values. If a
column contains NULL values in one result set and non-NULL values in the corresponding
column of another result set, the NULL values will be retained in the final result set produced
by the UNION operator.
If we want to include NULL values in the result set and prevent their removal by the UNION
operator, we can use the UNION ALL operator instead. This operator combines the results of
multiple SELECT queries, including all rows from each result set, regardless of whether they
are duplicates or contain NULL values.
The INTERSECT operator returns only the rows that appear in both result sets. Think of it as
finding the people who belong to both groups.
Let’s use INTERSECT to query our tables above. For the sake of the example, let’s just query
the name and department columns:
name department
Carol Engineering
The INTERSECT operator handles NULL values based on standard comparison rules,
considering NULL values equal when comparing corresponding columns. It also results in an
empty set when dealing with empty result sets.
In other words, if a NULL value is present in one result set and the corresponding column in
the other result set contains a non-NULL value, the rows are not considered equal – they will
not be included in the intersection result.
Additionally, If one of the result sets provided to the INTERSECT operator is empty (i.e., it
contains no rows), the overall result of the INTERSECT operation will also be empty since
there are no common rows between an empty set and any other set.
The EXCEPT operator retrieves the rows present in the first result set but not in the second.
John HR 55000
The EXCEPT operator also follows standard comparison rules for handling NULL values. Its
behavior with empty result sets results in an empty set if the first result set is empty or includes
all rows from the first result set if the second result set is empty.
Set Operators: Performance and Optimization
The impact of set operators on query performance in SQL can vary depending on factors such
as the size of the datasets involved, the complexity of the queries, and the database
management system (DBMS) used.
Let's break down the key factors and strategies for optimization.
When working with large amounts of data, set operators can significantly impact query
performance because the size of the result sets that need to be combined, intersected, or
compared increases the processing time required to perform that operation.
Complex queries containing multiple subqueries, joins, or set operators may result in additional
processing overhead and impact query performance. Chained operations or nested set
operations could further exacerbate performance consequences.
Proper indexing of the columns involved in set operations can significantly improve query
performance. Indexes help the database engine quickly locate and retrieve the relevant rows,
reducing the need for full-table scans and improving query execution times.
To enhance the performance of queries involving set operators, database administrators, and
developers can utilize optimization techniques like query rewriting, query plan analysis, and
database schema optimization. Techniques like query caching and materialized views can also
be used to precompute and store the results of complex queries, reducing the computational
overhead of set operations.
The performance of set operations may vary depending on the underlying database engine and
its optimization capabilities. Different DBMSs may employ different optimization strategies
and algorithms for processing set operations, leading to variations in performance.
The availability of hardware resources such as CPU, memory, and disk I/O also influences the
performance of queries involving set operators. Adequate hardware resources can help mitigate
performance bottlenecks and ensure efficient query execution.
E.F Codd was a Computer Scientist who invented the Relational model for Database
management. Based on relational model, the Relational database was created. Codd proposed
13 rules popularly known as Codd's 12 rules to test DBMS's concept against his relational
model. Codd's rule actualy define what quality a DBMS requires in order to become a
Relational Database Management System(RDBMS). Till now, there is hardly any commercial
product that follows all the 13 Codd's rules. Even Oracle follows only eight and half(8.5) out
of 13. The Codd's 12 rules are as follows.
Rule zero
This rule states that for a system to qualify as an RDBMS, it must be able to manage database
entirely through the relational capabilities.
Each unique piece of data(atomic value) should be accesible by : Table Name + Primary
Key(Row) + Attribute(column).
Null has several meanings, it can mean missing data, not applicable or no value. It should be
handled consistently. Also, Primary key must not be null, ever. Expression on NULL must give
null.
Database dictionary(catalog) is the structure description of the complete Database and it must
be stored online. The Catalog must be governed by same rules as rest of the database. The same
query language should be used on catalog as used to query database.
One well structured language must be there to provide all manners of access to the data stored
in the database. Example: SQL, etc. If the database allows access to the data without the use of
this language, then that is a violation.
All the view that are theoretically updatable should be updatable by the system as well.
There must be Insert, Delete, Update operations at each level of relations. Set operation like
Union, Intersection and minus should also be supported.
Rule 8: Physical Data Independence
The physical storage of data should not matter to the system. If say, some file supporting table
is renamed or moved from one disk to another, it should not effect the application.
If there is change in the logical structure(table structures) of the database the user view of data
should not change. Say, if a table is split into two tables, a new view should give result as the
join of the two tables. This rule is most difficult to satisfy.
The database should be able to enforce its own integrity rather than using other programs. Key
and Check constraints, trigger etc, should be stored in Data Dictionary. This also
make RDBMS independent of front-end.
A database should work properly regardless of its distribution across a network. Even if a
database is geographically distributed, with data stored in pieces, the end user should get an
impression that it is stored at the same place. This lays the foundation of distributed database.
If low level access is allowed to a system it should not be able to subvert or bypass integrity
rules to change the data. This can be achieved by some sort of looking or encryption.
Introduction of ER Model
The Entity-Relationship Model (ER Model) is a conceptual model for designing a databases.
This model represents the logical structure of a database, including entities, their attributes
and relationships between them.
Entity: An objects that is stored as data such as Student, Course or Company.
Attribute: Properties that describes an entity such as StudentID, CourseName,
or EmployeeEmail.
Relationship: A connection between entities such as "a Student enrolls in a Course".
The graphical representation of this model is called an Entity-Relation Diagram (ERD).
ER Model in Database Design Process
We typically follow the below steps for designing a database for an application.
Gather the requirements (functional and data) by asking questions to the database users.
Create a logical or conceptual design of the database. This is where ER model plays a
role. It is the most used graphical representation of the conceptual design of a database.
After this, focus on Physical Database Design (like indexing) and external design (like
views)
Why Use ER Diagrams In DBMS?
ER diagrams represent the E-R model in a database, making them easy to convert into
relations (tables).
These diagrams serve the purpose of real-world modeling of objects which makes them
intently useful.
Unlike technical schemas, ER diagrams require no technical knowledge of the underlying
DBMS used.
They visually model data and its relationships, making complex systems easier to
understand.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data perspective which
consists of these symbols:
Rectangles: Rectangles represent entities in the ER Model.
Ellipses: Ellipses represent attributes in the ER Model.
Diamond: Diamonds represent relationships among Entities.
Lines: Lines represent attributes to entities and entity sets with other relationship types.
Double Ellipse: Double ellipses represent multi-valued Attributes, such as a student's
multiple phone numbers
Double Rectangle: Represents weak entities, which depend on other entities for
identification.
What is an Entity?
An Entity represents a real-world object, concept or thing about which data is stored in a
database. It act as a building block of a database. Tables in relational database represent these
entities.
Example of entities:
Real-World Objects: Person, Car, Employee etc.
Concepts: Course, Event, Reservation etc.
Things: Product, Document, Device etc.
The entity type defines the structure of an entity, while individual instances of that type
represent specific entities.
What is an Entity Set?
An entity refers to an individual object of an entity type, and the collection of all entities of a
particular type is called an entity set. For example, E1 is an entity that belongs to the entity
type "Student," and the group of all students forms the entity set.
In the ER diagram below, the entity type is represented as:
Entity Set
We can represent the entity sets in an ER Diagram but we can't represent individual entities
because an entity is like a row in a table, and an ER diagram shows the structure and
relationships of data, not specific data entries (like rows and columns). An ER diagram is a
visual representation of the data model, not the actual data itself.
Types of Entity
There are two main types of entities:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute that can uniquely identify each
instance of the entity. A Strong Entity does not depend on any other Entity in the Schema for
its identification. It has a primary key that ensures its uniqueness and is represented by a
rectangle in an ER diagram.
2. Weak Entity
A Weak Entity cannot be uniquely identified by its own attributes alone. It depends on a
strong entity to be identified. A weak entity is associated with an identifying entity (strong
entity), which helps in its identification. A weak entity are represented by a double rectangle.
The participation of weak entity types is always total. The relationship between the weak
entity type and its identifying strong entity type is called identifying relationship and it is
represented by a double diamond.
Example:
A company may store the information of dependents (Parents, Children, Spouse) of an
Employee. But the dependents can't exist without the employee. So dependent will be a Weak
Entity Type and Employee will be identifying entity type for dependent, which means it is
Strong Entity Type.
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute.
For example, Roll_No will be unique for each student. In ER diagram, the key attribute is
represented by an oval with an underline.
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example,
the Address attribute of the student Entity type consists of Street, City, State, and Country. In
ER diagram, the composite attribute is represented by an oval comprising of ovals.
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.
The Complete Entity Type Student with its Attributes can be represented as:
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
Relationship Set
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. Ternary Relationship: When there are three entity sets participating in a relationship, the
relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a relationship, the
relationship is called an n-ary relationship.
Cardinality in ER Model
The maximum number of times an entity of an entity set participates in a relationship set is
known as cardinality.
2. One-to-Many
In one-to-many mapping as well where each entity can be related to more than one entity. Let
us assume that one surgeon department can accommodate many doctors. So the Cardinality
will be 1 to M. It means one department has many Doctors.
In this case, each student is taking only 1 course but 1 course has been taken by many
students.
4. Many-to-Many
When entities in all entity sets can take part more than once in the relationship cardinality is
many to many. Let us assume that a student can take more than one course and one course
can be taken by many students. So the relationship will be many to many.
many to many cardinality
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3, and
S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation: Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of students will be total. Total
participation is shown by a double line in the ER diagram.
2. Partial Participation: The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Every student in the Student Entity set participates in a relationship but there exists a course
C4 that is not taking part in the relationship.
How to Draw an ER Diagram
1. Identify Entities: The very first step is to identify all the Entities. Represent these entities
in a Rectangle and label them accordingly.
2. Identify Relationships: The next step is to identify the relationship between them and
represent them accordingly using the Diamond shape. Ensure that relationships are not
directly connected to each other.
3. Add Attributes: Attach attributes to the entities by using ovals. Each entity can have
multiple attributes (such as name, age, etc.), which are connected to the respective entity.
4. Define Primary Keys: Assign primary keys to each entity. These are unique identifiers
that help distinguish each instance of the entity. Represent them with underlined attributes.
5. Remove Redundancies: Review the diagram and eliminate unnecessary or repetitive
entities and relationships.
6. Review for Clarity: Review the diagram make sure it is clear and effectively conveys the
relationships between the entities.