Unit 2 Data Modeling
Data Modeling
• Entity Relationship (ER) Model
• Keys
• Extended ER Model
• Relational Model
• Codd's Rules
Data Modelling
• Data modeling in a Database Management System (DBMS) refers to
the process of creating a conceptual representation of data and its
relationships within a database.
• It serves as a blueprint for how data will be structured, stored, and
accessed.
Data Modeling
Data Modeling
Data Modeling
• Entity Relationship (ER) Model
Data modeling
• A data model is a collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints.
• In simple terms, Data models define how the logical structure of a database
is modeled that is Data models define how data is connected to each other
and how they are processed and stored inside the system.
• Data Models are fundamental entities to introduce abstraction in a DBMS.
• Categories of Data Models
• High-level or conceptual data models provide concepts that are close to the
way many users perceive data.
• whereas low-level or physical data models provide concepts that describe
the details of how data is stored in the computer.
Entity Relationship (ER) Model
• ER model stands for an Entity-Relationship model.
• An Entity–relationship model (ER model) describes the structure of a
database with the help of a diagram, which is known as Entity
Relationship Diagram (ER Diagram).
• An ER model is a design or blueprint of a database that can later be
implemented as a database.
• It is a high-level data model. This model is used to define the data
elements and relationship for a specified system.
• It helps developers to design the conceptual design or you
can say the logical design of the system from a data
perspective.
• In ER modeling, the database structure is portrayed as a diagram called an
entity-relationship diagram.
Why use ER Diagrams?
• Helps you to define terms related to entity relationship modeling
• Provide a preview of how all your tables should connect, what fields are
going to be on each table
• Helps to describe entities, attributes, relationships
• ER diagrams are translatable into relational tables which allows you to
build databases quickly
• ER diagrams can be used by database designers as a blueprint for
implementing data in specific software applications
• The database designer gains a better understanding of the information to
be contained in the database with the help of ERP diagram
• ERD Diagram allows you to communicate with the logical structure of the
database to users
Entity Relationship (ER) Model
• For example, Suppose we design a school database. In this
database, the student will be an entity with attributes like
address, name, id, age, etc.
• The address can be another entity with attributes like city,
street name, pin code, etc and there will be a relationship
between them.
Entity Relationship (ER) Model
• ER diagrams are created
based on three basic
concepts: entities, attributes,
and relationships.
ER Diagram Symbols
• Rectangles: This Entity Relationship Diagram symbol represents entity types
• Double Rectangles: Weak Entity Sets
• Ellipses : Symbol represent attributes
• Double Ellipses: Represent multi-valued attributes
• Dashed Ellipses: Derived Attributes
• Diamonds: This symbol represents relationship types
• Lines: It links attributes to entity types and entity types with other relationship
types
• Primary key: attributes are underlined
Component of ER Diagram
• Entity:
• An entity may be any object, class,
person or place. In the ER diagram, an
entity can be represented as rectangles.
• For example: In the following ER
diagram we have two entities Student
and College and these two entities have
many to one relationship as many
students study in a single college.
• Strong Entity :
Component of ER Diagram
• Strong entities are those entity
• Weak Entity: types that have a key attribute.
• An entity that depends on another • The primary key helps in
entity called a weak entity.
identifying each entity uniquely.
• The weak entity doesn't contain This can not accept null values.
any key attribute of its own.
• It is represented by a rectangle.
• The weak entity is represented by
a double rectangle.
• For example – a bank account
cannot be uniquely identified
without knowing the bank to which
the account belongs, so bank
account is a weak entity.
Component of ER Diagram
• Attribute
• An attribute describes the property of an entity. An attribute is
represented as Ellipses in an ER diagram.
• For example, id, age, contact number, name, etc. can be attributes of
a student.
• There are four types of attributes:
[Link] attribute
[Link] attribute
[Link] attribute
[Link] attribute
Component of ER Diagram
• Attribute
• 1. Key attribute:
• A key attribute can uniquely identify an
entity from an entity set.
• It represents a primary key.
• For example, student roll number can
uniquely identify a student from a set of
students. Key attribute is represented by
oval same as other attributes however
the text of key attribute is underlined.
Component of ER Diagram
• Attribute
• 2. Composite attribute:
• An attribute that is a combination of
other attributes is known as
composite attribute.
• An attribute that composed of many
other attributes is known as a
composite attribute.
• For example, In student entity, the
student address is a composite
attribute as an address is composed
of other attributes such as pin code,
state, country.
Component of ER Diagram
• Attribute
• 3. Multivalued attribute:
• An attribute that can hold multiple
values is known as multivalued
attribute/An attribute can have more
than one value.
• It is represented with double ovals in
an ER Diagram.
• For example – A student can have
more than one phone number so the
phone number attribute is
multivalued.
Component of ER Diagram
• Attribute
• 4. Derived attribute:
• A derived attribute is one whose value
is dynamic and derived from another
attribute.
• It is represented by dashed oval in an
ER Diagram.
• For example – Person age is a derived
attribute as it changes over time and
can be derived from another attribute
(Date of birth).
Component of ER Diagram
• Relationship
• A relationship is used to describe the
relation between entities. • 1. One to One Relationship
• A relationship is represented by • When a single instance of an
diamond shape in ER diagram. entity is associated with a single
instance of another entity then
• There are four types of it is called one to one
relationships/Cardinality relationship.
1. One to One • For example, a person has only
one passport and a passport is
2. One to Many given to one person.
3. Many to One
4. Many to Many
Component of ER Diagram
• 2. One to Many Relationship
• When only one instance of the entity • 3. Many-to-one relationship
on the left, and more than one • When more than one instance
instance of an entity on the right of the entity on the left, and
associates with the relationship then only one instance of an entity
this is known as a one-to-many on the right associates with the
relationship. relationship then it is known as a
• For example, Scientist can invent many-to-one relationship.
many inventions, but the invention is • For example – many students
done by the only specific scientist. can study in a single college but
• a student cannot study in many
colleges at the same time.
Component of ER Diagram
• 4. Many to Many Relationship
• When more than one instances of an entity is associated with more
than one instances of another entity then it is called many to many
relationship.
• For example, Employee can assign by many projects and project can
have many employees.
How to Create an Entity Relationship Diagram (ERD)
• In a university, a Student enrolls
in Courses. A student must be
assigned to at least one or more
Courses. Each course is taught by
a single Professor. To maintain
instruction quality, a Professor
can deliver only one course.
How to Create an Entity Relationship Diagram (ERD)
• Step 1) Entity Identification
• In a university, a Student
enrolls in Courses. A student • We have three entities
must be assigned to at least • Student
one or more Courses. Each
• Course
course is taught by a single
Professor. To maintain • Professor
instruction quality, a
Professor can deliver only
one course.
How to Create an Entity Relationship Diagram (ERD)
• Step 2) Relationship Identification
• In a university, a Student
enrolls in Courses. A student • We have the following two relationships
must be assigned to at least • The student is assigned a course
one or more Courses. Each • Professor delivers a course
course is taught by a single
Professor. To maintain
instruction quality, a
Professor can deliver only
one course.
How to Create an Entity Relationship Diagram (ERD)
• Step 3) Cardinality Identification
• In a university, a Student
enrolls in Courses. A student • For them problem statement we know that,
must be assigned to at least • A student can be assigned multiple courses
one or more Courses. Each • A Professor can deliver only one course
course is taught by a single
Professor. To maintain
instruction quality, a
Professor can deliver only
one course.
How to Create an Entity Relationship Diagram (ERD)
Entity Primary Key Attribute
Student Student_ID StudentName
ProfessorNam
Professor Employee_ID
e
• Step 4) Identify Attributes Course Course_ID CourseName
• You need to study the files, forms, reports, data currently maintained by the organization to
identify attributes. You can also conduct interviews with various stakeholders to identify entities.
Initially, it’s important to identify the attributes without mapping them to a particular entity.
• Once, you have a list of Attributes, you need to map them to the identified entities. Once the
mapping is done, identify the primary Keys. If a unique key is not readily available, create one.
How to Create an Entity Relationship Diagram (ERD)
Entity Primary Key Attribute
Student Student_ID StudentName
ProfessorNam
Professor Employee_ID
e
• Step 5) Create the ERD Diagram Course Course_ID CourseName
Sample ER Diagram for Student Database
ER to Table
Conversion
Extended ER Model
• EER is a high-level data model that incorporates the
extensions to the original ER model. Enhanced ERD are high
level models that represent the requirements and complexities
of complex database.
• In addition to ER model concepts EE-R includes −
• Subclasses and Super classes.
• Specialization and Generalization.
• Category or union type.
• Aggregation.
Extended ER Model
• Subclasses and Super class
• Super class is an entity that can be divided into further subtype.
• For example − consider Shape super class.
• Super class shape has sub groups: Triangle, Square and Circle.
• Sub classes are the group of entities with some unique attributes.
• Sub class inherits the properties and attributes from super class.
• For example, a "Car" (subclass) inherits attributes from "Vehicle"
(superclass).
Extended ER Model
• Generalization
• Generalization is like a bottom-up approach in which two
or more entities of lower level combine to form a higher
level entity if they have some attributes in common.
• Generalization is more like subclass and superclass
system, but the only difference is the approach.
Generalization uses the bottom-up approach.
• For example, Faculty and Student entities can be
generalized and
• create a higher level entity Person.
Extended ER Model
• Specialization
• Specialization is a top-down approach, and it is opposite to
Generalization.
• In specialization, one higher level entity can be broken down
into two lower level entities.
• Specialization is used to identify the subset of an entity set
that shares some distinguishing characteristics.
• Normally, the superclass is defined first, the subclass and its
related attributes are defined next, and relationship set are
then added.
• For example: In an Employee management system,
EMPLOYEE entity can be specialized as TESTER or DEVELOPER
based on what role they play in the company.
Extended ER Model
• Aggregation
• Aggregation is a process in which a single entity alone is not
able to make sense in a relationship so the relationship of
two entities acts as one entity.
• Aggregation can be defined as a procedure for combining
multiple entities into a single one. In database management,
it is a design system performed to model relationships
between a group of entities and another relationship. Its main
motive is treating these relationships as a single one.
• In real world, we know that a manager not only manages the employee working
under them but he has to manage the project as well. In such scenario if entity
“Manager” makes a “manages” relationship with either “Employee” or “Project”
entity alone then it will not make any sense because he has to manage both. In
these cases the relationship of two entities acts as one entity.
• In our example, the relationship “Works-On” between “Employee” & “Project”
acts as one entity that has a relationship “Manages” with the entity “Manager”.
Extended ER Model
• Category or Union
• Relationship of one super or sub class with more than
one super class.
• Owner is the subset of two super class: Vehicle and House.
ER Model: Keys
• Keys are very important part of Relational database model.
• They are used to establish and identify relationships between tables and
also to uniquely identify any record or row of data inside a table.
• A Key can be a single attribute or a group of attributes, where the
combination may act as a key.
ER Model: Keys
• Why are the Keys Required?
• It is used to uniquely identify any record or row of data from the table.
• A table in a database represents a collection of records. Tables generally extends
to thousands of records stored in them, unsorted and unorganized.
• Now to fetch any particular record from such dataset, you will have to apply some
conditions, but what if there is duplicate data present and every time you try to
fetch some data by applying certain condition, you get the wrong data. How
many trials before you get the right data?
• To avoid all this, Keys are defined to easily identify any row of data in a table.
ER Model : Types of Keys
ER Model : Types of Keys
• Primary Key
• It is the first key used to identify one and only one
instance of an entity uniquely. An entity can contain
multiple keys.
• In the EMPLOYEE table, ID can be the primary key
since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and
Passport_Number as primary keys since they are also
unique.
• For each entity, the primary key selection is based on
requirements and developers.
Primary Key:
ER Model : Types of Keys
• 2. Candidate key
• A candidate key is an attribute or set of attributes that
can uniquely identify a tuple.
• Except for the primary key, the remaining attributes are
considered a candidate key. The candidate keys are as
strong as the primary key.
• For example: In the EMPLOYEE table, id is best suited for
the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a
candidate key.
• Primary key column value can not be null, Candidate key
column can have null value.
: Candidate key
ER Model : Types of Keys
• 3. Super Key
• Super key is an attribute set that can uniquely
identify a tuple. A super key is a superset of a
candidate key.
• The super key would be EMPLOYEE-ID (EMPLOYEE_ID,
EMPLOYEE-NAME), etc.
• It is the superset of all such attributes that can
uniquely identify the table. It is the subset or the part
of the Super key.
Super Key:
ER Model : Types of Keys
• 4. Foreign key
• Foreign keys are the column of the table used
to point to the primary key of another table.
• Every employee works in a specific
department in a company, and employee and
department are two different entities. So we
can't store the department's information in
the employee table. That's why we link these
two tables through the primary key of one
table.
• We add the primary key of the DEPARTMENT
table, Department_Id, as a new attribute in the
EMPLOYEE table.
• In the EMPLOYEE table, Department_Id is the
foreign key, and both the tables are related.
Foreign key
• 4. Foreign key
•.
ER Model : Types of Keys
• 5. Alternate key
• The candidate key which are not selected as primary
key are known as secondary keys or alternative keys.
• One key is chosen as the primary key from candidate
keys, and the remaining candidate key, if it exists, is
termed the alternate key.
In other words, the total number of the
alternate keys is the total number of candidate
keys minus the primary key.
• For example, employee relation has two attributes,
Employee_Id and PAN_No, that act as candidate keys.
In this relation, Employee_Id is chosen as the primary
key, so the other candidate key, PAN_No, acts as the
Alternate key.
ER Model : Types of Keys
• 5. Alternate key
•.
ER Model : Types of Keys
• 6. Composite key
• Whenever a primary key consists of more
than one attribute, it is known as a
composite key. This key is also known as
Concatenated Key.
• But the attributes which together form
the Composite key are not a key
independently or individually.
• For example, Score table which stores the
marks scored by a student in a particular
subject.
• In this table student_id and subject_id
together will form the primary key, hence it is
a composite key.
ER Model : Types of Keys
• 7. Artificial key
• The key created using arbitrarily assigned
data are known as artificial keys.
• These keys are created when a primary key is
large and complex and has no relationship
with many other relations.
• The data values of the artificial keys are
usually numbered in a serial order.
• For example, the primary key, which is
composed of Emp_ID, Emp_role, and Proj_ID,
is large in employee relations. So it would be
better to add a new virtual attribute to
identify each tuple in the relation uniquely.
Relational Model
Relational Model
• Relational Model was proposed by E.F. Codd
to model data in the form of relations or
tables.
• After designing the conceptual model of
Database using ER diagram, we need to
convert the conceptual model in the
relational model which can be implemented
using any RDBMS languages like Oracle SQL,
MySQL etc.
• Relational Model represents how data is
stored in Relational Databases. A relational
database stores data in the form of relations
(tables).
Relational Model
• Relational Model was proposed by E.F. Codd to model data in the
form of relations or tables.
• After designing the conceptual model of Database using ER diagram,
we need to convert the conceptual model in the relational model
which can be implemented using any RDBMS languages like Oracle
SQL, MySQL etc.
Conceptual
Relational model
model
ROL
ADDRES AG
L_N NAME PHONE
S E
O
Relational Model 1 RAM DELHI
945512345
1
18
• Consider a relation STUDENT with attributes 2 RAMESH
GURGA
ON
965243154
3
18
• ROLL_NO, NAME, ADDRESS, PHONE and AGE shown in3 Table. SUJIT ROHTAK
915625313
20
1
• IMPORTANT TERMINOLOGIES
4 SURESH DELHI 18
• Attribute: Attributes are the properties that define a relation.
e.g.; ROLL_NO, NAME
• Relation Schema: A relation schema represents name of the relation with its
attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is
relation schema for STUDENT. If a schema has more than 1 relation, it is called
Relational Schema.
• Tuple: Each row in the relation is known as tuple. The above relation contains
4 tuples, one of which is shown as:
945512345
1 RAM DELHI 18
1
ROLL ADDRES AG
NAME PHONE
Relational Model
_NO S E
1 RAM DELHI 9455123451 18
• Relation Instance: The set of tuples of a relation at a
particular instance of time is called as relation instance. 2 RAMESH
GURGAO
9652431543 18
Table shows the relation instance of STUDENT at a N
particular time. It can change whenever there is 3 SUJIT ROHTAK 9156253131 20
insertion, deletion or updation in the database.
• Degree: The number of attributes in the relation is 4 SURESH DELHI 18
known as degree of the relation. The STUDENT relation
defined above has degree 5.
ROLL_N
• Cardinality: The number of tuples in a relation is known O
as cardinality. The STUDENT relation defined above has
cardinality 4. 1
• Column: Column represents the set of values for a 2
particular attribute. The column ROLL_NO is extracted
from relation STUDENT. 3
• NULL Values: The value which is not known or 4
unavailable is called NULL value. It is represented by
blank space. e.g.; PHONE of STUDENT having ROLL_NO 4
is NULL.
Relational Model
ROLL ADDRES AG
NAME PHONE
_NO S E
• Constraints in Relational Model 1 RAM DELHI 9455123451 18
• While designing Relational Model, we define some conditions
which must hold for data present in database are called 2 RAMESH
GURGAO
N
9652431543 18
Constraints.
• These constraints are checked before performing any 3 SUJIT ROHTAK 9156253131 20
operation (insertion, deletion and updation) in database. If
there is a violation in any of constrains, operation will fail. 4 SURESH DELHI 18
• Domain Constraints: These are attribute level constraints. An attribute can only
take values which lie inside the domain range. e.g,; If a constrains AGE>0 is applied
on STUDENT relation, inserting negative value of AGE will result in failure.
• Key Integrity: Every relation in the database should have at least one set of
attributes which defines a tuple uniquely. Those set of attributes is called key. e.g.;
ROLL_NO in STUDENT is a key. No two students can have same roll number. So a
key has two properties:
• It should be unique for all tuples.
• It can’t have NULL values.
Relational • Constraints in Relational Model
• Referential Integrity: When one attribute of a relation can only take
Model values from other attribute of same relation or any other relation, it
is called referential integrity. Let us suppose we have 2 relations
STUDENT BRANCH
BRANCH_CO
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CO
DE BRANCH_NAME
DE
1 RAM DELHI 9455123451 18 CS
CS COMPUTER SCIENCE
RAMES
2 GURGAON 9652431543 18 CS IT INFORMATION TECHNOLOGY
H
3 SUJIT ROHTAK 9156253131 20 ECE ELECTRONICS AND COMMUNICATION
ECE
ENGINEERING
SURES
4 DELHI 18 IT CV CIVIL ENGINEERING
H
• BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint.
• The relation which is referencing to other relation is called REFERENCING
RELATION (STUDENT in this case) and the relation to which other relations refer is
called REFERENCED RELATION (BRANCH in this case).
• ANOMALIES
Relational • An anomaly is an irregularity, or something which deviates from the expected or normal
Model state. When designing databases,
anomalies: Insert, Update and Delete.
we identify three types of
• Insertion Anomaly in Referencing Relation:
• We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not
REFERENCING present in referenced attribute value. e.g.; Insertion of a student with BRANCH_CODE
RELATION ‘ME’ in STUDENT relation will result in error because ‘ME’ is not present in
BRANCH_CODE of BRANCH.
(STUDENT in
this case) • Deletion/ Updation Anomaly in Referenced Relation:
• We can’t delete or update a row from REFERENCED RELATION if value of REFERENCED
ATTRIBUTE is used in value of REFERENCING ATTRIBUTE. e.g; if we try to delete tuple
from BRANCH having BRANCH_CODE ‘CS’, it will result in error because ‘CS’ is referenced
REFERENCED by BRANCH_CODE of STUDENT, but if we try to delete the row from BRANCH with
BRANCH_CODE CV, it will be deleted as the value is not been used by referencing
RELATION relation. It can be handled by following method:
(BRANCH in • ON DELETE CASCADE: It will delete the tuples from REFERENCING RELATION if value used
by REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION. e.g;, if we delete a
this case) row from BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT relation with
BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be deleted.
• ON UPDATE CASCADE: It will update the REFERENCING ATTRIBUTE in REFERENCING
RELATION if attribute value used by REFERENCING ATTRIBUTE is updated in REFERENCED
RELATION. e.g;, if we update a row from BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’, the
rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be
updated with BRANCH_CODE ‘CSE’.
Codd's Rules
Codd's Rules
• E.F. Codd was a Computer Scientist who
invented the Relational model for Database
management. Based on relational model,
the Relational database was created.
• Dr Edgar F. Codd, after his extensive
research on the Relational Model of
database systems, came up with twelve
rules of his own, which according to him, a
database must obey in order to be
regarded as a true relational database.
• These rules can be applied on any database
system that manages stored data using
only its relational capabilities.
• This is a foundation rule, which acts as a
base for all the other rules.
Codd's Rules
• Rule 0: The Foundation Rule
• The database must be in relational form. So that the system can handle the
database through its relational capabilities.
• Rule 1: Information Rule
• A database contains various information, and this information must be
stored in each cell of a table in the form of rows and columns.
• Rule 2: Guaranteed Access Rule
• Every single or precise data (atomic value) may be accessed logically from a
relational database using the combination of primary key value, table
name, and column name.
Codd's Rules
• Rule 3: Systematic Treatment of Null Values
• This rule defines the systematic treatment of Null values in database records.
The null value has various meanings in the database, like missing the data, no
value in a cell, inappropriate information, unknown data and the primary key
should not be null.
Codd's Rules
• Rule 4: Active/Dynamic Online Catalog based on the relational model
• It represents the entire logical structure of the descriptive database that
must be stored online and is known as a database dictionary. It authorizes
users to access the database and implement a similar query language to
access the database.
• Rule 5: Comprehensive Data Sub Language Rule
• The relational database supports various languages, and if we want to access
the database, the language must be the explicit, linear or well-defined syntax,
character strings and supports the comprehensive: data definition, view
definition, data manipulation, integrity constraints, and limit transaction
management operations.
• If the database allows access to the data without any language, it is
considered a violation of the database.
Codd's Rules
• Rule 6: View Updating Rule
• All views table can be theoretically updated and must be practically
updated by the database systems.
• Rule 7: Relational Level Operation (High-Level Insert, Update and delete)
Rule
• A database system should follow high-level relational operations such as
insert, update, and delete in each level or a single row.
• It also supports union, intersection and minus operation in the database
system.
Codd's Rules
• Rule 8: Physical Data Independence Rule
• All stored data in a database or an application must be physically independent to
access the database.
• Each data should not depend on other data or an application. If data is updated
or the physical structure of the database is changed, it will not show any effect on
external applications that are accessing the data from the database.
• Rule 9: Logical Data Independence Rule
• It is similar to physical data independence. It means, if any changes occurred to the
logical level (table structures), it should not affect the user's view (application).
• For example, suppose a table either split into two tables, or two table joins to
create a single table, these changes should not be impacted on the user view
application.
Codd's Rules
• Rule 10: Integrity Independence Rule
• The database should be able to enforce its own integrity rather than using
other programs.
• Key and Check constraints, trigger etc, should be stored in Data Dictionary.
• All entered values should not be changed or rely on any external factor or
application to maintain integrity. This also make RDBMS independent of
front-end.
Codd's Rules
• Rule 11: Distribution Independence Rule
• The distribution independence rule represents a database that must work
properly, even if it is stored in different locations and used by different end-
users.
• Suppose a user accesses the database through an application; in that case,
they should not be aware that another user uses particular data, and the data
they always get is only located on one site. This lays the foundation
of distributed database.
Codd's Rules
• Rule 12: Non Subversion Rule
• The non-subversion rule defines RDBMS as a SQL language to store and
manipulate the data in the database.
• If a system has a low-level or separate language other than SQL to access the
database system, it should not subvert or bypass integrity to transform data.
Thank You