0% found this document useful (0 votes)
18 views142 pages

1 DBMS

The document provides an overview of database systems and database management systems (DBMS), detailing their types (structured, unstructured, semi-structured) and advantages over traditional file systems, such as reduced redundancy and improved data integrity. It discusses the architecture of DBMS, including components like storage management and transaction management, and describes various data models, including relational, hierarchical, and object-oriented models. The document emphasizes the importance of data abstraction, schemas, and the role of data models in organizing and defining data relationships.

Uploaded by

sandip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views142 pages

1 DBMS

The document provides an overview of database systems and database management systems (DBMS), detailing their types (structured, unstructured, semi-structured) and advantages over traditional file systems, such as reduced redundancy and improved data integrity. It discusses the architecture of DBMS, including components like storage management and transaction management, and describes various data models, including relational, hierarchical, and object-oriented models. The document emphasizes the importance of data abstraction, schemas, and the role of data models in organizing and defining data relationships.

Uploaded by

sandip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 142

Database

System

Database DBMS

• Structured • SQL Server


• Unstructured • Oracle
• Semi Structured • MySql
• Db2
Introduction
• Database: Systematic Collection of interrelated data is database
• Structured – RDBMS – Pre Defined Structured : E.g. Bank, Business, University
,E-Commerce, Airline Data
• Unstructured – no Pre defined structured , Multimedia content: Digital
photos, audio, and video files are all unstructured.
• Semi Structured - emails, XML
• DBMS:
• Collection of interrelated data and set of programs to access the data is called
DBMS,
• is a collection of programs designed to define, manipulate, retrieve and
manage data in a database.
• DBMS contains information about a particular enterprise

• DBMS Goal: DBMS provides an environment that is both convenient and efficient
to use.
Drawbacks of using file systems to store data:
o Data redundancy and inconsistency : Multiple file o Concurrent access by multiple users:
formats, duplication of information in different files o Concurrent accessed needed for
o Difficulty in accessing data: Need to write a new program performance.
to carry out each new task o Uncontrolled concurrent accesses
can lead to inconsistencies.
o Data isolation — multiple files and formats
o E.g. two people reading a balance
o Integrity problems: and updating it at the same time
o Integrity constraints (e.g. account balance > 0) become o Security problems
part of program code .
o Hard to provide user access to some,
o Hard to add new constraints or change existing ones but not all, data
o Atomicity of updates:
o Failures may leave database in an inconsistent state with Database systems offer
partial updates carried out.
o E.g. transfer of funds from one account to another solutions to all the above
should either complete or not happen at all
problems
Advantages of DBMS over file system
• There are several advantages of Database management system over file system. Few of
them are as follows:
• No redundant data –by data normalization
• Data Consistency - data normalization takes care of it too , A DBMS schedules
concurrent access to the data in such a manner that only one user can access the
same data at a time which helps in maintaining consistency as well
• Integrity - The DBMS implies integrity constraints to get a high level of protection
against prohibited access to data.
• Secure – Each user has a different set of access , E.g. application programmers never
exposed to details of data representation and storage .
• Privacy – Limited access.
• Easy access to data : uses various powerful functions to store and retrieve data
efficiently, Uniform administration procedures for data
• Easy recovery
• Flexible - DBMS offers a variety of techniques to store & retrieve data, DBMS serves
as an efficient handler to balance the needs of multiple applications using the same
data
DBMS vs. Flat File
DBMS Flat File Management System
• Multi-user access • It does not support multi-user access
• Design to fulfill the need for small and • It is only limited to smaller system.
large businesses
• Remove redundancy and Integrity • Redundancy and Integrity issues
• Expensive. But in the long term Total Cost • It's cheaper
of Ownership is cheap
• Easy to implement complicated • No support for complicated transactions
transactions
• Data Isolation Problem Not exist • Data isolation problems

• Concurrent access by multiple users • Maintaining Concurrent access by


multiple users is very hard
Characteristics of DBMS
• Provides security and removes redundancy
• Self-describing nature of a database system
• Insulation between programs and data abstraction
• Support of multiple views of the data
• Sharing of data and multiuser transaction processing
• DBMS allows entities and relations among them to form tables.
• It follows the ACID concept ( Atomicity, Consistency, Isolation, and
Durability).
• DBMS supports multi-user environment that allows users to access
and manipulate data in parallel.
Database Engine/ Major Components of DBMS
• Storage manager
• Query processing and optimization Manager
• Transaction and Concurrency Manager
• Crash Recovery Manager
Storage Management
• Storage manager is a program module that provides the interface
between the low-level data stored in the database and the
application programs and queries submitted to the system.
• The storage manager is responsible to the following tasks:
• Interaction with the OS file manager
• Efficient storing, retrieving and updating of data
• Issues:
• Storage access
• File organization
• Indexing and hashing
Storage Management
o interface between the low-level data stored in the database and the application programs and queries
submitted to the system. The storage manager is responsible for the interaction with the file manager

o The storage manager components include

 Authorization and integrity manager,

 Transaction manager

 File manager

 Buffer manager
o The storage manager implements several data structures as part of the physical system implementation:

 Data files, which store the database itself.

 Data dictionary, which stores metadata about the structure of the database, in particular the schema
of the database

 Indices, which can provide fast access to data items


Transaction Management
• What if the system fails?
• What if more than one user is concurrently updating the same
data?
• A transaction is a collection of operations that performs a single
logical function in a database application
• Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g.,
power failures and operating system crashes) and transaction
failures.
• Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the database.
Database System Internals
View of Data and Data Abstraction
• A database system is a collection of interrelated data and a set of programs that
allow users to access and modify these data.
• A major purpose of a database system is to provide users with an abstract view
of the data.
• That is, the system hides certain details of how the data are stored and
maintained (Data abstraction)

• For the system to be usable, it must retrieve data efficiently. The need for
efficiency has led designers to use complex data structures to represent data in
the database.
• Since many database-system users are not computer trained, developers hide
the complexity from users through several levels of abstraction, to simplify
users’ interactions with the system:
Levels of Abstraction
An architecture for a database system

computer users see a set of


application programs
that hide details of the data types.

Programmers using a programming language work at this


level of abstraction

the database system hides


Database administrators,may be aware of certain details of
many of the lowest-level
the physical organization of the data
storage details from
database programmers.
Fig: the relationship among the three levels of abstraction
Levels of Abstraction
• Physical level:
• describes how a record is stored. Describes how the data are actually stored. The physical level
describes complex low-level data structures in detail. Record can be described as a block of
consecutive storage locations
• Logical level:
• describes what data are stored in the database, and what relationships exist among those data.
Describes data stored in database, and the relationships among the data.
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;

• View level:
• application programs hide details of data types. Views can also hide information (such as an
employee’s salary) for security purposes.
• describes only part of the entire database.
• Many users of the database system do not need all this information; instead, they need to
access only a part of the database.
• The view level of abstraction exists to simplify their interaction with the system. The system may
provide many views for the same database.
Levels of Abstraction and Physical data independence
• Although implementation of the simple structures at the logical
level may involve complex physical-level structures, the user of the
logical level does not need to be aware of this complexity.
• This is referred to as physical data independence. Database
administrators, who must decide what information to keep in the
database, use the logical level of abstraction.
Instances and Schemas
• Similar to types and variables in programming languages. A
database schema corresponds to the variable declarations (along
with associated type definitions) in a program.
• Each variable has a particular value at a given instant. The values of
the variables in a program at a point in time correspond to an
instance of a database schema.
• Instance – the actual content of the database at a particular point in
time (Analogous to the value of a variable). Databases change over
time as information is inserted and deleted
• Schema - The overall design of the database is called the database
schema. Schemas are changed infrequently, if at all
Instances and Schemas
• Database systems have several schemas, partitioned according to
the levels of abstraction.
• Logical Schema –
• the overall logical structure of the database .
• describes the database design at the logical level
• Example: The database consists of information about a set of customers
and accounts in a bank and the relationship between them
• Analogous to type information of a variable in a program
• Physical schema
• the overall physical structure of the database
• describes the database design at the physical level
• Subschemas:
• A database may also have several schemas at the view level, sometimes
called subschemas, that describe different views of the database
Instances and Schemas
• Physical Data Independence
• the ability to modify the physical schema without changing the logical
schema
• Applications depend on the logical schema
• The physical schema is hidden beneath the logical schema, and can usually
be changed easily without affecting application programs.
• In general, the interfaces between the various levels and components
should be well defined so that changes in some parts do not seriously
influence others.
Data Models : Database Perspectives
• This data model is a conceptual representation of Data objects, the
associations between different data objects, and the rules
• Data Models ensure consistency in naming conventions, default
values, semantics, security
• Based on the Database design we can classify the Data model into
following Perspectives
• Conceptual Data Model/ Design
• Logical Model /Design
• Physical Model/ Design

• Data modeling is the process of creating a data model for the data to be stored in
a database.
Conceptual Data Models/ Perspective
• earlier stages of data modeling to organize and define concepts and
rules based on use-case requirements.
• This is often achieved via entity relationship diagrams (ERD) and/or
object-Relational models (ORM)
• technology- and application-independent.
• Define and communicate high-level relationships between
concepts/entities
Logical Data Models/ Perspective
• data model of a specific problem domain expressed independently of a
particular database management product or storage technology but in terms of
data structures such as relational tables and columns, object-oriented classes,
or XML tags
• A logical data model is constructed by taking the data descriptions depicted in a
conceptual data model and data’s structure.
• The nature of relationships between data is established and defined, and data
from different systems is normalized.
• For example, a logical data model would specify the nature of a data element,
i.e., account name (string), account number (integer).
Physical Data Models/ Perspective
• introduces the database-specific context . models should be built in relation to a
specific database management system (DBMS) as well as the specific
requirements of the processes
• It represents the tables, columns, data types, views, constraints, indices and
procedures within the database.
• The consideration of technological context means physical data models reflect
the needs of the technological environment as is, or as intended.
Conceptual Data Model

Logical Data Model

Physical Data Model


Data Models
• A collection of conceptual tools for describing Data , Data relationships, Data
semantics, Data constraints
• a way to describe the design of a database at the physical, logical, and view levels
• A Database model defines the logical design and structure of a database and defines
how data will be stored, accessed and updated in a database management system.
• Different categories of Data Models
• Relational model
• Entity-Relationship data model (mainly for database design)
• Object-based data models (Object-oriented and Object-relational)
• Semistructured data model (XML)
• Other older models:
• Network model
• Hierarchical model
Relational Database model
 In this model, data is organised in two-dimensional tables and the relationship is maintained by
storing a common field. This model was introduced by E.F Codd in 1970, and since then it has
been the most widely used database model,
 The basic structure : tables. All the information related to a particular type is stored in rows of
that table. Hence, tables are also known as relations in relational model.
 based on normalizing data in the rows and columns of the tables. In normalized relations, values
saved are atomic values.
 Each row in a relation contains a unique value.
 Each column in a relation contains values from a same domain.
 Relational model can represent as a table with columns and rows. Each row is known as a tuple.
Each table of the column has a name or attribute.
 Relational model stored in fixed structures and manipulated using SQL (Structured Query
Language).
 DB2 , Oracle, SQL Server and MS Access , MySQL etc are based on relational data model
Relational Model Columns
• The relational model is an
example of a record-based
Rows model.
• Record-based models are so
named because the database
is structured in fixed-format
records of several types.
• Each table contains records of
a particular type.
• Each record type defines a
fixed number of fields, or
attributes.
• The columns of the table
correspond to the attributes of
the record type
Hierarchical Database model
• In a Hierarchical database, model data is organized in a tree-like structure. Data is Stored Hierarchically (top
down or bottom up) format. Data is represented using a parent-child relationship.
• The hierarchy starts from the Root data, and expands like a tree, adding child nodes to the parent nodes.
• In hierarchical model, data is organized into tree-like structure with one one-to-many relationship between
two different types of data, for example, one department can have many courses, many professors and of-
course many students.
• In Hierarchical DBMS parent may have many children, but children have only one parent.
• E.g Data related to Registry in
Windows system
Network Model
• The network database model allows each child to have multiple parents.
• It helps you to address the need to model more complex relationships like as the
orders/parts many-to-many relationship.
• In this model, entities are organized in a graph which can be accessed through several
paths.
• This is an extension of the Hierarchical model. In
this model data is organized more like a graph,
and are allowed to have more than one parent
node

• This database model was used to map many-to-


many data relationships.
• This was the most widely used database model,
before Relational Model was introduced
Entity Relationship (ER) Model
• relationships are created by dividing object of interest into entity and its
characteristics into attributes. Different entities are related using
relationships
• Is used to model the Conceptual/Logical view of the system from data
perspective which consists of : entities, relationships, attributes.
• Entities: Entity is a real-world thing. It can be a person, place, or even a concept.
Example: Teachers, Students, Course, Building, Department, etc are some of the
entities of a School Management System
• Attributes: An entity contains describing property called attribute. This is the
characteristics of that attribute. Example: The entity teacher has the property like
teacher id, salary, age, etc.
• Relationship: tells how two attributes are related. Example: Teacher works for a
department.
• The ER model also has an associated diagrammatic representation, the ER
diagram, which can express the overall logical structure of a database graphically
Entity Relationship (ER) Model
Object-Oriented Data Model
• The real-world problems are more closely represented through the object-oriented data
model.
• In this model, both the data and relationship are present in a single structure known as an
object.
• We can store audio, video, images, etc in the database which was not possible in the
relational model(although you can store audio and video in relational database, it is
adviced not to store in the relational database).
• In this model, two are more objects are connected through links. We use this link to
relate one object to other objects.
Object-based data models/Object-Oriented Model
• In Object-oriented Model data stored in the form of objects. The structure which
is called classes which display data within it.
• It defines a database as a collection of objects which stores both data members
values and operations.
• An object database is a database management system in which information is
represented in the form of objects as used in object-oriented programming.
• Object databases are different from relational databases which are table-
oriented.
• Object-oriented database management systems (OODBMSs) also called ODBMS
(Object Database Management System) combine database capabilities with
object-oriented programming language capabilities
Object-Relational Model
• The object-relational data model combines features of the object-oriented data model
and relational data model
• As the name suggests it is a combination of both the relational model and the object-
oriented model.
• This model was built to fill the gap between object-oriented model and the relational
model.
• We can have many advanced features like we can make complex data types according to
our requirements using the existing data types.
• The problem with this model is that this can get complex and difficult to handle. So, proper
understanding of this model is required.
Semi-Structured Model
• Semi-structured model is an evolved form of the relational model. We cannot differentiate between
data and schema in this model.
• Example: Web-Based data sources which we can't differentiate between the schema and data of the
website.
• In this model, some entities may have missing attributes while others may have an extra attribute.
• Semi-structured data contains tags and elements (Metadata) which is used to group data and
describe how the data is stored
• This model gives flexibility in storing the data. It also gives flexibility to the attributes.
• Example: If we are storing any value in any attribute then that value can be either atomic value or a
collection of values. Similar entities are grouped together and organised in a hierarchy.
• Some Examples of Such data are :
• XML and other markup languages
• Web pages
• E-mails
Database Language
• A DBMS has appropriate languages and interfaces to express database queries and updates.
• Database languages can be used to read, store and update the data in the database.

DDL stands for Data Definition Language.


It is used to define database structure or pattern.
Operation : Create, Alter, Drop, Truncate, Rename

DML stands for Data Manipulation Language. It is used for


accessing and manipulating data in a database
Operation : Select, Insert, update, delete

DCL stands for Data Control Language. It is used to control


access to data stored in a database (Authorization).
Operation: Grant, Revoke, DENY

Transaction Control Language


TCL is used to run the changes made by the DML statement.
TCL can be grouped into a logical transaction.
Operation: Commit, Rollback
Database Language
• Database system provides a
• data-definition language (DDL) to specify the database schema and
• a data-manipulation language(DML) to express database queries and updates.
• In practice, the data-definition and data-manipulation languages are not two
separate languages; instead they simply form parts of a single database language, such
as the widely used SQL language.
• A query is a statement requesting the retrieval of information.
• The portion of a DML that involves information retrieval is called a query language.
Although technically incorrect, it is common practice to use the terms query language
and data-manipulation language synonymously.
Data Definition Language (DDL)
• Specification notation for defining the database schema
• specify a database schema by a set of definitions expressed by a special
language called a data-definition language (DDL). The DDL is also used to specify
additional properties of the data.
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
• DDL compiler generates a set of table templates stored in a data dictionary
• Data dictionary contains metadata (i.e., data about data)
• Database schema
• Integrity constraints
• Primary key (ID uniquely identifies instructors)
• Authorization
• Who can access what
Data Definition Language (DDL)
• We specify the storage structure and access methods used by the
database system by a set of statements in a special type of DDL called a
data storage and definition language.
• The data values stored in the database must satisfy certain consistency
constraints.
• For example, suppose the university requires that the account balance of a
department must never be negative.
• The DDL provides facilities to specify such constraints.
• Domain Constraints
• domain of possible values must be associated with every attribute (for example,
integer types, character types, date/time types).
• Domain constraints are the most elementary form of integrity constraint. They are
tested easily by the system whenever a new
• data item is entered into the database.
Data Definition Language (DDL)
• Referential Integrity.
• to ensure that a value that appears in one relation for a given set of attributes also
appears in a certain set of attributes in another relation (referential integrity).
• For example, the department listed for each course must be one that actually
exists.
• More precisely, the dept name value in a course record must appear in the dept
name attribute of some record of the department relation.
• Database modifications can cause violations of referential integrity.
• When a referential-integrity constraint is violated, the normal procedure is to reject
the action that caused the violation.
• Assertions.
• An assertion is any condition that the database must always satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions.
Data Definition Language (DDL)
• Authorization.
• We may want to differentiate among the users as far as the type of access
they are permitted on various data values in the database.
• These differentiations are expressed in terms of authorization, the most
common being: read authorization, which allows reading, but not
modification, of data; insert authorization, which allows insertion of new
data, but not modification of existing data; update authorization, which
allows modification, but not deletion, of data; and delete authorization,
which allows deletion of data.
• We may assign the user all, none, or a combination of these types of
authorization.
Data Manipulation Language (DML)
• Language for accessing and manipulating the data organized by the
appropriate data model
• Retrieval of information stored in the database
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database

• DML also known as query language


• A query is a statement requesting the retrieval of information. The portion of
• DML that involves information retrieval is called a query language
• Although technically incorrect, it is common practice to use the terms query
language and data-manipulation language synonymously.
• The query processor component of the database system translates DML
queries into sequences of actions at the physical level of the database system
Data Manipulation Language (DML)
• There are basically two types:
• Procedural DMLs require a user to specify what data are needed and
how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require
a user to specify what data are needed without specifying how to get
those data (E.g. SQL)
• Declarative DMLs are usually easier to learn and use than are
procedural DMLs.
• However, since a user does not have to specify how to get the data,
the database system has to figure out an efficient means of accessing
data
DCL
• DCL commands are used for access control and permission management for users in the
database. With them we can easily allow or deny some actions for users on the tables or
records (row level security).
• DCL commands are:
• GRANT – We can give certain permissions on the table (and other objects) for certain
users of database,
• DENY – bans certain permissions from users.
• REVOKE – with this command we can take back permission from users.
• For Example GRANT can be used to give privileges to user to do SELECT, INSERT, UPDATE
and DELETE on a specific table or more than one tables.
• With REVOKE command we can take back privilege to default or revoking specific
command like update or delete based on requirements.
DCL
• Example:
Grant SELECT,INSERT,UPDATE,DELETE on Employee To User1
Revoke INSERT On Employee To user1
Deny Update On Employee to user1
• GRANT in first case we gave privileges to user User1 to do SELECT, INSERT, UPDATE and
DELETE on the table called employees.
• REVOKE with this command we can take back privilege to default one, in this case, we
take back command INSERT on the table employees for user User1.
• DENY is a specific command. We can conclude that every user has a list of privilege which
is denied or granted so command DENY is there to explicitly ban you some privileges on
the database objects.:
ER Model
 ER Model is used to model the logical view of the system from data perspective
 The ER data mode was developed to facilitate database design by allowing specification of an
enterprise schema that represents the overall logical structure of a database.
 Models an enterprise as a collection of entities ,attributes and relationships
o Entity: a “thing” or “object” in the enterprise that is distinguishable from other
objects
 Described by a set of attributes
 Either a physical existence such as a particular person, office, house or car.
 Or a conceptual existence such as a school, a university, a company or a job.
o Relationship: an association among several entities
 The ER data model employs three basic concepts:
o entity sets,
o relationship sets,
o attributes.
ER Model
In ER diagram,
• Attributes are associated with an entity set.
• Attributes describe the properties of entities
in the entity set.
• Based on the values of certain attributes, an
entity can be identified uniquely.
ER Model

•A strong entity set is an entity set that contains


•Two strong entity sets “Student” and “Course” are
sufficient attributes to uniquely identify all its entities.
related to each other.
•In other words, a primary key exists for a strong entity
set.

• Double line between Student and relationship set signifies total participation.
• It suggests that each student must be enrolled in at least one course.
• Single line between Course and relationship set signifies partial participation.
• It suggests that there might exist some courses for which no enrollments are made.
ER Model
• A weak entity set is an entity set that does not contain sufficient attributes to uniquely identify
its entities.
• In other words, a primary key does not exist for a weak entity set.
• it contains a partial key called as a discriminator.
• Discriminator is represented by underlining with a dashed line.
ER Model

Primary key of Apartment

= Primary key of Building + Its own discriminator


= Building number + Door number
ER diagram
• The ER diagram, can express the overall logical structure of a
database graphically.
• ER diagrams are a visual tool which is helpful to represent the ER
model
• ER model is a conceptual data modeling tool
Entity, Entity Type, Entity Set
• An Entity is an object of Entity Type and set of all entities is called as entity set. e.g.; E1 is an
entity having Entity Type Student and set of all students is called Entity Set. In ER diagram, Entity
Type is represented as:

 An entity is an object that exists and is distinguishable from other objects.


o Example: specific person, company, event, plant
 An entity set is a set of entities of the same type that share the same properties.
o Example: set of all persons, companies, trees, holidays
 An entity is represented by a set of attributes; i.e., descriptive properties possessed by all members of an entity set.
o Example: instructor = (ID, name, street, city, salary ) , course= (course_id, title, credits)
Attribute(s):
• Attributes are the properties which define the entity type
• For example, Roll_No, Name, DOB, Age, Address, Mobile_No are the attributes which defines entity type
Student. In ER diagram, attribute is represented by an oval.

3. Multivalued Attribute

1. Key Attribute: uniquely identifies each entity

4. Derived Attribute

2. Composite Attribute: composed of many other attribute


Relationship Type and Relationship Set:
• A relationship type represents the association between entity types

For example,‘Enrolled in’ is a relationship type


that exists between entity type Student and
Course.
In ER diagram, relationship type is represented
by a diamond and connecting the entities with
lines.

A set of relationships of same type is known as relationship set. The following relationship set depicts S1
is enrolled in C2, S2 is enrolled in C1 and S3 is enrolled in C3.
Degree of a relationship set:
• The number of different entity sets participating in a relationship set is called as degree of a relationship set.

• Unary Relationship

• Binary Relationship: Most relationships are binary.

• n-ary Relationship

When there are n entities set participating in a relation, the relationship is called as n-ary
relationship.
o Relationships between more than two entity sets are rare.
o Example: students work on research projects under the guidance of an instructor.
o relationship proj_guide is a ternary relationship between instructor, student, and
project
Instructor Student
Project
Ternary Relationship
• A Suppler can supplies many parts in a particular project
Mapping Cardinality:
• The number of times an entity of an entity set participates in a relationship set
is known as cardinality.
• Express the number of entities to which another entity can be associated via a
relationship set.
 We express cardinality constraints by drawing either a directed line (),
signifying “one,” or an undirected line (—), signifying “many,” between the
relationship set and the entity set.
• Cardinality can be of different types:
o One to one
o One to many
o Many to one
o Many to many
Mapping Cardinalities

One to one One to many

Note: Some elements in A and B may not be mapped to any


elements in the other set
Mapping Cardinalities

Many to one Many to many

Note: Some elements in A and B may not be mapped to any


elements in the other set
One to one

Using Sets, it can be represented as:


One to one

EID Ename age EID (fk) DID (fk) DID Dname Loc
1 A 20 1 1 1 Fin KTM
2 B 25 3 2 2 IT Lalit
3 C 20 2 3 3 HR BKT
4 A 31
Works Department
5 B 23
EID Ename age DID
Employee 1 A 20 1
The Primary key of relation table will be Either EID or DID (primary key of any tables) 2 B 25 3
Can I reduce the table? 3 C 20 2
Yes – Just put DID in the Employee Table or Put EID in Department Table
4 A 31 -
5 B 23 -
One to Many

IID (fk) SID (fk) SID Sname Cre.


IID Iname salary
1 A 2
1 A 20 1 1
2 B 3
2 B 25 1 2
3 C 2
3 C 20 2 3
4 A 3
4 A 31 3 4
5 B 5
5 B 23 4 5
6 A 3
4 6
Instructor Student
Advisor
What will be the Primary Key of Advisor ?  SID will be the primary Key (Primary key of Many Side)

Can we Reduce the Above Advisor Table ? Yes (If we combine IID to to Student Table Then we can remove
Advisor table )
One to Many

IID Iname salary SID Sname Cre. IID


1 A 20 1 A 2 1 This field is from Instructor
2 B 25 2 B 3 1
3 C 20 3 C 2 2
4 A 31 4 A 3 3
5 B 23 5 B 5 4
6 A 3 4
Instructor

Student Table after combining relationship table (advisor)


Many-to-One Relationships
• In a many-to-one relationship between an instructor and a student,
• an instructor is associated with at most one student via advisor,
• and a student is associated with several (including 0) instructors via advisor
Many-to-Many Relationship
• An instructor is associated with several (possibly 0) students via
advisor
• A student is associated with several (possibly 0) instructors via
advisor
Many to many

RollNO Name Age


RollNO CID
1 A 20
1 1 CID Name Credit
2 B 25
2 2 1 A 20
3 A 20
1 2 2 B 25
4 C 31
2 1 3 C 20
5 D 23
3 3 4 A 31

What will be the Primary key of Relation table ? campsite/combined key form RollNO and CID
(Non of the single column either CID or RollNO in Enrolled IN table are unique )
Can I reduce the table ? No Reduction of table
What is the minimum no of tables required to represent this E-R model into Relational Model?

A)2
B)3
C)4
D)5

T1=E1
T2=R1E2
T3=R2 ( Combining PK A and B )
More On ERD: ER- Notation
ER- Notation
Alternative ER Notations
Alternative ER Notations
Chen IDE1FX (Crows feet notation)
ERD for University Database
ERD for Order System
Schema Diagram for University
Database
• Example : In a university, a Student enrolls in Courses. A student
must be assigned to at least one or more Courses. Each course is
taught by a single Professor. To maintain instruction quality, a
Professor can deliver only one course.
Relational Model Concepts
A relational database is based on the relational model and uses a collection of tables to
represent both data and the relationships among those data.
It also includes a DML and DDL.
Most commercial relational database systems employ the SQL language
Relational Model Concepts
Attribute: Each column in a Table. Attributes are the properties which define a relation. e.g.,
Student_Rollno, NAME,etc.
Tables – In the Relational model the, relations are saved in the table format. It is stored along
with its entities. A table has two properties rows and columns. Rows represent records and
columns represent attributes.
Tuple –a single row of a table, which contains a single record.
Relation Schema: A relation schema represents the name of the relation with its attributes.
Degree: The total number of attributes which in the relation is called the degree of the relation.
Cardinality: Total number of rows present in the Table.
Column: The column represents the set of values for a specific attribute.
Relation instance – Relation instance is a finite set of tuples in the RDBMS system. Relation
instances never have duplicate tuples.
Relation key - Every row has one, two or multiple attributes, which is called relation key.
Attribute domain – Every attribute has some pre-defined value and scope which is known as
attribute domain
Relational model

•Basic Database Operation in Relational Model


• Insert is used to insert data into the relation
• Delete is used to delete tuples from the table.
• update allows you to change the values of some attributes in existing
tuples.
• Select allows you to choose a specific range of data.
Relational Model : Relational Integrity Constraints
• Relational Integrity constraints in DBMS are referred to conditions which
must be present for a valid relation.
• These Relational constraints in DBMS are derived from the rules in the mini-
world that the database represents.
• Mostly divided into three main categories are:
1.Domain Constraints
2.Key Constraints
3.Referential Integrity Constraints
Relational Model
• Domain Constraints
• Domain constraints can be violated if an attribute value is not appearing in the
corresponding domain or it is not of the appropriate data type.
• Domain constraints specify that within each tuple, and the value of each attribute
must be unique.
• This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.
• Create DOMAIN CustomerName CHECK (value not NULL)
• The example shown demonstrates creating a domain constraint such that
CustomerName is not NULL
Relational Model
• Key Constraints
• An attribute that can uniquely identify a tuple in a relation is called the key of
the table. The value of the attribute for different tuples in the relation has to
be unique
• In the given table, CustomerID is a key attribute of Customer Table.
• It is most likely to have a single key for one customer, CustomerID =1 is only
for the CustomerName =" Google".

CustomerID CustomerName Status


1 Google Active
2 Amazon Active
3 Apple Inactive
Relational model
• Referential Integrity Constraints
Referential Integrity constraints in DBMS are
based on the concept of Foreign Keys.
A foreign key is an important attribute of a
relation which should be referred to in other
relationships.
Referential integrity constraint state
happens where relation refers to a key
attribute of a different or same relation.
However, that key element must exist in the
table.

In example, we have 2 relations, Customer


and Billing.
Tuple for CustomerID =1 is referenced twice
in the relation Billing.
So we know CustomerName=Google has
billing amount $300
Relational model

Keys in DBMS
• Key is an attribute or set of attributes which helps you to identify a row(tuple) in a
relation(table).
• They allow you to find the relation between two tables.
• Keys help you uniquely identify a row in a table by a combination of one or more
columns in that table.
• Key is also helpful for finding unique record or row from the table.
Types of keys in DBMS
• Super Key
• Super key is a set of one or more than one keys that can be used to identify a record uniquely in a table.
• Example : Primary key, Unique key, Alternate key are subset of Super Keys.
• Candidate Key - minimal superkeys are called Candidate Key
• A Candidate Key is a set of one or more fields/columns that can identify a record uniquely in a table.
There can be multiple Candidate Keys in one table. Each Candidate Key can work as Primary Key.
• Example: In below diagram ID, RollNo and EnrollNo are Candidate Keys since all these three fields
can be work as Primary Key.
• Primary Key –
• Primary key is a set of one or more fields/columns of a table that uniquely identify a record in database
table. It can not accept null, duplicate values.
• Unique+Not Null
• In a one table only one primary key can exist (i.e no more than one primary key in a table)
Keys in DBMS
• Foreign Key
• Foreign Key is a field in database table that is Primary key in another table.
• It can accept multiple null, duplicate values.
• Example: We can have a DeptID column in the Employee table which is pointing to
DeptID column in a department table where it a primary key.
• Alternate key
• An Alternate key is a key that can be work as a primary key. Basically it is a candidate
key that currently is not primary key.
• Example: In below diagram RollNo and EnrollNo becomes Alternate Keys when we
define ID as Primary Key.
• Composite/Compound Key
• Composite Key is a combination of more than one fields/columns of a table. It can be
a Candidate key, Primary key.
Keys in DBMS

Minimal Super Key
Super Keys :
1. {Emp_ID}
2. {Adhar_no}
3. {Email_id}
4. {Emp_id, Adhar_no}
5. {Adhar_card, Email_id}
6. {Emp_id, Email_id}
7. {Emp_id,Adhar_No, Email_id}
8. {Emp_id, Name}
9. {Emp_id, Dept_id
10. {Emp_id,Name,Adhar_no,Email_id,Dept}
11. Etc ,……
Primary Key:
1. {Emp_ID}

Alternate Key Key:


1. {Adhar_no}
2. {Email_id}

Alternate Key=Candidate Key- Primary Key


Types of keys in DBMS: Super Key
• A superkey is a set of one or more attributes
that, taken collectively, allow us to identify
uniquely a tuple in the relation.
• For example, the ID attribute of the relation
instructor is sufficient to distinguish one
instructor tuple from another. Thus, ID is a
superkey.
• The name attribute of instructor, on the other
hand, is not a superkey, because several
instructors might have the same name.
• Superkey may contain extraneous attributes.
For example, the combination of ID and name
is a superkey for the relation instructor.
Types of keys in DBMS :candidate Key
• minimal superkeys are called candidate
keys.
• Super key with no redundant attributes
• It is possible that several distinct sets of
attributes could serve as candidate Key
• Suppose that a combination of name and
dept name is sufficient to distinguish
among members of the instructor
relation.
• Then, both {ID} and {name, dept name}
are candidate keys.
• Although the attributes ID and name
together can distinguish instructor tuples,
their combination, {ID, name}, does not
form a candidate key, since the attribute
ID alone is a candidate key.
• Candidate Key cant be null
Types of keys in DBMS: Primary Key
• Term primary key to denote a
candidate key that is chosen by the
database designer as the principal
means of identifying tuples within a
relation.
• A key (whether primary, candidate,
or super) is a property of the entire
relation, rather than of the
individual tuples
• Primary keys must be chosen with care.
As we noted, the name of a person is
obviously not sufficient, because there
may be many people with the same name
• The primary key should be chosen such
that its attribute values are never, or very
rarely, changed
• Primary Key must be unique and cant be
NULL
• Primary key must be given a value when
a new record is inserted in database
Types of keys in DBMS : Foreign Key
• A relation, say r1, may include among its attributes the
primary key of another relation, say r2. This attribute is
called a foreign key from r1, referencing r2. An attribute
(or set of attributes ) in one table that refers to the
primary key in the another table
• The relation r1 is also called the referencing relation of
the foreign key dependency, and r2 is called the
referenced relation of the foreign key.
• Purpose: to ensure referential integrity of the data
• For example, the attribute dept name in instructor is a
foreign key from instructor, referencing department, since
dept name is the primary key of department.
• Used to link two tables
Ans: C
Three tables of Three Entities
and 1 for multivalued attribute
Ans: a
COUNT(expression) does not
count NULL values
7. Which of the following is NOT a super Key in a relational
schema with attributes V, W,X,Y,Z and primary key VY
a) VXYZ
b) VWXZ Ans: B (V and Y both does not contains in VWXZ .
c) VWXY
d) VWXYZ
Functional Dependency
• Functional Dependency is when one attribute determines another attribute in a DBMS system
• Attribute B is functionally dependent upon attribute A (or collection of attributes) if a value of A determines
single value of B at any one time

Notation: A B
Read as: A determines B or B is functionally dependent on A
Here A is determinant and B is dependent attribute
• Functional Dependency plays a vital role to find the difference between good and bad database design

Sid Sname Case 2 In Case1 First and Second Ram is


Case 1 different
1 Ram 1 Ram
But in Case2 Ram is same
2 Ram 1 Ram Which can be determined by
Case 4 Determinant
Case 3
1 Ram In case 3 : Two values are different
1 Ram In case 4: Not valid (Poor Design)
2 Shyam 1 Shyam
Functional Dependency
Key Terms Description
Axiom Axioms is a set of inference rules used to infer all the
functional dependencies on a relational database.
Decomposition It is a rule that suggests if you have a table that appears to
contain two entities which are determined by the same
primary key then you should consider breaking them up into
two different tables.
Dependent It is displayed on the right side of the functional dependency
diagram.
Determinant It is displayed on the left side of the functional dependency
Diagram.
Union It suggests that if two tables are separate, and the PK is the
same, you should consider putting them together
Functional Dependency
Employee number Employee Name Salary City
1 Dana 50000 San Francisco
2 Francis 38000 London
3 Andrew 25000 Tokyo

In this example, if we know the value of Employee number, we can obtain Employee Name, city, salary,
etc.
By this, we can say that the city, Employee Name, and salary are functionally depended on
Employee number.
Which of the following
Which of the following A B C
X Y Z FD is not correct?
FD are satisfied 1 2 3 A) AB
1 4 3
A) XYZ, ZY B) BC
4 2 3
1 5 3 B) XZX, YZ, C) BCA
C) YZX, ZX 5 3 3
4 6 3 D) ACB
D) XZY, YX
3 2 2
Full Functional Dependency
For ABCD { D is fully FD on ABC } , D cannot dependent on any
subset of ABC
BCD Is not possible because
BC cannot determine D
CD C cannot determine D
A D A can not determine D

• Only ABC determine A


If proper subset of left-hand side determines the non unique attribute on right hand side then There is a partial
Dependency
E.g. ABCD
BCD
CD
A D
Here BC is a is a proper subset of ABC and BC is determining D so it’s a partial dependency
Full Functional Dependency
• Only relevant with composite Determinants
• If it is necessary to use all the attributes of the composite Determinant to identify
it’s object uniquely it is full functional dependency
• An attribute is fully functional dependent on another attribute, if it is Functionally
Dependent on that attribute and not on any of its proper subset.

Partial Functional Dependency


• Only relevant with composite Determinants
• If it isn’t necessary to use all the attributes of the composite Determinant
to identify it’s object uniquely
• Subset of Determinant can identify the object uniquely
Here Proficiency is depends upon the two
attributes StudnetID and Activity , which is
not depend upon only one attribute.
StudentID and Activity combined
Determines Proficiency
Types of Functional Dependencies
 Multivalued dependency
 Transitive dependency:
 Trivial functional dependency:
 Non-trivial functional dependency:
Multivalued dependency
• Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table
• For a dependency XY ,if for a single value of X , multiple value of Y exist ,
then the relation have multivalued dependency
• The relation should have atleast three attributes (XY, XZ)
• The attributes Y and Z should be independent for X Y ( or independent
for each other )

• Point to Remember : Whenever two independent attributes have 1 to


many (1: N ) relationship AB and AC are mixed in the same
relation , an MVD may arise
• MVD always leads to a bad database design
Ename ProjName DepName
Ram DBMS Sita It is quite Possible Because
Ram Java Khusi One Person may works in Many projects
And One person may have many Dependent
Ram DBMS Khusi
Ram Java Sita
• EnameProjName
• Ename DepName

If we Join this table with Same(Self Join) then how many rows it will create
??
It generates too much duplicate data
Fir first two rows there is
Ename ProjName DepName additional(redundant) two rows
Ram DBMS Sita Generated
Fr this case 4 redundant rows generated
Ram Java Sita
and we cant remove them from this
Ram DBMS Khusi relation design
Ram Java Khisi
It’s a problem
……
 Multivalued dependency
multi-valued dependency.
Car_model Maf_year Color E.G A person can have multiple phone
no and multiple email , Then we should
H001 2017 Metallic make two table one contains Name and
H001 2017 Green email , other Name with phone .
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray
In this example, maf_year and color are independent of each other but dependent on
car_model. In this example, these two columns are said to be multivalue dependent on
car_model.
This dependence can be represented like this:
car_model  maf_year
car_model colour
Consider attributes A,B and C, Where AB and
Transitive dependency: BC FD are transitive also have the FD AC
‘C ‘ is transitively dependent on A through B
Occurs when there is Intermediate dependency
A transitive is a type of functional dependency which happens when it is
indirectly formed by two functional dependencies.

{Company}  {CEO} (if we know the compay, we know its CEO's name)
{CEO }  {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company}  {Age} should hold, that makes sense because if we know the
company name, we can know his age.

Note: You need to remember that transitive dependency can only occur in a relation of three or more
attributes.
Transitive dependency:
Trivial Functional dependency:
• The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are
included in that attribute.
• So, X  Y is a trivial functional dependency if Y is a subset of X.
• This FD is always valid because Y is subset of X
• LHS Ո RHS is not Empty (null set)
• For example:

SID Sname
AS555 Harry
AS811 George
AS999 Kevin
Consider this table with two columns SID and Sname.
{SID, Sname}  SID is a trivial functional dependency as SID is a subset of {SID, Sname}.
Non trivial functional dependency in DBMS
• Functional dependency which also known as a nontrivial dependency occurs when AB holds
true where B is not a subset of A.
• In a relationship, if attribute B is not a subset of attribute A, then it is considered as a non-trivial
dependency.
• LHS Ո LHS is always Empty (null set)
• In case of Non trivial FD we must check the validity of the given dependency
• SIDSname,
• SIDPhoneNO,
• SIDSemester
Example:
(Company} {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it's non-trivial functional dependency.
Properties of FD
• Reflexive : if Y is subset of X then XY ( trivial )
• E.g SIDSID (always valid, no need to check )
• Augmentation : if XY , then XZYZ
• SIDSname then
• SID PhoneNO  Sname PhoneNO
• Transitive : if XY and YZ then XZ
• E.g. SIDSname and Sname City then we can say that SIDCity
• Union : if XY and XZ then XYZ
• Decomposition: if XYZ then XY and XZ
• But XY then XZ, YZ is not possible

• Psedutransitivity : XY, WYZ then WXZ

• Composition: if XY, ZW then XZYW


• For the Given relation R(ABCD) FD are given as { AB, BC,
To Find the Candidate Key Compute
CD } .
Closure of Given Functional Dependency
A+ =BCDA
The Candidate key Will be : A B+ =BCD
Prime Attribute are : A C+ =CD
Non Prime Attribute : { B,C,D } D+ =C

(AB)+= ABCD
- it’s a Super Key Not a Primary Key

• For the Given relation R(ABCD) FD are given as { AB, BC,


CD, DA } .
To Find the Candidate Key Compute
Closure of Given Functional Dependency
The Candidate key Will be : {A,B,C,D} A+ =ABCD
Prime Attribute are : {A,B,C,D} B+ =ABCD
Non Prime Attribute : { Ø } C+ =ABCD
D+ =ABCD
8 For the Given relation R(ABCD) FD are given as { AB, BC, CD } . The Candidate key Will be
A. A only
B. AB
C. C
D. D

Ans: A

9. For the Given relation R(ABCD) FD are given as


{ AB, BC, CD,DA } . The Candidate key
Will be
A. A only
B. A,B,C,D
C. A or B only
D. A or B or C
E. None
Ans: B
Database Normalization
 Process of removing redundant data to improve storage efficiency, data integrity
and scalability
 Normalization is used to minimize the redundancy from a relation or set of
relations.
 It is also used to eliminate the undesirable characteristics like Insertion, Update
and Deletion Anomalies.
 Normalization divides the larger table into the smaller table and links them using
relationship.
Normalization is used for mainly two purposes,
 Eliminating redundant(useless) data.
 Ensuring data dependencies make sense i.e data is logically stored.
Problems Without Normalization
• If a table is not properly normalized and have data redundancy then it will not
only take extra memory space but will also make it difficult to handle and
update the database, without facing data loss.
• Insertion, Updation and Deletion Anomalies are very frequent if database is
not normalized. To understand these anomalies let us take an example of a
Student table.

rollno name Branch hod office_tel


401 Akon CSE Mr. X 53337
402 Bkon CSE Mr. X 53337
403 Ckon CSE Mr. X 53337
404 Dkon CSE Mr. X 53337
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch,
hod(Head of Department) and office_tel is repeated for the students who are in the same branch in the college,
this is Data Redundancy.
Anomalies
Undesirable Side Effects That Occur If Relations are Not in
Proper Normal Form
• Insertion Anomalies
• Deletion Anomalies
• Update Anomalies
SID SNAME AGE SID SNAM CID CNam FID Fname Salary
1 Ram 20 E e
2 Varun 25 1 Ram C1 DBMS F1 John 30000
1 Ram 20 2 Ravi C2 JAVA F2 Bob 40000
Row Level duplicacy ((row 1 3 Nitin C1 DBMS F1 John 30000
and 3 is exactly same) 4 Amit C1 DBMS F1 John 30000
 can be removed by
setting primary key

Column Level duplicacy  Here no two rows


are exactly same but there is duplicay on
column (cid, Cname, FID, Fname, Salary)
 column level duplicacy creates anomalies
SID SNAM CID CNam FID Fname Salary
E e
1 Ram C1 DBMS F1 John 30000
2 Ravi C2 JAVA F2 Bob 40000
3 Nitin C1 DBMS F1 John 30000
4 Amit C1 DBMS F1 John 30000

Insertion Anomaly: If you want to add new course lets say C10 or Faculty F3 you cant
add because you cant set SID ( what will be the SID??) – you cant leave SID null because
it’s a primary key

Deletion Anomaly : Lets say given a query remove the database of roll no 1 . Here is no
problem
But if you delete a data of roll no 2 , what will happens ?? Then there is no information of
who was teaching roll no 2, what was the course enrolled by roll no )

Update anomaly: lets say update the name of roll no 4,  no Problem here
But if we change the salary of F1 from 30000 to 40000 then what will happens?? There is
only one faculty F1 but it updates all the rows in the database which takes huge time
If we divide the table into multiple table , the anomalies can be removed , which is called Normalization
SID Sname

CID Cname

FID Fname Salary


Normal Form Description
1NF A relation is in 1NF if it contains an atomic value.
(no multivalued attribute)
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
(No partial Dependency)
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
BCNF 3 NF+ LHS must be Candidate Key or Super Key
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
E.G A person can have multiple phone no and multiple email , Then we should
make two table one contains Name and email , other Name with phone .
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless. ( lossless decomposition) – No lossy decomposition
Also calle Project-Join Normal Form (PJNC)
DKNF The constraints are verified by the domain and key constraints
Domain-Key Normal Form is the highest form of Normalization.
Normal Forms Types(1)
First Normal Form (1NF) : For a table to be in the First Normal Form, it
should follow the following 4 rules:
• It should only have single(atomic) valued attributes/columns.
• Values stored in a column should be of the same domain
• All the columns in a table should have unique names.
• And the order in which data is stored, does not matter.
• Second Normal Form (2NF)
• For a table to be in the Second Normal Form,
• It should be in the First Normal form.
• And, it should not have Partial Dependency.
• Third Normal Form (3NF)
• A table is said to be in the Third Normal Form when,
• It is in the Second Normal form.
• And, it doesn't have Transitive Dependency.
Normal Forms Types (2)
• Boyce and Codd Normal Form (BCNF)
• Boyce and Codd Normal Form is a higher version of the Third Normal form.
• This form deals with certain type of anomaly that is not handled by 3NF.
• A 3NF table which does not have multiple overlapping candidate keys is said
to be in BCNF.
• For a table to be in BCNF, following conditions must be satisfied:
• R must be in 3rd Normal Form
• and, for each functional dependency ( X → Y ), X should be a super Key.
• Fourth Normal Form (4NF)
• A table is said to be in the Fourth Normal Form when,
• It is in the Boyce-Codd Normal Form.
• And, it doesn't have Multi-Valued Dependency.
Domain Key Normal Form(DKNF)
• Which of the following is TRUE?
(A) Every relation in 2NF is also in BCNF
(B) A relation R is in 3NF if every non-prime attribute of R is fully functionally dependent
on every key of R
(C) Every relation in BCNF is also in 3NF
(D) No relation can be in both BCNF and 3NF
First Normal Form
Table should not contain any multivalued RollNO Name Course1 Course2
Attribute 1 Sai C C++
2 Harse Java Null
RollNO Name Course
3 Awatar C DBMS
1 Sai C/C++
Primary Key : RollNO
2 Harse Java
3 Awatar C/DBMS
This table is not in 1NF

RollNO Name RollNO Course


RollNO Name Course 1 Sai 1 C
1 Sai C 2 Harse 1 C++
1 Sai C++ 3 Awatar 2 Java
2 Harse Java Primary Key : RollNO 3 C
3 Awatar C 3 DBMS
3 Awatar DBMS Primary Key :
RollNO,Course
Primary Key:{ RollNo, Course} This table is in 1NF Foreign Key: RollNO
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
CustomerID StoreID Location • Here , CustomerID and
2NF= 1 1 Kathamndu StoreID Combined should
• 1NF+ All the non prime 1 3 Bhaktapur
be a Candidate key ( primary
Key)
attributes should be 2 1 Kathmandu • Location is non prime
fully functional 3 2 Lalitpur attribute , which should not
4 3 Bhaktapur be determined by other
dependent on non prime attribute
Candidate key • But here StoreID itself is
determining the Location
so the table is not in 2NF
StoreID Location
CustomerID StoreID
1 Kathmandu {CustomerID,StoreID}Location
1 1
2 Lalitpur
1 3 StoreIDLocation – It’s a Problem
3 Bhaktapur
2 1 (it’s a partial dependency )
3 2 Primary Key : StoreID

4 3
These tables are in second normal form
Primary Key : CustomerID+ StoreID
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can
teach more than one subject.
TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE


25 Chemistry 30
In the given table, non-prime attribute
25 Biology 30
TEACHER_AGE is dependent on TEACHER_ID
47 English 35
which is a proper subset of a candidate key. That's why
83 Math 38
it violates the rule for 2NF.
83 Computer 38

To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL TEACHER_SUBJECT
TEACHER_ID SUBJECT
TEACHER_ID TEACHER_AGE 25 Chemistry
25 30 25 Biology
47 35 47 English
83 38 83 Math
83 Computer
Third Normal Form
- 2NF+ There should not be no transitive dependency
RollNo State City
1 Bagmati Kathmandu
Here Candidate Key : RollNO
2 Lumbini Butwal
3 Bagmati Kathmandu
Prime Attribute: RollNO
4 Lumbini Butwal
Non Prime Attribute: State, City
5 Gandaki Pokhara
Given Functional Dependency:

RollNoState
StateCity
Then RollNO City ( here RollNO is determining City through State) – it’s a transitive
dependency (because it is not given in the FD it is derived from RollNoStateCity)

Her RollNo State is ok ,


but State City is not in proper form (non prime is determining non prime)
A relation is in third normal form if it holds at least one of the following conditions for every non-trivial
function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

EMPLOYEE_DETAIL
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
Super key in the table above:
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, E
MP_NAME, EMP_ZIP}....so on
444 Lan 60007 US Chicago
Candidate key: {EMP_ID}
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal Non-prime attributes: In the given table, all
attributes except EMP_ID are non-prime

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-
prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule
of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMPLOYEE EMPLOYEE_ZIP
EMP_ID EMP_NAME EMP_ZIP EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010
201010 UP Noida
333 Stephan 02228
02228 US Boston
444 Lan 60007
60007 US Chicago
555 Katharine 06389
06389 UK Norwich
666 John 462007
462007 MP Bhopal
RollNO Name VoterID Age  BCNF is the advance version of 3NF. It is stricter
1 Ravi K0123 20 than 3NF.
2 Varun M034 21  A table is in BCNF if every functional dependency
3 Ravi K786 23 X → Y, X is the super key/candidate of the table.
4 Rahul D286 21  For BCNF, the table should be in 3NF, and for every
FD, LHS is super key.

Given a FD
C.K={RollNo, VoterID}
FD
RollNoName
RollNOVoterID
VoetrIDage
VoterIDROllNO

Is Given Table is in BCNF ?


Yes
• For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
• Example: Let's assume there is a company where employees work in more than one department.
• EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

• In the above table Functional dependencies are as follows:


• EMP_ID → EMP_COUNTRY
• EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}


• The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
EMP_COUNTRY table:
To convert the given table into BCNF, we decompose it into
EMP_ID EMP_COUNTRY
three tables:
264 India
364 UK

EMP_DEPT table:
EMP_DEPT EMP_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
264 283
264 300
364 232
364 549
Fourth normal form (4NF)
 A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
 For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a
multi-valued dependency.
The given STUDENT table is in 3NF, but the COURSE and HOBBY are
STU_ID COURSE HOBBY
two independent entity. Hence, there is no relationship between COURSE
21 Computer Dancing
and HOBBY.
21 Math Singing
In the STUDENT relation, a student with STU_ID, 21 contains two
34 Chemistry Dancing courses, Computer and Math and two hobbies, Dancing and Singing.
74 Biology Cricket So there is a Multi-valued dependency on STU_ID, which leads to
59 Physics Hockey unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two
tables:

STUDENT_COURSE
STUDENT_HOBBY
STU_ID COURSE
STU_ID HOBBY
21 Computer
21 Dancing
21 Math
21 Singing
34 Chemistry
34 Dancing
74 Biology
74 Cricket
59 Physics
59 Hockey
Ename ProjName DepName
Ram DBMS Sita It is quite Possible Because
Ram Java Khusi One Person may works in Many projects
And One person may have many Dependent
Ram DBMS Khusi
Ram Java Sita
• EnameProjName
• Ename DepName
• The above table is in BCNF(Ename is Super Key in both Functional
Dependency) but not in 4NF because of multivalued dependency. So
Lets Divide above table into two tables

EMP_Proj
EMP_Dep
Ename ProjName
Ename DepName
Ram DBMS
Ram Sita
Ram Java
Ram Khusi
Lossless Decomposition
• If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
• The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Dependency Preserving
• In the dependency preservation, at least one decomposed table must satisfy every
dependency.
• If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
• For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(ABC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD ABC is a part of relation R1(ABC).
Find the value of C if the value of A=1
5NF (4NF+Lossless Join) Here It List the
Rows which has
For this case we have to join Two
tables
B B Common
Cross Product Natural Join= cross product+
A B C condition (equivalence of
Common attribute )
1 2 1
2 2 2 Here 1 2 2 and
2 2 1 are
3 3 2
inconsistency Select R2.C
Lets Divide into Two table from R2
Sporous tuples ? Natural Join R1
A B B C
It selects all the
1 2 2 1 rows which are
2 2 2 2 R1.B=R2.B
3 3 3 2 Select R2.C from R2
Natural Join R1 where
(AB) (BC) R1.A=1

Here for one value of A (1) there is two value of C (1 and 2) in the above table
it’s a lossy decomposition because there is inconsistency (in this case redundancy )
• For Lossless Join
• Common Attribute should be Candidate Key or Super Key of
either R1 or R2 or Both .
• In previous example we take B as common attribute but which is
not a candidate key
• If we Divide Table into
• R1(AB)
• R2(AC)
• Condition :
• R1 Union R2=R
• R1 Intersection R2 must not be Null (Empty )
• Common Attribute should be Candidate Key or Super Key of
either R1 or R2 or Both .
• After that we will get lossless join (where there will no spurious
tuples)
1. Consider the relation R(ABCDE) and its Functional dependencies: ABC, CE, BD, EA
The relation R is decomposed into R1(BCD) and R2(ACE).
Which of the following option is correct about decomposition
a) Decomposition is lossy and dependency preserving
b) Decomposition is lossless and dependency preserving
c) Decomposition is lossy and dependency not preserving
d) Decomposition is lossless and not dependency preserving

2. The Maximum number of super keys in the relation R(A,B,C,D) with


A as the Key is
a) 5
b) 7
c) 8
d) 6

Note : Maximum number of possible super keys for a table with


‘n’ attributes =2𝑛−1

You might also like