1 DBMS
1 DBMS
System
Database DBMS
• DBMS Goal: DBMS provides an environment that is both convenient and efficient
to use.
Drawbacks of using file systems to store data:
o Data redundancy and inconsistency : Multiple file o Concurrent access by multiple users:
formats, duplication of information in different files o Concurrent accessed needed for
o Difficulty in accessing data: Need to write a new program performance.
to carry out each new task o Uncontrolled concurrent accesses
can lead to inconsistencies.
o Data isolation — multiple files and formats
o E.g. two people reading a balance
o Integrity problems: and updating it at the same time
o Integrity constraints (e.g. account balance > 0) become o Security problems
part of program code .
o Hard to provide user access to some,
o Hard to add new constraints or change existing ones but not all, data
o Atomicity of updates:
o Failures may leave database in an inconsistent state with Database systems offer
partial updates carried out.
o E.g. transfer of funds from one account to another solutions to all the above
should either complete or not happen at all
problems
Advantages of DBMS over file system
• There are several advantages of Database management system over file system. Few of
them are as follows:
• No redundant data –by data normalization
• Data Consistency - data normalization takes care of it too , A DBMS schedules
concurrent access to the data in such a manner that only one user can access the
same data at a time which helps in maintaining consistency as well
• Integrity - The DBMS implies integrity constraints to get a high level of protection
against prohibited access to data.
• Secure – Each user has a different set of access , E.g. application programmers never
exposed to details of data representation and storage .
• Privacy – Limited access.
• Easy access to data : uses various powerful functions to store and retrieve data
efficiently, Uniform administration procedures for data
• Easy recovery
• Flexible - DBMS offers a variety of techniques to store & retrieve data, DBMS serves
as an efficient handler to balance the needs of multiple applications using the same
data
DBMS vs. Flat File
DBMS Flat File Management System
• Multi-user access • It does not support multi-user access
• Design to fulfill the need for small and • It is only limited to smaller system.
large businesses
• Remove redundancy and Integrity • Redundancy and Integrity issues
• Expensive. But in the long term Total Cost • It's cheaper
of Ownership is cheap
• Easy to implement complicated • No support for complicated transactions
transactions
• Data Isolation Problem Not exist • Data isolation problems
Transaction manager
File manager
Buffer manager
o The storage manager implements several data structures as part of the physical system implementation:
Data dictionary, which stores metadata about the structure of the database, in particular the schema
of the database
• For the system to be usable, it must retrieve data efficiently. The need for
efficiency has led designers to use complex data structures to represent data in
the database.
• Since many database-system users are not computer trained, developers hide
the complexity from users through several levels of abstraction, to simplify
users’ interactions with the system:
Levels of Abstraction
An architecture for a database system
• View level:
• application programs hide details of data types. Views can also hide information (such as an
employee’s salary) for security purposes.
• describes only part of the entire database.
• Many users of the database system do not need all this information; instead, they need to
access only a part of the database.
• The view level of abstraction exists to simplify their interaction with the system. The system may
provide many views for the same database.
Levels of Abstraction and Physical data independence
• Although implementation of the simple structures at the logical
level may involve complex physical-level structures, the user of the
logical level does not need to be aware of this complexity.
• This is referred to as physical data independence. Database
administrators, who must decide what information to keep in the
database, use the logical level of abstraction.
Instances and Schemas
• Similar to types and variables in programming languages. A
database schema corresponds to the variable declarations (along
with associated type definitions) in a program.
• Each variable has a particular value at a given instant. The values of
the variables in a program at a point in time correspond to an
instance of a database schema.
• Instance – the actual content of the database at a particular point in
time (Analogous to the value of a variable). Databases change over
time as information is inserted and deleted
• Schema - The overall design of the database is called the database
schema. Schemas are changed infrequently, if at all
Instances and Schemas
• Database systems have several schemas, partitioned according to
the levels of abstraction.
• Logical Schema –
• the overall logical structure of the database .
• describes the database design at the logical level
• Example: The database consists of information about a set of customers
and accounts in a bank and the relationship between them
• Analogous to type information of a variable in a program
• Physical schema
• the overall physical structure of the database
• describes the database design at the physical level
• Subschemas:
• A database may also have several schemas at the view level, sometimes
called subschemas, that describe different views of the database
Instances and Schemas
• Physical Data Independence
• the ability to modify the physical schema without changing the logical
schema
• Applications depend on the logical schema
• The physical schema is hidden beneath the logical schema, and can usually
be changed easily without affecting application programs.
• In general, the interfaces between the various levels and components
should be well defined so that changes in some parts do not seriously
influence others.
Data Models : Database Perspectives
• This data model is a conceptual representation of Data objects, the
associations between different data objects, and the rules
• Data Models ensure consistency in naming conventions, default
values, semantics, security
• Based on the Database design we can classify the Data model into
following Perspectives
• Conceptual Data Model/ Design
• Logical Model /Design
• Physical Model/ Design
• Data modeling is the process of creating a data model for the data to be stored in
a database.
Conceptual Data Models/ Perspective
• earlier stages of data modeling to organize and define concepts and
rules based on use-case requirements.
• This is often achieved via entity relationship diagrams (ERD) and/or
object-Relational models (ORM)
• technology- and application-independent.
• Define and communicate high-level relationships between
concepts/entities
Logical Data Models/ Perspective
• data model of a specific problem domain expressed independently of a
particular database management product or storage technology but in terms of
data structures such as relational tables and columns, object-oriented classes,
or XML tags
• A logical data model is constructed by taking the data descriptions depicted in a
conceptual data model and data’s structure.
• The nature of relationships between data is established and defined, and data
from different systems is normalized.
• For example, a logical data model would specify the nature of a data element,
i.e., account name (string), account number (integer).
Physical Data Models/ Perspective
• introduces the database-specific context . models should be built in relation to a
specific database management system (DBMS) as well as the specific
requirements of the processes
• It represents the tables, columns, data types, views, constraints, indices and
procedures within the database.
• The consideration of technological context means physical data models reflect
the needs of the technological environment as is, or as intended.
Conceptual Data Model
• Double line between Student and relationship set signifies total participation.
• It suggests that each student must be enrolled in at least one course.
• Single line between Course and relationship set signifies partial participation.
• It suggests that there might exist some courses for which no enrollments are made.
ER Model
• A weak entity set is an entity set that does not contain sufficient attributes to uniquely identify
its entities.
• In other words, a primary key does not exist for a weak entity set.
• it contains a partial key called as a discriminator.
• Discriminator is represented by underlining with a dashed line.
ER Model
3. Multivalued Attribute
4. Derived Attribute
A set of relationships of same type is known as relationship set. The following relationship set depicts S1
is enrolled in C2, S2 is enrolled in C1 and S3 is enrolled in C3.
Degree of a relationship set:
• The number of different entity sets participating in a relationship set is called as degree of a relationship set.
• Unary Relationship
• n-ary Relationship
When there are n entities set participating in a relation, the relationship is called as n-ary
relationship.
o Relationships between more than two entity sets are rare.
o Example: students work on research projects under the guidance of an instructor.
o relationship proj_guide is a ternary relationship between instructor, student, and
project
Instructor Student
Project
Ternary Relationship
• A Suppler can supplies many parts in a particular project
Mapping Cardinality:
• The number of times an entity of an entity set participates in a relationship set
is known as cardinality.
• Express the number of entities to which another entity can be associated via a
relationship set.
We express cardinality constraints by drawing either a directed line (),
signifying “one,” or an undirected line (—), signifying “many,” between the
relationship set and the entity set.
• Cardinality can be of different types:
o One to one
o One to many
o Many to one
o Many to many
Mapping Cardinalities
EID Ename age EID (fk) DID (fk) DID Dname Loc
1 A 20 1 1 1 Fin KTM
2 B 25 3 2 2 IT Lalit
3 C 20 2 3 3 HR BKT
4 A 31
Works Department
5 B 23
EID Ename age DID
Employee 1 A 20 1
The Primary key of relation table will be Either EID or DID (primary key of any tables) 2 B 25 3
Can I reduce the table? 3 C 20 2
Yes – Just put DID in the Employee Table or Put EID in Department Table
4 A 31 -
5 B 23 -
One to Many
Can we Reduce the Above Advisor Table ? Yes (If we combine IID to to Student Table Then we can remove
Advisor table )
One to Many
What will be the Primary key of Relation table ? campsite/combined key form RollNO and CID
(Non of the single column either CID or RollNO in Enrolled IN table are unique )
Can I reduce the table ? No Reduction of table
What is the minimum no of tables required to represent this E-R model into Relational Model?
A)2
B)3
C)4
D)5
T1=E1
T2=R1E2
T3=R2 ( Combining PK A and B )
More On ERD: ER- Notation
ER- Notation
Alternative ER Notations
Alternative ER Notations
Chen IDE1FX (Crows feet notation)
ERD for University Database
ERD for Order System
Schema Diagram for University
Database
• Example : In a university, a Student enrolls in Courses. A student
must be assigned to at least one or more Courses. Each course is
taught by a single Professor. To maintain instruction quality, a
Professor can deliver only one course.
Relational Model Concepts
A relational database is based on the relational model and uses a collection of tables to
represent both data and the relationships among those data.
It also includes a DML and DDL.
Most commercial relational database systems employ the SQL language
Relational Model Concepts
Attribute: Each column in a Table. Attributes are the properties which define a relation. e.g.,
Student_Rollno, NAME,etc.
Tables – In the Relational model the, relations are saved in the table format. It is stored along
with its entities. A table has two properties rows and columns. Rows represent records and
columns represent attributes.
Tuple –a single row of a table, which contains a single record.
Relation Schema: A relation schema represents the name of the relation with its attributes.
Degree: The total number of attributes which in the relation is called the degree of the relation.
Cardinality: Total number of rows present in the Table.
Column: The column represents the set of values for a specific attribute.
Relation instance – Relation instance is a finite set of tuples in the RDBMS system. Relation
instances never have duplicate tuples.
Relation key - Every row has one, two or multiple attributes, which is called relation key.
Attribute domain – Every attribute has some pre-defined value and scope which is known as
attribute domain
Relational model
Notation: A B
Read as: A determines B or B is functionally dependent on A
Here A is determinant and B is dependent attribute
• Functional Dependency plays a vital role to find the difference between good and bad database design
In this example, if we know the value of Employee number, we can obtain Employee Name, city, salary,
etc.
By this, we can say that the city, Employee Name, and salary are functionally depended on
Employee number.
Which of the following
Which of the following A B C
X Y Z FD is not correct?
FD are satisfied 1 2 3 A) AB
1 4 3
A) XYZ, ZY B) BC
4 2 3
1 5 3 B) XZX, YZ, C) BCA
C) YZX, ZX 5 3 3
4 6 3 D) ACB
D) XZY, YX
3 2 2
Full Functional Dependency
For ABCD { D is fully FD on ABC } , D cannot dependent on any
subset of ABC
BCD Is not possible because
BC cannot determine D
CD C cannot determine D
A D A can not determine D
If we Join this table with Same(Self Join) then how many rows it will create
??
It generates too much duplicate data
Fir first two rows there is
Ename ProjName DepName additional(redundant) two rows
Ram DBMS Sita Generated
Fr this case 4 redundant rows generated
Ram Java Sita
and we cant remove them from this
Ram DBMS Khusi relation design
Ram Java Khisi
It’s a problem
……
Multivalued dependency
multi-valued dependency.
Car_model Maf_year Color E.G A person can have multiple phone
no and multiple email , Then we should
H001 2017 Metallic make two table one contains Name and
H001 2017 Green email , other Name with phone .
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray
In this example, maf_year and color are independent of each other but dependent on
car_model. In this example, these two columns are said to be multivalue dependent on
car_model.
This dependence can be represented like this:
car_model maf_year
car_model colour
Consider attributes A,B and C, Where AB and
Transitive dependency: BC FD are transitive also have the FD AC
‘C ‘ is transitively dependent on A through B
Occurs when there is Intermediate dependency
A transitive is a type of functional dependency which happens when it is
indirectly formed by two functional dependencies.
{Company} {CEO} (if we know the compay, we know its CEO's name)
{CEO } {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} {Age} should hold, that makes sense because if we know the
company name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three or more
attributes.
Transitive dependency:
Trivial Functional dependency:
• The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are
included in that attribute.
• So, X Y is a trivial functional dependency if Y is a subset of X.
• This FD is always valid because Y is subset of X
• LHS Ո RHS is not Empty (null set)
• For example:
SID Sname
AS555 Harry
AS811 George
AS999 Kevin
Consider this table with two columns SID and Sname.
{SID, Sname} SID is a trivial functional dependency as SID is a subset of {SID, Sname}.
Non trivial functional dependency in DBMS
• Functional dependency which also known as a nontrivial dependency occurs when AB holds
true where B is not a subset of A.
• In a relationship, if attribute B is not a subset of attribute A, then it is considered as a non-trivial
dependency.
• LHS Ո LHS is always Empty (null set)
• In case of Non trivial FD we must check the validity of the given dependency
• SIDSname,
• SIDPhoneNO,
• SIDSemester
Example:
(Company} {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it's non-trivial functional dependency.
Properties of FD
• Reflexive : if Y is subset of X then XY ( trivial )
• E.g SIDSID (always valid, no need to check )
• Augmentation : if XY , then XZYZ
• SIDSname then
• SID PhoneNO Sname PhoneNO
• Transitive : if XY and YZ then XZ
• E.g. SIDSname and Sname City then we can say that SIDCity
• Union : if XY and XZ then XYZ
• Decomposition: if XYZ then XY and XZ
• But XY then XZ, YZ is not possible
(AB)+= ABCD
- it’s a Super Key Not a Primary Key
Ans: A
Insertion Anomaly: If you want to add new course lets say C10 or Faculty F3 you cant
add because you cant set SID ( what will be the SID??) – you cant leave SID null because
it’s a primary key
Deletion Anomaly : Lets say given a query remove the database of roll no 1 . Here is no
problem
But if you delete a data of roll no 2 , what will happens ?? Then there is no information of
who was teaching roll no 2, what was the course enrolled by roll no )
Update anomaly: lets say update the name of roll no 4, no Problem here
But if we change the salary of F1 from 30000 to 40000 then what will happens?? There is
only one faculty F1 but it updates all the rows in the database which takes huge time
If we divide the table into multiple table , the anomalies can be removed , which is called Normalization
SID Sname
CID Cname
The decomposition of the EMPLOYEE table into 1NF has been shown below:
4 3
These tables are in second normal form
Primary Key : CustomerID+ StoreID
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can
teach more than one subject.
TEACHER table
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL TEACHER_SUBJECT
TEACHER_ID SUBJECT
TEACHER_ID TEACHER_AGE 25 Chemistry
25 30 25 Biology
47 35 47 English
83 38 83 Math
83 Computer
Third Normal Form
- 2NF+ There should not be no transitive dependency
RollNo State City
1 Bagmati Kathmandu
Here Candidate Key : RollNO
2 Lumbini Butwal
3 Bagmati Kathmandu
Prime Attribute: RollNO
4 Lumbini Butwal
Non Prime Attribute: State, City
5 Gandaki Pokhara
Given Functional Dependency:
RollNoState
StateCity
Then RollNO City ( here RollNO is determining City through State) – it’s a transitive
dependency (because it is not given in the FD it is derived from RollNoStateCity)
EMPLOYEE_DETAIL
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
Super key in the table above:
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, E
MP_NAME, EMP_ZIP}....so on
444 Lan 60007 US Chicago
Candidate key: {EMP_ID}
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal Non-prime attributes: In the given table, all
attributes except EMP_ID are non-prime
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-
prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule
of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMPLOYEE EMPLOYEE_ZIP
EMP_ID EMP_NAME EMP_ZIP EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010
201010 UP Noida
333 Stephan 02228
02228 US Boston
444 Lan 60007
60007 US Chicago
555 Katharine 06389
06389 UK Norwich
666 John 462007
462007 MP Bhopal
RollNO Name VoterID Age BCNF is the advance version of 3NF. It is stricter
1 Ravi K0123 20 than 3NF.
2 Varun M034 21 A table is in BCNF if every functional dependency
3 Ravi K786 23 X → Y, X is the super key/candidate of the table.
4 Rahul D286 21 For BCNF, the table should be in 3NF, and for every
FD, LHS is super key.
Given a FD
C.K={RollNo, VoterID}
FD
RollNoName
RollNOVoterID
VoetrIDage
VoterIDROllNO
EMP_DEPT table:
EMP_DEPT EMP_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
264 283
264 300
364 232
364 549
Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will be a
multi-valued dependency.
The given STUDENT table is in 3NF, but the COURSE and HOBBY are
STU_ID COURSE HOBBY
two independent entity. Hence, there is no relationship between COURSE
21 Computer Dancing
and HOBBY.
21 Math Singing
In the STUDENT relation, a student with STU_ID, 21 contains two
34 Chemistry Dancing courses, Computer and Math and two hobbies, Dancing and Singing.
74 Biology Cricket So there is a Multi-valued dependency on STU_ID, which leads to
59 Physics Hockey unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two
tables:
STUDENT_COURSE
STUDENT_HOBBY
STU_ID COURSE
STU_ID HOBBY
21 Computer
21 Dancing
21 Math
21 Singing
34 Chemistry
34 Dancing
74 Biology
74 Cricket
59 Physics
59 Hockey
Ename ProjName DepName
Ram DBMS Sita It is quite Possible Because
Ram Java Khusi One Person may works in Many projects
And One person may have many Dependent
Ram DBMS Khusi
Ram Java Sita
• EnameProjName
• Ename DepName
• The above table is in BCNF(Ename is Super Key in both Functional
Dependency) but not in 4NF because of multivalued dependency. So
Lets Divide above table into two tables
EMP_Proj
EMP_Dep
Ename ProjName
Ename DepName
Ram DBMS
Ram Sita
Ram Java
Ram Khusi
Lossless Decomposition
• If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
• The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Dependency Preserving
• In the dependency preservation, at least one decomposed table must satisfy every
dependency.
• If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
• For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(ABC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD ABC is a part of relation R1(ABC).
Find the value of C if the value of A=1
5NF (4NF+Lossless Join) Here It List the
Rows which has
For this case we have to join Two
tables
B B Common
Cross Product Natural Join= cross product+
A B C condition (equivalence of
Common attribute )
1 2 1
2 2 2 Here 1 2 2 and
2 2 1 are
3 3 2
inconsistency Select R2.C
Lets Divide into Two table from R2
Sporous tuples ? Natural Join R1
A B B C
It selects all the
1 2 2 1 rows which are
2 2 2 2 R1.B=R2.B
3 3 3 2 Select R2.C from R2
Natural Join R1 where
(AB) (BC) R1.A=1
Here for one value of A (1) there is two value of C (1 and 2) in the above table
it’s a lossy decomposition because there is inconsistency (in this case redundancy )
• For Lossless Join
• Common Attribute should be Candidate Key or Super Key of
either R1 or R2 or Both .
• In previous example we take B as common attribute but which is
not a candidate key
• If we Divide Table into
• R1(AB)
• R2(AC)
• Condition :
• R1 Union R2=R
• R1 Intersection R2 must not be Null (Empty )
• Common Attribute should be Candidate Key or Super Key of
either R1 or R2 or Both .
• After that we will get lossless join (where there will no spurious
tuples)
1. Consider the relation R(ABCDE) and its Functional dependencies: ABC, CE, BD, EA
The relation R is decomposed into R1(BCD) and R2(ACE).
Which of the following option is correct about decomposition
a) Decomposition is lossy and dependency preserving
b) Decomposition is lossless and dependency preserving
c) Decomposition is lossy and dependency not preserving
d) Decomposition is lossless and not dependency preserving