DBMS
Introduction
Data and Information
Definition of DBMS
characteristics
Data and Information
Are same ?
Data : Raw, Unprocessed facts that needs to be
processed
Can be symbol, word, character , graph , etc.
Useless until it is organised
E.g : 35, Ram , Banglore
Information : when unprocessed data is processed to
make a meaning ful context is known as information.
Data put into context
Some conclusion can be drawn from Information
DBMS : collection of related
data
Random collection
No related collection
Examples : Online banking system, library
management system
DBMS is a collection of programs that enables users to create and
maintain the database.
DBMS = Database Programs to manage data
base
Definition :
Database : A database is an organized collection of structured
information, or data, typically stored electronically in a computer
system
Database Management System (DBMS) is a software system that
is designed to manage and organize data in a structured manner.
It allows users to create, modify, and query a database, as well as
manage the security and access controls for that database.
Characteristics of DBMS
capable of handling huge amounts of data, voluminous
data, business data
Stores data in structured form
Self explaining nature : Metadata
Transactions :Atomicity
No Anomolies
integrity : data is correct and consistent in nature
ease of data access (queries)
multiuser : concurrent access
security : unauthorised users are not given access,
authentication
distributed database systems
Controls redundancy
Need of DBMS :
1. Data organisation and management :DBMSs allow
data to be stored in a structured manner, which helps
in easier retrieval and analysis.
2. Data Security and Privacy: DBMSs provide a robust
security framework. They offer authentication and
authorization features that control access to the
database
3. Cost effectiveness : DBMSs are cost-effective
compared to traditional file-based systems. They
reduce storage costs by eliminating redundancy and
optimizing data storage. They also reduce
development costs by providing tools for database
design, maintenance, and administration
Characteristics
Need
Advantages
DBMS Architectures types :
1.2-tier architecture – Client server
architecture
2.3- tier architecture
In a 2-tier architecture, the client and the server
communicate directly with each other, and the client has direct
access to the database. The client also includes the business
logic and validation rules which can be executed in the client
machine.
Client-side application:
User interfaces and application programs run generally on the
client-side
It is generally a presentation layer that runs on a client (PC,
Mobile, Tablet, etc)
This client application establishes a connection by sending a
request to the server-side up an application in order to establish
communication between the database
Server side application
The server side is responsible for query processing and
transaction management
In this, client-side directly communicates with the database base that
is present on the server-side and the server sends a response to the
request received from the client
Simply server-side represented data is stored on a Server.
Generally, APIS like ODBC, JDBC is used for this interaction i.e for
sending requests from the client-side application.
3 tier architecture of DBMS
In Tier- 3 architecture, another layer is introduced between the client and
server. 3 Tier Architecture means 3 layers arrived. First is the application
layer, which we call directly is client layer; the second is the business layer,
and the third is the database or data layer.
First of all, the client layer is the same, which means here, all are my users
who are normal users like we are also normal users; all of us, by opening the
railway's website or the application, our machine, is called a client machine.
Now there is an interface running that helps us to make connectivity with
the database; An interface is an application that can be in Java, Python, etc.
To support that language or application, there is the business layer, which
means that the query from the client machine is processed at the business
layer. This reduces the load on the data server.
The request made by us, i.e., the client machine, first goes to the application
layer. That application layer verifies it and processes it from the high-level
language to the low-level language and then passes that simplified query to
the database server. Then the server just gives back the data to the
application layer, and it returns the data to the client machine after
converting it into the high-level language from the low-level language.
The application layer also avoids the direct interaction of the client machine
with the database server, which increases the security of the database
ER diagram
ER diagram is a flowchart shows entities such as
people, objects or concepts relate to each other
within a system.
ER diagrams are standard and logical way of
visualizing data.
They represent the structure of data in the database.
Also known as ER models
It provides useful concepts that allow the database
designers to move from an informal description of
what users want from their database, to a more
detailed and precise description that can be
implemented in a database management system.
Thus, E-R modelling is an important technique for
Components of ER
Diagram
Entities Attributes Relationships
Entity
An Entity is an ‘object’ or a ‘thing’ in the real world
with an independent existence and that is
distinguishable from other objects.
Entities are the principal data objects about which
data is to be collected.
An entity may be an object with a physical
existence such as a person, car, house, employee or
city. Or, it may be an object with a conceptual
existence such as a company, an enterprise, a job
or an event of informational interest.
Each entity has attributes.
Entities in ER diagram are represented by a
Examples of entities :
Person: STUDENT, PATIENT, EMPLOYEE, DOCTOR, ENGINEER
Place: CITY, COUNTRY, STATE
Event: SEMINAR, SALE, RENEWAL, COMPETITION
Object: BUILDING, AUTOMOBILE, MACHINE, FUNITURE, ΤΟΥ
Concept: COURSE, ACCOUNT, TRAINING CENTRE, WORK
CENTRE
Attribute
An attribute is a piece of data that describes an
entity.
Are properties or characteristics of an entity
In short, attributes describes entity or gives more
information about entity.
Attributes are represented as ellipse in ERD.
Name Class
Roll_no
Student
ER diagram is a flowchart shows entities such as people, objects or
concepts relate to each other within a system.
They represent the structure of data in the database.
Also known as ER models
Relationship
A Relationship represents the association between entities in the system
ER diagram, the relationship type is represented by a diamond and
connecting the entities with lines.
Cardinality
One to one
One to many
Many to one
Many to many
One to one
This means that one entityA relates to only one
entity and one entity B relates to one entityA
One to many
In one to many entityA relates to many entity
and entity B will relate to only one entityA
Transactions in DBMS
A transaction can be defined as a group of tasks.
When the data of users is stored in a database, that
data needs to be accessed and modified from time to
time.
This task should be performed with a specified set of
rules and in a systematic way to maintain the
consistency and integrity of the data present in a
database.
In DBMS, this task is called a transaction.
It is similar to a bank transaction, where the user
requests to withdraw some amount of money from his
account.
Example :
A’s account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
States of transactions
Active − This is the state in which a transaction is being executed.
Thus, it is like the initial state of any given transaction.
Partially Committed − A transaction is in its partially committed
state whenever it executes the final operation.
Failed − In case any check made by a database recovery system
fails, then that transaction is in a failed state. Remember that a
failed transaction can not proceed further.
Aborted − In case any check fails, leading the transaction to a
failed state, the recovery manager then rolls all its write
operations back on the database so that it can bring the DB
(database) back to the original state (the state where it actually
was prior to the transaction execution). The transactions in this
state are known to be aborted. A DB recovery module can actually
select one of these two operations after the abortion of a
transaction –Re-start Kill the transaction
Committed − We can say that a transaction is committed in case
it actually executes all of its operations successfully. In such a
case, all of its effects are now established permanently on the DB
system.
Operations in transactions
Read :read operation is used to read the value from the
database and store it in a buffer in main memory.
Write : write operation is used to write the value back
to the database from the buffer.
Commit : is used to save the work done permanently in
the database
Rollback : it is used to undo the work done.
ACID properties
Atomicity
Consistency
Isolation
Durability
Transactions should possess several properties,
often called the ACID properties
Atomicity : A transaction is an atomic unit of
processing; it should either be performed in its entirety
or not performed at all
Consistency : the database should be in a consistent
state before and after the transaction . It refers to the
correctness of the database
.
Isolation: Isolation in database refers to the ability of
a database system to allow multiple transactions to
access the same data without interfering with each
other.
A transaction should appear as though is being
executed in isolation from other transactions, even
though many transactions are executing concurrently.
That is, the execution of a transaction should not be
interfered with by any other transactions executing
concurrently.
Durability : The term "durability" in relation to
DBMS refers to the idea that if an operation is
successfully finished, the database remains in the
disc forever. The database's resilience should allow it
to continue operating even if the system malfunctions
or crashes.
The recovery manager is in charge of guaranteeing
the database's long-term viability in the event that it
is lost. Every time we make a change, we must use
the COMMIT command to commit the values.
Back up and recovery
1.Backup: back up is simply a copy. If you loose
original data you can reconstruct it using a back
up. Backup refers to storing a copy of original
data which can be used in case of data loss.
Backup is considered one of the approaches to
data protection. Important data of the
organization needs to be kept in backup efficiently
for protecting valuable data. Backup can be
achieved by storing a copy of the original data
separately or in a database on storage devices.
2. Recovery: Recovery refers to restoring lost data
by following some processes. Even if the data was
backed up still lost so it can be recovered by
Concurrency control in DBMS
When several transactions execute concurrently
without any rules and protocols, various problems
arise that may harm the data integrity of several
databases.
These problems are known as concurrency control
problems.
Therefore several rules are designed, to maintain
consistency in the transactions while they are
executing concurrently which are known as
concurrency control protocols.
A transaction is a single reasonable unit of work
that can retrieve or may change the data of a
database. Executing each transaction individually
increases the waiting time for the other
transactions and the overall execution also gets
delayed. Hence, to increase the throughput and to
reduce the waiting time, transactions are executed
concurrently.
When several transactions execute simultaneously,
then there is a risk of violation of the data integrity
of several databases. Concurrency Control in DBMS
is a procedure of managing simultaneous
transactions ensuring their atomicity, isolation,
consistency and serializability.
Concurrent execution in DBMS
In a multi-user system, multiple users can access
and use the same database at one time, which is
known as the concurrent execution of the database.
It means that the same database is executed
simultaneously on a multi-user system by different
users
Example : 2 people trying to withdraw money from
the same account at the same time from different
ATMs
Two or more events happening at the same time
using the same resources.
Activities occurring simultaneously
Purpose of Concurrency control
To enforce isolation
To preserve database consistency
To resolve conflicts (rw, wr, ww)
Locking in DBMS
In this type of protocol, any transaction cannot read
or write data until it acquires an appropriate lock on
it.
locking : a procedure used to control concurrent
access to data when one transaction is accessing the
database. A lock may deny access to other
transaction to prevent incorrect results.
Types of locks
Shared lock/read only lock : if a transaction has
shared lock on the data item, it can read the item
but cannot update it.
Exclusive lock : if a transaction has exclusive lock
Concurrency control techniques
2 phase locking protocol
Growing phase: In the growing phase, a new lock on
the data item may be acquired by the transaction, but
none can be released.
Shrinking phase: In the shrinking phase, existing
lock held by the transaction may be released, but no
new locks can be acquired.
Deadlock
A system is said to be in deadlock, if there exists a set
of transaction such that every transaction in the set is
waiting for other transaction in the set to complete its
execution
A deadlock in a database management system (DBMS)
is a situation where two or more processes are blocked,
waiting for each other to release a resource that they
need to proceed with their execution. This creates a
circular wait, where each process is waiting for a
resource that is held by another process, leading to a
complete blockage of the system.
Deadlocks can occur in multi-user, multi-tasking
environments, where multiple processes are accessing
shared resources simultaneously
In a database, when a transaction waits indefinitely to
obtain a lock, then the DBMS should detect whether
the transaction is involved in a deadlock or not.
Deadlock prevention
In a database, when a transaction waits indefinitely
to obtain a lock, then the DBMS should detect
whether the transaction is involved in a deadlock or
not.
1. Advanced locking
2. Wait and die
3. Wound wait
4. Timeout based
Wait and Die
The algorithm makes use of timestamps to prevent
deadlocks from occurring in a multi-user database
environment. The concept is that transactions with
older timestamps are given priority to access a
resource, and transactions with newer timestamps
are forced to wait.
Serializibility
Serializable
Collection of transactions
Are they serializable
RDBMS
Primary key
• The primary key is a column or a set of columns
that uniquely identifies each record in the table.
• A table can have only ONE primary key
• The primary key must have unique values for each
row, no repetition is allowed, and values of primary
key must not change
• It cannot have NULL values
Customer ID Fname Lname City
24 Ajay Rathore Mumbai
25 Rohit Sinha Mumbai
26 Sakshi Sinha Bangalore
27 Akash Verma Bangalore
28 Ajay Rathore Chennai
Foreign key
• Foreign key is a column or a set of columns that is
used to link two tables together
• When we set a foreign key we intend to link two
tables or relations
• Foreign key of one table is primary key in another
table
• Foreign key of one table helps to connect with the
primary key of another table
• Foreign key links two tables
Custom Fname Lname City ID City ID City
er ID
1 Bangalore
24 Ajay Rathore 6
2 Chennai
25 Rohit Sinha 6
3 Delhi
26 Sakshi Sinha 1
4 Hyderabad
27 Akash Verma 1
5 Kolkata
28 Ajay Rathore 6
6 Mumbai
EF Codd rules
Dr. Edgar F Codd had proposed a set of rules
after vast research to define important
characteristics and capabilities of any
relational system.
Today, codd rules are used as yardstick for
what can be expected from a conventional
relational DBMS.
There are 12 Codd rules
Example : oracle, SQL
• The codd rules can be applied to any database
system that manages stored data using only its
relational capabilities.
• If the databases follows the rules, it is called a true
relational database (RDBMS)
Rule 0: Foundational rule
A relational database management system
must manage the database entirely through
its relational capabilities.
For any system to be known as a RDBMS, it
must be able to manage its database
according to relational capabilities.
The rule ensures that these types of
databases are truly relational.
Data must be in tabular format
Rule 1: Information rule
All information is represented logically by
values in tables(relations).
Rule 2: guaranteed access rule
Every single piece of data (atomic value) may
be accessed logically from a relational
database using the combination of primary
key value, table name, and column name.
Select * from customers where city =
‘Bangalore’;
Select * from customers where custID = 3;
Tablename + primary key(row)+ attribute
name(column)
Rule 3: systematic treatment of NULL
This rule defines the systematic treatment of null
values in database records.
The NULL values in a database must be given a
systematic and uniform treatment.
Null values have various meanings in the database,
like missing data, no value in the cell, or data is not
applicable. etc.
Primary key should not be null.
Rule 4: Active/ Dynamic online
catalog based on relational model
• The structure description of the entire
database must be stored in an online
catalog, known as data dictionary.
• Online catalog should be accessed by
authorized users only.
• Users can use the same query language to
access the catalog which they use to access
the database itself.
Rule 5: comprehensive data sub language
rule
• A relational database should support a
comprehensive data sub language (like SQL) to
perform all necessary data manipulation and
retrieval operations.
• Example : you are using a search engine to find
information on the internet. You enter a query in
natural language like asking a question and the
search engine underlying database language
processes the query and retrieves relevant results.
• The comprehensive language should support the
following :
• Data definition
• View definition
• Data manipulation
DDL commands
Create table tablename
(
Column 1 datatype,
Column 2 datatype,
Column 3 datatype,
……
);
Create table Persons
(
PersonID int primary key ,
Lastname varchar(255),
Firstnam varchar(255),
address varchar(255),
city varchar(255),
);
PersonI Lastnam Firstna address city
D e me
Alter command :
The ALTER TABLE statement is used to add, delete,
or modify columns in an existing table.
ALTER TABLE table_name
ADD column_name datatype;
ALTER TABLE Persons
ADD email_ID varchar(255) ;
Drop Command
The DROP TABLE statement is used to drop an
existing table in a database.
Drop table tablename;
Drop table Persons;
DML commands are used to insert, delete, and update data in the
database.
Views in DBMS
• Views – virtual tables as opposite to base tables that
actually occupy space in memory
• Views actually do not hold the actual data and just
have the definition of the original data.
• View can be based on single table or multiple tables
Rule 6: view updating rule
• states that a relational database management
system must support the ability to update the data
stored in the database through views
• This means that any changes made to the data in a
view should be automatically reflected in the
underlying base tables, and conversely, any changes
made to the base tables should be reflected in any
views that include that data.
• Views should be treated in the same way as tables
are treated in dbms.
Rule 7: high level insert, update and delete
• A successful database system must possess the
feature of facilitating high level insertions, updates
and deletions.
• It should also support union, intersection and minus
operations in th database system to yield set of
database records.
Rule 8: Physical data independence
• The data stored in the database must be
independent of the applications that access it.
• Any change in the physical structure of a database
must not have any impact on how the data is being
accessed by external applications.
• Example location change of the database
Rule 9: logical data independence
• This rule states that The logical data in a database
must be independent of its user’s view (application).
Any change in logical data must not affect the
applications using it. For example, if two tables are
merged or one is split into two different tables, there
should be no impact or change on the user
application. This is one of the most difficult rule to
apply.
Rule 10: integrity independence
• Integrity constraints specific to a particular
relational database must be definable in the
relational data sublanguage and storable in the
catalog, not in the application programs.
• A database must be independent of the application
that uses it. All its integrity constraints can be
independently modified without the need of any
change in the application. This rule makes a
database independent of the front-end application
and its interface
Integrity constraints
• Integrity constraints are the set of predefined rules
that are used to maintain the quality of information.
Integrity constraints ensure that the data insertion,
data updating, data deleting and other processes
have to be performed in such a way that the data
integrity is not affected.
• They act as guidelines ensuring that data in the
database remain accurate and consistent.
• Examples of integrity constraints:
• Not null constraint
• Primary key constraints
• Foreign key constriants
Rule 11: Distribution independence
• Distribution Independence Rule," states that a
relational database management system must
support the ability to distribute the data across
multiple physical locations, while maintaining the
integrity and consistency of the data.
• The distribution of data across multiple locations
should be invisible to users, and the database
system should handle the distribution transparently.
• The end user must not be able to see that the data
is distributed over various locations.
• User should always get the impression that the data
is located at one site only
Rule 12: the nonsubversion rule
• The rule states that the language that we build will
be high level language. In this we will be applying
certain constraints. When we compile and convert
it to low level language it should not happen that
some integrity constraints or rules are not
violated.
• Database should work in a proper and consistent
way
Normalisation
• Normalisation is the process of minimizing
redundancy from a relation.
• Normalisation is the process of organising data in
the database.
• Redundancy in a relation may cause insertion,
deletion and update anomalies
• The normalization process makes the database more
flexible and consistent
• Normalisation works by organising the data into
multiple tables, to minimize data redundancy.
• Normalisation optimizes the database design.
Data redundancy
• Repetition of same data at multiple places
• Repetition of data increases size of the database
• The other problems because of data redundancy
are :
•Insertion problems
•Deletion problems
•Update problems
Roll no Name Branch Hod Office_tel
Mr. 2336259
1 Raju CS
johnson
Mr.
2 Mina CS 2336259
johnson
Mr.
3 Renuka CS 2336259
johnson
Mr.
4 Dina CS 2336259
johnson
Normalisation works through a series of stages called Normal Forms.
The normal form that is applied to a particular table is said to be in that
normal form
1st normal form
• This is the very basic level of normalization.
• It is the first stage of the normalization process
• In 1st normal form each table cell should contain
atomic values.
• If a relation contains multivalued attributes it is
said to be not in 1st normal form
• Entries like X, Y or W, Z are called multivalues
• It establishes the base for further normalization
• further improve the correctness and efficiency of
database systems by imposing atomic values and
forbidding recurring groupings inside rows.
Teacher’s table
Subjects
Id Name
taught
Ramesh Math,
1289
Sawant Science
5678 Shruti Shah English
History,
1267 Nikhil Das Social
Studies
Pooja
3409 Economics
Sharma
Id Name Subjects
Ramesh
1289 Math
Sawant
Ramesh
1289 Science
Sawant
5678 Shruti Shah English
1267 Nikhil Das History
Social
1267 Nikhil Das
Studies
Pooja
3409 Economics
Sharma
2nd normal form
• The table should be in 1st normal form
• To be in the second normal form, a relation must be
in the first normal form and the relation must not
contain any partial dependency.
• A relation is in 2NF if it has no partial dependency,
i.e., no non-prime attribute (attributes that are not
part of any candidate key), which is dependent on
any proper subset of any candidate key in the table.
• A relation that is in First Normal Form and every
non-primary-key attribute is fully functionally
dependent on the primary key, then the relation is in
Second Normal Form (2NF).
Partial dependencies
• A partial dependency would occur whenever a non-
prime attribute depends functionally on a part of
the given candidate key. The 2NF (Second Normal
Form) eliminates partial dependency.
employee_I Task_No Employee_N Task_Name
D ame
C01 34 Mona App
Development
C02 58 Genine UX/UI
Designing
• Here, the prime key attributes are Employee_ID and
Task_No
• As stated above, the non-prime attributes are
Employee_Name and Task_Name. These must be
dependent functionally on the part of the candidate
key so as to be Partial Dependent.
• The Employee_Name can be determined using the
Employee_ID. It actually makes the relation Dependent
Partially.
• The Task_Name can be determined using the Task_No.
It makes the relation Dependent Partially.
Solution: We decompose the tables to remove Partial Dependency along
with the violation on the second normal form
Employee table
Employee_ID Task_No Employee_Na
me
C01 34 Mona
C02 58 Genine
Task_info
Task_No Task_Name
34 App
Development
58 UX/UI
Designing