Data Security For M.tech
Data Security For M.tech
DBMS Tutorial provides basic and advanced concepts of Database. Our DBMS Tutorial is designed for
beginners and professionals both.
Our DBMS Tutorial includes all topics of DBMS such as introduction, ER model, keys, relational model, join
operation, SQL, functional dependency, transaction, concurrency control, etc.
What is Database?
The database is a collection of inter-related data which is used to retrieve, insert and delete the data efficiently.
It is also used to organize the data in the form of a table, schema, views, and reports, etc.
For example: The college Database organizes the data about the admin, staff, students and faculty etc.
Using the database, you can easily retrieve, insert, and delete the information.
o Data Definition: It is used for creation, modification, and removal of definition that defines the
organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the actual data in the
database.
o Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain data integrity,
enforcing data security, dealing with concurrency control, monitoring performance and recovering
information corrupted by unexpected failure.
Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements of the user.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data in
one single database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data among
multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database
system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of data
from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large memory
size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most of the
organization, all the data stored in a single database and if the database is damaged due to
electric failure or database corruption then the data may be lost forever.
DBMS Index
DBMS Tutorial
o DBMS Tutorial
o DBMS vs File System
o DBMS Architecture
o Three schema Architecture
o Data model schema
o Data Independence
o DBMS Language
Data modeling
o ER model concept
o Notation for ER diagram
o Mapping constraints
o DBMS Keys
o DBMS Generalization
o DBMS Specialization
o DBMS Aggregation
o Convert ER into table
o Relationship of Higher Degree
Relational data Model
o Functional Dependency
o Inference Rule
o DBMS Normalization
o DBMS 1NF
o DBMS 2NF
o DBMS 3NF
o DBMS BCNF
o DBMS 4NF
o DBMS 5NF
o Relational Decomposition
o Multivalued Dependency
o Join Dependency
o Inclusion Dependence
Transaction Processing
o Transaction
o Transaction Property
o States of Transaction
o DBMS Schedule
o Testing of Serializability
o Conflict schedule
o View Serializability
o Recoverability of Schedule
o Failure Classification
o Log-Based Recovery
o DBMS Checkpoint
o Deadlock in DBMS
Concurrency Control
o Concurrency Control
o Lock based Protocol
o Time stamping Protocol
o Validation based Protocol
o Thomas Write Rule
o Multiple Granularity
o Recovery Concurrent Transaction
File organization
o File organization
o Sequential File Organization
o Heap File Organization
o Hash File Organization
o B+ File Organization
o DBMS ISAM
o Cluster File Organization
Indexing and B+ Tree
o Indexing in DBMS
o B+ Tree
Hashing
o Hashing
o Static Hashing
o Dynamic Hashing
RAID
o RAID
SQL Introduction
o SQL Introduction
o Characteristics of SQL
o Advantage of SQL
o SQl Datatype
o SQL Command
o SQL Operator
o SQL Table
o SQL SELECT Statement
o SQL INSERT Statement
o SQL Update Statement
o SQL DELETE Statement
o SQL View
o SQL Index
o SQL Sub Queries
o SQL Clauses
o SQL Aggregate Function
o SQL JOIN
o SQL Set Operation
Interview Questions
Audience
Our DBMS Tutorial is designed to help beginners and professionals.
Problem
We assure that you will not find any problem with this DBMS Tutorial. But if there is any mistake, please post
the problem in the contact form.
next → ← prev
Types of Databases
There are various types of databases used for storing different varieties of data:
1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts the users to access the
stored data from different locations through several applications. These applications contain the authentication
process to let users access data securely. An example of a Centralized database can be Central Library that
carries a central database of each library in a college/university.
o It has decreased the risk of data management, i.e., manipulation of data will not affect the core
data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.
o The size of the centralized database is large, which increases the response time for fetching the
data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.
2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among different database
systems of an organization. These database systems are connected via communication links. Such links help
the end-users to access the data easily. Examples of the Distributed database are Apache Cassandra, HBase,
Ignite, etc.
o Homogeneous DDB: Those database systems which execute on the same operating system
and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating systems
under different application procedures, and carries different hardware devices.
3) Relational Database
This database is based on the relational data model, which stores data in the form of rows(tuple) and
columns(attributes), and together forms a table(relation). A relational database uses SQL for storing,
manipulating, as well as maintaining the data. E.F. Codd invented the database in 1970. Each table in the
database carries a key that makes the data unique from others. Examples of Relational databases are
MySQL, Microsoft SQL Server, Oracle, etc.
There are following four commonly known properties of a relational model known as ACID properties, where:
A means Atomicity: This ensures the data operation will complete either with success or with failure. It follows
the 'all or nothing' strategy. For example, a transaction will either be committed or will abort.
C means Consistency: If we perform any operation over the data, its value before and after the operation
should be preserved. For example, the account balance before and after the transaction should be correct, i.e.,
it should remain conserved.
I means Isolation: There can be concurrent users for accessing data at the same time from the database.
Thus, isolation between the data should remain isolated. For example, when multiple transactions occur at the
same time, one transaction effects should not be visible to the other transactions in the database.
D means Durability: It ensures that once it completes the operation and commits the data, data changes
should remain permanent.
4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It is not a
relational database as it stores data not only in tabular form but in several different ways. It came into existence
when the demand for building modern applications increased. Thus, NoSQL presented a wide variety of
database technologies in response to the demands. We can further divide a NoSQL database into the following
four types:
1. Key-value storage: It is the simplest type of database storage where it stores every single item
as a key (or attribute name) holding its value, together.
2. Document-oriented Database: A type of database used to store data as JSON-like document. It
helps developers in storing data by using the same document-model format as used in the
application code.
3. Graph Databases: It is used for storing vast amounts of data in a graph-like structure. Most
commonly, social networking websites use the graph database.
4. Wide-column stores: It is similar to the data represented in relational databases. Here, data is
stored in large columns together, instead of storing in rows.
o It enables good productivity in the application development as it is not required to store data in a
structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.
5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the cloud computing
platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS, etc.) for accessing the
database. There are numerous cloud platforms, but the best options are:
6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data in the database system.
The data is represented and stored as objects which are similar to the objects used in the object-oriented
programming language.
7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship nodes. Here, it organizes
data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record in the tree will contain only
one parent. On the other hand, each parent record can have multiple child records.
8) Network Databases
It is the database that typically follows the network data model. Here, the representation of data is in the form of
nodes connected via links between them. Unlike the hierarchical database, it allows each record to have
multiple children and parent nodes to form a generalized graph structure.
9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This database is basically
designed for a single user.
Advantage of Personal Database
All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL, and
Microsoft Access are based on RDBMS.
It is called Relational Database Management System (RDBMS) because it is based on the relational model
introduced by E.F. Codd.
How it works
Data is represented in terms of tuples (rows) in RDBMS.
A relational database is the most commonly used database. It contains several tables, and each table has its
primary key.
Due to a collection of an organized set of tables, data can be accessed easily in RDBMS.
What is table/Relation?
Everything in a relational database is stored in the form of relations. The RDBMS database uses tables to store
data. A table is a collection of related data entries and contains rows and columns to store data. Each table
represents some real-world objects such as person, place, or event about which information is collected. The
organized collection of data into a relational table is known as the logical view of the database.
Properties of a Relation:
ID Name AGE C
O
U
R
S
E
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Properties of a row:
1 Ajeet 24 B.Tech
What is a column/attribute?
A column is a vertical entity in the table which contains all information associated with a specific field in a table.
For example, "name" is a column in the above table which contains all information about a student's name.
Properties of an Attribute:
Name
Ajeet
Aryan
Mahesh
Ratan
ID Name AGE C
O
U
R
S
E
B
.
T
1 Ajeet 24
e
c
h
Degree:
The total number of attributes that comprise a relation is known as the degree of the table.
For example, the student table has 4 attributes, and its degree is 4.
1 Ajeet 24 B.Tech
2 Aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Cardinality:
The total number of tuples at any one time in a relation is known as the table's cardinality. The relation whose
cardinality is 0 is called an empty table.
For example, the student table has 5 rows, and its cardinality is 5.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Domain:
The domain refers to the possible values each attribute can contain. It can be specified using standard data
types such as integers, floating numbers, etc. For example, An attribute entitled Marital_Status may be limited
to married or unmarried values.
NULL Values
The NULL value of the table specifies that the field has been left blank during record creation. It is different
from the value filled with zero or a field that contains space.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:
Domain integrity: It enforces valid entries for a given column by restricting the type, the format, or the range of
values.
Referential integrity specifies that rows cannot be deleted, which are used by other records.
User-defined integrity: It enforces some specific business rules defined by users. These rules are different
from the entity, domain, or referential integrity.
Next
The main differences between DBMS and RDBMS are given below:
Example of RDBMS
Examples of DBMS are file
9) are mysql, postgre, sql
systems, xml etc.
server, oracle etc.
After observing the differences between DBMS and RDBMS, you can say that RDBMS is an extension of
DBMS. There are many software products in the market today who are compatible for both DBMS and
RDBMS. Means today a RDBMS application is DBMS application and vice-versa.
This article is based on the concept of ACID properties in DBMS that are
necessary for maintaining data consistency, integrity, and reliability while
performing transactions in the database. Let’s explore them.
A transaction is a single logical unit of work that accesses and possibly
modifies the contents of a database. Transactions access data using read-and-
write operations. To maintain consistency in a database, before and after the
transaction, certain properties are followed. These are called ACID properties.
Atomicity:
By this, we mean that either the entire transaction takes place at once or
doesn’t happen at all. There is no midway i.e. transactions do not occur
partially. Each transaction is considered as one unit and either runs to
completion or is not executed at all. It involves the following two operations.
— Abort : If a transaction aborts, changes made to the database are not visible.
— Commit : If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Aiming for a top All India Rank in the GATE CS/IT or GATE DA 2026
exam but unsure about your preparation level?
We've got you covered! Our GATE Courses for CSE & DA at GeeksforGeeks
are designed to give you the competitive edge you need. Get live classes from
GATE experts (Khaleel Sir, Chandan Jha Sir, Vijay Agarwal Sir, and many
others), practice problems, doubt-support, previous year questions,
quizzes, all India mock tests, and much more - all in one place.
Consider the following transaction T consisting of T1 and T2 : Transfer of 100
from account X to account Y .
If the transaction fails after completion of T1 but before completion of T2 .( say,
after write(X) but before write(Y) ), then the amount has been deducted
from X but not added to Y . This results in an inconsistent database state.
Therefore, the transaction must be executed in its entirety in order to ensure
the correctness of the database state.
Consistency:
This means that integrity constraints must be maintained so that the database
is consistent before and after the transaction. It refers to the correctness of a
database. Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700 .
Total after T occurs = 400 + 300 = 700 .
Therefore, the database is consistent . Inconsistency occurs in
case T1 completes but T2 fails. As a result, T is incomplete.
Isolation:
This property ensures that multiple transactions can occur concurrently without
leading to the inconsistency of the database state. Transactions occur
independently without interference. Changes occurring in a particular
transaction will not be visible to any other transaction until that particular
change in that transaction is written to memory or has been committed. This
property ensures that the execution of transactions concurrently will result in a
state that is equivalent to a state achieved these were executed serially in some
order.
Let X = 500, Y = 500.
Consider two transactions T and T”.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result,
interleaving of operations takes place due to which T’’ reads the correct value
of X but the incorrect value of Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450) .
This results in database inconsistency, due to a loss of 50 units. Hence,
transactions must take place in isolation and changes should be visible only
after they have been made to the main memory.
Durability:
This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and
they persist even if a system failure occurs. These updates now become
permanent and are stored in non-volatile memory. The effects of the
transaction, thus, are never lost.
Some important points:
Property Responsibility for maintaining properties
Visit Course
Get ready to boost your rank and secure an exceptional GATE 2025
score with confidence!
Our GATE CS & IT Test Series 2025 offers 60 PYQs Quizzes, 60 Subject-
Wise Mock Tests, 4500+ PYQs and practice questions, and over 20 Full-
Length Mock Tests that ensure you’re well-prepared to tackle the toughest
questions and secure a top-rank in the GATE 2025 exam. Get personalized
insights with student rankings based on performance and benefit from expert-
designed tests created by industry pros and GATE CS toppers.
Plus, don’t miss out on these exclusive features:
--> All India Mock Test
--> Live GATE CSE Mentorship Classes
--> Live Doubt Solving Sessions
Join now and stay ahead in your GATE 2025 journey!
Implementation of Locking in DBMS
It consists of an array of disks in which multiple disks are connected to achieve different
goals.
RAID technology
There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.
Example:
20 21 22 23
24 25 26 27
28 29 30 31
32 33 34 35
In this level, instead of placing just one block into a disk at a time, we can work with two or
more blocks placed it into a disk before moving on to the next one.
20 22 24 26
21 23 25 27
28 30 32 34
29 31 33 35
In this above figure, there is no duplication of data. Hence, a block once lost cannot be
recovered.
Pros of RAID 0:
o In this level, throughput is increased because multiple data requests probably not
on the same disk.
o This level full utilizes the disk space and provides high performance.
o It requires minimum 2 drives.
Cons of RAID 0:
o It doesn't contain any error detection mechanism.
o The RAID 0 is not a true RAID because it is not fault-tolerance.
o In this level, failure of either disk results in complete data loss in respective array.
RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive 2. It provides
100% redundancy in case of a failure.
Example:
A A B B
C C D D
E E F F
G G H H
Only half space of the drive is used to store the data. The other half of drive is just a mirror
to the already stored data.
Pros of RAID 1:
o The main advantage of RAID 1 is fault tolerance. In this level, if one disk fails,
then the other automatically takes over.
o In this level, the array will function even if any one of the drives fails.
Cons of RAID 1:
o In this level, one extra drive is required per drive for mirroring, so the expense is
higher.
RAID 2
o RAID 2 consists of bit-level striping using hamming code parity. In this level, each
data bit in a word is recorded on a separate disk and ECC code of data words is
stored on different set disks.
o Due to its high cost and complex structure, this level is not commercially used.
This same performance can be achieved by RAID 3 at a lower cost.
Pros of RAID 2:
o This level uses one designated drive to store parity.
o It uses the hamming code for error detection.
Cons of RAID 2:
o It requires an additional drive for error detection.
RAID 3
o RAID 3 consists of byte-level striping with dedicated parity. In this level, the parity
information is stored for each disk section and written to a dedicated parity drive.
o In case of drive failure, the parity drive is accessed, and data is reconstructed
from the remaining devices. Once the failed drive is replaced, the missing data
can be restored on the new drive.
o In this level, data can be transferred in bulk. Thus high-speed data transmission
is possible.
A B C P(A, B, C)
D E F P(D, E, F)
G H I P(G, H, I)
J K L P(J, K, L)
Pros of RAID 3:
o In this level, data is regenerated using parity drive.
o It contains high data transfer rates.
o In this level, data is accessed in parallel.
Cons of RAID 3:
o It required an additional drive for parity.
o It gives a slow performance for operating on small sized files.
RAID 4
o RAID 4 consists of block-level stripping with a parity disk. Instead of duplicating
data, the RAID 4 adopts a parity-based approach.
o This level allows recovery of at most 1 disk failure due to the way parity works. In
this level, if more than one disk fails, then there is no way to recover the data.
o Level 3 and level 4 both are required at least three disks to implement RAID.
A B C P0
D E F P1
G H I P2
J K L P3
In this level, parity can be calculated using an XOR function. If the data bits are 0,0,0,1 then
the parity bits is XOR(0,1,0,0) = 1. If the parity bits are 0,0,1,1 then the parity bit is
XOR(0,0,1,1)= 0. That means, even number of one results in parity 0 and an odd number of
one results in parity 1.
C1 C2 C3 C4 Parity
0 1 0 0 1
0 0 1 1 0
Suppose that in the above figure, C2 is lost due to some disk failure. Then using the values
of all the other columns and the parity bit, we can recompute the data bit stored in C2. This
level allows us to recover lost data.
RAID 5
o RAID 5 is a slight modification of the RAID 4 system. The only difference is that
in RAID 5, the parity rotates among the drives.
o It consists of block-level striping with DISTRIBUTED parity.
o Same as RAID 4, this level allows recovery of at most 1 disk failure. If more than
one disk fails, then there is no way for data recovery.
0 1 2 3 P0
5 6 7 P1 4
10 11 P2 8 9
15 P3 12 13 14
P4 16 17 18 19
This level was introduced to make the random write performance better.
Pros of RAID 5:
o This level is cost effective and provides high performance.
o In this level, parity is distributed across the disks in an array.
o It is used to make the random write performance better.
Cons of RAID 5:
o In this level, disk failure recovery takes longer time as parity has to be calculated
from all available drives.
o This level cannot survive in concurrent drive failure.
RAID 6
o This level is an extension of RAID 5. It contains block-level stripping with 2 parity
bits.
o In RAID 6, you can survive 2 concurrent disk failures. Suppose you are using
RAID 5, and RAID 1. When your disks fail, you need to replace the failed disk
because if simultaneously another disk fails then you won't be able to recover
any of the data, so in this case RAID 6 plays its part where you can survive two
concurrent disk failures before you run out of options.
A0 B0 Q0 P0
A1 Q1 P1 D1
Q2 P2 C2 D2
P3 B3 C3 Q3
Pros of RAID 6:
o This level performs RAID 0 to strip data and RAID 1 to mirror. In this level,
stripping is performed before mirroring.
o In this level, drives required should be multiple of 2.
Cons of RAID 6:
o It is not utilized 100% disk capability as half is used for mirroring.
o It contains very limited scalability.
ACID Property is the most important part of the database. ACID stands for Atomicity
Consistency Isolation Durability.
Atomicity:
This means that “all or nothing”. When an update occurs to a database either all or none of
the update will become available to anyone beyond the user. This update to the database is
called a transaction and it either commits or aborts.
Consistency:
It ensures that any changes to values in an instance are consistent with changes to other
values in the same instance.
Isolation:
Isolation is is needed when there are concurrent transactions. Concurrent transactions are
transactions that occur at the same time, such as shared multiple users accessing shared
objects.
An important concept to understanding isolation through transactions is serializability.
Transactions are serializable when the effect on the database is the same whether the
transactions are executed in serial order or in an interleaved fashion.
Durability:
Normalization:
Normalization is a technique which is used to organize the data in the database. It is a
systematic approach to remove the data redundancy. Normalization is mainly used for two
purpose,
To remove data redundancy.
Without normalization 3 anomalies occurred and it becomes difficult to handle and update
data. To understand these anomalies let’s take an Student table
ID Name Address Subject
201 Akshay Jaipur Maths
202 Charu Bombay Bio
203 Disha Banglore Physics
204 Eva Noida Maths
Updation Anamoly – To update address of a student who occurs twice or more than twice
in a table, we will have to updateAddress column in all the rows, else data will become
inconsistent.
Insertion Anamoly – Suppose for a new admission, we have a Student id, name and
address of a student but if student has not opted for any subject yet then we have to
insertNULL there, leading to Insertion Anamoly.
Deletion Anamoly – If id 401 has only one subject and temporarily he drops it, when we
delete that row, entire student record will be deleted along with id.
Normalization Form:
BCNF
EMPLOYEE table:
7272826385,
14 John UP
9064738238
7390372389,
12 Sam Punjab
8589830302
PauseNext
Mute
Duration 18:10
Loaded: 1.84%
Fullscreen
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
Method 1:
To remove the repeating values for a column, the EMPLOYEE table was converted to a flat
relation EMPLOYEE_1 table by repeating the pair (EMP_ID, EMP_NAME) for every entry in
the table. Now the new relation does not contain any non-atomic values so the table is said
to be normalized and is in First Normal Form.
Method 2:
Another method is to remove the attributes that violate 1NF and place it in a separate
relation along with primary key. So the unnormalized relation, EMPLOYEE table is
decomposed into two sub-relations EMP_DETAILS and EMP_PERFORMANCE
EMP_DETAILS
EMP_ID EMP_NAME
14 John
20 Harry
12 Sam
EMP_PERFORMANCE
14 7272826385 UP
14 9064738238 UP
20 8574783832 Bihar
12 7390372389 Punjab
12 8589830302 Punjab
The main idea of decomposing the relations is to keep the different types of information in
their separate relation as first normal form disallows multivalve attribute that are composite
in nature. In the EMP_DETAILS relation, the attribute (EMP_ID) acts as a primary key and
in the EMP_PERFORMANCE relation the attributes (EMP_ID, EMP_PHONE) act as a
primary key. Now it satisfies both the conditions for a relation to be in 1NF.
The third method of normalizing a unnormalized relation into 1NF will be explained with
following example where skills of an employee of some company are fixed. Suppose an
employee can have maximum of five skills.
EMP_SKILL relation
EMP_ID Skill
14 DBMS, C, C++
20 JAVA, C
Here the EMP_SKILL relation is not 1NF as the skill attribute contains a set of values. So
to remove this problem, we define multiple Skill columns as shown.
14 DBMS C C++ - -
20 JAVA C - - -
MS
12 DBMS HTML VB -
OFFICE
The above representation is in 1NF but this technique is not preferred as it may cause
problems such as:
o It would be difficult to query the relation. For Example, it would be difficult to
answer the queries like “which employee share a skill?”, “Which employees have
skill C?”
o Restriction of employee skills to 5. If employee with more skills appears, it would
be left unrecorded.
To sum up, all the three approaches are correct because they transform any unnormalized
table into a first normal form table. However, the second approach where table is
decomposed into relations is more efficient as minimizes the duplicacy of the data. So for a
relation to be in first normal form, each set of repeating groups should appear in its own
table and every relation should have a primary key.
Whereas the first normal form was concerned with the structure of the representation of
relation, the second normal form is concerned with the eliminating redundancy in these
relations.
o Insertion anomaly
o Deletion anomaly
o Updation anomaly
These anomalies have got their name from the relational operations they perform on a
relation.
4253 C 15 175
4154 IT 30 200
4256 C 50 175
So neither the Order_No nor B_Name can contain null values because it is against the
principle of entity integrity rule. So we cannot insert the information of a new book whose
order has not been placed yet because in that case, the attribute Order_No will contain null
value which is against the entity integrity rule i.e. primary key cannot null values. This is
shown in following figure:
4253 C 15 175
4154 IT 30 200
4256 C 50 175
Operating
NULL 30 300
System
Relations which exhibit such kind of undesirable property are said to suffer from insertion
anomaly.
Deletion anomaly: Suppose that an order whose order number is 4154 is cancelled due to
certain reasons. Therefore, we would have to delete this order information from the
ORDER_BOOK relation.
As we can see from the relation that this particular order contain information about the book
whose name is “IT”. So on deletion of this record from the relation would result in loss of
information about the “IT” book. This may lead to loss of vital information as it is the only
record which contains information about the book “IT”. But if we try to remove any other
record from the relation then it would cause no problem as it still contains information of the
book in other record.
For Example: Deleting record whose Order_No = 4154 as shown in following figure:
4253 C 15 175
4256 C 50 175
Relations which exhibit such kind of undesirable property are said to suffer from deletion
anomaly.
Updation Anomaly: Modifying some values in the relations may also prove cumbersome.
Suppose that if the price of the book C is modified to 190 then every tuple referring to this
book have to be updated and multiple updating always carries some risk of inconsistencies.
In the ORDER_BOOK relation, the updation seems to be very easy because it contains only
two tuples having B_price as 175. But if in case relation has thousands of tuples containing
a large number of redundant data, the updations may lead to inconsistency as humans are
prone to errors.
Relations which exhibit such kind of undesirable property are said to suffer from updation
anomaly.
The above considerations leads us to a conclusion that relations in 1NF have undesirable
data manipulation properties hence bringing relation to 1NF would not terminate logical
database design. Further transformations are needed to eliminate this kind of anomalies
form a set of original relations. So this bring us the concept of second normal form.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
To discuss the various anomalies, we will consider the STUDENT relation that holds
information about students and teachers.
In the above table, Stu_Id is the primary key which acts as the roll number of the student.
Since the STUDENT relation is composed of only one attribute which acts as a primary key
(Stu_Id) so it is in 2NF. But it suffers from the insertion, deletion and updation anomalies
which are explained as follows.
Insertion anomaly: Suppose that we want to insert a a new record with some information
about a new teacher who has not yet been assigned a personal student. But this insertion
record is not allowed because the primary key Stu_Id contains a nullvalue which is not
possible as it is against the entity integrity rule.
For Example: Suppose that we want to insert information about a new teacher ‘Mayank’
having Teach_Id = ‘206’ Teach_Qual = ‘MCA‘who has not yet been allotted any student.
This is not possible as Stu_Id will contain a null value.
Deletion anomaly: Suppose that a student whose Stu_Id = 1768 decides to leave the
college, so we would have to delete this tuple from the STUDENT relation.
As we can see from the relation that this particular student is the last student of the teacher
whose Teach_Id = ‘205’. Thus on deleting this tuple, the information about the teacher
would also be deleted. This may lead to vital information. This is the deletion anomaly.
There would be no deletion problem if the student who decides to leave the college is not
the last student of the particular teacher.
For Example: Deleting student record with Stu_Id = 2523 will not lead to deletion of teacher
information whose Teach_Id = ‘201’ because it is present elsewhere.
Stu_Id Stu_Name Teach_Id Teach_Name Teach_Qual
Updation Anomaly: The second normal form also suffers from updation anomaly.
For Example: The value of the qualifications of the teacher i.e. Teach_Qual whose
Teach_Id = ‘204’ is updated from MCA to Ph.D. This would be quite a big problem as the
updation in the tuple will have to be made where ever this information reoccurs. Although
this relation is having few tuples so it would be quite a big problem here but normally a
teacher, teaches many students. So in case of huge databases it will be a big problem and
may lead to inconsistencies as human are prone to errors.
The above considerations leads us to a conclusion that relation in 2NF have undesirable
data manipulation properties hence bringing a relation to 2NF would not terminate logical
database design. Further transformations are needed to eliminate these kinds of anomalies
from an original relation. So this brings us to a concept of the Third normal form.
If A → B and B → C then A → C
For example:
Student_details table
ID Name Subject DOB Address Mobile No. City
Address Table:
ID Address DOB Mobile No. City
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
To discuss the various anomalies, we will consider the STAFF relation that holds
information about the staff, the equipment key they have been allocated and the language
in which they are fluent.
STAFF Relation:
STAFF Relation:
Anurag PC English
Anurag PC French
Kapil PC English
Kapil PC French
Kapil PC Japanese
In the above relation, every STAFF has two independent sets of features associated with it.
The STAFF relation has a primary key composed of S_Name, Equipment and Language.
There is no transitive dependency, so the relation STAFF is in 3NF. But it stills suffers from
the insertion, deletion and updation anomalies which are explained as follows.
Insertion anomaly: Suppose an Anurag learns a new language Japanese then we will
have to insert two new records into the STAFF relation. Similarly, if the Equipment values
corresponding to staff Anurag changes in numbers from 2 to 5, then with introduction of new
language Japanese we will have to insert multiple records into the STAFF relation which
results in redundancy.
Deletion anomaly: Let us suppose that the name of the staff changed from Anurag to
Anuraj here sue to some reasons then multiple records need to be updated which may
result in inconsistency of data.
Updation Anomaly: Let us suppose that an staff Kapil has his equipment PC deallocated,
then all the information about his languages skills would be lost due to deletion of these
records which may result in loss of vital information.
All these anomalies are encountered due to the presence of multivalued dependency which
is removed in fourth normal form. The concept of multivalued dependency and 4NF will be
explained later.
This is a higher version of third normal form. This form deals with certain type of anamoly
that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate
keys is said to be in BCNF. For a table to be in BCNF, following conditions must be
satisfied:
R must be in 3rd Normal Form
1 2 4
2 2 3
3 2 3
4 3 4
Output Table :
For the above relation, σ(c>3)R will select the tuples which have c more than 3.
A B C
1 2 4
4 3 4
Note: The selection operator only selects the required tuples but does not
display them. For display, the data projection operator is used.
2. Projection(π) : While Selection operation works on rows , similarly
projection operation of relational algebra works on columns. It basically allows
you to pick specific columns from a given relational table based on the given
condition and ignoring all the other remaining columns.
Illustration : If we are interested in the specific columns from the relational
tables, we would prefer to use the Projection Operator.
It is used to project required column data from a relation.
Example: Consider Table 1. Suppose we want columns B and C from Relation R.
2 4
2 3
3 4
Note: By Default, projection removes duplicate data.
3. Union(U) : Union Operator is basically used to combine the results of two
queries into a single result. The only condition is that both queries must return
same number of columns with same data types.
Illustration : If in case we want a list of all the employee from two different
department . Then in that case we should use union operation to merge both
the list from two different department.
Union operation in relational algebra is the same as union operation in set
theory.
Example: FRENCH
Student_Name Roll_Number
Ram 01
Mohan 02
Vivek 13
Geeta 17
GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
Consider the following table of Students having different optional subjects in
their course.
π(Student_Name)FRENCH U π(Student_Name)GERMAN
Student_Name
Ram
Mohan
Vivek
Geeta
Shyam
Rohan
Note: The only constraint in the union of two relations is that both relations
must have the same set of Attributes.
4. Set Difference(-) : Set difference basically provides the rows that are
present in one table , but not in another tables.
Illustration : If you have two lists of employees, one from the IT Department
and one from the Marketing department & you want to know which employees
are in IT but not in Marketing , the set difference operator would used.
Set Difference in relational algebra is the same set difference operation as in set
theory.
Example: From the above table of FRENCH and GERMAN, Set Difference is used
as follows
π(Student_Name)FRENCH - π(Student_Name)GERMAN
Student_Name
Ram
Mohan
Note: The only constraint in the Set Difference between two relations is that
both relations must have the same set of Attributes.
5. Set Intersection(∩) : Set Intersection basically allows to fetches only those
rows of data that are common between two sets of relational tables.
Illustration : If we want to find the number of employees who work in both IT
and Marketing Department , then in that case we use Intersection Operator.
Set Intersection in relational algebra is the same set intersection operation in
set theory.
Example: From the above table of FRENCH and GERMAN, the Set Intersection is
used as follows:
π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN
Student_Name
Vivek
Geeta
Note: The only constraint in the Set Difference between two relations is that
both relations must have the same set of Attributes.
6. Rename(ρ) : Rename operator basically allows you to give a temporary
name to a specific relational table or to its columns. It is very useful when we
want to avoid ambiguity, especially in complex Queries.
Rename is a unary operation used for renaming attributes of a relation.
ρ(a/b)R will rename the attribute 'b' of the relation by 'a'.
7. Cross Product(X) : Cartesian product Operator combines every row of one
table with every row of another table , producing all the possible combination.
It’s mostly used as a precursor to more complex operation like joins.
Cross-product between two relations. Let’s say A and B, so the cross product
between A X B will result in all the attributes of A followed by each attribute of
B. Each record of A will pair with every record of B.
Example:
A
Name Age Sex
Ram 14 M
Sona 15 F
Kim 20 M
B
ID Course
1 DS
2 DBMS
AXB
Name Age Sex ID Course
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.
Derived Operators
These are some of the derived operators, which are derived from the
1. Natural Join(⋈)
fundamental operators.
2. Conditional Join
1. Natural Join(⋈): Natural join is a binary operator. Natural join between two or
more relations will result in a set of all combinations of tuples where they have
an equal common attribute.
Example:
EMP
Name ID Dept_Name
A 120 IT
B 125 HR
C 110 Sales
D 111 IT
DEPT
Dept_Name Manager
Sales Y
Production Z
IT A
Natural join between EMP and DEPT with condition :
EMP ⋈ DEPT
EMP.Dept_Name = DEPT.Dept_Name
A 120 IT A
C 110 Sales Y
D 111 IT A
2. Conditional Join: Conditional join works similarly to natural join. In natural
join, by default condition is equal between common attributes while in
conditional join we can specify any condition such as greater than, less than, or
not equal.
Example:
R
ID Sex Marks
1 F 45
2 F 55
3 F 60
S
ID Sex Marks
10 M 20
11 M 22
12 M 59
Join between R and S with condition R.marks >= S.marks
R.ID R.Sex R.Marks S.ID S.Sex S.Marks
1 F 45 10 M 20
1 F 45 11 M 22
2 F 55 10 M 20
2 F 55 11 M 22
3 F 60 10 M 20
3 F 60 11 M 22
3 F 60 12 M 59
Relational Calculus
As Relational Algebra is a procedural query language, Relational Calculus is a
non-procedural query language. It basically deals with the end results. It always
tells me what to do but never tells me how to do it.
There are two types of Relational Calculus
1. Tuple Relational Calculus(TRC)
2. Domain Relational Calculus(DRC)
In-depth articles:
Basic-operators-in-relational-algebra
Extended Relational Algebra Operators
Following are the Previous Year’s Gate Questions
https://www.geeksforgeeks.org/gate-gate-cs-2012-question-50/
https://www.geeksforgeeks.org/gate-gate-cs-2012-question-43/
Conclusion
Relational algebra might sound theoretical, but it’s a very powerful tool for
understanding how databases work behind the scene. It fundamentally permits
us to break down the complex database queries into a smaller , more
manageable queries. Whether you’re filtering data, combining tables , or
renaming columns , these operators serve as building blocks for working with
relational databases effectively.
Learning relational algebra will provides you with high level of understanding
regarding how to optimize your database queries and structure data for better
performance and utilization of resources.
Frequently Asked Questions on Relational Algebra in
DBMS -FAQs
What is a relational database ?
A relational database is basically a type of database where data is stored in the
form of tables, which look similar to spreadsheets. Each table has rows and
columns: rows represent individual records ( like a person or an ordered data ) &
columns represent attributes ( like name, and address). Since they are
connected with each other through shared data, it makes it allowing for more
organized storage and easier retrieval of information
What is the relational model ?
Relational model is the theory behind how relational database are organized. In
this model, data is organized into tables, with each table representing a specific
type of information ( like customer or order ). The model also defines how these
tables relate to one another, typically through shared keys ( based on primary
key or candidate keys) or identifier.
What is the main difference between Selection Operation and
Projection Operation?
Selection Operation is basically used to filter out rows based on certain
condition, while on other hand the projection operations of Relational algebra
selects the specific columns from a table.
How is cartesian product different from a Join Operations?
Cartesian product is used to form all the possible combinations of rows from two
tables, while Join typically used to combine rows based on a related column (like
foreign key)
Why is Relational Algebra important?
Relational algebra helps in designing efficient database queries. It provides a
clear framework for understanding how data is retrieved and manipulated in
relational databases.
The Relational Model is a way of structuring data using relations, which are a
collection of tuples that have the same attributes. Relational Algebra is a
procedural query language that takes relations as input and returns relations as
output. It uses a set of operators to manipulate and retrieve data from these
relations. Here, we’ll explore the basic operators of Relational Algebra using the
STUDENT_SPORTS, EMPLOYEE, and STUDENT relations from Table 1, Table 2,
and Table 3, respectively.
Table 1: STUDENT_SPORTS
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
Table 2: EMPLOYEE
ADDRES
EMP_NO NAME PHONE AGE
S
RAMES
2 GURGAON 9652431543 18
H
1 RAM
2 RAMESH
3 SUJIT
4 SURESH
Note: If the resultant relation after projection has duplicate rows, it will be
removed. For Example ∏(ADDRESS)(STUDENT) will remove one duplicate row with
the value DELHI and return three rows.
3. Cross Product(X)
Cross product is used to join two relations. For every row of Relation1, each row
of Relation2 is concatenated. If Relation1 has m tuples and and Relation2 has n
tuples, cross product of Relation1 and Relation2 will have m X n tuples. Syntax:
Relation1 X Relation2
To apply Cross Product on STUDENT relation given in Table 1 and
STUDENT_SPORTS relation given in Table 2,
STUDENT X STUDENT_SPORTS
RESULT:
4. Union (U)
Union on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible (These two relations should have the same number of attributes
and corresponding attributes in two relations have the same domain). Union
operator when applied on two relations R1 and R2 will give a relation with tuples
that are either in R1 or in R2. The tuples which are in both R1 and R2 will appear
only once in the result relation. Syntax:
Relation1 U Relation2
Find the person who is either student or employees, we can use Union operators
like:
STUDENT U EMPLOYEE
RESULT:
RAMES
2 GURGAON 9652431543 18
H
RAMES
2 GURGAON 9652431543 18
H
6. Rename(ρ)
Rename operator is used to giving another name to a relation. Syntax:
ρ(Relation2, Relation1)
To rename STUDENT relation to STUDENT1, we can use rename operator like:
ρ(STUDENT1, STUDENT)
If you want to create a relation STUDENT_NAMES with ROLL_NO and NAME from
STUDENT, it can be done using rename operator as:
ρ(STUDENT_NAMES, ∏(ROLL_NO, NAME)(STUDENT))
Extended Relational Algebra Operators
Intersection (∩)
Division (÷)
Join Operations (⋈)
Natural Join
Theta Join
Equi Join
These operators provide more functionality for complex queries in relational
databases.
Conclusion
Relational Algebra provides a fundamental toolkit for querying and manipulating
relations in a database. It offers a set of operators that allow users to perform
basic operations like selection, projection, union, and more. Understanding
these operators is essential for working with relational databases and forms the
basis for more advanced SQL operations.
Basic Operators in Relational Algebra – FAQs
What is the difference between Relational Algebra and SQL?
Relational Algebra is a procedural query language that specifies how to retrieve
data, while SQL is a declarative language that specifies what data to retrieve.
SQL implementations are influenced by the principles of Relational Algebra.
Why is Relational Algebra important?
It provides a formal foundation for relational databases and is used for query
optimization, ensuring that database operations are performed efficiently.
Can Relational Algebra handle complex queries?
Yes, by combining basic operators, complex queries can be constructed to
handle intricate data retrieval and manipulation tasks.
The Relational Model is a way of structuring data using relations, which are a
collection of tuples that have the same attributes. Relational Algebra is a
procedural query language that takes relations as input and returns relations as
output. It uses a set of operators to manipulate and retrieve data from these
relations. Here, we’ll explore the basic operators of Relational Algebra using the
STUDENT_SPORTS, EMPLOYEE, and STUDENT relations from Table 1, Table 2,
and Table 3, respectively.
Table 1: STUDENT_SPORTS
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
Table 2: EMPLOYEE
ADDRES
EMP_NO NAME PHONE AGE
S
RAMES
2 GURGAON 9652431543 18
H
1 RAM
2 RAMESH
3 SUJIT
4 SURESH
Note: If the resultant relation after projection has duplicate rows, it will be
removed. For Example ∏(ADDRESS)(STUDENT) will remove one duplicate row with
the value DELHI and return three rows.
3. Cross Product(X)
Cross product is used to join two relations. For every row of Relation1, each row
of Relation2 is concatenated. If Relation1 has m tuples and and Relation2 has n
tuples, cross product of Relation1 and Relation2 will have m X n tuples. Syntax:
Relation1 X Relation2
To apply Cross Product on STUDENT relation given in Table 1 and
STUDENT_SPORTS relation given in Table 2,
STUDENT X STUDENT_SPORTS
RESULT:
4. Union (U)
Union on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible (These two relations should have the same number of attributes
and corresponding attributes in two relations have the same domain). Union
operator when applied on two relations R1 and R2 will give a relation with tuples
that are either in R1 or in R2. The tuples which are in both R1 and R2 will appear
only once in the result relation. Syntax:
Relation1 U Relation2
Find the person who is either student or employees, we can use Union operators
like:
STUDENT U EMPLOYEE
RESULT:
RAMES
2 GURGAON 9652431543 18
H
RAMES
2 GURGAON 9652431543 18
H
6. Rename(ρ)
Rename operator is used to giving another name to a relation. Syntax:
ρ(Relation2, Relation1)
To rename STUDENT relation to STUDENT1, we can use rename operator like:
ρ(STUDENT1, STUDENT)
If you want to create a relation STUDENT_NAMES with ROLL_NO and NAME from
STUDENT, it can be done using rename operator as:
ρ(STUDENT_NAMES, ∏(ROLL_NO, NAME)(STUDENT))
Extended Relational Algebra Operators
Intersection (∩)
Division (÷)
Join Operations (⋈)
Natural Join
Theta Join
Equi Join
These operators provide more functionality for complex queries in relational
databases.
Conclusion
Relational Algebra provides a fundamental toolkit for querying and manipulating
relations in a database. It offers a set of operators that allow users to perform
basic operations like selection, projection, union, and more. Understanding
these operators is essential for working with relational databases and forms the
basis for more advanced SQL operations.
Basic Operators in Relational Algebra – FAQs
What is the difference between Relational Algebra and SQL?
Relational Algebra is a procedural query language that specifies how to retrieve
data, while SQL is a declarative language that specifies what data to retrieve.
SQL implementations are influenced by the principles of Relational Algebra.
Why is Relational Algebra important?
It provides a formal foundation for relational databases and is used for query
optimization, ensuring that database operations are performed efficiently.
Can Relational Algebra handle complex queries?
Yes, by combining basic operators, complex queries can be constructed to
handle intricate data retrieval and manipulation tasks.
Both these tables are connected by one common key (column) i.e ROLL_NO.
We can perform a JOIN operation using the given SQL query:
Query:
SELECT s.roll_no, s.name, s.address, s.phone, s.age, sc.course_id
FROM Student s
JOIN StudentCourse sc ON s.roll_no = sc.roll_no;
Output
ROLL_N
NAME ADDRESS PHONE AGE COURSE_ID
O
Syntax
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Key Terms
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
FULL JOIN Example
This example demonstrates the use of a FULL JOIN, which combines the results
of both LEFT JOIN and RIGHT JOIN. The query retrieves all rows from
the Student and StudentCourse tables. If a record in one table does not have
a matching record in the other table, the result set will include that record
with NULL values for the missing fields
Query:
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
FULL JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output
NAME COURSE_ID
HARSH 1
NAME COURSE_ID
PRATIK 2
RIYANKA 2
DEEP 3
SAPTARHI 1
DHANRAJ NULL
ROHIT NULL
NIRAJ NULL
NULL 4
NULL 5
NULL 4
1 Ram 10
2 Jon 30
Employee
3 Bob 50
Department
Dept_id Dept_name
10 IT
30 HR
40 TIS
Problem: Find all Employees and their respective departments.
Solution Query: (Employee) ? (Department)
Emp_i
d Emp_name Dept_id Dept_id Dept_name
1 Ram 10 10 IT
2 Jon 30 30 HR
Conclusion
SQL joins are essential tools for anyone working with relational databases.
Understanding the different types of joins in SQL, like INNER JOIN, LEFT
OUTER JOIN, RIGHT JOIN, and FULL JOIN, allows us to combine and query
data effectively. With the examples and syntax covered here, we should feel
confident applying these SQL join types to our data to retrieve meaningful
observations and manage complex queries with ease. Use these SQL join
techniques to streamline our data handling and enhance our SQL skills.
FAQs
What are the 4 types of join SQL?
In SQL, the four main types of joins are: