0% found this document useful (0 votes)
38 views75 pages

Data Security For M.tech

The document provides a comprehensive overview of Database Management Systems (DBMS), covering fundamental concepts, types of databases, and their characteristics, advantages, and disadvantages. It explains various database types including centralized, distributed, relational, NoSQL, cloud, object-oriented, hierarchical, network, personal, operational, and enterprise databases. Additionally, it introduces the Relational Database Management System (RDBMS) and its historical context, emphasizing the organization of data in tables and the importance of ACID properties.

Uploaded by

Ritik chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views75 pages

Data Security For M.tech

The document provides a comprehensive overview of Database Management Systems (DBMS), covering fundamental concepts, types of databases, and their characteristics, advantages, and disadvantages. It explains various database types including centralized, distributed, relational, NoSQL, cloud, object-oriented, hierarchical, network, personal, operational, and enterprise databases. Additionally, it introduces the Relational Database Management System (RDBMS) and its historical context, emphasizing the organization of data in tables and the importance of ACID properties.

Uploaded by

Ritik chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 75

CD532 Data Security

DBMS Tutorial provides basic and advanced concepts of Database. Our DBMS Tutorial is designed for
beginners and professionals both.

Database management system is software that is used to manage the database.

Our DBMS Tutorial includes all topics of DBMS such as introduction, ER model, keys, relational model, join
operation, SQL, functional dependency, transaction, concurrency control, etc.

What is Database?
The database is a collection of inter-related data which is used to retrieve, insert and delete the data efficiently.
It is also used to organize the data in the form of a table, schema, views, and reports, etc.

For example: The college Database organizes the data about the admin, staff, students and faculty etc.

Using the database, you can easily retrieve, insert, and delete the information.

Database Management System


o Database management system is software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in different
applications.
o DBMS provides an interface to perform various operations like database creation, storing data in
it, updating data, creating a table in the database and a lot more.
o It provides protection and security to the database. In the case of multiple users, it also maintains
data consistency.
DBMS allows users the following tasks:

o Data Definition: It is used for creation, modification, and removal of definition that defines the
organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the actual data in the
database.
o Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain data integrity,
enforcing data security, dealing with concurrency control, monitoring performance and recovering
information corrupted by unexpected failure.

Characteristics of DBMS
o It uses a digital repository established on a server to store and manage the information.
o It can provide a clear and logical view of the process that manipulates data.
o DBMS contains automatic backup and recovery procedures.
o It contains ACID properties which maintain data in a healthy state in case of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints according to the requirements of the user.

Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data in
one single database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data among
multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database
system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of data
from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces

Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large memory
size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most of the
organization, all the data stored in a single database and if the database is damaged due to
electric failure or database corruption then the data may be lost forever.

DBMS Index

DBMS Tutorial

o DBMS Tutorial
o DBMS vs File System
o DBMS Architecture
o Three schema Architecture
o Data model schema
o Data Independence
o DBMS Language
Data modeling

o ER model concept
o Notation for ER diagram
o Mapping constraints
o DBMS Keys
o DBMS Generalization
o DBMS Specialization
o DBMS Aggregation
o Convert ER into table
o Relationship of Higher Degree
Relational data Model

o Relational Model concept


o Relational Algebra
o Join Operation
o Integrity Constraints
o Relational Calculus
Normalization

o Functional Dependency
o Inference Rule
o DBMS Normalization
o DBMS 1NF
o DBMS 2NF
o DBMS 3NF
o DBMS BCNF
o DBMS 4NF
o DBMS 5NF
o Relational Decomposition
o Multivalued Dependency
o Join Dependency
o Inclusion Dependence

Transaction Processing

o Transaction
o Transaction Property
o States of Transaction
o DBMS Schedule
o Testing of Serializability
o Conflict schedule
o View Serializability
o Recoverability of Schedule
o Failure Classification
o Log-Based Recovery
o DBMS Checkpoint
o Deadlock in DBMS
Concurrency Control

o Concurrency Control
o Lock based Protocol
o Time stamping Protocol
o Validation based Protocol
o Thomas Write Rule
o Multiple Granularity
o Recovery Concurrent Transaction
File organization

o File organization
o Sequential File Organization
o Heap File Organization
o Hash File Organization
o B+ File Organization
o DBMS ISAM
o Cluster File Organization
Indexing and B+ Tree

o Indexing in DBMS
o B+ Tree
Hashing

o Hashing
o Static Hashing
o Dynamic Hashing

RAID

o RAID
SQL Introduction

o SQL Introduction
o Characteristics of SQL
o Advantage of SQL
o SQl Datatype
o SQL Command
o SQL Operator
o SQL Table
o SQL SELECT Statement
o SQL INSERT Statement
o SQL Update Statement
o SQL DELETE Statement
o SQL View
o SQL Index
o SQL Sub Queries
o SQL Clauses
o SQL Aggregate Function
o SQL JOIN
o SQL Set Operation
Interview Questions

o DBMS Interview Questions


o SQL Interview Questions
Prerequisite
Before learning DBMS Tutorial, you must have the basic knowledge of Basic Database.

Audience
Our DBMS Tutorial is designed to help beginners and professionals.

Problem
We assure that you will not find any problem with this DBMS Tutorial. But if there is any mistake, please post
the problem in the contact form.

next → ← prev

Types of Databases
There are various types of databases used for storing different varieties of data:

1) Centralized Database
It is the type of database that stores data at a centralized database system. It comforts the users to access the
stored data from different locations through several applications. These applications contain the authentication
process to let users access data securely. An example of a Centralized database can be Central Library that
carries a central database of each library in a college/university.

Advantages of Centralized Database

o It has decreased the risk of data management, i.e., manipulation of data will not affect the core
data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data standards.
o It is less costly because fewer vendors are required to handle the data sets.

Disadvantages of Centralized Database

o The size of the centralized database is large, which increases the response time for fetching the
data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.

2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed among different database
systems of an organization. These database systems are connected via communication links. Such links help
the end-users to access the data easily. Examples of the Distributed database are Apache Cassandra, HBase,
Ignite, etc.

We can further divide a distributed database system into:

o Homogeneous DDB: Those database systems which execute on the same operating system
and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating systems
under different application procedures, and carries different hardware devices.

Advantages of Distributed Database


o Modular development is possible in a distributed database, i.e., the system can be expanded by
including new computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.

3) Relational Database
This database is based on the relational data model, which stores data in the form of rows(tuple) and
columns(attributes), and together forms a table(relation). A relational database uses SQL for storing,
manipulating, as well as maintaining the data. E.F. Codd invented the database in 1970. Each table in the
database carries a key that makes the data unique from others. Examples of Relational databases are
MySQL, Microsoft SQL Server, Oracle, etc.

Properties of Relational Database

There are following four commonly known properties of a relational model known as ACID properties, where:

A means Atomicity: This ensures the data operation will complete either with success or with failure. It follows
the 'all or nothing' strategy. For example, a transaction will either be committed or will abort.

C means Consistency: If we perform any operation over the data, its value before and after the operation
should be preserved. For example, the account balance before and after the transaction should be correct, i.e.,
it should remain conserved.

I means Isolation: There can be concurrent users for accessing data at the same time from the database.
Thus, isolation between the data should remain isolated. For example, when multiple transactions occur at the
same time, one transaction effects should not be visible to the other transactions in the database.

D means Durability: It ensures that once it completes the operation and commits the data, data changes
should remain permanent.

4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of data sets. It is not a
relational database as it stores data not only in tabular form but in several different ways. It came into existence
when the demand for building modern applications increased. Thus, NoSQL presented a wide variety of
database technologies in response to the demands. We can further divide a NoSQL database into the following
four types:
1. Key-value storage: It is the simplest type of database storage where it stores every single item
as a key (or attribute name) holding its value, together.
2. Document-oriented Database: A type of database used to store data as JSON-like document. It
helps developers in storing data by using the same document-model format as used in the
application code.
3. Graph Databases: It is used for storing vast amounts of data in a graph-like structure. Most
commonly, social networking websites use the graph database.
4. Wide-column stores: It is similar to the data represented in relational databases. Here, data is
stored in large columns together, instead of storing in rows.

Advantages of NoSQL Database

o It enables good productivity in the application development as it is not required to store data in a
structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.

5) Cloud Database
A type of database where data is stored in a virtual environment and executes over the cloud computing
platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS, etc.) for accessing the
database. There are numerous cloud platforms, but the best options are:

o Amazon Web Services(AWS)


o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
o Google Cloud SQL, etc.

6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing data in the database system.
The data is represented and stored as objects which are similar to the objects used in the object-oriented
programming language.

7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship nodes. Here, it organizes
data in a tree-like structure.

Data get stored in the form of records that are connected via links. Each child record in the tree will contain only
one parent. On the other hand, each parent record can have multiple child records.

8) Network Databases
It is the database that typically follows the network data model. Here, the representation of data is in the form of
nodes connected via links between them. Unlike the hierarchical database, it allows each record to have
multiple children and parent nodes to form a generalized graph structure.

9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This database is basically
designed for a single user.
Advantage of Personal Database

o It is simple and easy to handle.


o It occupies less storage space as it is small in size.

10) Operational Database


The type of database which creates and updates the database in real-time. It is basically designed for
executing and handling the daily data operations in several businesses. For example, An organization uses
operational databases for managing per day transactions.

11) Enterprise Database


Large organizations or enterprises use this database for managing a massive amount of data. It helps
organizations to increase and improve their efficiency. Such a database allows simultaneous access to users.

Advantages of Enterprise Database:

o Multi processes are supportable over the Enterprise database.


o It allows executing parallel queries on the system.

What is RDBMS (Relational Database Management


System)
RDBMS stands for Relational Database Management System.

All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL, and
Microsoft Access are based on RDBMS.

It is called Relational Database Management System (RDBMS) because it is based on the relational model
introduced by E.F. Codd.

How it works
Data is represented in terms of tuples (rows) in RDBMS.

A relational database is the most commonly used database. It contains several tables, and each table has its
primary key.

Due to a collection of an organized set of tables, data can be accessed easily in RDBMS.

Brief History of RDBMS


From 1970 to 1972, E.F. Codd published a paper to propose using a relational database model.
RDBMS is originally based on E.F. Codd's relational model invention.

Following are the various terminologies of RDBMS:

What is table/Relation?
Everything in a relational database is stored in the form of relations. The RDBMS database uses tables to store
data. A table is a collection of related data entries and contains rows and columns to store data. Each table
represents some real-world objects such as person, place, or event about which information is collected. The
organized collection of data into a relational table is known as the logical view of the database.

Properties of a Relation:

o Each relation has a unique name by which it is identified in the database.


o Relation does not contain duplicate tuples.
o The tuples of a relation have no specific order.
o All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one value.
A table is the simplest example of data stored in RDBMS.

Let's see the example of the student table.

ID Name AGE C
O
U
R
S
E

1 Ajeet 24 B.Tech

2 aryan 20 C.A
3 Mahesh 21 BCA

4 Ratan 22 MCA

5 Vimal 26 BSC

What is a row or record?


A row of a table is also called a record or tuple. It contains the specific information of each entry in the table. It
is a horizontal entity in the table. For example, The above table contains 5 records.

Properties of a row:

o No two tuples are identical to each other in all their entries.


o All tuples of the relation have the same format and the same number of entries.
o The order of the tuple is irrelevant. They are identified by their content, not by their position.
Let's see one record/row in the table.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

What is a column/attribute?
A column is a vertical entity in the table which contains all information associated with a specific field in a table.
For example, "name" is a column in the above table which contains all information about a student's name.

Properties of an Attribute:

o Every attribute of a relation must have a name.


o Null values are permitted for the attributes.
o Default values can be specified for an attribute automatically inserted if no other value is specified
for an attribute.
o Attributes that uniquely identify each tuple of a relation are the primary key.

Name

Ajeet

Aryan

Mahesh
Ratan

What is data item/Cells?


The smallest unit of data in the table is the individual data item. It is stored at the intersection of tuples and
attributes.

Properties of data items:

o Data items are atomic.


o The data items for an attribute should be drawn from the same domain.
In the below example, the data item in the student table consists of Ajeet, 24 and Btech, etc.

ID Name AGE C
O
U
R
S
E

B
.
T
1 Ajeet 24
e
c
h

Degree:

The total number of attributes that comprise a relation is known as the degree of the table.

For example, the student table has 4 attributes, and its degree is 4.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

2 Aryan 20 C.A

3 Mahesh 21 BCA
4 Ratan 22 MCA

5 Vimal 26 BSC

Cardinality:

The total number of tuples at any one time in a relation is known as the table's cardinality. The relation whose
cardinality is 0 is called an empty table.

For example, the student table has 5 rows, and its cardinality is 5.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

2 aryan 20 C.A

3 Mahesh 21 BCA

4 Ratan 22 MCA

5 Vimal 26 BSC

Domain:

The domain refers to the possible values each attribute can contain. It can be specified using standard data
types such as integers, floating numbers, etc. For example, An attribute entitled Marital_Status may be limited
to married or unmarried values.

NULL Values

The NULL value of the table specifies that the field has been left blank during record creation. It is different
from the value filled with zero or a field that contains space.

Data Integrity

There are the following categories of data integrity exist with each RDBMS:

Entity integrity: It specifies that there should be no duplicate rows in a table.

Domain integrity: It enforces valid entries for a given column by restricting the type, the format, or the range of
values.
Referential integrity specifies that rows cannot be deleted, which are used by other records.

User-defined integrity: It enforces some specific business rules defined by users. These rules are different
from the entity, domain, or referential integrity.

Next

Difference between DBMS and RDBMS


Although DBMS and RDBMS both are used to store information in physical database but there are some
remarkable differences between them.

The main differences between DBMS and RDBMS are given below:

No. DBMS RDBMS

DBMS applications store data RDBMS applications store data in a


1)
as file. tabular form.

In DBMS, data is generally In RDBMS, the tables have an identifier


2) stored in either a hierarchical called primary key and the data values are
form or a navigational form. stored in the form of tables.

Normalization is not present


3) Normalization is present in RDBMS.
in DBMS.

RDBMS defines the integrity


DBMS does not apply any
constraint for the purpose of ACID
4) security with regards to data
(Atomocity, Consistency, Isolation and
manipulation.
Durability) property.

in RDBMS, data values are stored in the


DBMS uses file system to
form of tables, so a relationship between
5) store data, so there will be no
these data values will be stored in the
relation between the tables.
form of a table as well.

RDBMS system supports a tabular


DBMS has to provide some
structure of the data and a relationship
6) uniform methods to access
between them to access the stored
the stored information.
information.

DBMS does not support


7) RDBMS supports distributed database.
distributed database.
DBMS is meant to be for
RDBMS is designed to handle large
small organization and deal
8) amount of data. it supports multiple
with small data. it
users.
supports single user.

Example of RDBMS
Examples of DBMS are file
9) are mysql, postgre, sql
systems, xml etc.
server, oracle etc.

After observing the differences between DBMS and RDBMS, you can say that RDBMS is an extension of
DBMS. There are many software products in the market today who are compatible for both DBMS and
RDBMS. Means today a RDBMS application is DBMS application and vice-versa.

ACID Properties in DBMS




This article is based on the concept of ACID properties in DBMS that are
necessary for maintaining data consistency, integrity, and reliability while
performing transactions in the database. Let’s explore them.
A transaction is a single logical unit of work that accesses and possibly
modifies the contents of a database. Transactions access data using read-and-
write operations. To maintain consistency in a database, before and after the
transaction, certain properties are followed. These are called ACID properties.
Atomicity:
By this, we mean that either the entire transaction takes place at once or
doesn’t happen at all. There is no midway i.e. transactions do not occur
partially. Each transaction is considered as one unit and either runs to
completion or is not executed at all. It involves the following two operations.
— Abort : If a transaction aborts, changes made to the database are not visible.
— Commit : If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Aiming for a top All India Rank in the GATE CS/IT or GATE DA 2026
exam but unsure about your preparation level?
We've got you covered! Our GATE Courses for CSE & DA at GeeksforGeeks
are designed to give you the competitive edge you need. Get live classes from
GATE experts (Khaleel Sir, Chandan Jha Sir, Vijay Agarwal Sir, and many
others), practice problems, doubt-support, previous year questions,
quizzes, all India mock tests, and much more - all in one place.
Consider the following transaction T consisting of T1 and T2 : Transfer of 100
from account X to account Y .
If the transaction fails after completion of T1 but before completion of T2 .( say,
after write(X) but before write(Y) ), then the amount has been deducted
from X but not added to Y . This results in an inconsistent database state.
Therefore, the transaction must be executed in its entirety in order to ensure
the correctness of the database state.
Consistency:
This means that integrity constraints must be maintained so that the database
is consistent before and after the transaction. It refers to the correctness of a
database. Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700 .
Total after T occurs = 400 + 300 = 700 .
Therefore, the database is consistent . Inconsistency occurs in
case T1 completes but T2 fails. As a result, T is incomplete.
Isolation:
This property ensures that multiple transactions can occur concurrently without
leading to the inconsistency of the database state. Transactions occur
independently without interference. Changes occurring in a particular
transaction will not be visible to any other transaction until that particular
change in that transaction is written to memory or has been committed. This
property ensures that the execution of transactions concurrently will result in a
state that is equivalent to a state achieved these were executed serially in some
order.
Let X = 500, Y = 500.
Consider two transactions T and T”.

Suppose T has been executed till Read (Y) and then T’’ starts. As a result,
interleaving of operations takes place due to which T’’ reads the correct value
of X but the incorrect value of Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450) .
This results in database inconsistency, due to a loss of 50 units. Hence,
transactions must take place in isolation and changes should be visible only
after they have been made to the main memory.
Durability:
This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and
they persist even if a system failure occurs. These updates now become
permanent and are stored in non-volatile memory. The effects of the
transaction, thus, are never lost.
Some important points:
Property Responsibility for maintaining properties

Atomicity Transaction Manager

Consistency Application programmer

Isolation Concurrency Control Manager

Durability Recovery Manager


The ACID properties, in totality, provide a mechanism to ensure the correctness
and consistency of a database in a way such that each transaction is a group of
operations that acts as a single unit, produces consistent results, acts in
isolation from other operations, and updates that it makes are durably stored.
ACID properties are the four key characteristics that define the reliability and
consistency of a transaction in a Database Management System (DBMS). The
acronym ACID stands for Atomicity, Consistency, Isolation, and Durability. Here
is a brief description of each of these properties:
1. Atomicity: Atomicity ensures that a transaction is treated as a single,
indivisible unit of work. Either all the operations within the transaction are
completed successfully, or none of them are. If any part of the transaction
fails, the entire transaction is rolled back to its original state, ensuring data
consistency and integrity.
2. Consistency: Consistency ensures that a transaction takes the database
from one consistent state to another consistent state. The database is in a
consistent state both before and after the transaction is executed.
Constraints, such as unique keys and foreign keys, must be maintained to
ensure data consistency.
3. Isolation: Isolation ensures that multiple transactions can execute
concurrently without interfering with each other. Each transaction must be
isolated from other transactions until it is completed. This isolation prevents
dirty reads, non-repeatable reads, and phantom reads.
4. Durability: Durability ensures that once a transaction is committed, its
changes are permanent and will survive any subsequent system failures. The
transaction’s changes are saved to the database permanently, and even if
the system crashes, the changes remain intact and can be recovered.
Overall, ACID properties provide a framework for ensuring data consistency,
integrity, and reliability in DBMS. They ensure that transactions are executed in
a reliable and consistent manner, even in the presence of system failures,
network issues, or other problems. These properties make DBMS a reliable and
efficient tool for managing data in modern organizations.
Advantages of ACID Properties in DBMS
1. Data Consistency: ACID properties ensure that the data remains consistent
and accurate after any transaction execution.
2. Data Integrity: ACID properties maintain the integrity of the data by
ensuring that any changes to the database are permanent and cannot be lost.
3. Concurrency Control: ACID properties help to manage multiple transactions
occurring concurrently by preventing interference between them.
4. Recovery: ACID properties ensure that in case of any failure or crash, the
system can recover the data up to the point of failure or crash.
Disadvantages of ACID Properties in DBMS
1. Performance: The ACID properties can cause a performance overhead in the
system, as they require additional processing to ensure data consistency and
integrity.
2. Scalability: The ACID properties may cause scalability issues in large
distributed systems where multiple transactions occur concurrently.
3. Complexity: Implementing the ACID properties can increase the complexity
of the system and require significant expertise and resources.
Overall, the advantages of ACID properties in DBMS outweigh the
disadvantages. They provide a reliable and consistent approach to data
management, ensuring data integrity, accuracy, and reliability. However, in
some cases, the overhead of implementing ACID properties can cause
performance and scalability issues. Therefore, it’s important to balance the
benefits of ACID properties against the specific needs and requirements of
the system.
Conclusion
In conclusion, ACID properties involve four properties i.e. Atomicity, Consistency,
Isolation and Durability that are responsible for data consistency, integrity and
reliability in DBMS. Atomicity ensures that a transaction is rolled back if any part
of it fails. Consistency ensures that the database remains consistent before and
after the transaction. Isolation ensures that one transaction is not affected by
the other. Durability ensures that the changes introduced by a particular
transaction persist even after a system failure.
ACID Properties in DBMS – FAQs
What is the importance of ACID properties?
ACID properties ensure data consistency, integrity and reliability in DBMS. They
manage multiple transactions that are occurring concurrently and help to
recover data if any system failures occur. Thus, they play an important role.
How can we achieve Atomicity in transactions?
To achieve atomicity, a transaction should be treated as a single unit. If at any
point of transaction, execution fails then the whole transaction should be rolled
back by undoing the changes made.
What are some drawbacks of ACID Properties in DBMS?
ACID properties cause performance overhead because of additional processing
required for maintaining data integrity and consistency. This also increases the
complexity of the system, so we require more significant expertise and
resources.

ACID Properties in DBMS

Visit Course

ACID Properties in DBMS

Transaction and Concurrency Control | ACID Pro...

Get ready to boost your rank and secure an exceptional GATE 2025
score with confidence!
Our GATE CS & IT Test Series 2025 offers 60 PYQs Quizzes, 60 Subject-
Wise Mock Tests, 4500+ PYQs and practice questions, and over 20 Full-
Length Mock Tests that ensure you’re well-prepared to tackle the toughest
questions and secure a top-rank in the GATE 2025 exam. Get personalized
insights with student rankings based on performance and benefit from expert-
designed tests created by industry pros and GATE CS toppers.
Plus, don’t miss out on these exclusive features:
--> All India Mock Test
--> Live GATE CSE Mentorship Classes
--> Live Doubt Solving Sessions
Join now and stay ahead in your GATE 2025 journey!
Implementation of Locking in DBMS


Locking protocols are used in database management systems as a means of


concurrency control. Multiple transactions may request a lock on a data item
simultaneously. Hence, we require a mechanism to manage the locking requests
made by transactions. Such a mechanism is called a Lock Manager. It relies on
the process of message passing where transactions and lock manager exchange
messages to handle the locking and unlocking of data items.
Data Structure in Lock Manager
The data structure required for the implementation of locking is called a Lock
table.
1. It is a hash table where the names of data items are used as a hashing index.
2. Each locked data item has a linked list associated with it.
3. Every node in the linked list represents the transaction requested for lock, the
mode of lock requested (mutual/exclusive), and the current status of the
request (granted/waiting).
4. Every new lock request for the data item will be added to the end of the
linked list as a new node.
5. Collisions in the hash table are handled by the technique of separate
chaining.
Consider the following example of a lock table:
Aiming for a top All India Rank in GATE CS & IT 2025 exam, but not sure
where you stand?
We’ve got you covered! FREE GATE CS & IT Test Series - 2025 is designed to
give you the edge you need. With previous year questions, subject-
wise and full-length mock tests, and All India Mock Test, you can get a real
feel of the exam. Plus, get our live mentorship classes with experts and
attend live doubt-solving sessions to clear all your queries.
Explanation: In the above figure, the locked data items present in the lock
table are 5, 47, 167, and 15. The transactions which have requested for lock
have been represented by a linked list shown below them using a downward
arrow. Each node in the linked list has the name of the transaction which has
requested the data item like T33, T1, T27, etc. The color of the node represents
the status i.e. whether the lock has been granted or waiting. Note that a
collision has occurred for data items 5 and 47. It has been resolved by separate
chaining where each data item belongs to a linked list. The data item is acting
as a header for a linked list containing the locking request.
Working as Lock Manager
 Initially, the lock table is empty as no data item is locked.
 Whenever the lock manager receives a lock request from a transaction T i on a
particular data item Qi following cases may arise:
o If Qi is not already locked, a linked list will be created and a lock will
be granted to the requesting transaction Ti.
o If the data item is already locked, a new node will be added at the
end of its linked list containing the information about the request
made by Ti.
 If the lock mode requested by Ti is compatible with the lock mode of the
transaction currently having the lock, T i will acquire the lock too and the
status will be changed to ‘granted’. Else, the status of T i’s safety will be
‘waiting’.
 If a transaction Ti wants to unlock the data item it is currently holding, it will
send an unlock request to the lock manager. The lock manager will delete T i’s
node from this linked list. The lock will be granted to the next transaction in
the list.
 Sometimes transaction Ti may have to be aborted. In such a case all the
waiting requests made by Ti will be deleted from the linked lists present in the
lock table. Once abortion is complete, locks held by T i will also be released.
Advantages of Locking
 Data Consistency: Locking can help ensure data consistency by preventing
multiple users from modifying the same data simultaneously. By controlling
access to shared resources, locking can help prevent data conflicts and
ensure that the database remains in a consistent state.
 Isolation: Locking can ensure that transactions are executed in isolation
from other transactions, preventing interference between transactions and
reducing the risk of data inconsistencies.
 Granularity: Locking can be implemented at different levels of granularity,
allowing for more precise control over shared resources. For example, row-
level locking can be used to lock individual rows in a table, while table-level
locking can be used to lock entire tables.
 Availability: Locking can help ensure the availability of shared resources by
preventing users from monopolizing resources or causing resource starvation.
Disadvantages of Locking
 Overhead: Locking requires additional overhead, such as acquiring and
releasing locks on shared resources. This overhead can lead to slower
performance and increased resource consumption, particularly in systems
with high levels of concurrency.
 Deadlocks: Deadlocks can occur when two or more transactions are waiting
for each other to release resources, causing a circular dependency that can
prevent any of the transactions from completing. Deadlocks can be difficult to
detect and resolve and can result in reduced throughput and increased
latency.
 Reduced Concurrency: Locking can limit the number of users or
applications accessing the database simultaneously. This can lead to reduced
concurrency and slower performance in systems with high levels of
concurrency.
 Complexity: Implementing locking can be complex, particularly in
distributed systems or in systems with complex transactional logic. This
complexity can lead to increased development and maintenance costs.

RAID (Redundant Array of Independent Disk)


RAID refers to redundancy array of the independent disk. It is a technology which is used to
connect multiple secondary storage devices for increased performance, data redundancy or
both. It gives you the ability to survive one or more drive failure depending upon the RAID
level used.

It consists of an array of disks in which multiple disks are connected to achieve different
goals.

RAID technology
There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.

These levels contain the following characteristics:

o It contains a set of physical disk drives.


o In this technology, the operating system views these separate disks as a single
logical disk.
o In this technology, data is distributed across the physical drives of the array.
o Redundancy disk capacity is used to store parity information.
o In case of disk failure, the parity information can be helped to recover the data.

Standard RAID levels


RAID 0
o RAID level 0 provides data stripping, i.e., a data can place across multiple disks.
It is based on stripping that means if one disk fails then all data in the array is
lost.
o This level doesn't provide fault tolerance but increases the system performance.

Example:

Disk 0 Disk 1 Disk 2 Disk 3

20 21 22 23

24 25 26 27

28 29 30 31

32 33 34 35

In this figure, block 0, 1, 2, 3 form a stripe.

In this level, instead of placing just one block into a disk at a time, we can work with two or
more blocks placed it into a disk before moving on to the next one.

Disk 0 Disk 1 Disk 2 Disk 3

20 22 24 26
21 23 25 27

28 30 32 34

29 31 33 35

In this above figure, there is no duplication of data. Hence, a block once lost cannot be
recovered.

Pros of RAID 0:
o In this level, throughput is increased because multiple data requests probably not
on the same disk.
o This level full utilizes the disk space and provides high performance.
o It requires minimum 2 drives.

Cons of RAID 0:
o It doesn't contain any error detection mechanism.
o The RAID 0 is not a true RAID because it is not fault-tolerance.
o In this level, failure of either disk results in complete data loss in respective array.

RAID 1
This level is called mirroring of data as it copies the data from drive 1 to drive 2. It provides
100% redundancy in case of a failure.

Example:

Disk 0 Disk 1 Disk 2 Disk 3

A A B B

C C D D

E E F F
G G H H

Only half space of the drive is used to store the data. The other half of drive is just a mirror
to the already stored data.

Pros of RAID 1:
o The main advantage of RAID 1 is fault tolerance. In this level, if one disk fails,
then the other automatically takes over.
o In this level, the array will function even if any one of the drives fails.

Cons of RAID 1:
o In this level, one extra drive is required per drive for mirroring, so the expense is
higher.

RAID 2
o RAID 2 consists of bit-level striping using hamming code parity. In this level, each
data bit in a word is recorded on a separate disk and ECC code of data words is
stored on different set disks.
o Due to its high cost and complex structure, this level is not commercially used.
This same performance can be achieved by RAID 3 at a lower cost.

Pros of RAID 2:
o This level uses one designated drive to store parity.
o It uses the hamming code for error detection.

Cons of RAID 2:
o It requires an additional drive for error detection.

RAID 3
o RAID 3 consists of byte-level striping with dedicated parity. In this level, the parity
information is stored for each disk section and written to a dedicated parity drive.
o In case of drive failure, the parity drive is accessed, and data is reconstructed
from the remaining devices. Once the failed drive is replaced, the missing data
can be restored on the new drive.
o In this level, data can be transferred in bulk. Thus high-speed data transmission
is possible.

Disk 0 Disk 1 Disk 2 Disk 3

A B C P(A, B, C)

D E F P(D, E, F)

G H I P(G, H, I)

J K L P(J, K, L)

Pros of RAID 3:
o In this level, data is regenerated using parity drive.
o It contains high data transfer rates.
o In this level, data is accessed in parallel.

Cons of RAID 3:
o It required an additional drive for parity.
o It gives a slow performance for operating on small sized files.

RAID 4
o RAID 4 consists of block-level stripping with a parity disk. Instead of duplicating
data, the RAID 4 adopts a parity-based approach.
o This level allows recovery of at most 1 disk failure due to the way parity works. In
this level, if more than one disk fails, then there is no way to recover the data.
o Level 3 and level 4 both are required at least three disks to implement RAID.

Disk 0 Disk 1 Disk 2 Disk 3

A B C P0

D E F P1
G H I P2

J K L P3

In this figure, we can observe one disk dedicated to parity.

In this level, parity can be calculated using an XOR function. If the data bits are 0,0,0,1 then
the parity bits is XOR(0,1,0,0) = 1. If the parity bits are 0,0,1,1 then the parity bit is
XOR(0,0,1,1)= 0. That means, even number of one results in parity 0 and an odd number of
one results in parity 1.

C1 C2 C3 C4 Parity

0 1 0 0 1

0 0 1 1 0

Suppose that in the above figure, C2 is lost due to some disk failure. Then using the values
of all the other columns and the parity bit, we can recompute the data bit stored in C2. This
level allows us to recover lost data.

RAID 5
o RAID 5 is a slight modification of the RAID 4 system. The only difference is that
in RAID 5, the parity rotates among the drives.
o It consists of block-level striping with DISTRIBUTED parity.
o Same as RAID 4, this level allows recovery of at most 1 disk failure. If more than
one disk fails, then there is no way for data recovery.

Disk 0 Disk 1 Disk 2 Disk 3 Disk 4

0 1 2 3 P0

5 6 7 P1 4
10 11 P2 8 9

15 P3 12 13 14

P4 16 17 18 19

This figure shows that how parity bit rotates.

This level was introduced to make the random write performance better.

Pros of RAID 5:
o This level is cost effective and provides high performance.
o In this level, parity is distributed across the disks in an array.
o It is used to make the random write performance better.

Cons of RAID 5:
o In this level, disk failure recovery takes longer time as parity has to be calculated
from all available drives.
o This level cannot survive in concurrent drive failure.

RAID 6
o This level is an extension of RAID 5. It contains block-level stripping with 2 parity
bits.
o In RAID 6, you can survive 2 concurrent disk failures. Suppose you are using
RAID 5, and RAID 1. When your disks fail, you need to replace the failed disk
because if simultaneously another disk fails then you won't be able to recover
any of the data, so in this case RAID 6 plays its part where you can survive two
concurrent disk failures before you run out of options.

Disk 1 Disk 2 Disk 3 Disk 4

A0 B0 Q0 P0

A1 Q1 P1 D1
Q2 P2 C2 D2

P3 B3 C3 Q3

Pros of RAID 6:
o This level performs RAID 0 to strip data and RAID 1 to mirror. In this level,
stripping is performed before mirroring.
o In this level, drives required should be multiple of 2.

Cons of RAID 6:
o It is not utilized 100% disk capability as half is used for mirroring.
o It contains very limited scalability.

ACID Properties & Normalization


ACID Properties

ACID Property is the most important part of the database. ACID stands for Atomicity
Consistency Isolation Durability.
Atomicity:

This means that “all or nothing”. When an update occurs to a database either all or none of
the update will become available to anyone beyond the user. This update to the database is
called a transaction and it either commits or aborts.
Consistency:

It ensures that any changes to values in an instance are consistent with changes to other
values in the same instance.
Isolation:

Isolation is is needed when there are concurrent transactions. Concurrent transactions are
transactions that occur at the same time, such as shared multiple users accessing shared
objects.
An important concept to understanding isolation through transactions is serializability.
Transactions are serializable when the effect on the database is the same whether the
transactions are executed in serial order or in an interleaved fashion.
Durability:

Maintaining updates of committed transactions is important. These updates must never be


lost. The ACID property of durability addresses this need. Durability refers to the ability of
the system to recover committed transaction updates if either the system or the storage
media fails.

 Normalization:
Normalization is a technique which is used to organize the data in the database. It is a
systematic approach to remove the data redundancy. Normalization is mainly used for two
purpose,
 To remove data redundancy.

 Ensuring data dependencies is proper.

Without normalization 3 anomalies occurred and it becomes difficult to handle and update
data. To understand these anomalies let’s take an Student table
ID Name Address Subject
201 Akshay Jaipur Maths
202 Charu Bombay Bio
203 Disha Banglore Physics
204 Eva Noida Maths

 Updation Anamoly – To update address of a student who occurs twice or more than twice
in a table, we will have to updateAddress column in all the rows, else data will become
inconsistent.
 Insertion Anamoly – Suppose for a new admission, we have a Student id, name and
address of a student but if student has not opted for any subject yet then we have to
insertNULL there, leading to Insertion Anamoly.
 Deletion Anamoly – If id 401 has only one subject and temporarily he drops it, when we
delete that row, entire student record will be deleted along with id.

Get 100% Hike!


Master Most in Demand Skills Now!
By providing your contact details, you agree to our Terms of Use Privacy Policy

Normalization Form:

Normalization Rules are divided into 4 normal forms.


 First Normal Form

 Second Normal Form

 Third Normal Form

 BCNF

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and
their combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

7272826385,
14 John UP
9064738238

20 Harry 8574783832 Bihar

7390372389,
12 Sam Punjab
8589830302

The above EMPLOYEE table is an unnormalized relation as it contains multiple values


corresponding to EMP_PHONE attribute i.e. these values are non-atomic. So relations with
multi value entries are called unnormalized relations.
To overcome this problem, we have to eliminate the non atomic values of EMP_PHONE
attribute.

PauseNext
Mute

Current Time 0:00

Duration 18:10
Loaded: 1.84%

Fullscreen
The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

There are 3 ways to achieve first normal form.

Method 1:

To remove the repeating values for a column, the EMPLOYEE table was converted to a flat
relation EMPLOYEE_1 table by repeating the pair (EMP_ID, EMP_NAME) for every entry in
the table. Now the new relation does not contain any non-atomic values so the table is said
to be normalized and is in First Normal Form.

Method 2:
Another method is to remove the attributes that violate 1NF and place it in a separate
relation along with primary key. So the unnormalized relation, EMPLOYEE table is
decomposed into two sub-relations EMP_DETAILS and EMP_PERFORMANCE

EMP_DETAILS

EMP_ID EMP_NAME

14 John

20 Harry

12 Sam

EMP_PERFORMANCE

EMP_ID EMP_PHONE EMP_STATE

14 7272826385 UP

14 9064738238 UP

20 8574783832 Bihar

12 7390372389 Punjab

12 8589830302 Punjab

The main idea of decomposing the relations is to keep the different types of information in
their separate relation as first normal form disallows multivalve attribute that are composite
in nature. In the EMP_DETAILS relation, the attribute (EMP_ID) acts as a primary key and
in the EMP_PERFORMANCE relation the attributes (EMP_ID, EMP_PHONE) act as a
primary key. Now it satisfies both the conditions for a relation to be in 1NF.

The relation is decomposed according to the following rules:


o One relation consists of the primary key (EMP_ID) of the original relation (i.e.
EMPLOYEE) and non repeating attributes of the original relation (i.e.
EMP_NAME).
o The other relation consists of copy of the primary key of the original relation and
all the repeating attributes of the original relation.
Method 3:

The third method of normalizing a unnormalized relation into 1NF will be explained with
following example where skills of an employee of some company are fixed. Suppose an
employee can have maximum of five skills.

EMP_SKILL relation

EMP_ID Skill

14 DBMS, C, C++

20 JAVA, C

12 DBMS, HTML, VB, MS OFFICE

Here the EMP_SKILL relation is not 1NF as the skill attribute contains a set of values. So
to remove this problem, we define multiple Skill columns as shown.

The above relation is decomposed into 1NF in the following example.

EMP_ID Skil_1 Skill_2 Skill_3 Skill_4 Skill_5

14 DBMS C C++ - -

20 JAVA C - - -

MS
12 DBMS HTML VB -
OFFICE

The above representation is in 1NF but this technique is not preferred as it may cause
problems such as:
o It would be difficult to query the relation. For Example, it would be difficult to
answer the queries like “which employee share a skill?”, “Which employees have
skill C?”
o Restriction of employee skills to 5. If employee with more skills appears, it would
be left unrecorded.
To sum up, all the three approaches are correct because they transform any unnormalized
table into a first normal form table. However, the second approach where table is
decomposed into relations is more efficient as minimizes the duplicacy of the data. So for a
relation to be in first normal form, each set of repeating groups should appear in its own
table and every relation should have a primary key.

Anomalies in first normal form

Whereas the first normal form was concerned with the structure of the representation of
relation, the second normal form is concerned with the eliminating redundancy in these
relations.

The various anomalies can be divided as:

o Insertion anomaly
o Deletion anomaly
o Updation anomaly
These anomalies have got their name from the relational operations they perform on a
relation.

Let us take a following example of ORDER_BOOK relation:

Order_No B_Name Quantity Price

4253 C 15 175

4253 Database 20 225

4154 IT 30 200

4256 C 50 175

4186 Database 15 225


Insertion anomaly: Suppose that we want to insert information about a new book into the
ORDER_BOOK relation. But we cannot insert this information until some order is placed for
it because in this relation the primary key is composed of two attributes Order_No and
B_Name which are called composite keys.

So neither the Order_No nor B_Name can contain null values because it is against the
principle of entity integrity rule. So we cannot insert the information of a new book whose
order has not been placed yet because in that case, the attribute Order_No will contain null
value which is against the entity integrity rule i.e. primary key cannot null values. This is
shown in following figure:

Order_No B_Name Quantity Price

4253 C 15 175

4253 Database 20 225

4154 IT 30 200

4256 C 50 175

4186 Database 15 225

Operating
NULL 30 300
System

Relations which exhibit such kind of undesirable property are said to suffer from insertion
anomaly.

Deletion anomaly: Suppose that an order whose order number is 4154 is cancelled due to
certain reasons. Therefore, we would have to delete this order information from the
ORDER_BOOK relation.

As we can see from the relation that this particular order contain information about the book
whose name is “IT”. So on deletion of this record from the relation would result in loss of
information about the “IT” book. This may lead to loss of vital information as it is the only
record which contains information about the book “IT”. But if we try to remove any other
record from the relation then it would cause no problem as it still contains information of the
book in other record.
For Example: Deleting record whose Order_No = 4154 as shown in following figure:

Order_No B_Name Quantity Price

4253 C 15 175

4253 Database 20 225

4256 C 50 175

4186 Database 15 225

NULL Operating System 30 300

Relations which exhibit such kind of undesirable property are said to suffer from deletion
anomaly.

Updation Anomaly: Modifying some values in the relations may also prove cumbersome.
Suppose that if the price of the book C is modified to 190 then every tuple referring to this
book have to be updated and multiple updating always carries some risk of inconsistencies.

In the ORDER_BOOK relation, the updation seems to be very easy because it contains only
two tuples having B_price as 175. But if in case relation has thousands of tuples containing
a large number of redundant data, the updations may lead to inconsistency as humans are
prone to errors.

Relations which exhibit such kind of undesirable property are said to suffer from updation
anomaly.

The above considerations leads us to a conclusion that relations in 1NF have undesirable
data manipulation properties hence bringing relation to 1NF would not terminate logical
database design. Further transformations are needed to eliminate this kind of anomalies
form a set of original relations. So this bring us the concept of second normal form.

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent
on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_A


GE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID


which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

Backward Skip 10sPlay VideoForward Skip 10s


TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:
TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Anomalies in Second Normal Form


Even if the relation in 2NF, it still suffers from insertion, deletion and updation anomalies. So
before discussing the third normal form, we will explain these anomalies.

To discuss the various anomalies, we will consider the STUDENT relation that holds
information about students and teachers.

Stu_Id Stu_Name Teach_Id Teach_Name Teach_Qual

2523 Anurag 201 Mohan MCA

3712 Raju 202 Ravi M.Tech

4906 Raman 203 Mahima Ph.D

2716 Jyoti 204 Anjali MCA

1768 Meetali 205 Sonia M.Tech

In the above table, Stu_Id is the primary key which acts as the roll number of the student.
Since the STUDENT relation is composed of only one attribute which acts as a primary key
(Stu_Id) so it is in 2NF. But it suffers from the insertion, deletion and updation anomalies
which are explained as follows.

Insertion anomaly: Suppose that we want to insert a a new record with some information
about a new teacher who has not yet been assigned a personal student. But this insertion
record is not allowed because the primary key Stu_Id contains a nullvalue which is not
possible as it is against the entity integrity rule.

For Example: Suppose that we want to insert information about a new teacher ‘Mayank’
having Teach_Id = ‘206’ Teach_Qual = ‘MCA‘who has not yet been allotted any student.
This is not possible as Stu_Id will contain a null value.

Stu_Id Stu_Name Teach_Id Teach_Name Teach_Qual

2523 Anurag 201 Mohan MCA

3712 Raju 202 Ravi M.Tech

4906 Raman 203 Mahima Ph.D

2716 Jyoti 204 Anjali MCA

1768 Meetali 205 Sonia M.Tech

NULL NULL 206 Mayank MCA

Deletion anomaly: Suppose that a student whose Stu_Id = 1768 decides to leave the
college, so we would have to delete this tuple from the STUDENT relation.

As we can see from the relation that this particular student is the last student of the teacher
whose Teach_Id = ‘205’. Thus on deleting this tuple, the information about the teacher
would also be deleted. This may lead to vital information. This is the deletion anomaly.

There would be no deletion problem if the student who decides to leave the college is not
the last student of the particular teacher.

For Example: Deleting student record with Stu_Id = 2523 will not lead to deletion of teacher
information whose Teach_Id = ‘201’ because it is present elsewhere.
Stu_Id Stu_Name Teach_Id Teach_Name Teach_Qual

3712 Raju 202 Ravi M.Tech

4906 Raman 203 Mahima Ph.D

2716 Jyoti 204 Anjali MCA

1768 Meetali 205 Sonia M.Tech

NULL NULL 206 Mayank MCA

Updation Anomaly: The second normal form also suffers from updation anomaly.

For Example: The value of the qualifications of the teacher i.e. Teach_Qual whose
Teach_Id = ‘204’ is updated from MCA to Ph.D. This would be quite a big problem as the
updation in the tuple will have to be made where ever this information reoccurs. Although
this relation is having few tuples so it would be quite a big problem here but normally a
teacher, teaches many students. So in case of huge databases it will be a big problem and
may lead to inconsistencies as human are prone to errors.

The above considerations leads us to a conclusion that relation in 2NF have undesirable
data manipulation properties hence bringing a relation to 2NF would not terminate logical
database design. Further transformations are needed to eliminate these kinds of anomalies
from an original relation. So this brings us to a concept of the Third normal form.

Third Normal Form:


 A relation is in third normal form (3NF) if it is in second normal form and it contains no
transitive dependencies.

 Consider relation R containing attributes A, B and C. R(A, B, C)

 If A → B and B → C then A → C

 Transitive Dependency: Three attributes with the above dependencies.

For example:
Student_details table
ID Name Subject DOB Address Mobile No. City

New Student_detail table:


ID Name Subject

Address Table:
ID Address DOB Mobile No. City

The advantage of removing transtive dependency is,


 Amount of data duplication is reduced.

 Data integrity achieved.

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation
must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

To explain the 3NF, let us consider the example of Employee_Detail relation as


shown below.

Example: EMPLOYEE_DETAIL table

Example:

Backward Skip 10sPlay VideoForward Skip 10s


EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida


333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on


Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key(EMP_ID). It violates the rule of third normal form.The reduction of 2NF relation
into 3NF consists of splitting the 2NF into appropriate relations such that every non-key
attribute are functionally dependent on the primary key not transitively or indirectly of the
respective relations.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389


666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Anomalies in Third Normal Form


Even if the relation in 3NF, it still suffers from insertion, deletion and updation anomalies. So
before discussing the next higher normal form, we will explain these anomalies.

To discuss the various anomalies, we will consider the STAFF relation that holds
information about the staff, the equipment key they have been allocated and the language
in which they are fluent.

STAFF (@S_Name + @Equipment + @Language) where @symbol tells that it is a


primary key.

STAFF Relation:

S_Name Equipment Language

Anurag PC English Mainframe French

Kapil PC English French Japanese


In the above visualization it shows that it is not a relation. In order to represent it as a
relation in 1NF, we need to convert it to the form as shown in the following table.

STAFF Relation:

S_Name Equipment Language

Anurag PC English

Anurag PC French

Anurag Mainframe English

Anurag Mainframe French

Kapil PC English

Kapil PC French

Kapil PC Japanese

In the above relation, every STAFF has two independent sets of features associated with it.

The STAFF relation has a primary key composed of S_Name, Equipment and Language.
There is no transitive dependency, so the relation STAFF is in 3NF. But it stills suffers from
the insertion, deletion and updation anomalies which are explained as follows.

Insertion anomaly: Suppose an Anurag learns a new language Japanese then we will
have to insert two new records into the STAFF relation. Similarly, if the Equipment values
corresponding to staff Anurag changes in numbers from 2 to 5, then with introduction of new
language Japanese we will have to insert multiple records into the STAFF relation which
results in redundancy.

Deletion anomaly: Let us suppose that the name of the staff changed from Anurag to
Anuraj here sue to some reasons then multiple records need to be updated which may
result in inconsistency of data.
Updation Anomaly: Let us suppose that an staff Kapil has his equipment PC deallocated,
then all the information about his languages skills would be lost due to deletion of these
records which may result in loss of vital information.

All these anomalies are encountered due to the presence of multivalued dependency which
is removed in fourth normal form. The concept of multivalued dependency and 4NF will be
explained later.

Boyce and Codd Normal Form (BCNF):

This is a higher version of third normal form. This form deals with certain type of anamoly
that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate
keys is said to be in BCNF. For a table to be in BCNF, following conditions must be
satisfied:
 R must be in 3rd Normal Form

 and, for each functional dependency ( X -> Y ), X should be a super Key.

Introduction of Relational Algebra in DBMS




Relational Algebra is a procedural query language. Relational algebra mainly


provides a theoretical foundation for relational databases and SQL. The main
purpose of using Relational Algebra is to define operators that transform one or
more input relations into an output relation. Given that these operators accept
relations as input and produce relations as output, they can be combined and
used to express potentially complex queries that transform potentially many
input relations (whose data are stored in the database) into a single output
relation (the query results). As it is pure mathematics, English Keywords are not
used in Relational Algebra, and operators are represented using symbols.
What is Relational Algebra?
Relational algebra consists of a certain set of rules or operations that are widely
used to manipulate and query data from a relational database. It can be
facilitated by utilizing SQL language and helps users interact with database
tables based on querying data from the database more efficiently and
effectively.
Relational Algebra incorporates a collection of operations, such as filtering data
or combining data, that help us organize and manipulate data more efficiently.
This ” algebra ” is the foundation for most database queries, and it enables us to
extract the required information from the databases by using SQL query
language.
Fundamental Operators
Relational algebra consists of various operators that help us fetch and
manipulate data from relational tables in the database to perform certain
operations on relational data.
These are the basic/fundamental operators used in Relational
Algebra
1. Selection(σ)
2. Projection(π)
3. Union(U)
4. Set Difference(-)
5. Set Intersection(∩)
6. Rename(ρ)
7. Cartesian Product(X)
1. Selection(σ) : Selection Operation is basically used to filter out rows from a
given table based on certain given condition. It basically allows you to retrieve
only those rows that match the condition as per condition passed during SQL
Query.
It is used to select required tuples of the relations.
Illustration : If we want to get the details of all the work in the Department
“IT”, we would use the selection operator to filter out these based on the given
condition.
Example:
A B C

1 2 4

2 2 3

3 2 3

4 3 4
Output Table :
For the above relation, σ(c>3)R will select the tuples which have c more than 3.
A B C

1 2 4

4 3 4
Note: The selection operator only selects the required tuples but does not
display them. For display, the data projection operator is used.
2. Projection(π) : While Selection operation works on rows , similarly
projection operation of relational algebra works on columns. It basically allows
you to pick specific columns from a given relational table based on the given
condition and ignoring all the other remaining columns.
Illustration : If we are interested in the specific columns from the relational
tables, we would prefer to use the Projection Operator.
It is used to project required column data from a relation.
Example: Consider Table 1. Suppose we want columns B and C from Relation R.

π(B,C)R will show following columns.


B C

2 4

2 3

3 4
Note: By Default, projection removes duplicate data.
3. Union(U) : Union Operator is basically used to combine the results of two
queries into a single result. The only condition is that both queries must return
same number of columns with same data types.
Illustration : If in case we want a list of all the employee from two different
department . Then in that case we should use union operation to merge both
the list from two different department.
Union operation in relational algebra is the same as union operation in set
theory.

Example: FRENCH
Student_Name Roll_Number

Ram 01

Mohan 02

Vivek 13

Geeta 17

GERMAN
Student_Name Roll_Number

Vivek 13

Geeta 17

Shyam 21

Rohan 25
Consider the following table of Students having different optional subjects in
their course.

π(Student_Name)FRENCH U π(Student_Name)GERMAN
Student_Name

Ram

Mohan

Vivek

Geeta

Shyam

Rohan
Note: The only constraint in the union of two relations is that both relations
must have the same set of Attributes.
4. Set Difference(-) : Set difference basically provides the rows that are
present in one table , but not in another tables.
Illustration : If you have two lists of employees, one from the IT Department
and one from the Marketing department & you want to know which employees
are in IT but not in Marketing , the set difference operator would used.
Set Difference in relational algebra is the same set difference operation as in set
theory.
Example: From the above table of FRENCH and GERMAN, Set Difference is used
as follows

π(Student_Name)FRENCH - π(Student_Name)GERMAN
Student_Name

Ram

Mohan
Note: The only constraint in the Set Difference between two relations is that
both relations must have the same set of Attributes.
5. Set Intersection(∩) : Set Intersection basically allows to fetches only those
rows of data that are common between two sets of relational tables.
Illustration : If we want to find the number of employees who work in both IT
and Marketing Department , then in that case we use Intersection Operator.
Set Intersection in relational algebra is the same set intersection operation in
set theory.
Example: From the above table of FRENCH and GERMAN, the Set Intersection is
used as follows:

π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN
Student_Name

Vivek

Geeta
Note: The only constraint in the Set Difference between two relations is that
both relations must have the same set of Attributes.
6. Rename(ρ) : Rename operator basically allows you to give a temporary
name to a specific relational table or to its columns. It is very useful when we
want to avoid ambiguity, especially in complex Queries.
Rename is a unary operation used for renaming attributes of a relation.
ρ(a/b)R will rename the attribute 'b' of the relation by 'a'.
7. Cross Product(X) : Cartesian product Operator combines every row of one
table with every row of another table , producing all the possible combination.
It’s mostly used as a precursor to more complex operation like joins.
Cross-product between two relations. Let’s say A and B, so the cross product
between A X B will result in all the attributes of A followed by each attribute of
B. Each record of A will pair with every record of B.

Example:
A
Name Age Sex

Ram 14 M

Sona 15 F

Kim 20 M
B
ID Course

1 DS

2 DBMS
AXB
Name Age Sex ID Course

Ram 14 M 1 DS

Ram 14 M 2 DBMS

Sona 15 F 1 DS

Sona 15 F 2 DBMS

Kim 20 M 1 DS

Kim 20 M 2 DBMS
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.
Derived Operators
These are some of the derived operators, which are derived from the

1. Natural Join(⋈)
fundamental operators.
2. Conditional Join
1. Natural Join(⋈): Natural join is a binary operator. Natural join between two or
more relations will result in a set of all combinations of tuples where they have
an equal common attribute.

Example:

EMP
Name ID Dept_Name

A 120 IT

B 125 HR

C 110 Sales

D 111 IT

DEPT
Dept_Name Manager

Sales Y

Production Z

IT A
Natural join between EMP and DEPT with condition :

EMP ⋈ DEPT
EMP.Dept_Name = DEPT.Dept_Name

Name ID Dept_Name Manager

A 120 IT A

C 110 Sales Y

D 111 IT A
2. Conditional Join: Conditional join works similarly to natural join. In natural
join, by default condition is equal between common attributes while in
conditional join we can specify any condition such as greater than, less than, or
not equal.
Example:

R
ID Sex Marks

1 F 45

2 F 55

3 F 60

S
ID Sex Marks

10 M 20

11 M 22

12 M 59
Join between R and S with condition R.marks >= S.marks
R.ID R.Sex R.Marks S.ID S.Sex S.Marks

1 F 45 10 M 20

1 F 45 11 M 22

2 F 55 10 M 20

2 F 55 11 M 22

3 F 60 10 M 20

3 F 60 11 M 22

3 F 60 12 M 59

Relational Calculus
As Relational Algebra is a procedural query language, Relational Calculus is a
non-procedural query language. It basically deals with the end results. It always
tells me what to do but never tells me how to do it.
There are two types of Relational Calculus
1. Tuple Relational Calculus(TRC)
2. Domain Relational Calculus(DRC)
In-depth articles:
Basic-operators-in-relational-algebra
Extended Relational Algebra Operators
Following are the Previous Year’s Gate Questions
https://www.geeksforgeeks.org/gate-gate-cs-2012-question-50/
https://www.geeksforgeeks.org/gate-gate-cs-2012-question-43/
Conclusion
Relational algebra might sound theoretical, but it’s a very powerful tool for
understanding how databases work behind the scene. It fundamentally permits
us to break down the complex database queries into a smaller , more
manageable queries. Whether you’re filtering data, combining tables , or
renaming columns , these operators serve as building blocks for working with
relational databases effectively.
Learning relational algebra will provides you with high level of understanding
regarding how to optimize your database queries and structure data for better
performance and utilization of resources.
Frequently Asked Questions on Relational Algebra in
DBMS -FAQs
What is a relational database ?
A relational database is basically a type of database where data is stored in the
form of tables, which look similar to spreadsheets. Each table has rows and
columns: rows represent individual records ( like a person or an ordered data ) &
columns represent attributes ( like name, and address). Since they are
connected with each other through shared data, it makes it allowing for more
organized storage and easier retrieval of information
What is the relational model ?
Relational model is the theory behind how relational database are organized. In
this model, data is organized into tables, with each table representing a specific
type of information ( like customer or order ). The model also defines how these
tables relate to one another, typically through shared keys ( based on primary
key or candidate keys) or identifier.
What is the main difference between Selection Operation and
Projection Operation?
Selection Operation is basically used to filter out rows based on certain
condition, while on other hand the projection operations of Relational algebra
selects the specific columns from a table.
How is cartesian product different from a Join Operations?
Cartesian product is used to form all the possible combinations of rows from two
tables, while Join typically used to combine rows based on a related column (like
foreign key)
Why is Relational Algebra important?
Relational algebra helps in designing efficient database queries. It provides a
clear framework for understanding how data is retrieved and manipulated in
relational databases.

Basic Operators in Relational Algebra




The Relational Model is a way of structuring data using relations, which are a
collection of tuples that have the same attributes. Relational Algebra is a
procedural query language that takes relations as input and returns relations as
output. It uses a set of operators to manipulate and retrieve data from these
relations. Here, we’ll explore the basic operators of Relational Algebra using the
STUDENT_SPORTS, EMPLOYEE, and STUDENT relations from Table 1, Table 2,
and Table 3, respectively.

Table 1: STUDENT_SPORTS

ROLL_NO SPORTS

1 Badminton

2 Cricket

2 Badminton

4 Badminton
Table 2: EMPLOYEE

ADDRES
EMP_NO NAME PHONE AGE
S

1 RAM DELHI 9455123451 18


ADDRES
EMP_NO NAME PHONE AGE
S

5 NARESH HISAR 9782918192 22

6 SWETA RANCHI 9852617621 21

4 SURESH DELHI 9156768971 18


Table 3: STUDENT

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

RAMES
2 GURGAON 9652431543 18
H

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

1. Selection operator (σ)


Selection operator is used to selecting tuples from a relation based on some
condition. Syntax:
σ (Cond)(Relation Name)
Extract students whose age is greater than 18 from STUDENT relation given in
Table 3
σ (AGE>18)(STUDENT)
[Note: SELECT operation does not show any result, the projection operator
must be called before the selection operator to generate or project the result.
So, the correct syntax to generate the result is: ∏(σ (AGE>18)(STUDENT))]
RESULT:

ROLL_NO NAME ADDRESS PHONE AGE

3 SUJIT ROHTAK 9156253131 20

2. Projection Operator (∏)


Projection operator is used to project particular columns from a
relation. Syntax:
∏(Column 1,Column 2….Column n)(Relation Name)
Extract ROLL_NO and NAME from STUDENT relation given in Table 3
∏(ROLL_NO,NAME)(STUDENT)
RESULT:
ROLL_NO NAME

1 RAM

2 RAMESH

3 SUJIT

4 SURESH
Note: If the resultant relation after projection has duplicate rows, it will be
removed. For Example ∏(ADDRESS)(STUDENT) will remove one duplicate row with
the value DELHI and return three rows.
3. Cross Product(X)
Cross product is used to join two relations. For every row of Relation1, each row
of Relation2 is concatenated. If Relation1 has m tuples and and Relation2 has n
tuples, cross product of Relation1 and Relation2 will have m X n tuples. Syntax:
Relation1 X Relation2
To apply Cross Product on STUDENT relation given in Table 1 and
STUDENT_SPORTS relation given in Table 2,
STUDENT X STUDENT_SPORTS
RESULT:

ROLL_NO NAME ADDRESS PHONE AGE ROLL_NO SPORTS

1 RAM DELHI 9455123451 18 1 Badminton

1 RAM DELHI 9455123451 18 2 Cricket

1 RAM DELHI 9455123451 18 2 Badminton

1 RAM DELHI 9455123451 18 4 Badminton

2 RAMESH GURGAON 9652431543 18 1 Badminton

2 RAMESH GURGAON 9652431543 18 2 Cricket

2 RAMESH GURGAON 9652431543 18 2 Badminton

2 RAMESH GURGAON 9652431543 18 4 Badminton

3 SUJIT ROHTAK 9156253131 20 1 Badminton


ROLL_NO NAME ADDRESS PHONE AGE ROLL_NO SPORTS

3 SUJIT ROHTAK 9156253131 20 2 Cricket

3 SUJIT ROHTAK 9156253131 20 2 Badminton

3 SUJIT ROHTAK 9156253131 20 4 Badminton

4 SURESH DELHI 9156768971 18 1 Badminton

4 SURESH DELHI 9156768971 18 2 Cricket

4 SURESH DELHI 9156768971 18 2 Badminton

4 SURESH DELHI 9156768971 18 4 Badminton

4. Union (U)
Union on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible (These two relations should have the same number of attributes
and corresponding attributes in two relations have the same domain). Union
operator when applied on two relations R1 and R2 will give a relation with tuples
that are either in R1 or in R2. The tuples which are in both R1 and R2 will appear
only once in the result relation. Syntax:
Relation1 U Relation2
Find the person who is either student or employees, we can use Union operators
like:
STUDENT U EMPLOYEE
RESULT:

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

RAMES
2 GURGAON 9652431543 18
H

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

5 NARESH HISAR 9782918192 22


ROLL_NO NAME ADDRESS PHONE AGE

6 SWETA RANCHI 9852617621 21

5. Minus (-) or Set Difference


Minus on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible. Minus operator when applied on two relations as R1-R2 will give a
relation with tuples that are in R1 but not in R2. Syntax:
Relation1 - Relation2
Find the person who is a student but not an employee, we can use minus
operator like:
STUDENT - EMPLOYEE
RESULT:

ROLL_NO NAME ADDRESS PHONE AGE

RAMES
2 GURGAON 9652431543 18
H

3 SUJIT ROHTAK 9156253131 20

6. Rename(ρ)
Rename operator is used to giving another name to a relation. Syntax:
ρ(Relation2, Relation1)
To rename STUDENT relation to STUDENT1, we can use rename operator like:
ρ(STUDENT1, STUDENT)
If you want to create a relation STUDENT_NAMES with ROLL_NO and NAME from
STUDENT, it can be done using rename operator as:
ρ(STUDENT_NAMES, ∏(ROLL_NO, NAME)(STUDENT))
Extended Relational Algebra Operators
 Intersection (∩)
 Division (÷)
 Join Operations (⋈)
 Natural Join
 Theta Join
 Equi Join
These operators provide more functionality for complex queries in relational
databases.
Conclusion
Relational Algebra provides a fundamental toolkit for querying and manipulating
relations in a database. It offers a set of operators that allow users to perform
basic operations like selection, projection, union, and more. Understanding
these operators is essential for working with relational databases and forms the
basis for more advanced SQL operations.
Basic Operators in Relational Algebra – FAQs
What is the difference between Relational Algebra and SQL?
Relational Algebra is a procedural query language that specifies how to retrieve
data, while SQL is a declarative language that specifies what data to retrieve.
SQL implementations are influenced by the principles of Relational Algebra.
Why is Relational Algebra important?
It provides a formal foundation for relational databases and is used for query
optimization, ensuring that database operations are performed efficiently.
Can Relational Algebra handle complex queries?
Yes, by combining basic operators, complex queries can be constructed to
handle intricate data retrieval and manipulation tasks.

Basic Operators in Relational Algebra




The Relational Model is a way of structuring data using relations, which are a
collection of tuples that have the same attributes. Relational Algebra is a
procedural query language that takes relations as input and returns relations as
output. It uses a set of operators to manipulate and retrieve data from these
relations. Here, we’ll explore the basic operators of Relational Algebra using the
STUDENT_SPORTS, EMPLOYEE, and STUDENT relations from Table 1, Table 2,
and Table 3, respectively.
Table 1: STUDENT_SPORTS

ROLL_NO SPORTS

1 Badminton

2 Cricket

2 Badminton

4 Badminton
Table 2: EMPLOYEE

ADDRES
EMP_NO NAME PHONE AGE
S

1 RAM DELHI 9455123451 18

5 NARESH HISAR 9782918192 22


ADDRES
EMP_NO NAME PHONE AGE
S

6 SWETA RANCHI 9852617621 21

4 SURESH DELHI 9156768971 18


Table 3: STUDENT

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

RAMES
2 GURGAON 9652431543 18
H

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

1. Selection operator (σ)


Selection operator is used to selecting tuples from a relation based on some
condition. Syntax:
σ (Cond)(Relation Name)
Extract students whose age is greater than 18 from STUDENT relation given in
Table 3
σ (AGE>18)(STUDENT)
[Note: SELECT operation does not show any result, the projection operator
must be called before the selection operator to generate or project the result.
So, the correct syntax to generate the result is: ∏(σ (AGE>18)(STUDENT))]
RESULT:

ROLL_NO NAME ADDRESS PHONE AGE

3 SUJIT ROHTAK 9156253131 20

2. Projection Operator (∏)


Projection operator is used to project particular columns from a
relation. Syntax:
∏(Column 1,Column 2….Column n)(Relation Name)
Extract ROLL_NO and NAME from STUDENT relation given in Table 3
∏(ROLL_NO,NAME)(STUDENT)
RESULT:
ROLL_NO NAME

1 RAM

2 RAMESH

3 SUJIT

4 SURESH
Note: If the resultant relation after projection has duplicate rows, it will be
removed. For Example ∏(ADDRESS)(STUDENT) will remove one duplicate row with
the value DELHI and return three rows.
3. Cross Product(X)
Cross product is used to join two relations. For every row of Relation1, each row
of Relation2 is concatenated. If Relation1 has m tuples and and Relation2 has n
tuples, cross product of Relation1 and Relation2 will have m X n tuples. Syntax:
Relation1 X Relation2
To apply Cross Product on STUDENT relation given in Table 1 and
STUDENT_SPORTS relation given in Table 2,
STUDENT X STUDENT_SPORTS
RESULT:

ROLL_NO NAME ADDRESS PHONE AGE ROLL_NO SPORTS

1 RAM DELHI 9455123451 18 1 Badminton

1 RAM DELHI 9455123451 18 2 Cricket

1 RAM DELHI 9455123451 18 2 Badminton

1 RAM DELHI 9455123451 18 4 Badminton

2 RAMESH GURGAON 9652431543 18 1 Badminton

2 RAMESH GURGAON 9652431543 18 2 Cricket

2 RAMESH GURGAON 9652431543 18 2 Badminton

2 RAMESH GURGAON 9652431543 18 4 Badminton

3 SUJIT ROHTAK 9156253131 20 1 Badminton


ROLL_NO NAME ADDRESS PHONE AGE ROLL_NO SPORTS

3 SUJIT ROHTAK 9156253131 20 2 Cricket

3 SUJIT ROHTAK 9156253131 20 2 Badminton

3 SUJIT ROHTAK 9156253131 20 4 Badminton

4 SURESH DELHI 9156768971 18 1 Badminton

4 SURESH DELHI 9156768971 18 2 Cricket

4 SURESH DELHI 9156768971 18 2 Badminton

4 SURESH DELHI 9156768971 18 4 Badminton

4. Union (U)
Union on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible (These two relations should have the same number of attributes
and corresponding attributes in two relations have the same domain). Union
operator when applied on two relations R1 and R2 will give a relation with tuples
that are either in R1 or in R2. The tuples which are in both R1 and R2 will appear
only once in the result relation. Syntax:
Relation1 U Relation2
Find the person who is either student or employees, we can use Union operators
like:
STUDENT U EMPLOYEE
RESULT:

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

RAMES
2 GURGAON 9652431543 18
H

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

5 NARESH HISAR 9782918192 22


ROLL_NO NAME ADDRESS PHONE AGE

6 SWETA RANCHI 9852617621 21

5. Minus (-) or Set Difference


Minus on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible. Minus operator when applied on two relations as R1-R2 will give a
relation with tuples that are in R1 but not in R2. Syntax:
Relation1 - Relation2
Find the person who is a student but not an employee, we can use minus
operator like:
STUDENT - EMPLOYEE
RESULT:

ROLL_NO NAME ADDRESS PHONE AGE

RAMES
2 GURGAON 9652431543 18
H

3 SUJIT ROHTAK 9156253131 20

6. Rename(ρ)
Rename operator is used to giving another name to a relation. Syntax:
ρ(Relation2, Relation1)
To rename STUDENT relation to STUDENT1, we can use rename operator like:
ρ(STUDENT1, STUDENT)
If you want to create a relation STUDENT_NAMES with ROLL_NO and NAME from
STUDENT, it can be done using rename operator as:
ρ(STUDENT_NAMES, ∏(ROLL_NO, NAME)(STUDENT))
Extended Relational Algebra Operators
 Intersection (∩)
 Division (÷)
 Join Operations (⋈)
 Natural Join
 Theta Join
 Equi Join
These operators provide more functionality for complex queries in relational
databases.
Conclusion
Relational Algebra provides a fundamental toolkit for querying and manipulating
relations in a database. It offers a set of operators that allow users to perform
basic operations like selection, projection, union, and more. Understanding
these operators is essential for working with relational databases and forms the
basis for more advanced SQL operations.
Basic Operators in Relational Algebra – FAQs
What is the difference between Relational Algebra and SQL?
Relational Algebra is a procedural query language that specifies how to retrieve
data, while SQL is a declarative language that specifies what data to retrieve.
SQL implementations are influenced by the principles of Relational Algebra.
Why is Relational Algebra important?
It provides a formal foundation for relational databases and is used for query
optimization, ensuring that database operations are performed efficiently.
Can Relational Algebra handle complex queries?
Yes, by combining basic operators, complex queries can be constructed to
handle intricate data retrieval and manipulation tasks.

SQL Joins (Inner, Left, Right and Full Join)


Last Updated : 17 Dec, 2024



SQL joins are the foundation of database management systems, enabling


the combination of data from multiple tables based on relationships between
columns. Joins allow efficient data retrieval, which is essential for generating
meaningful observations and solving complex business queries.
Understanding SQL join types, such as INNER JOIN, LEFT JOIN, RIGHT
JOIN, FULL JOIN, and NATURAL JOIN, is critical for working with relational
databases.
In this article, we will cover the different types of SQL joins, including INNER
JOIN, LEFT OUTER JOIN, RIGHT JOIN, FULL JOIN, and NATURAL JOIN. Each
join type will be explained with examples, syntax, and practical use cases to
help us understand when and how to use these joins effectively.
What is SQL Join?
SQL JOIN clause is used to query and access data from multiple tables by
establishing logical relationships between them. It can access data from
multiple tables simultaneously using common key values shared across different
tables. We can use SQL JOIN with multiple tables. It can also be paired with
other clauses, the most popular use will be using JOIN with WHERE clause to
filter data retrieval.
Example of SQL JOINS
Consider the two tables, Student and StudentCourse, which share a common
column ROLL_NO. Using SQL JOINS, we can combine data from these tables based
on their relationship, allowing us to retrieve meaningful information like
student details along with their enrolled courses.
Student Table
StudentCourse Table

Both these tables are connected by one common key (column) i.e ROLL_NO.
We can perform a JOIN operation using the given SQL query:
Query:
SELECT s.roll_no, s.name, s.address, s.phone, s.age, sc.course_id
FROM Student s
JOIN StudentCourse sc ON s.roll_no = sc.roll_no;
Output
ROLL_N
NAME ADDRESS PHONE AGE COURSE_ID
O

1 HARSH DELHI XXXXXXXXXX 18 1

2 PRATIK BIHAR XXXXXXXXXX 19 2

3 RIYANKA SILGURI XXXXXXXXXX 20 2

4 DEEP RAMNAGAR XXXXXXXXXX 18 3

5 SAPTARHI KOLKATA XXXXXXXXXX 19 1

Types of JOIN in SQL


There are many types of Joins in SQL. Depending on the use case, we can use
different type of SQL JOIN clause. Below, we explain the most commonly used
join types with syntax and examples:
 INNER JOIN
 LEFT JOIN
 RIGHT JOIN
 FULL JOIN
 Natural Join
1. SQL INNER JOIN
The INNER JOIN keyword selects all rows from both the tables as long as the
condition is satisfied. This keyword will create the result-set by combining all
rows from both the tables where the condition satisfies i.e value of the
common field will be the same.
Syntax
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
Key Terms
 table1: First table.
 table2: Second table
 matching_column: Column common to both the tables.
Note: We can also write JOIN instead of INNER JOIN. JOIN is same as INNER
JOIN.
INNER JOIN Example
Let’s look at the example of INNER JOIN clause, and understand it’s working.
This query will show the names and age of students enrolled in different
courses.
Query:
SELECT StudentCourse.COURSE_ID, Student.NAME, Student.AGE FROM Student
INNER JOIN StudentCourse
ON Student.ROLL_NO = StudentCourse.ROLL_NO;
Output

2. SQL LEFT JOIN


LEFT JOIN returns all the rows of the table on the left side of the join and
matches rows for the table on the right side of the join. For the rows for which
there is no matching row on the right side, the result-set will contain null.
LEFT JOIN is also known as LEFT OUTER JOIN.
Syntax
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Key Terms
 table1: First table.
 table2: Second table
 matching_column: Column common to both the tables.
Note: We can also use LEFT OUTER JOIN instead of LEFT JOIN, both are the
same.

LEFT JOIN Example


In this example, the LEFT JOIN retrieves all rows from the Student table and
the matching rows from the StudentCourse table based on the ROLL_NO column.
Query:
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
LEFT JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output
3. SQL RIGHT JOIN
RIGHT JOIN returns all the rows of the table on the right side of the join and
matching rows for the table on the left side of the join. It is very similar to LEFT
JOIN for the rows for which there is no matching row on the left side, the result-
set will contain null. RIGHT JOIN is also known as RIGHT OUTER JOIN.
Syntax
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
Key Terms
 table1: First table.
 table2: Second table
 matching_column: Column common to both the tables.
Note: We can also use RIGHT OUTER JOIN instead of RIGHT JOIN, both are the
same.
RIGHT JOIN Example
In this example, the RIGHT JOIN retrieves all rows from
the StudentCourse table and the matching rows from the Student table
based on the ROLL_NO column.
Query:
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
RIGHT JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output
4. SQL FULL JOIN
FULL JOIN creates the result-set by combining results of both LEFT
JOIN and RIGHT JOIN. The result-set will contain all the rows from both tables.
For the rows for which there is no matching, the result-set will
contain NULL values.

Syntax
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Key Terms
 table1: First table.
 table2: Second table
 matching_column: Column common to both the tables.
FULL JOIN Example
This example demonstrates the use of a FULL JOIN, which combines the results
of both LEFT JOIN and RIGHT JOIN. The query retrieves all rows from
the Student and StudentCourse tables. If a record in one table does not have
a matching record in the other table, the result set will include that record
with NULL values for the missing fields
Query:
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
FULL JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output
NAME COURSE_ID

HARSH 1
NAME COURSE_ID

PRATIK 2

RIYANKA 2

DEEP 3

SAPTARHI 1

DHANRAJ NULL

ROHIT NULL

NIRAJ NULL

NULL 4

NULL 5

NULL 4

5. SQL Natural Join (?)


Natural join can join tables based on the common columns in the tables being
joined. A natural join returns all rows by matching values in common columns
having same name and data type of columns and that column should be
present in both tables.
 Both table must have at least one common column with same column name
and same data type.
 The two table are joined using Cross join.
 DBMS will look for a common column with same name and data type. Tuples
having exactly same values in common columns are kept in result.
Natural join Example
Look at the two tables below- Employee and Department
Employee

Emp_id Emp_name Dept_id

1 Ram 10

2 Jon 30
Employee

3 Bob 50

Department

Dept_id Dept_name

10 IT

30 HR

40 TIS
Problem: Find all Employees and their respective departments.
Solution Query: (Employee) ? (Department)
Emp_i
d Emp_name Dept_id Dept_id Dept_name

1 Ram 10 10 IT

2 Jon 30 30 HR

Employee data Department data

Conclusion
SQL joins are essential tools for anyone working with relational databases.
Understanding the different types of joins in SQL, like INNER JOIN, LEFT
OUTER JOIN, RIGHT JOIN, and FULL JOIN, allows us to combine and query
data effectively. With the examples and syntax covered here, we should feel
confident applying these SQL join types to our data to retrieve meaningful
observations and manage complex queries with ease. Use these SQL join
techniques to streamline our data handling and enhance our SQL skills.
FAQs
What are the 4 types of join SQL?
In SQL, the four main types of joins are:

You might also like