DBMS
o Explain the ACID properties of transactions and their
importance in database integrity.
o Differentiate between primary and foreign keys with
examples.
The PRIMARY KEY constraint in SQL is one of the most important
constraints used to ensure data integrity in a database table. A
primary key uniquely identifies each record in a table, preventing
duplicate or NULL values in the specified column(s). Understanding
how to properly implement and use the primary key constraint is
crucial for managing relational data effectively.
PRIMARY KEY in SQL
PRIMARY KEY in SQL is a column (or group of columns) that
uniquely identifies the records in that table. A primary key must
contain unique values and can not have any NULL value.
There can only be one primary key in a table, but that primary key
can consist of one or more columns. When there are two or more
columns in the primary key it is called a composite key.
A primary key automatically has a UNIQUE constraint defined on
it, and it ensures that there are no duplicate or NULL values in
that column.
SQL PRIMARY KEY Properties
1. No duplicate values are allowed, i.e. The column assigned as the
primary key should have UNIQUE values only.
2. NO NULL values are present in the Primary key column. Hence there
is a Mandatory value in the column having the Primary key.
3. Only one primary key per table exists although the Primary key may
have multiple columns.
4. No new row can be inserted with the already existing primary key.
5. Primary keys can be classified into two categories Simple primary
key that consists of one column and composite primary key that
consists of Multiple column.
6. Defined in CREATE TABLE or ALTER TABLE statement.
Syntax
There are two syntaxes to create/add primary key to a table:
Using CREATE TABLE Statement
Using ALTER TABLE Statement
SQL PRIMARY KEY with CREATE TABLE
SQL primary key syntax with CREATE TABLE statement is:
CREATE TABLE table_name (
column1 datatype constraint,
column2 datatype constraint,
...,
CONSTRAINT pk_constraint_name PRIMARY KEY (column1, column2,
...)
);
SQL PRIMARY KEY with ALTER TABLE
SQL primary key syntax with ALTER TABLE statement is
ALTER TABLE table_name
ADD CONSTRAINT constraint_name PRIMARY KEY (column1,
column2, ... column_n);
SQL PRIMARY KEY Examples
Let's look at some examples of the PRIMARY KEY Constraint
in SQL, and understand it's working.
Create PRIMARY KEY in SQL Example
In this example, we will create primary key in a new table using
CREATE TABLE statement.
Query
CREATE TABLE Persons (
PersonID int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
Verify SQL Primary key creation
To verify if the primary key has been successfully created, we will try
adding duplicate values in primary key column, and SQL should
return an error.
Query
INSERT INTO Persons VALUES
(1,"Thakur", "Aditya", 22),
(1, "Kumar", "Shubham", 21);
Output
Error: UNIQUE constraint failed: Persons.PersonID
Add PRIMARY KEY to a Table Example
In this example, we will add primary key to a already existing table
using ALTER TABLE command.
Let's consider previous table, and create it without primary key this
time.
CREATE TABLE Persons (
PersonID int,
LastName varchar(255) NOT NULL,
FirstName varchar(255), Age int);
This query will add primary key to 'Persons' table
ALTER TABLE Persons
ADD CONSTRAINT PK_Person PRIMARY KEY (PersonID);
Important Points About SQL PRIMARY KEY
A primary key is a column or a set of columns in a table that
uniquely identifies each row.
It ensures data integrity by preventing duplicate records and null
values.
A primary key can be defined on a single column (simple primary
key) or multiple columns (composite primary key).
Creating a primary key automatically creates a unique index on the
key column(s), improving query performance.
Establishing relationships between tables using SQL primary key
and foreign key improve database design, reduce data redundancy,
and improve data consistency.
Benefits of Using Primary Keys
Data Integrity: The primary key enforces data integrity by
ensuring each record is unique.
Efficient Querying: Since a primary key automatically creates an
index, querying for records by the primary key is faster.
Referential Integrity: Primary keys are used to establish
relationships between tables (via foreign keys), ensuring
consistency across related data.
Common Issues and Best Practices
Avoid NULL values: Always ensure that the columns involved in
the primary key do not accept NULL values.
Choose meaningful primary keys: If possible, choose a primary
key that naturally fits the data and serves as a meaningful identifier,
like an ID field.
Composite Keys: Be cautious when using composite keys. While
they are useful in some scenarios, they can make queries more
complex. If possible, use a simple key or generate an artificial
primary key (like an ID).
Changing Primary Keys: Once a primary key is established,
changing it can be difficult because of the interdependencies with
other tables (foreign key constraints). Always plan ahead when
designing your database schema.
Conclusion
The PRIMARY KEY constraint is a fundamental concept in relational
databases that ensures each record in a table is unique and
identifiable. By using the primary key effectively, you can maintain
data integrity, improve query performance, and establish
meaningful relationships between tables. Whether you are working
with simple or composite keys, it is important to understand the
properties and best practices to ensure a well-designed and efficient
database structure.
o Write an SQL query to join two tables based on specific
conditions.
SQL INNER JOIN is a powerful and frequently used operation
in relational databases. It allows us to combine two or more
tables based on a related column, returning only the records that
satisfy the join condition
This article will explore the fundamentals of INNER JOIN, its syntax,
practical examples, and the key differences between INNER
JOIN and other types of joins, such as OUTER JOIN.
What is SQL Inner Join?
The INNER JOIN clause in SQL is used to combine
records from two or more tables. The result contains only the
rows that have matching values in both tables based on a specific
condition. This makes INNER JOIN a valuable tool when we need to
work with related data across multiple tables in a database.
The key feature of an INNER JOIN is that it filters out rows from
the result where there is no matching data in both tables.
Essentially, it returns a "subset" of the data where the condition is
satisfied.
Syntax:
SELECT columns
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
Key Terms
columns: The specific columns we want to retrieve.
table1 and table2: The two tables we are joining.
column_name: The columns from both tables that we want to
match based on the join condition.
Example of SQL INNER JOIN
Consider two tables: professor and teacher. The professor table
contains data about professors, while the teacher table holds
information about the courses that these professors teach. The
common column between these tables is ID from the professor table
and prof_id from the teacher table.
professor table
Nam Sala
ID e ry
Roha 5700
1
n 0
Arya 4500
2
n 0
6000
3 Arpit
0
Hars 5000
4
h 0
5500
5 Tara
0
teacher Table
course_ prof_ course_na
id id me
1 1 English
1 3 Physics
2 4 Chemistry
Mathematic
2 5
s
Now, we will write a query to retrieve
the course_id, prof_id, professor's Name, and their Salary by
joining the professor and teacher tables using INNER JOIN. The
query joins the professor table and the teacher table based on the
condition that the ID from the professor table matches
the prof_id in the teacher table.
Query
SELECT teacher.course_id, teacher.prof_id, professor.Name,
professor.Salary
FROM professor INNER JOIN teacher ON professor.ID =
teacher.prof_id;
Output
course_ prof_ Nam Sala
id id e ry
Roha 5700
1 1
n 0
6000
1 3 Arpit
0
Hars 5000
2 4
h 0
5500
2 5 Tara
0
Explanation:
The output contains the details of professors and the courses
they teach. The INNER JOIN operation ensures that only the records
where a professor is assigned a course are included in the result.
The professor who does not teach a course (like Aryan, who is not
listed in the output) is excluded.
Difference Between INNER JOIN and OUTER JOIN
INNER JOIN
Returns records that have matching values in both tables.
Does not include records where there is no match between the
tables.
OUTER JOIN
Returns records even if there is no match in one of the tables.
Can be a LEFT OUTER JOIN, RIGHT OUTER JOIN, or FULL OUTER
JOIN, depending on whether we want to include unmatched records
from the left, right, or both tables.
Key Points About SQL INNER JOIN
1. Combines Data from Multiple Tables: INNER JOIN allows us
to combine data from multiple tables based on common columns,
making it possible to work with related data stored in different
tables.
2. Excludes Non-Matching Records: INNER JOIN only returns
records where there is a match in both tables based on the join
condition. If there is no match, the record will be excluded from the
result set.
3. Simplifies Complex Queries: INNER JOIN simplifies complex
queries by allowing you to work with multiple tables at once. It
reduces the need for multiple subqueries and makes database
management more efficient.
4. Widely Used in Relational Databases: INNER JOIN is widely
used for tasks such as managing customer orders, product
inventories, and many other relational datasets. It is essential for
performing operations on normalized data.
Conclusion
SQL INNER JOIN is an essential tool for combining related data
across multiple tables. By retrieving only the rows where a match is
found, it helps filter relevant information efficiently. Whether we're
managing employee data, courses, or customer orders,
mastering the use of INNER JOIN is key to
effective database querying and management. It simplifies complex
queries and enhances the performance of SQL operations by
minimizing unnecessary data retrieval.
o Optimize a slow-running SQL query by identifying and
correcting bottlenecks.
The optimized SQL queries not only enhance the performance but
also contribute to cost savings by reducing resource consumption.
Let us see the various ways in which you can optimize SQL queries
for faster performance.
1. Use Indexes
Indexes act like internal guides for the database to locate specific
information quickly. Identify frequently used columns in WHERE
clauses, JOIN conditions and ORDER BY clauses then create indexes
on those columns. However, creating too many indexes can slow
down adding and updating data so use them strategically.
The database needs to maintain the indexes in addition to the main
table data, which adds some overhead. So, it's important to strike a
balance and only create indexes on columns that will provide
significant search speed improvements.
2. Use WHERE Clause instead of HAVING Clause
Using the WHERE clause instead of HAVING can significantly
improve query performance. The WHERE clause filters data before
groups are formed, whereas HAVING applies filters after the
grouping process. As a result, queries with the WHERE clause
executes faster reducing the overall processing time and enhancing
efficiency.
For Example:
SELECT name FROM table_name WHERE age>=18; results in
displaying only those names whose age is greater than or equal to
18 whereas
SELECT age COUNT(A) AS Students FROM table_name GROUP BY
age HAVING COUNT(A)>1; results in first renames the row and then
displaying only those values which pass the condition
3. Avoid Queries inside a Loop
This is one of the best optimization techniques that you must follow.
Running queries inside the loop will slow down the execution time to
a great extent. In most cases, you will be able to insert and update
data in bulk which is a far better approach as compared to queries
inside a loop.
The iterative pattern which could be visible in loops such as FOR,
WHILE and DO-WHILE takes a lot of time to execute and thus the
performance and scalability are also affected. To avoid this, all the
queries can be made outside loops, and hence, the efficiency can be
improved.
4. Use Select instead of Select *
One of the best ways to enhance efficiency is to reduce the load on
the database. This can be done by limiting the amount of
information to be retrieved from each query. Running queries with
Select * will retrieve all the relevant information which is available in
the database table. It will retrieve all the unnecessary information
from the database which takes a lot of time and enhance the load
on the database.
Let's understand this better with the help of an example. Consider a
table name GeeksforGeeks which has columns names like Java,
Python, and DSA.
Select * from GeeksforGeeks; - Gives you the complete table as
an output whereas
Select condition from GeeksforGeeks; - Gives you only the
preferred(selected) value
So the better approach is to use a Select statement with defined
parameters to retrieve only necessary information. Using Select will
decrease the load on the database and enhances performance.
5. Add Explain to the Beginning of Queries
Explain keywords is used to describe how SQL queries are being
executed. This description includes how tables are joined, their
order, and many more. It is a beneficial query optimization tool that
further helps in knowing the step-by-step details of execution. Add
explain and check whether the changes you made have reduced the
runtime significantly or not. Running Explain query takes time so it
should only be done during the query optimization process.
6. Keep Wild cards at the End of Phrases
A wildcard is used to substitute one or more characters in a string. It
is used with the LIKE operator. LIKE operator is used with where
clause to search for a specified pattern. Pairing a leading wildcard
with the ending wildcard will check for all records matching between
the two wildcards. Let's understand this with the help of an
example.
Consider a table Employee which has 2 columns name and salary.
There are 2 different employees namely Rama and Balram.
Select name, salary From Employee Where name like ‘%Ram%’;
Select name, salary From Employee Where name like ‘Ram%’;
In both the cases, now when you search %Ram% you will get both
the results Rama and Balram, whereas Ram% will return just Rama.
Consider this when there are multiple records of how the efficiency
will be enhanced by using wild cards at the end of phrases.
7. Use Exist() instead of Count()
Both Exist() and Count() are used to search whether the table has a
specific record or not. But in most cases Exist() is much more
effective than Count(). As Exist() will run till it finds the first
matching entry whereas Count() will keep on running and provide all
the matching records. Hence this practice of SQL query optimization
saves a lot of time and computation power. EXISTS stop as the
logical test proves to be true whereas COUNT(*) must count each
and every row, even after it has passed the test.
8. Avoid Cartesian Products
Cartesian products occur when every row from one table is joined
with every row from another table, resulting in a massive dataset.
Accidental Cartesian products can severely impact query
performance. Always double-check JOIN conditions to
avoid unintended Cartesian products. Make sure you're joining the
tables based on the specific relationship you want to explore.
For Example
Incorrect JOIN (Cartesian product): SELECT * FROM Authors JOIN
Books; (This joins every author with every book)
Correct JOIN (retrieves books by author): SELECT Authors.name,
Books.title FROM Authors JOIN Books ON Authors.id =
Books.author_id; (This joins authors with their corresponding books
based on author ID).
9. Consider Denormalization
Denormalization involves strategically adding redundant data to a
database schema to improve query performance. It can reduce the
need for JOIN operations but should be balanced with considerations
for data integrity and maintenance overhead. JOIN operations are
used to combine data from multiple tables, which can be slow
especially for complex queries. Denormalization aims to reduce the
need for JOINs by copying some data from one table to another.
For Example:
Imagine tables for "Customers" and "Orders." Normally, you would
link them with a foreign key (e.g., customer ID) in the Orders table.
To speed up queries that retrieve customer information along with
their orders, you could denormalize by adding some customer
details (e.g., name, email) directly into the Orders table.
10. Optimize JOIN Operations
JOIN operations combine rows from two or more tables based on a
related column. Select the JOIN type that aligns with the data you
want to retrieve. For example, to find all customers and their
corresponding orders (even if a customer has no orders), use a LEFT
JOIN on the customer ID column. The JOIN operation works by
comparing values in specific columns from both tables (join
condition). Ensure these columns are indexed for faster lookups.
Having indexes on join columns significantly improves the speed of
the JOIN operation.
Conclusion
Following these best practices for SQL query optimization, you can
ensure that your database queries run efficiently and deliver results
quickly. This will not only improve the performance of your
applications but also enhance the user experience by minimizing
delays. Therefore the key benefits of SQL query optimization are
improved performance, faster results, and a better user experience.
o Explain the concepts of normalization and its different levels
(1NF, 2NF, 3NF).
Normalization is an important process in database design that helps
improve the database's efficiency, consistency, and accuracy. It
makes it easier to manage and maintain the data and ensures that
the database is adaptable to changing business needs.
Database normalization is the process of organizing the attributes of
the database to reduce or eliminate data redundancy (having the
same data but at different places).
Data redundancy unnecessarily increases the size of the database
as the same data is repeated in many places. Inconsistency
problems also arise during insert, delete, and update operations.
In the relational model, there exist standard methods to quantify
how efficient a databases is. These methods are called normal
forms and there are algorithms to covert a given database into
normal forms.
Normalization generally involves splitting a table into multiple ones
which must be linked each time a query is made requiring data from
the split tables.
Why do we need Normalization?
The primary objective for normalizing the relations is to eliminate
the below anomalies. Failure to reduce anomalies results in data
redundancy, which may threaten data integrity and cause additional
issues as the database increases. Normalization consists of a set of
procedures that assist you in developing an effective database
structure.
Insertion Anomalies: Insertion anomalies occur when it is not
possible to insert data into a database because the required fields
are missing or because the data is incomplete. For example, if a
database requires that every record has a primary key, but no value
is provided for a particular record, it cannot be inserted into
the database.
Deletion anomalies: Deletion anomalies occur when deleting a
record from a database and can result in the unintentional loss of
data. For example, if a database contains information about
customers and orders, deleting a customer record may also delete
all the orders associated with that customer.
Updation anomalies: Updation anomalies occur when modifying
data in a database and can result in inconsistencies or errors. For
example, if a database contains information about employees and
their salaries, updating an employee’s salary in one record but not
in all related records could lead to incorrect calculations and
reporting.
Normal
Condition/Description
Form
First
Normal - All attributes must be atomic (single-valued).-
Form No repeating groups or arrays allowed in a table.
(1NF)
Second - Must be in 1NF.- All non-primary key
Normal attributes must be fully functionally
Form dependent on the entire primary key (no partial
(2NF) dependency).
Third - Must be in 2NF.- No transitive dependency
Normal
Condition/Description
Form
for non-prime attributes.- For every functional
Normal
dependency X → Y, one of the following must hold:
Form
• X is a super key, or • Y is a prime attribute
(3NF)
(part of some candidate key).
Boyce-
Codd
- Must be in 3NF.- For every functional
Normal
dependency X → Y, X must be a super key.
Form
(BCNF)
Fourth
Normal - Must be in BCNF.- Must not contain any multi-
Form valued dependency (MVD).
(4NF)
Fifth
- Must be in 4NF.- Should not have any join
Normal
dependency that results in lossless
Form
decomposition.
(5NF)
o Describe the purpose and benefits of indexing in a database.
Indexing is a crucial technique used in databases to optimize data
retrieval operations. It improves query performance by minimizing
disk I/O operations, thus reducing the time it takes to locate and
access data. Essentially, indexing allows the database management
system (DBMS) to locate data more efficiently without having to
scan the entire dataset. Indexes are organized data structures that
allow quick searching based on key values. When an index is
created for a database table, it maintains a sorted order of key
values along with pointers to the actual data rows. This process
significantly reduces the number of disk accesses required to fulfill a
query.
Attributes of Indexing
Several Important attributes of indexing affect the performance and
efficiency of database operations:
1. Access Types: This refers to the type of access such as value-
based search, range access, etc.
2. Access Time: It refers to the time needed to find a particular
data element or set of elements.
3. Insertion Time: It refers to the time taken to find the
appropriate space and insert new data.
4. Deletion Time: Time taken to find an item and delete it as well
as update the index structure.
5. Space Overhead: It refers to the additional space required by
the index.
o Compare and contrast Structured Query Language (SQL) and
NoSQL databases. When would you choose one over the
other?
o Explain the concept of database transactions and how they
handle concurrency issues.
o Discuss different types of database views and their
advantages in data security and access control.
o What database schema you will use for Juspay's payment
processing system. Explain your data model choices and
how they optimize efficiency and scalability.
1. What is DBMS ? Mention advantages..
A DBMS is a system that allows users to create, modify, and query
databases while ensuring data integrity, security, and efficient data
access. Unlike traditional file systems, DBMS minimizes data
redundancy, prevents inconsistencies, and simplifies data
management with features like concurrent access and backup
mechanisms. It organizes data into tables, views, schemas, and
reports, providing a structured approach to data management.
Example:
A university database can store and manage student information,
faculty records, and administrative data, allowing seamless
retrieval, insertion, and deletion of information as required.
Key Features of DBMS
1. Data Modeling: Tools to create and modify data models, defining
the structure and relationships within the database.
2. Data Storage and Retrieval: Efficient mechanisms for storing
data and executing queries to retrieve it quickly.
3. Concurrency Control: Ensures multiple users can access the
database simultaneously without conflicts.
4. Data Integrity and Security: Enforces rules to maintain accurate
and secure data, including access controls and encryption.
5. Backup and Recovery: Protects data with regular backups and
enables recovery in case of system failures.
Types of DBMS
There are several types of Database Management Systems (DBMS),
each tailored to different data structures, scalability requirements,
and application needs. The most common types are as follows:
1. Relational Database Management System (RDBMS)
RDBMS organizes data into tables (relations) composed of rows and
columns. It uses primary keys to uniquely identify rows and foreign
keys to establish relationships between tables. Queries are written
in SQL (Structured Query Language), which allows for efficient
data manipulation and retrieval.
Examples: MySQL, Oracle, Microsoft SQL Server and Postgre SQL.
2. NoSQL DBMS
NoSQL systems are designed to handle large-scale data and provide
high performance for scenarios where relational models might be
restrictive. They store data in various non-relational formats, such as
key-value pairs, documents, graphs, or columns. These flexible data
models enable rapid scaling and are well-suited for unstructured or
semi-structured data.
Examples: MongoDB, Cassandra, DynamoDB and Redis.
3. Object-Oriented DBMS (OODBMS)
OODBMS integrates object-oriented programming concepts into the
database environment, allowing data to be stored as objects. This
approach supports complex data types and relationships, making it
ideal for applications requiring advanced data modeling and real-
world simulations.
Examples: ObjectDB, db4o.
Database Languages
Database languages are specialized sets of commands and
instructions used to define, manipulate, and control data within a
database. Each language type plays a distinct role in database
management, ensuring efficient storage, retrieval, and security of
data. The primary database languages include:
1. Data Definition Language (DDL)
DDL is the short name for Data Definition Language, which deals
with database schemas and descriptions, of how the data should
reside in the database.
CREATE: to create a database and its objects like (table, index,
views, store procedure, function, and triggers)
ALTER: alters the structure of the existing database
DROP: delete objects from the database
TRUNCATE: remove all records from a table, including all spaces
allocated for the records are removed
COMMENT: add comments to the data dictionary
RENAME: rename an object
2. Data Manipulation Language (DML)
DML focuses on manipulating the data stored in the database,
enabling users to retrieve, add, update, and delete data.
SELECT: retrieve data from a database
INSERT: insert data into a table
UPDATE: updates existing data within a table
DELETE: Delete all records from a database table
MERGE: UPSERT operation (insert or update)
CALL: call a PL/SQL or Java subprogram
EXPLAIN PLAN: interpretation of the data access path
LOCK TABLE: concurrency Control
3. Data Control Language (DCL)
DCL commands manage access permissions, ensuring data security
by controlling who can perform certain actions on the database.
GRANT: Provides specific privileges to a user (e.g., SELECT,
INSERT).
REVOKE: Removes previously granted permissions from a user.
4. Transaction Control Language (TCL)
TCL commands oversee transactional data to maintain consistency,
reliability, and atomicity.
ROLLBACK: Undoes changes made during a transaction.
COMMIT: Saves all changes made during a transaction.
SAVEPOINT: Sets a point within a transaction to which one can
later roll back.
5. Data Query Language (DQL)
DQL is a subset of DML, specifically focused on data retrieval.
SELECT: The primary DQL command, used to query data from the
database without altering its structure or contents.
Paradigm Shift from File System to DBMS
Before the advent of modern DBMS, data was managed using basic
file systems on hard drives. While this approach allowed users to
store, retrieve, and update files as needed, it came with numerous
challenges.
A typical example can be seen in a file-based university
management system, where data was stored in separate sections
such as Departments, Academics, Results, Accounts, and Hostels.
Certain information like student names and phone numbers was
repeated across multiple files, leading to the following issues:
1. Redundancy of data
When the same data exists in multiple places, any update must be
manually repeated everywhere. For instance, if a student changes
their phone number, it must be updated across all sections. Failure
to do so leads to unnecessary duplication and wasted storage.
2. Inconsistency of Data
Data is said to be inconsistent if multiple copies of the same data do
not match each other. If the Phone number is different in Accounts
Section and Academics Section, it will be inconsistent. Inconsistency
may be because of typing errors or not updating all copies of the
same data.
3. Complex Data Access
A user should know the exact location of the file to access data, so
the process is very cumbersome and tedious. If the user wants to
search the student hostel allotment number of a student from
10000 unsorted students’ records, how difficult it can be.
4. Lack of Security
File systems provided limited control over who could access certain
data. A student who gained access to a file with grades might easily
alter it without proper authorization, compromising data integrity.
5. No Concurrent Access
File systems were not designed for multiple users working at the
same time. If one user was editing a file, others had to wait, which
hindered collaboration and slowed down workflows.
6. No Backup and Recovery
File systems lacked built-in mechanisms for creating backups or
recovering data after a loss. If a file was accidentally deleted or
corrupted, there was no easy way to restore it, potentially causing
permanent data loss.
Advantages of DBMS
1. Data organization: A DBMS allows for the organization and
storage of data in a structured manner, making it easy to retrieve
and query the data as needed.
2. Data integrity: A DBMS provides mechanisms for enforcing data
integrity constraints, such as constraints on the values of data and
access controls that restrict who can access the data.
3. Concurrent access: A DBMS provides mechanisms for controlling
concurrent access to the database, to ensure that multiple users can
access the data without conflicting with each other.
4. Data security: A DBMS provides tools for managing the security of
the data, such as controlling access to the data and encrypting
sensitive data.
5. Backup and recovery: A DBMS provides mechanisms for backing
up and recovering the data in the event of a system failure.
6. Data sharing: A DBMS allows multiple users to access and share
the same data, which can be useful in a collaborative work
environment.
Disadvantages of DBMS
1. Complexity: DBMS can be complex to set up and maintain,
requiring specialized knowledge and skills.
2. Performance overhead: The use of a DBMS can add overhead to
the performance of an application, especially in cases where high
levels of concurrency are required.
3. Scalability: The use of a DBMS can limit the scalability of an
application, since it requires the use of locking and other
synchronization mechanisms to ensure data consistency.
4. Cost: The cost of purchasing, maintaining and upgrading a DBMS
can be high, especially for large or complex systems.
5. Limited Use Cases: Not all use cases are suitable for a DBMS,
some solutions don't need high reliability, consistency or security
and may be better served by other types of data storage.
Applications of DBMS
1. Enterprise Information: Sales, accounting, human resources,
Manufacturing, online retailers.
2. Banking and Finance Sector: Banks maintaining the customer
details, accounts, loans, banking transactions, credit card
transactions. Finance: Storing the information about sales and
holdings, purchasing of financial stocks and bonds.
3. University: Maintaining the information about student course
enrolled information, student grades, staff roles.
4. Airlines: Reservations and schedules.
5. Telecommunications: Prepaid, postpaid bills maintenance.
2. What is Database?
A Database is a structured and organized collection of data stored
electronically. It is designed to manage, update, and retrieve related
data quickly and efficiently.
Characteristics:
Data is logically organized into tables.
Each table contains rows (records) and columns (attributes).
Eliminates data inconsistency by maintaining a single source of
truth.
Example: A student database might include tables for Students,
Courses, Faculties, and Marks
3. What is a database system?
A Database System is a complete environment that brings
together different components to enable efficient data storage,
management, and access. It integrates software, hardware, data,
users, and rules to create a cohesive and functional data system.
A Database System is composed of the following elements:
1. Database:
o The central component containing the actual collection of
logically related data.
o Organized into tables, schemas, and relationships among
data.
2. DBMS (Database Management System):
o The software layer responsible for interacting with the
database.
o Provides functionalities like query processing, indexing,
concurrency control, security, and backup.
3. Users:
o People interacting with the system, each with distinct roles:
Database Administrators (DBAs): Responsible for
database configuration, tuning, and user management.
Developers: Build applications that interact with the
DBMS.
End Users: Use applications to retrieve or input data.
4. Application Programs:
o Software tools or user interfaces that communicate with the
DBMS.
o Provide functionalities such as form submissions, report
generation, or data dashboards.
Additional Components of a Database System:
Hardware:
o The physical layer including storage devices, processors,
memory, and networking infrastructure.
o Stores both the DBMS and the database files.
Software:
o The collection of programs including the DBMS, application
interfaces, and utility programs.
Data:
o The actual content stored in the database — organized,
processed, and queried by users.
Procedures:
o Defined methods, guidelines, and rules to be followed during
the design, usage, and maintenance of the database system.
✅ In summary:
Database System = Database + DBMS + Users + Application
Programs + Hardware + Procedures
This integrated setup ensures efficient data management, reliable
transactions, and scalable system operation.
4. What is RDBMS ? Properties..
RDBMS (Relational Database Management System) is a type
of DBMS that organizes data into one or more tables (relations) with
rows and columns.
Properties (Based on Codd's Rules):
1. Tabular Structure: Data is stored in tables (relations).
2. Keys: Uses Primary Keys and Foreign Keys to uniquely identify
records and establish relationships.
3. Data Integrity: Ensures correctness via constraints (PK, FK, CHECK,
UNIQUE).
4. Normalization: Reduces data redundancy and maintains data
dependency.
5. ACID Properties: Ensures safe and reliable transactions:
o Atomicity: All steps in a transaction are completed or none.
o Consistency: Keeps database in a valid state.
o Isolation: Transactions do not interfere with each other.
o Durability: Completed transactions survive system failures.
6. SQL Support: Provides a standard language (SQL) to interact with
the database.
Examples: MySQL, PostgreSQL, Oracle DB, SQL Server.
5. Types of database languages
Database languages are used to perform various operations like
defining the schema, updating data, controlling access, and
handling transactions.
1. DDL (Data Definition Language):
Used to define and modify the structure of database objects.
CREATE: To create tables, schemas
ALTER: To modify existing table structure
DROP: To delete tables or databases
2. DML (Data Manipulation Language):
Used for manipulating the data itself.
SELECT: Retrieve data
INSERT: Add new data
UPDATE: Modify existing data
DELETE: Remove data
3. DCL (Data Control Language):
Used to control access to the data.
GRANT: Give user access privileges
REVOKE: Remove access privileges
4. TCL (Transaction Control Language):
Used to manage transactions in the database.
COMMIT: Save changes
ROLLBACK: Undo changes
SAVEPOINT: Define breakpoints within transactions.
6. ACID properties (VVVVV IMP)
7. Difference between vertical and horizontal scaling
Feature Vertical Scaling Horizontal Scaling
Also
Scale-up Scale-out
Known As
Upgrade the
How It Add more servers
existing server
Works to the system
(RAM, CPU)
1 server → 3
8GB RAM → 16GB
Example servers working in
RAM
parallel
High (limited by Moderate (add
Cost
hardware) cheap machines)
Usually none (can
Downtime Often needed
scale live)
Limitation Physical hardware Scales to large
s limits traffic easily
8. What is sharding
Sharding is the process of breaking a large database into smaller
parts called shards. Each shard holds a subset of the data and is
stored on a separate server.
Why Shard?
To handle large volumes of data
Improve performance
Reduce server load
Example:
User table:
o Shard 1: Users A–M
o Shard 2: Users N–Z
Each server manages fewer users → faster queries!
Note: It’s used in horizontal scaling.
9. Keys in DBMS
Keys are special fields used to uniquely identify rows in a table.
Types of Keys:
1. Primary Key – Main key to uniquely identify a row (cannot be
NULL).
2. Candidate Key – Fields that can qualify as a Primary Key.
3. Super Key – Any set of fields that can uniquely identify a row (may
include extra fields).
4. Alternate Key – Candidate keys not selected as the primary.
5. Foreign Key – A field in one table that refers to the primary key of
another.
6. Composite Key – Made up of two or more fields to identify a
record.
Example:
In Students table: Roll No = Primary Key
In Marks table: Roll No = Foreign Key (refers to Students)
10. Types of relationship
In ER Diagrams, relationships connect different entities.
1. One-to-One (1:1):
o One record in table A matches one in table B.
o Example: One person has one Aadhaar.
2. One-to-Many (1:N):
o One record in A relates to many in B.
o Example: One teacher teaches many students.
3. Many-to-One (N:1):
o Many records in A relate to one in B.
o Example: Many students belong to one department.
4. Many-to-Many (M:N):
o Records in A can relate to many in B and vice versa.
o Example: Students enroll in many courses; courses have many
students
11. Data abstraction in DBMS, three levels of it
Data abstraction helps simplify how users interact with the database
by hiding complexity.
Levels:
1. Physical Level – "How data is stored"
o Low-level details (blocks, files, indexes)
o Example: Data is stored in binary format on disk.
2. Logical Level – "What data is stored"
o Table structure, relationships, keys
o Example: A table named Students with ID, Name, Age.
3. View Level – "What the user sees"
o Customized views for different users
o Example: Student sees marks, admin sees all data.
12. Indexing in DBMS
Indexing is like a "search shortcut" to speed up queries.
How it works:
Creates a data structure (like a B-Tree or Hash Map)
Helps DBMS find data faster without scanning every row
Types:
1. Primary Index – Built on primary key (unique)
2. Secondary Index – Built on non-key fields (e.g., names)
3. Clustered Index – Reorders actual data rows
4. Non-Clustered Index – Keeps data separate and uses pointers
Tradeoff: Fast reads, but slightly slower inserts/updates (due to
index maintenance)
13. What is DDL (Data Definition Language)
DDL is used to define, modify, and remove database structures
such as tables, schemas, views, etc.
It does not handle data, only structure.
Key Features:
DDL statements are auto-committed (changes cannot be rolled
back).
Mostly used by database administrators during design or
structural updates.
Common DDL Commands:
1. CREATE: Creates new database objects.
o Example: CREATE TABLE Students (ID INT, Name
VARCHAR(50));
2. ALTER: Modifies the structure of an existing object.
o Example: ALTER TABLE Students ADD Email VARCHAR(100);
3. DROP: Deletes an object permanently.
o Example: DROP TABLE Students;
4. TRUNCATE: Deletes all records in a table but keeps the structure.
o Example: TRUNCATE TABLE Students;
o Faster than DELETE but cannot be rolled back.
14. What is DML (Data Manipulation Language)
DML is used to perform operations on the data inside database
objects (like tables).
It allows users to insert, update, delete, and retrieve data.
Key Features:
DML statements can be rolled back if enclosed in transactions.
Commonly used by application developers and end-users.
Common DML Commands:
1. SELECT: Fetches records from tables.
o Example: SELECT * FROM Students;
2. INSERT: Adds new records into tables.
o Example: INSERT INTO Students VALUES (1, 'Aman',
'
[email protected]');
3. UPDATE: Modifies existing records.
o Example: UPDATE Students SET Name = 'Rahul' WHERE ID =
1;
4. DELETE: Removes existing records.
o Example: DELETE FROM Students WHERE ID = 1;
Note:
Use WHERE clause carefully with UPDATE and DELETE to avoid
modifying all records accidentally.
15. What is normalization ? Types of them ..
Normalization is the process of organizing data to:
Reduce redundancy
Improve integrity
Avoid anomalies (errors)
Types of Normal Forms:
1. 1NF (First Normal Form):
o Each cell must contain atomic (single) values.
o No repeating groups.
o ✅ Marks = 80 ❌ Marks = 80, 90
2. 2NF (Second Normal Form):
o Must be in 1NF
o No partial dependency (non-key column depends only on part
of a composite key)
3. 3NF (Third Normal Form):
o Must be in 2NF
o No transitive dependency (non-key depends on another non-
key)
4. BCNF (Boyce-Codd Normal Form):
o Stronger version of 3NF
o Every determinant is a candidate key
5. 4NF and 5NF:
o Deal with multivalued and join dependencies (used in complex
systems)
16. What is Denormalization?
Denormalization is the process of combining normalized tables into
larger tables by adding redundant data intentionally to improve the
read performance of a database.
Why Use Denormalization?
Joins are expensive in terms of time
Reporting and analytics often require faster access
Reduces the number of joins
Example:
Normalized Structure:
Orders (OrderID, CustomerID)
Customers (CustomerID, Name, Email)
Denormalized:
Orders (OrderID, CustomerID, Name, Email)
Now no need to join with Customers when querying order information.
Advantages:
Faster SELECT queries
Easier reporting
Disadvantages:
Redundant data
Higher storage requirement
Data inconsistency risk during updates
17. What is Functional Dependency?
Functional Dependency (FD) occurs when one attribute uniquely
determines another attribute within a relation.
Definition:
If A and B are attributes of a relation, A → B means:
For every value of A, there is exactly one value of B.
Notation:
A→B
A: Determinant
B: Dependent attribute
Example:
RollNo → StudentName
Each Roll Number is associated with exactly one Student Name.
Types of Functional Dependencies:
Trivial FD: X → Y, where Y ⊆ X
Non-Trivial FD: X → Y, where Y is not a subset of X
Transitive FD: If A → B and B → C, then A → C
FDs are used to identify candidate keys and form the basis for
normalization.
18. E-R Model (Entity-Relationship Model)
The ER model is used to visually represent the structure of a database
using entities, attributes, and relationships.
Key Components:
Entity: Real-world object (e.g., Student, Book)
Entity Set: Group of similar entities (e.g., all students)
Attributes: Properties of entities (e.g., StudentName, RollNo)
Primary Key: Uniquely identifies an entity (e.g., RollNo)
Relationship: Logical connection between entities (e.g., Student
"EnrolledIn" Course)
ER Diagram Symbols:
Rectangle → Entity
Ellipse → Attribute
Diamond → Relationship
Underline → Primary Key
Double Ellipse → Multivalued Attribute
Dashed Border → Derived Attribute
19. Conflict Serializability in DBMS
A schedule is said to be conflict serializable if it can be converted to
a serial schedule by swapping non-conflicting operations.
Conflict Conditions:
Two operations are in conflict if:
They belong to different transactions
They access the same data item
At least one of them is a write
Precedence Graph (Serialization Graph):
Nodes: Transactions
Edges: Direction from T1 → T2 if T1 precedes T2 in conflicting
operations
If the graph has no cycles → schedule is conflict serializable
Example:
If T1 writes A and then T2 reads A → T1 → T2 (edge)
20. Explain Normal Forms in DBMS
Normalization is the stepwise process of removing redundancy and
anomalies by organizing attributes into multiple related tables.
Types of Normal Forms:
1. First Normal Form (1NF)
o Atomic (indivisible) values only
o No repeating groups or arrays
o Example: Subjects = Math, Physics → ❌
2. Second Normal Form (2NF)
o Must be in 1NF
o No partial dependency (non-prime attributes depend only
on a part of composite key)
3. Third Normal Form (3NF)
o Must be in 2NF
o No transitive dependency
o i.e., Non-key attributes must not depend on other non-key
attributes
4. Boyce-Codd Normal Form (BCNF)
o Every determinant must be a candidate key
o Stricter than 3NF
5. Fourth Normal Form (4NF)
o Removes multivalued dependencies
6. Fifth Normal Form (5NF)
o Eliminates join dependency
o Ensures lossless join decomposition
21. What is CCP? (Concurrency Control Protocols)
Concurrency Control Protocols help manage multiple transactions
occurring at the same time without conflicting or violating the ACID
properties.
1. Lock-Based Protocols
Use shared (read) and exclusive (write) locks
Two-Phase Locking (2PL):
o Growing Phase: Transaction acquires all required locks
o Shrinking Phase: Transaction releases locks
o Ensures serializability
2. Timestamp-Based Protocols
Each transaction gets a unique timestamp
Operations are allowed based on timestamp order
Prevents cycles and deadlocks
3. Optimistic Concurrency Control
Transactions execute without locking
At commit time, check for conflicts
If conflict → abort and retry
22. Entity, Entity Type, Entity Set, Weak Entity Set
Entity: A real-world object with attributes (e.g., Employee)
Entity Type: A blueprint for similar entities (e.g., Employee(Name,
ID))
Entity Set: All instances of an entity type in the database
Weak Entity Set:
o Cannot be uniquely identified by its attributes
o Needs a foreign key from a related entity (called the
"owner")
o Requires a partial key
Example:
Employee(EmpID) → Strong Entity
Dependent(Name, Relation, EmpID) → Weak Entity
23. What are SQL Commands? Types of Them
SQL is divided into different categories based on the task performed:
1. DDL – Data Definition Language
Defines structure of database
Commands: CREATE, ALTER, DROP, TRUNCATE
2. DML – Data Manipulation Language
Used to manipulate data
Commands: SELECT, INSERT, UPDATE, DELETE
3. DCL – Data Control Language
Manages access and permissions
Commands: GRANT, REVOKE
4. TCL – Transaction Control Language
Controls transaction boundaries
Commands: COMMIT, ROLLBACK, SAVEPOINT, SET TRANSACTION
5. DQL – Data Query Language
Fetches data
Command: SELECT
24. Nested Queries in SQL
Nested Query = Query written inside another query (also called
Subquery).
Where Used?
In WHERE, HAVING, FROM clauses
Types of Subqueries:
1. Single-row Subquery: Returns one value
2. Multi-row Subquery: Returns multiple values
3. Correlated Subquery: Depends on outer query
Example:
SELECT Name FROM Students
WHERE Marks > (SELECT AVG(Marks) FROM Students);
Correlated Example:
SELECT s1.Name FROM Students s1
WHERE EXISTS (
SELECT * FROM Students s2
WHERE s1.Class = s2.Class AND s2.Marks > 90);
25. What is JOIN? Explain Types of JOINs
A JOIN allows you to combine rows from two or more tables based on a
related column between them.
Types of Joins:
1. INNER JOIN:
o Returns rows when there is a match in both tables
2. SELECT * FROM A INNER JOIN B ON A.id = B.id;
3. LEFT JOIN (LEFT OUTER):
o Returns all rows from the left table and matched rows from the
right
4. SELECT * FROM A LEFT JOIN B ON A.id = B.id;
5. RIGHT JOIN (RIGHT OUTER):
o Returns all rows from the right table and matched rows from
the left
6. SELECT * FROM A RIGHT JOIN B ON A.id = B.id;
7. FULL OUTER JOIN:
o Returns all rows when there is a match in one of the tables
8. SELECT * FROM A FULL OUTER JOIN B ON A.id = B.id;
9. CROSS JOIN:
o Returns Cartesian product of two tables
10. SELECT * FROM A CROSS JOIN B;
11. SELF JOIN:
o Joins a table to itself using alias
12. SELECT A.Name, B.Name FROM Employees A
13. JOIN Employees B ON A.ManagerID = B.ID;
Let me know if you’d like practice questions or visual ER diagrams to
reinforce these concepts!
[...existing content remains unchanged here for brevity...]
26. Inner and Outer Join (Difference)
INNER JOIN:
Returns only the rows that have matching values in both tables.
If there is no match, the row is excluded from the result.
SELECT Orders.OrderID, Customers.Name
FROM Orders
INNER JOIN Customers ON Orders.CustomerID =
Customers.CustomerID;
OUTER JOIN:
Returns all matching rows, plus unmatched rows from one or both
tables.
NULLs are used where there is no match.
Types of OUTER JOIN:
1. LEFT OUTER JOIN: Returns all rows from the left table, and
matched rows from the right. If no match, NULLs.
SELECT Orders.OrderID, Customers.Name
FROM Orders
LEFT JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
2. RIGHT OUTER JOIN: Returns all rows from the right table, and
matched rows from the left. If no match, NULLs.
SELECT Orders.OrderID, Customers.Name
FROM Orders
RIGHT JOIN Customers ON Orders.CustomerID =
Customers.CustomerID;
3. FULL OUTER JOIN: Returns all rows when there is a match in one of
the tables. Where no match, NULLs are filled in.
SELECT Orders.OrderID, Customers.Name
FROM Orders
FULL OUTER JOIN Customers ON Orders.CustomerID =
Customers.CustomerID;
27. Practice SQL Queries from LeetCode
Some important and common SQL problems for practice:
1. Combine Two Tables
o Learn LEFT JOIN
o Easy starter problem
2. Second Highest Salary
o Use LIMIT, DISTINCT, or subqueries
o Good for understanding sorting and filtering
3. Nth Highest Salary
o Use DENSE_RANK() or nested subqueries
o Introduces window functions
4. Employees Earning More Than Their Managers
o Requires SELF JOIN
o Great for understanding aliasing and hierarchical data
5. Department Top Three Salaries
o Use DENSE_RANK() or ROW_NUMBER()
o Practice of partitioned ranking
✅ Practice link: https://leetcode.com/problemset/database/
28. Difference Between 2-Tier and 3-Tier Architecture
Feature 2-Tier Architecture 3-Tier Architecture
Client, Application Server
Layers Client and Database
(Middle Layer), and Database
Client handles Application server handles
Processing
business logic business logic
Scalability Limited High
Less secure, direct More secure due to
Security
DB access intermediate layer
Maintenanc Easy to maintain and update
Harder to manage
e modules separately
Standalone apps Web apps (like Flipkart, Banking
Example
(like MS Access) Systems)
29. Difference Between TRUNCATE and DELETE
Feature DELETE TRUNCATE
DML (can be rolled DDL (auto-committed,
Operation Type
back) cannot rollback)
Specific rows (using
Removes All rows
WHERE)
WHERE clause Supported Not Supported
Slower (row-by-row
Speed Faster (resets entire table)
deletion)
Transaction
Yes, can ROLLBACK No, cannot be rolled back
control
Does not fire triggers (in
Triggers Fires triggers
most DBs)
Auto-increment Yes (in some DBMS like
No
reset MySQL)
30. Intension vs Extension in DBMS
Term Intension Extension
Also Schema Instance
Term Intension Extension
Called
Structure/definition of the Actual data/records in the
Meaning
table table
Changes Rarely changes Frequently changes
Table Students (ID, Name, Rows like (101, 'Aman',
Example
Marks) 89)
Purpose Design-time concept Run-time data
31. Share Lock vs Exclusive Lock
Lock in DBMS is a mechanism to ensure concurrent transactions do
not result in data inconsistency.
1. Shared Lock (S-Lock):
Allows reading the data
Multiple shared locks can be held on the same data item
Write is NOT allowed while shared locks exist
Example:
T1 and T2 both can read the same row using shared locks
2. Exclusive Lock (X-Lock):
Allows reading and writing the data
Only ONE transaction can hold exclusive lock at a time
No other lock (shared or exclusive) allowed simultaneously
Example:
T3 holds an exclusive lock to update salary → all others must wait
Lock Compatibility Table:
Exclusive
Shared Lock
Lock
Shared ✅ Allowed ❌ Not Allowed
Exclusiv
❌ Not Allowed ❌ Not Allowed
e
These locks are used in protocols like Two-Phase Locking (2PL) to
ensure serializability in concurrent transactions.
Let me know if you'd like more practice queries, visuals for join
operations, or revision notes!
[...existing content remains unchanged here for brevity...]
32. Difference Between File System and DBMS
Definition
File System:
A file system is a method of storing and organizing files on a storage
device (like HDD or SSD). It manages how data is named, stored, and
retrieved. It organizes data into files and directories.
Example: NTFS (Windows), EXT4 (Linux)
DBMS (Database Management System):
A DBMS is a software system used to create, manage, and manipulate
structured data efficiently. It supports querying, transaction
management, indexing, and provides data security and integrity.
Example: Oracle, MySQL, PostgreSQL
Difference Table: File System vs DBMS
DBMS (Database
Feature File System
Management System)
Organizes data in
Organizes data as
Structure structured tables with
files in directories
relationships
High redundancy
Data Minimal redundancy using
(duplicate data may
Redundancy normalization
exist)
Backup & Provides automated tools
No built-in support
Recovery for backup and recovery
Query No query language; Supports SQL and
Processing manual access relational algebra for fast
DBMS (Database
Feature File System
Management System)
access
High consistency due to
Data Inconsistent due to
constraints and
Consistency duplication
normalization
Simple to use and More complex due to
Complexity
manage advanced features
High-level user access
Security Low-level security controls, roles, and
encryption
Less expensive (free More expensive (requires
Cost
and built-in) installation and licensing)
Data Supports logical and
Independenc Not supported physical data
e independence
Only one user or
Multiple users can access
User Access difficult concurrent
concurrently
access
Difficult to share due Easy sharing due to
Sharing
to scattered files centralized design
Data Shows file locations Hides internal details using
Abstraction and formats three levels of abstraction
Integrity Hard to implement Easily enforceable through
Constraints manually keys and constraints
Attributes to File name, path Table and column names
Access required used in SQL
C, COBOL, Manual file
Examples Oracle, MySQL, SQL Server
handling
Summary:
A File System is good for storing small or unstructured data with
simple requirements.
A DBMS is ideal for complex, structured data that requires integrity,
security, and multiple access.
Conclusion:
The File System is basic, lightweight, and best for small, standalone
applications. In contrast, DBMS provides comprehensive, scalable, and
secure data management for modern applications requiring
consistency and concurrent access.
Let me know if you'd like a diagram or quiz on this topic!