1. What is a Database Management System (DBMS)? Explain its purpose.
A Database Management System (DBMS) is software that manages databases. It allows
users to store, retrieve, update, and manage data efficiently.
Purpose of DBMS:
● Data Organization: Helps structure and organize large datasets.
● Data Integrity & Security: Ensures data consistency, accuracy, and security.
● Concurrency Management: Allows multiple users to access data simultaneously.
● Backup & Recovery: Provides mechanisms to recover data in case of failures.
Benefits of a DBMS:
● Data Consistency: Ensures that the same data is available for all users in the system.
● Reduced Data Redundancy: By centralizing data, a DBMS minimizes duplication of
information.
● Efficient Data Retrieval: The use of indexes and query optimization enables faster data
access.
● Centralized Control: A DBMS provides a centralized way to control and manage data,
reducing the complexities of handling data across multiple locations or applications.
● Data Scalability: DBMS can handle vast amounts of data and are scalable to meet
growing demands.
Why DBMS is Important:
1. Data Storage Efficiency: DBMS systems efficiently store large amounts of data and
make them easily accessible without compromising performance.
2. Data Integrity and Accuracy: By enforcing rules and constraints, DBMS systems
ensure that the data is valid, consistent, and accurate across all operations.
3. Multi-user Support: With its transaction management features, DBMS allows multiple
users to access and manipulate the database without conflicts, ensuring smooth and
synchronized operations.
4. Better Decision-Making: The structured storage and easy retrieval of data make it
possible for businesses to make better, data-driven decisions.
Example DBMS:
● Relational DBMS (RDBMS): MySQL, PostgreSQL, Oracle, SQL Server.
● NoSQL DBMS: MongoDB, Cassandra.
2. What are the advantages of using databases?
Advantages:
1. Data Integrity & Accuracy: Enforces constraints to ensure data
correctness(Consistency)
2. Data Security: Provides access control mechanisms,Encryption and Authentication:.
3. Reduces Data Redundancy: Avoids unnecessary data duplication.
4. Concurrency Control: Allows multiple users to access data efficiently and Locking
Mechanisms.
5. Scalability: Supports handling large volumes of data.
6. Backup & Recovery: Ensures data recovery in case of system failures.
7. Efficient Data Management and Retrieval : Structured Data Organization , Query
Optimization
8. Data Independence
9. Enhanced Decision-Making
10.Reduced Data En
11.try and Duplication Errors
12.Flexibility and Querying Power
13.Data Sharing
14.Cost Efficiency Over Time
3. What is SQL? Explain its purpose.
SQL (Structured Query Language) is a language used to communicate with databases.
SQL (Structured Query Language) is a standardized programming language used to manage
and manipulate relational databases. It is primarily used for querying, updating, and managing
data stored in relational database management systems (RDBMS)
Purpose of SQL:
● Data Manipulation: (INSERT, UPDATE, DELETE)
● Data Retrieval: (SELECT, JOIN, WHERE)
● Database Administration: (CREATE, DROP, ALTER)
● Data Control: (GRANT, REVOKE)
Explain relational databases.
A Relational Database (RDBMS) organizes data into structured tables with rows and columns,
following ACID properties (Atomicity, Consistency, Isolation, Durability).
Advantages of Relational Databases:
● Structured Data: Data is organized in a clear, consistent way that makes it easy to
retrieve and update.
● Data Integrity: Enforces rules to make sure data is accurate and consistent (e.g.,
through primary and foreign keys).
● Queries: You can easily perform queries to combine, filter, and sort data from different
tables.
5. What is a Primary Key and a Unique Key? Explain their differences.
Primary Key:
● A Primary Key is a column (or a set of columns) that uniquely identifies each record in a
table.
● Key characteristics of a Primary Key:
1. Uniqueness: The value in the primary key column must be unique for every
record in the table. No two rows can have the same primary key value.
2. Not Null: A primary key column cannot have NULL values. Every record must
have a valid value for the primary key.
3. One per Table: Each table can have only one primary key, but this primary key
can consist of multiple columns (this is called a composite primary key).
4. Purpose: The primary key is used to uniquely identify each row in the table and
establish relationships with other tables (through foreign keys).
Unique Key:
● A Unique Key is a constraint that ensures all values in a specific column (or set of
columns) are unique across all records in the table, but it allows NULL values (unlike a
Primary Key).
● Key characteristics of a Unique Key:
1. Uniqueness: Like the primary key, the unique key ensures that the values in the
column(s) are unique, meaning no two rows can have the same value in that
column.
2. Nullability: Unique keys can allow NULL values, but multiple NULL values are
allowed. This is because in SQL, NULL is considered a special value that is not
equal to anything, including another NULL.
3. Multiple Unique Keys per Table: A table can have multiple unique keys. For
example, you could have a unique constraint on both the Email column and the
PhoneNumber column.
4. Purpose: Unique keys are used to enforce uniqueness for a column that is not
the primary key. They provide a way to ensure no duplicates in certain fields.
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
email VARCHAR(100) UNIQUE
);
Here, emp_id is a Primary Key (unique, not null), and email is a Unique Key (unique, but can
be null).
6. What is a Foreign Key? Why is it needed?
A Foreign Key is a column (or a set of columns) in a child table that creates a link to the
primary key (or unique key) of a parent table. It establishes a relationship between two tables
in a relational database.
CREATE TABLE Departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100)
);
CREATE TABLE Employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
dept_id INT,
FOREIGN KEY (dept_id) REFERENCES Departments(dept_id)
);
Here, dept_id in Employees references dept_id in Departments.
Why is a Foreign Key Needed?
A foreign key is important because it:
● Enforces Referential Integrity
● Maintains Data Consistency
● Represents Relationships Between Entities
● Enables Complex Queries
7. What are the different types of commands in SQL? Provide examples for
each.
DQL : DQL commands are used to query or retrieve data from a database
DDL : DDL commands are used to define or modify the structure of a database. This includes
creating, altering, or dropping tables, views, indexes, etc.
DML : DML commands are used to manipulate the data within the tables. These commands
allow you to insert, update, and delete records.
DCL : DCL commands are used to control access to data in a database. These commands
manage user permissions and access rights. (GRANT , REVOKE)
TCL : TCL commands are used to manage transactions in a database. These ensure data
integrity by controlling how changes to the data are committed or rolled back (COMMIT ,
ROLLBACK)
8. What are Data Types in SQL?
Data types define the type of data stored in a column.
Common SQL Data Types:
● Numeric: INT, BIGINT, DECIMAL(10,2), FLOAT
● String: VARCHAR(255), TEXT, CHAR(10)
● Date & Time: DATE, DATETIME, TIMESTAMP
● Boolean: BOOLEAN (TRUE or FALSE)
9. What do you mean by constraints? Explain the different types of
constraints in SQL.
Constraints are rules or restrictions applied to the columns in a database table to ensure data
integrity, accuracy, and consistency
10. What is the use of the WHERE clause and the HAVING clause?
WHERE Clause
● Purpose: The WHERE clause is used to filter rows before any groupings are made in the
query. It specifies conditions on individual rows in a table to select the rows that satisfy
the condition.
● When Used: The WHERE clause is used when you want to filter data before applying any
aggregate functions (like SUM, COUNT, AVG, etc.) or when you're not using aggregation at
all.
● Applies To: Individual rows of the table. It works on rows directly.
● Key Point: The WHERE clause is applied before any grouping or aggregation of data.
HAVING Clause
● Purpose: The HAVING clause is used to filter data after the data has been grouped. It is
typically used with aggregate functions like COUNT, SUM, AVG, etc., to filter groups of
rows rather than individual rows.
● When Used: The HAVING clause is used when you want to filter data after grouping the
rows using the GROUP BY clause. It allows you to apply conditions to aggregated data.
● Applies To: Groups of rows that have been created using the GROUP BY clause.
● Key Point: The HAVING clause is applied after the data is grouped and aggregated.
11. What is the difference between SQL DELETE and SQL TRUNCATE
commands?
12. What is the difference between CHAR, VARCHAR, and NVARCHAR in
SQL?
13. What is a view in SQL?
A view in SQL is a virtual table that is based on the result of a SELECT query. It doesn't store
data physically; instead, it dynamically pulls data from one or more tables whenever it is
accessed
Key Characteristics of Views:
● Virtual Table
● Simplifies Complex Queries
● Security
● Data Independence
● Read-Only or Updatable
Benefits of Using Views:
● Simplification
● Data Security
● Reusability
● Abstraction
Types of Views:
1. Simple Views:
○ A simple view is based on a single table and does not include aggregate
functions or GROUP BY.
2. Complex Views:
○ A complex view can involve multiple tables, joins, subqueries, and aggregate
functions.
Intermediate SQL Questions
1. What is Normalization? Explain the different types or forms of
normalization.
Normalization is the process of organizing data in a relational database to reduce redundancy
and avoid undesirable characteristics like data anomalies (insertion, update, and deletion
anomalies).
Why Normalize Data?
1. Eliminate Data Redundancy: It avoids storing the same data in multiple places.
2. Improve Data Integrity: By organizing data logically, normalization ensures that data is
consistent and accurate.
3. Reduce Anomalies: Helps in eliminating problems like update anomalies, insert
anomalies, and delete anomalies.
4. Efficient Queries: Helps in improving query performance by having smaller, more
specific tables that are easier to work with.
1. First Normal Form (1NF)
● Definition: A table is in First Normal Form if it meets the following criteria:
○ Each column contains atomic (indivisible) values (no repeating groups or
arrays).
○ Each column contains only one type of data (e.g., no mixed data types in a
single column).
○ Each row is unique and identifiable (i.e., no duplicate rows).
2. Second Normal Form (2NF)
● Definition: A table is in Second Normal Form if:
○ It is in First Normal Form (1NF).
○ No partial dependency exists, meaning every non-prime attribute (non-key
attribute) is fully dependent on the entire primary key (not just a part of the key
in the case of composite keys).
3. Third Normal Form (3NF)
● Definition: A table is in Third Normal Form if:
○ It is in Second Normal Form (2NF).
○ No transitive dependencies exist, meaning non-key attributes should not
depend on other non-key attributes.
4. Boyce-Codd Normal Form (BCNF)
● Definition: A table is in Boyce-Codd Normal Form (BCNF) if:
○ It is in Third Normal Form (3NF).
○ Every determinant is a candidate key. A determinant is an attribute (or set of
attributes) that uniquely determines another attribute.
5. Fourth Normal Form (4NF)
● Definition: A table is in Fourth Normal Form (4NF) if:
○ It is in Boyce-Codd Normal Form (BCNF).
○ There are no multi-valued dependencies. This means that no field in the table
should have more than one independent multi-valued attribute.
2. What is Denormalization? Why is it needed?
Denormalization is the process of introducing redundancy into a database by combining
tables or adding redundant data. This is essentially the reverse of normalization
Denormalization involves merging tables or duplicating data to improve performance
Why is Denormalization Needed? Advantage
● Improved Query Performance
● Reduced Complexity
● Faster Access to Data
● Optimized for Reporting and Analytics
● Avoiding Expensive Joins
Disadvantages of Denormalization
● Data Redundancy
● Increased Risk of Inconsistencies
● More Complex Updates
● Increased Maintenance
Example (Denormalization)
Instead of having separate Orders and Customers tables, we store customer details in
Orders table to improve query speed.
3. What is a Join in SQL? Explain the different types of joins in SQL.
A JOIN in SQL is an operation used to combine rows from two or more tables based on a
related column between them
Types of Joins:
1. INNER JOIN → The INNER JOIN returns only the rows that have matching values in
both tables. If there is no match, the row will not be included in the result.
2. LEFT JOIN → The LEFT JOIN returns all the rows from the left table (table1), and the
matching rows from the right table (table2). If there is no match, the result will still
include the row from the left table with NULL values for the columns from the right table
3. RIGHT JOIN → The RIGHT JOIN is the opposite of the LEFT JOIN. It returns all the
rows from the right table (table2), and the matching rows from the left table (table1). If
there is no match, the result will include the row from the right table with NULL values for
the left table’s columns.
4. FULL OUTER JOIN → The FULL JOIN returns all rows when there is a match in either
the left (table1) or right (table2) table. It returns NULL for non-matching rows from both
tables
5. CROSS → The CROSS JOIN produces a Cartesian product of the two tables. It
returns all possible combinations of rows between the tables. If Table1 has 3 rows and
Table2 has 4 rows, the result will contain 3 * 4 = 12 rows.
6. SELF JOIN→ A SELF JOIN is a join where a table is joined with itself. This can be
useful when a table contains hierarchical data or when you want to compare rows within
the same table.
4. What is a CROSS JOIN and a SELF JOIN in SQL?
CROSS JOIN
Definition: A CROSS JOIN returns the Cartesian product of two tables. This means it will
combine each row from the first table with each row from the second table, regardless of
whether they match in any way.
● No condition: Unlike other joins, a CROSS JOIN doesn’t require an ON condition
because it's based on every possible combination of rows.
● Result: If Table1 has n rows and Table2 has m rows, the result will have n * m rows.
2. SELF JOIN
Definition: A SELF JOIN is when a table is joined with itself. Essentially, it’s a way to compare
rows within the same table. It is helpful when the data within a table is hierarchical or when you
want to find relationships between rows in the same table.
● Alias usage: Since you're joining a table with itself, you’ll need to use aliases to
distinguish the two instances of the same table.
5. What is the execution order of SQL statements?
SQL execution order (logical order in which queries are processed):
1. FROM & JOIN
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
6. What is an Index in SQL? Why is it needed?
An Index in SQL is a database object that improves the speed of data retrieval operations on
a table.
Why is an Index Needed?
● Faster Data Retrieval
● Improved Query Performance:
● Efficient Sorting
● Faster Joins
● Uniqueness Enforcement
Types of Indexes in SQL
1. Single-Column Index:Created on a single column of a table.
2. Composite Index (Multi-Column Index):Created on two or more columns of a table
3. Unique Index:Ensures that all values in the indexed column(s) are unique
4. Clustered Index:A clustered index determines the physical order of data rows in the
table. A table can only have one clustered index because the data can only be stored
in one order. By default, a Primary Key constraint creates a clustered index on the
column.
5. Non-Clustered Index:A non-clustered index is a separate data structure that stores
pointers to the actual rows in the table. It allows for multiple indexes on a table, and each
non-clustered index has its own structure. It's typically used to improve performance
when querying non-primary columns
6. Full-Text Index:Used for full-text searching
Drawbacks of Using Indexes
● Increased Storage Requirements
● Slower Write Operations
● Complexity
7. What is the difference between Clustered and Non-Clustered Indexes?
Clustered Index:A clustered index determines the physical order of data rows in the table. A
table can only have one clustered index because the data can only be stored in one order. By
default, a Primary Key constraint creates a clustered index on the column.
Non-Clustered Index:A non-clustered index is a separate data structure that stores pointers
to the actual rows in the table. It allows for multiple indexes on a table, and each non-clustered
index has its own structure. It's typically used to improve performance when querying
non-primary columns
8. What do you mean by a Stored Procedure? Why is it needed?
A Stored Procedure in SQL is a precompiled collection of SQL statements that can be
executed as a single unit. It is essentially a set of SQL queries and logic (such as conditions,
loops, and exception handling) that are stored in the database and can be executed whenever
needed, without having to rewrite the SQL code each time.
Why is a Stored Procedure Needed? (ADVANTAGES)
● Reusability
● Improved Performance
● Better Security and Access Control
● Code Centralization and Maintainability
● Supports Complex Operations
● Error Handling and Transactions
Disadvantages of Stored Procedures
● Complexity
● Portability
● Debugging
9. What are Functions in SQL? When should you use a function over a
stored procedure?
In SQL, a function is a type of database object that performs a specific task and returns a
single value
● Functions return a single value or table.
● Stored Procedures do not return values but execute SQL logic.
When to Use a Function Over a Stored Procedure
● If You Need to Return a Value
● When You Need to Use the Result in a Query Expression
● If You Want to Perform a Simple Operation Without Changing Data
● If You Want to Avoid Side Effects (Data Modification)
● When You Need to Use the Function Within Other Functions or Procedures
● When You Need Transactions or Complex Logic
10. What is the difference between a Function and a Stored Procedure?
11. What are Triggers in SQL? How do you create them? Explain their
purposes.
A Trigger is an automatic action executed when an event (INSERT, UPDATE, DELETE) occurs.
12. What is SQL Injection? How can it be prevented?
SQL Injection is an attack where malicious SQL is executed via user input.
Preventing SQL Injection:
1. Use Prepared Statements
SELECT * FROM Users WHERE username = ? AND password = ?;
2. Escape User Input
3. Use ORM (e.g., Hibernate, Sequelize)
13. What is the difference between the BETWEEN and IN operators in SQL?
14. What are the various types of relationships in SQL?
1. One-to-One (1:1) → A person has one passport.
2. One-to-Many (1:M) → A department has many employees.
3. Many-to-Many (M:M) → A student enrolls in multiple courses.
15. What is the difference between COALESCE() and ISNULL()?
● ISNULL(col, value) → Works only in SQL Server.
● COALESCE(col1, col2, default_value) → Works in multiple databases.
16. What is a Subquery? What are its types?
A subquery is a query inside another query.
Types:
1. Correlated Subquery → Uses outer query values.
2. Non-Correlated Subquery → Executes independently.
SELECT * FROM Employees WHERE dept_id IN (SELECT dept_id FROM Departments
WHERE location = 'NYC');
Advanced SQL Questions
1. What is a transaction in SQL, and why is it important?
A transaction is a sequence of SQL operations that are executed as a single unit of work. It
follows the ACID properties (Atomicity, Consistency, Isolation, Durability) to ensure data
integrity.
2. How do you delete duplicate rows in SQL?
Using ROW_NUMBER() with DELETE:
3. What is a Common Table Expression (CTE) in SQL?
A CTE is a temporary result set used in complex queries to improve readability.
4. How can you improve the performance of SQL queries?
Performance Optimization Techniques:
1. Use Indexing (CREATE INDEX idx ON table(column);)
2. Avoid SELECT * (Use only required columns)
3. Use Proper Joins (Prefer INNER JOIN over OUTER JOIN if not needed)
4. Use EXISTS instead of IN
5. Partition Large Tables (PARTITION BY for faster retrieval)
6. Use Caching for frequent queries
7. Optimize WHERE Clause Filtering (Avoid LIKE '%text%' searches)
5. What is the difference between ROW_NUMBER(), RANK(), and
DENSE_RANK()?
6. What is a materialized view, and how does it differ from a normal view?
● View: A virtual table that does not store data permanently.
● Materialized View: A physical copy of a query result that is periodically refreshed.
7. How do you handle large datasets in SQL efficiently?
● Partitioning (PARTITION BY)
● Indexing (CREATE INDEX)
● Batch Processing (LIMIT, OFFSET)
● Query Optimization (Use EXPLAIN)
8. What is a window function in SQL?
A window function performs calculations across a set of rows related to the current row.
Example (Using SUM() as a window function):
9. How do you detect slow queries in SQL?
● Use EXPLAIN ANALYZE to check query execution plans.
● Check slow query logs (SHOW VARIABLES LIKE 'slow_query_log';)
● Use database profiling (SHOW PROFILE)
EXPLAIN ANALYZE SELECT * FROM Employees WHERE salary > 50000;
10. What are Recursive Queries in SQL?
A recursive query is a query that refers to itself, commonly used for hierarchical data (e.g.,
employee-manager relationships).
11. What is sharding in SQL?
Sharding is a database scaling technique that divides large databases into smaller, more
manageable pieces.
Example:
If we have millions of users, we can shard by region:
● Shard 1: Users from USA
● Shard 2: Users from Europe
● Shard 3: Users from Asia
12. What is the difference between EXISTS and IN in SQL?
13. What is Database Indexing, and how does it work?
An index speeds up data retrieval by creating a lookup table.
CREATE INDEX idx_name ON Employees(name);
14. What is the difference between NoSQL and SQL databases?
15. How do you optimize queries for real-time applications?
● Use Indexed Searches
● Avoid SELECT *, fetch only required columns
● Use Connection Pooling
● Use Caching Mechanisms (e.g., Redis, Memcached)