**What is SQL and why is it important in Data Engineering?
SQL (Structured Query Language) is used to communicate with relational databases. It
allows data engineers to extract, manipulate, and manage large volumes of structured data
efficiently.
**What is the difference between OLTP and OLAP?
OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are both
types of database systems but used for different purposes:
OLTP is used for day-to-day transactions like ATM withdrawals, online shopping, etc.
➤ Fast, real-time, handles lots of small queries.
➤ Example: Booking a movie ticket.
OLAP is used for analyzing data and generating reports.
➤ Works with large volumes of data, slower, used for decision-making.
➤ Example: Sales trends report for the last year.
Explain the difference between a UNION and UNION ALL operator.
UNION: Combines results from two queries and removes duplicates.
➤ Think of it like merging two lists and keeping only unique values.
UNION ALL: Combines results from two queries and keeps all records, including duplicates.
➤ It merges everything as-is, even if some rows are repeated.
What is the difference between DELETE, TRUNCATE, and DROP?
🔹 DELETE: Delete Removes specific rows from a table (you can use WHERE). Delete Can be
undone (if inside a transaction). Table structure remains.
🔹 TRUNCATE: Removes all rows from a table. truncate faster than DELETE. Table structure
remains.
🔹 DROP: Drop Deletes the entire table (data + structure). Table is completely removed from
the database
What are the ACID properties in the context of database transactions?
ACID properties are a set of four key rules that ensure database transactions are reliable and
safe.
Atomicity means a transaction is treated as one complete unit. Either everything in the
transaction happens, or nothing happens at all.
Consistency ensures that the database always follows its rules. After a transaction, the
data must remain valid. If a rule says that a person’s age can’t be negative, then the
transaction won’t allow an incorrect value to be stored.
Isolation means that transactions happen independently. If two people are making
changes to the database at the same time, their actions won’t interfere with each other,
and the results will be as if the transactions happened one after the other.
Durability ensures that once a transaction is done, the changes are saved permanently—
even if the system crashes right after. The data stays safe and doesn't get lost.
In short, ACID helps keep databases accurate, stable, and reliable
Explain the different types of SQL Joins.
Joins are used to combine data from two or more tables based on a related column.
INNER JOIN: Returns only the rows where there is a match in both tables.
LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table, and the
matching rows from the right table. If there's no match in the right table, NULLs are
returned for the right table's columns.
RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table, and the
matching rows from the left table. If there's no match in the left table, NULLs are
returned for the left table's columns.
FULL OUTER JOIN: Returns all rows when there is a match in either the left or right
table. If no match, NULLs are returned for the non-matching side. (Not directly
supported in all databases like MySQL, often simulated with UNION).
CROSS JOIN: Returns the Cartesian product of the two tables, combining every row
from the first table with every row from the second table.
What is a Primary Key? What is a Foreign Key?
🔹 Primary Key (PK) – A Primary Key is a column (or set of columns) in a table that uniquely
identifies each row. It cannot be empty (NULL). It must be unique – no two rows can have
the same value.
Example: In a table of students, the student_id can be the primary key because every
student has a unique ID.
🔹 Foreign Key (FK) – A Foreign Key is a column in one table that links to the Primary Key of
another [Link] helps connect two tables [Link] ensures data matches across tables.
Example: In a table of orders, the customer_id can be a foreign key that refers to the id in
the customers table. It tells which customer made the order.
✅ In short:
Primary Key = Unique ID for each row in a table
Foreign Key = Link to a primary key in another table
What is Normalization? Why is it important?
Normalization is the process of organizing data in a database to reduce redundancy and
improve data integrity. It involves splitting tables and defining relationships.
1NF: No repeating groups
2NF: No partial dependency
3NF: No transitive dependency
Normalization ensures clean, efficient data structures but may increase query complexity.
What is Denormalization?
Denormalization is the opposite of normalization. It involves combining tables to reduce the
number of joins and improve read performance, especially in data warehousing. It can lead
to data redundancy.
Explain the difference between WHERE and HAVING clauses.
WHERE Clause: Used to filter individual rows before grouping. It operates on raw
data.
HAVING Clause: Used to filter groups after the GROUP BY clause has been applied. It
operates on the results of aggregate functions.
What is a subquery? When would you use a correlated subquery?
A subquery is a query inside another query.
It helps you get a value that you can use in the main query.
Correlated Subquery: A subquery that uses data from the outer query. Runs for
each [Link] it when you need to compare each row to related data in another table
or the same table — like comparing a student’s marks to others in the same class.
What are indexes in SQL, and why are they used?
In simple words, an index in SQL is like a book index — it helps the database find data
faster without scanning the whole table.
🔹 Why are Indexes Used?
To speed up searches and queries.
Makes things like SELECT, WHERE, JOIN, and ORDER BY run faster.
❗ But indexes take extra space and slow down inserts/updates a little.