Dbms Notes
Dbms Notes
com
1
DBMS
INDEX
[email protected]
S.NO TOPIC PAGE NO.
DBMS 1 – 46
1 DBMS Introduction 3–4
2 DBMS Architecture 5–6
3 Data Abstraction 7–8
4 Types of Data Models 9 – 10
5 ER Model 11 – 14
6 Relational Model 15 – 16
7 Types of Keys 17 – 18
8 Normalisation 19 – 23
9 Denormalization 24 – 25
10 Transactions & Concurrency Control 26 – 31
11 SQL Commands 32 – 35
12 Indexing, SQL Optimisation, Sharding 36 – 39
13 SQL Queries Practice 40 – 43
14 MCQs & Answer Key 43 – 46
Important topics
2
DBMS Introduction
[email protected]
1. What is a Database and a DBMS?
A Database Management System (DBMS) is software that enables users to create,
manipulate, and administer databases. Allows secure data storage and retrieval it quickly,
modify it, and add new data whenever needed.
Example: A Picture a college library: books are arranged shelf-by-shelf by subject, so you can
walk straight to the “C-Programming” rack instead of digging through every pile. The library
catalog system functions like a DBMS index; they make searching easy.
3
4. Popular Types of Databases with Easy Examples
[email protected]
Data stored in tables with
UPI transactions: Each
defined schemas, linked via
Relational Highly structured transfer is recorded
relationships (e.g., primary /
(SQL) tasks. atomically with debit/credit
foreign keys), ensuring ACID
as a single transaction.
properties.
Self Search:
Why might Flipkart use a key-value store for user sessions, while UPI requires a relational
database?
4
DBMS Architecture
[email protected]
1. What is the Architecture of DBMS?
The architecture of a database system describes where each software component resides
and how those components communicate. A clear understanding of this layout enables
informed decisions regarding performance, security, and scalability.
3. Distributed Databases
A distributed database stores segments of data on two or more geographically separated
servers, yet presents them as a single logical database.
Example: An educational trust operating campuses in Delhi, Mumbai, and Bengaluru maintains
each city’s attendance data on a local server to ensure rapid retrieval. The system
synchronizes records asynchronously (eventual consistency) or via distributed transactions
(strong consistency), depending on requirements.
4. CAP Theorem
In distributed systems, Partition Tolerance is mandatory; trade-off between C and A
Consistency (C) – Every node reflects identical data at any given moment.
Availability (A) – Every request receives a timely response, without guarantee that the data
are the most recent.
Partition Tolerance (P) – The system continues operating even when communication
between nodes is disrupted.
5
[email protected]
Two Properties
Scenario Explanation
being followed
Consistency +
Availability (e.g., Monetary transfers must never diverge, even
UPI-based mess-wallet blocking transactions if slight delays occur during network
payments during partitions to disturbances. Temporary unavailability may
prevent double- be tolerated.
spending).
Self Search:
What 2 properties among CAP are guaranteed in case of financial applications like
paytm/paypal, etc.?
6
Data Abstraction
[email protected]
1. What is Data Abstraction?
Data abstraction hides implementation details (e.g., storage formats, indexing) from users and
applications, exposing only relevant data structures and operations. By separating how data
is stored from what the data represents, a DBMS improves security, flexibility, and
maintainability.
Key Point: Each higher level is insulated from changes made at the level beneath it.
3. Data Independence
7
4. Essential Takeaways
• Physical (lowest) → Logical → View (highest abstraction) hierarchy; knowledge flows
[email protected]
upward, while changes should ideally flow downward.
• Achieving full data independence is challenging but even partial success greatly reduces
maintenance costs.
• In daily student life, the view level is what mobile apps and portals expose, the logical level
is what developers design, and the physical level remains hidden in the server room.
Self Search:
Explore the evolution of Instagram’s logical and physical data independence from a photo
sharing platform to stories, reels, and DM’s.
8
Types of Data Models
[email protected]
1. What are Data Models in DBMS?
A data model supplies the formal grammar for describing how facts are organised inside a
database. Choosing the correct model influences performance, flexibility, and ease of
maintenance.
• Network Model
The network model expands the tree by
allowing a record to have multiple
parents, thereby supporting many-to-
many links.
Example: A Student can be linked to
several Clubs, while each Club links
back to many Students and multiple
Events. Retrieving every student
attending a coding marathon involves
traversing these linked sets.
9
• Relational Model
The relational model stores data in two-dimensional tables
[email protected]
and enforces relationships with primary and foreign keys.
Queries use structured query language (SQL).
Example: The college database maintains tables STUDENT
(roll No, name, programme), COURSE (code, title,
credits), and MARKS (roll No, code, grade). Joins enable
the examination section to publish marksheets.
• Object-Oriented Model
This model persists classes exactly as defined in an
object-oriented language, keeping both attributes and
methods.
Example: In a robotics research database, each Robot
object stores arrays of Sensor objects. When
technicians fetch a robot record, they can immediately
get the object associated with it.
10
ER Model
[email protected]
1. Purpose of the ER Model
The ER model is a conceptual-level blueprint (not implementation-specific) that defines
entities, attributes, and relationships, bridging real-world scenarios to database design.
2. Terminologies
Rectangles → Entity sets (strong entity in rectangle and weak entity in double rectangle).
Weak entities use double rectangles with a dashed underline for partial keys (e.g., bed
Number in Occupancy) and a double diamond for identifying relationships (e.g., Hostel Room
→ Occupancy).
attribute
Ellipses → Attributes (Double ellipse → multivalued attribute (e.g., phone numbers), Dashed
ellipse → derived attribute (e.g., age), Underlined name → primary key).
11
Lines → Connections, cardinality marks appear on these lines.
[email protected]
Cardinality is marked as:
• 1:1: |──| (e.g., Student ↔ Library Card).
• 1: N: |──< (e.g., Department → Students).
• M: N: >──< (e.g., Students ↔ Courses).
Cardinality
Definition Example
Symbol
One entity may relate to many One Instructor teaches many Courses, but each
1:N
on the other side. course has exactly one instructor for a semester.
Participation
Total: Every entity must participate (e.g., every Library Card must be assigned to a student).
Partial: Partial participation means not all entities in the entity set participate in the relationship
(e.g., not every student joins a Club).
5. Weak Entities
A weak entity lacks a primary key of its own and depends on a strong owner entity plus a
partial key.
Example:
OWNER entity set: Hostel Room (room No).
WEAK entity set: Occupancy uses dashed underline for partial key (bed Number) and double
diamond for identifying relationship with Hostel Room.
Associated occupancy records should be automatically deleted when the owner entity is
deleted (cascading delete).
12
7. Specialization
Specialization divides a superclass into exclusive or overlapping subclasses. It may be total
[email protected]
(every entity must belong to a subclass) or partial (some entities may not belong to any subclass).
Example:
[STUDENT] [INSTRUCTOR]
\ /
[PERSON]
PERSON holds common attributes (name, address).
STUDENT has programme, roll No; INSTRUCTOR has department, salary.
Queries requiring all people (e.g., campus email list) consult only the PERSON, while
programme-specific queries use STUDENT alone.
- Generalization combines multiple lower-level entities into a higher-level entity by
collecting their common attributes (e.g., STUDENT and INSTRUCTOR into PERSON).
- Specialization pushes distinct features downward (“one becomes many”).
8. Aggregation
Aggregation means combining smaller parts into one big meaningful unit, just like making a
full project using different modules. In an ER diagram, aggregation is used when a relationship
itself needs to be connected to something else. We group the relationship + its entities and
treat them as one single unit.
Example:
Exam Cell wants to track which projects are submitted. But to submit a project, we need
Student, Project and a Faculty Guide. These 3 can be connected by a relationship:
WORKS_ON (Student, Project, Faculty Guide). Now, instead of linking all 3 separately to
Submission Status, we group them like this:
Wrap WORKS_ON and its entities in a dashed box labelled Project Submission, then link to
Submission Status.
13
i. User_Account
• Attributes: USER_ID (Primary Key), Firstname, Lastname, Credit_Card_No.
This is like your Netflix account. One account per user.
[email protected]
• Relationships:
- User_Account ─ (1: 5) ─ PROFILE with crow’s foot at PROFILE.
- Double line from PROFILE to HAS relationship.
ii. Profile
• Attribute: PROFILE_ID (Primary Key), PROFILE_NAME, CREATION_DATE
Each account can have up to 5 profiles (like for you, your dad, your sibling).
• Relationships:
- USER_ACCOUNT ─ (1: 5) ─ PROFILE (one account owns 1-5 profiles).
- Has preferences for GENRE (User Genre).
- Has QUEUE and Rental History with movies.
iii. Genre
• Attribute: GENRE_ID (Primary Key), GENRE_NAME
Types of movies (like action, comedy, horror, etc.).
• Relationships:
- PROFILE ─ (M: N) ─ GENRE through USER_PREFERENCE (associative entity with
weight/priority).
- Linked to MOVIE (Movie Genre).
iv. Movie
• Attributes: MOVIE_ID (Primary Key), MOVIENAME, YEAR, PRODUCER, AVERAGE
RATING, DURATION (minutes), STREAMING_QUALITY (HD/4K), LICENSE_EXPIRY (date)
Each movie has these details stored.
• Relationships:
- Appears in QUEUE and Rental History of a PROFILE.
- Linked with GENRE through Movie Genre.
- MOVIE ─ (M:N) ─ ACTORS via STARRED_BY (with ROLE attribute).
v. Queue (Relationship)
• Attribute: RANKING
A profile’s to-watch list with order (e.g., 1st movie to watch, 2nd…).
• Connects: PROFILE ↔ MOVIE (1 profile ↔ many movies).
vi. Rental History (Relationship)
• Attribute: RATING (1 to 5), WATCH_DATE, DEVICE_TYPE (mobile/TV)
What movies a profile has watched and rated.
• Connects: PROFILE ↔ MOVIE.
vii. Actors
• Attributes: ACTOR_ID (Primary Key), FIRSTNAME, LASTNAME
ACTOR ─ (1: N) ─ AWARD (optional participation).
• Connected to MOVIE with Starred_by.
viii. Starred by (Relationship)
• Attributes: CHARACTER_NAME (e.g., 'Tony Stark').
Shows which actors acted in which movies.
• Connects: ACTORS ↔ MOVIE.
14
Relational Model
[email protected]
1. Relational Model in DBMS
Every relational database-whether it runs your college ERP or the canteen app-follows the
same four-part “grammar.” If you understand these parts and the rules that protect them, you
can design tables that never lose marks, fees, or attendance records.
The whole table-many tuples The STUDENT table that contains every
Relation
under the same set of columns. student’s detail this year.
15
Sample Relation:
[email protected]
22104567 Parikshit Sharma BTech CSE 9876543210
This table represents the STUDENT relation in the database. Each row is a tuple, and each
column is an attribute with its own domain.
4. Integrity Constraints
• Key Constraint
Rule: Every table must have at least one candidate key — a column (or set of columns)
whose values uniquely identify each row and never repeat.
Example: Roll No is unique for every student; if your script tries to insert the same roll
number twice, the DBMS blocks it. Keys can be single-column (RollNo) or composite
(StudentID, CourseID).
• Entity Integrity
Rule: A primary-key column cannot contain NULL values.
Example: You cannot add a student record with an empty roll number; otherwise, nobody
could tell who that row belongs to.
• Referential Integrity
Rule: A foreign key must match an existing primary key value in the referenced table. If
NULL is allowed, it means the relationship is optional.
Example: The MARKS table has a foreign key column, Roll No, which references the
primary key (Roll No) in the STUDENT table. If you upload marks for Roll No 22109999 but
that student does not exist in STUDENT, the DBMS rejects the upload to maintain
referential integrity.
16
Types of Keys
[email protected]
1. What are keys?
A key is a set of one or more columns that lets the DBMS (and you) pick out a single, exact
row from a table. Without keys, duplicate data would creep in and your queries would return
the wrong student, the wrong mark, or even both.
Candidate A minimal super key: remove any { rollNo } or { email }. Both stand alone
Key column and it stops being unique. and are minimal.
The single candidate key the DBA College chooses rollNo as the primary
Primary Key
chooses as the official row identifier. key for STUDENT.
Mind Map
Remember: Every primary key is a candidate key, every candidate key is a super key, but not
the other way round.
17
3. Keys Examples in Detail
• Primary Key
[email protected]
Roll No (PK) name programme email
22104567 Ananya Sharma BTech CSE [email protected]
Example
Table: STUDENT
Roll No never repeats and is never NULL, so the DBMS can efficiently identify a unique
student.
• Composite Key
Roll No (FK) Course Code (FK) grade
22104567 CS101 8.5
Example
Table: MARKS
Roll No alone is not unique (a student has marks in many subjects).
Course Code alone is not unique (many students take CS101).
Together they are unique, forming a composite key.
• Foreign Key
If someone tries to insert roll No = 22109999 in MARKS but that Roll No is missing from
STUDENT, the foreign-key check fails and the insert is refused.
18
Normalisation
[email protected]
1. What is normalisation?
Normalisation decomposes large, unstructured tables into smaller, logically related tables to
minimize redundancy and dependency. This removes update anomalies, improves
consistency, and ensures data integrity.
Type of
Mathematical Relation Example
Dependency
19
3. Four Key Normal Forms
[email protected]
1NF enforces atomic values: Each cell
Every cell holds one atomic
contains one indivisible datum (e.g., no
1 NF (indivisible) value; no
lists/arrays). Store each course in a separate
repeating groups.
row, not “CS101, MA102” in one cell.
2 NF and no transitive FD (A
In a COURSE table, course Code → instructor
→ B → C). Every non-prime
3 NF ID and instructor ID → instructor Name create
attribute must depend only
a chain, move instructor details out.
on a super key.
Problems: multi-valued cells, hard to insert a new course, risky to update a single grade.
Ananya’s, Farah’s row stores multiple values in single cells-clearly not atomic.
20
• First Normal Form (1 NF)
Rule: every cell holds one value. Split repeating lists into individual rows.
[email protected]
Roll No Student Name programme Course Code grade
22104567 Ananya Sharma BTech CSE CS101 8.5
22104567 Ananya Sharma BTech CSE MA102 7.0
22104612 Rajat Mehta BTech ECE EC101 8.0
22104755 Farah Khan MBA MG201 7.5
22104755 Farah Khan MBA MG202 8.2
22104755 Farah Khan MBA MG203 7.9
Problem: Still redundant, the same student data repeats on every course row.
• Find the FDs
FD1: roll No → student Name, programme.
FD2: course Code → course Name, credits, instructor ID.
FD3: instructor ID → instructor Name.
FD4: (roll No, course Code) → grade.
Rule: remove columns that depend on only part of a composite key (roll No, course Code)
and create separate tables.
STUDENT
COURSE
21
INSTRUCTOR
[email protected]
I01 Dr Rao
I03 Dr Khan
I04 Dr Sen
RESULT
After creating the INSTRUCTOR table, every non-key column in each table depends on
the whole primary key.
INSTRUCTOR
I01 Dr Rao
I03 Dr Khan
22
Course Code Course Name credits Instructor ID
[email protected]
MA102 Calculus I 3 I02
23
Denormalisation
[email protected]
1. What is Denormalisation?
Denormalization intentionally introduces controlled redundancy by:
1. Combining normalized tables vertically (fewer tables)
2. Adding derived columns horizontally (pre-computed values)
3. Creating materialized views (persisted query results)
Always maintains normalized source of truth.
You accept some data repetition so that common queries avoid expensive joins.
Common motives: speed, simplified code, analytics convenience, or working around slow
networks.
• Pros of Denormalization
Fewer joins → faster SELECTs.
Simpler queries for BI dashboards.
Lower CPU cost on overloaded DB servers.
• Cons
Repeated data eats storage.
Extra columns can go stale (update anomalies).
More triggers or batch jobs are needed to keep copies in sync.
Rule of thumb: denormalise only after measuring that joins are your real bottleneck, not before.
24
4. Example - From BCNF to a Denormalised “Result View”
Recall the BCNF design from the previous class:
[email protected]
Table Key Columns
STUDENT Roll No PK
INSTRUCTOR Instructor ID PK
Daily requirement: the exam branch must list roll No, name, programme, course Name,
credits, instructor Name, grade for 5,000 students every hour during result season.
Ananya Intro to
22104567 BTech CSE CS101 4 Dr Rao 8.5
Sharma CS
… … … … … … … …
Redundant fields: student Name, programme, course Name, credits, instructor Name
Result: Hourly report now runs as a single-table scan.
Another Mini Example - Pre-computed Mess Balance
Normalised set
SWIPE (student ID, date, amount) - thousands of rows per month
STUDENT (student ID, name, programme)
Denormalized helper table:
MESS_BALANCE (student ID PK, month, total amount)
Result: The kiosk app reads MESS_BALANCE efficiently instead of summing every swipe in
real time.
25
Transactions & Concurrency Control
[email protected]
1. What exactly is a transaction?
A transaction is a small, complete job that the database treats as one package.
Think of it like sending a registered parcel: the courier either delivers the whole parcel or brings
it back; they never deliver just part of it.
For eg. When a teacher presses “Publish Result” the system must do three things:
i. Insert the new grade for every subject.
ii. Re-calculate each student’s CGPA.
iii. Set a flag that says “Result Published = YES”.
These three steps belong together. If power fails after step 1, steps 2 and 3 never run, so
the grade and CGPA would clash. The database therefore cancels step 1 as well, leaving
everything exactly as it was.
26
CONS:
• Extra overhead - logging + locking can slow heavy traffic.
• Complex to build - code & storage engine get bigger.
[email protected]
• Scalability limits - strict ACID across many servers is tough; needs fancy tech (e.g.,
distributed consensus).
3. Life-cycle of a transaction
ACTIVE - SQL statements are running.
PARTIALLY COMMITTED - last statement
finished; DB is flushing log records.
COMMITTED - success message sent to
user; job is permanent.
ABORTED - an error happened; DB
UNDOs every change and returns to the
original state.
27
• Strict schedule (gold standard)
No one may read or write a value changed by T1 until T1 has committed or aborted.
Guarantees both no dirty reads and no dirty writes; most real systems get this via Strict
[email protected]
Two-Phase Locking (S2PL).
• Non-recoverable schedule – why you should avoid it
T2 reads T1’s uncommitted value and then commits before T1.
If T1 aborts later, the database is stuck with bad data from T2; no clean rollback path.
Exam keyword: “dirty commit.”
“Serializable decides order, Recoverability decides commit timing, Strict says ‘hands off my
data till I’m done.’”
Example:
-- Set per-transaction isolation
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE grades SET score = 85 WHERE student_id = 101; COMMIT;
Stronger isolation ⇒ more safety but extra locking and slower writes.
Data always stays correct, but Only one student is served at a time to
Higher level (e.g.,
more waiting because locks avoid data conflicts, though this slows
Serializable)
are held longer down processing.
Advantages of Isolation
• More classmates can work together: fewer “table occupied” moments in the DB library.
• Choose your consistency: critical apps get stricter rules; casual ones get speed.
• Fewer weird errors: helps avoid dirty/non-repeatable/phantom reads.
• Flexible tuning: you can set the level per transaction if your DBMS allows.
Disadvantages of Isolation
• Extra overhead: DB wastes CPU time checking locks/versions.
• Lower concurrency at high levels: Serializable may line students up single-file.
• Not universal: Some cloud DBs skip Repeatable Read, so your code becomes less portable.
• Mental load: Developers must manage isolation levels per module carefully; otherwise,
inconsistencies may occur.
28
Isolation Problems
[email protected]
T1 changes fees but hasn’t
Senior updates Mess Bill = ₹0 (just
committed → T2 reads it. If
Dirty read testing). You peek before he presses
T1 rolls back, T2 sees fake
Undo. You think owe nothing and leave.
data.
6. Serializability
Serializability is a concept that ensures the results of executing multiple transactions
concurrently are the same as if they were executed one after the other, in some order. It helps
in achieving a consistent database state, even when transactions overlap in time.
Conflict-serializability quick test
• Draw a node for each transaction (T1, T2, …).
• Draw an arrow Tᵢ → Tⱼ when Tᵢ writes a row and Tⱼ later reads or writes the same row.
• If the graph has no loops, the schedule is safe.
Example
T1 writes Ananya’s grade.
T2 later reads that grade.
If the graph has an arrow from T1 to T2 and no path from T2 back to T1, there is no loop, so
the schedule is safe.
[email protected]
Shared (S) lock – multiple users can read, but no one can write.
(X) lock – only one transaction can both read and write; others are blocked.
• Timestamp ordering
Every transaction gets the current timestamp when it starts.
If two transactions fight for the same row, the older timestamp wins; the newer one waits
or rolls back. This works well when the system does many reads and only few writes, e.g.,
when students keep refreshing RESULT_FLAT while teachers rarely update grades.
• Optimistic control
Assume clashes are rare.
Transactions read without locks; at commit, the DB checks for conflicts. If a clash is found,
one transaction rolls back and restarts. Suitable for read-heavy workloads like statistical
reports, where data changes are infrequent.
30
Techniques for Database Recovery
These methods help bring the database back online with minimal data loss:
[email protected]
• Backup and Restore: Taking regular copies of the database.
Example: The college IT department takes daily backups of all student attendance records.
If the live system crashes, they can load yesterday's data.
31
SQL Commands
[email protected]
1. SQL Command Families
Original
Family Enhanced Definition Key Commands Added
Definition
Defines/modifies
DDL Shape the TRUNCATE, RENAME,
database schema with
(Data Definition) tables COMMENT
transactional safety
32
Operators (work row-by-row)
Added
Category Original Operators NULL Handling
[email protected]
Operators
<=> (NULL-safe
Comparison = <> > < equal), IS WHERE grade IS NOT NULL
DISTINCT FROM
Sample queries
Add bonus marks to a score
SELECT score + 5 AS new_score FROM EXAM WHERE rollNo = 22104567;
Find students who scored above 90 and are in the ‘CSE’ batch
SELECT rollNo, name FROM EXAM WHERE score > 90 AND programme = 'CSE';
Aggregate functions (work on many rows at once)
SUM(col) Total of a column What is the total mess bill this month?
Sample queries
Count registered students
SELECT COUNT (*) AS total_students FROM STUDENT;
Total sales in the canteen
SELECT SUM(amount) AS total_sales FROM CANTEEN_SALES WHERE saleDate >=
'2025-05-01' AND saleDate <= '2025-05-31'
Average CGPA for the IT department
SELECT AVG(CGPA) AS avg_IT_cgpa FROM STUDENT WHERE programme = 'IT';
Highest and lowest hostel fees paid
SELECT MAX(totalFee) AS max_fee, MIN(totalFee) AS min_fee FROM
HOSTEL_ACCOUNTS;
Use row-level operators when you need to filter rows (e.g., WHERE salary > 5000), perform
calculations inside a row (e.g., price × quantity AS lineTotal), or combine conditions (AND / OR).
Use aggregate functions when you need one summary number such as total students,
average marks by course (GROUP BY course Code), or the highest CGPA in the college.
33
3. SQL Clauses (the “sentence order”)
SELECT column_list
[email protected]
FROM table_name
WHERE filter_condition -- row filter
GROUP BY grouping_columns -- group rows
by values HAVING group_filter -- filter groups after grouping
ORDER BY column_list [ASC|DESC] -- sort results
LIMIT count OFFSET start -- limit output (MySQL/PostgreSQL)
Example: Average grade per course, only if avg ≥ 8, top 5 hardest first
SELECT courseCode, AVG(grade) AS avgGrade
FROM MARKS
WHERE semester = 6
GROUP BY courseCode
HAVING AVG(grade) >= 8
ORDER BY avgGrade ASC
LIMIT 5;
5. UNION in SQL
UNION combines two SELECTs with the same column list and removes duplicates (UNION
ALL keeps duplicates).
Example:
-- Girls or MBA students
SELECT rollNo FROM STUDENT WHERE gender = 'F'
UNION
SELECT rollNo FROM STUDENT WHERE programme = 'MBA';
34
6. Views in SQL
A view acts like a virtual table whose content comes from a query.
[email protected]
Example:
CREATE VIEW V_TOPPERS
AS
SELECT rollNo, name, CGPA
FROM STUDENT
WHERE CGPA >= 9.0;
35
Indexing, SQL Optimisation, Sharding
[email protected]
1. Need for optimising SQL queries
Slow queries make the whole site feel “laggy”. Good optimisation means pages open faster,
less load on the server and smaller cloud bills.
Select only needed The merit-list page needs roll No, name,
Avoid SELECT *
columns CGPA; it does not need mobile numbers.
Put the WHERE clause First filter to ‘students in CSE 2023 batch’,
Filter early, join later
before big joins then join with placements table.
Use indexes in Put indexed columns inside WHERE roll No = 22104567 uses PK index;
predicates WHERE or JOIN ON WHERE LOWER (name) cannot.
Examine the execution Let the DB tell you the slow If EXPLAIN shows “Seq Scan”, think about
plan step (EXPLAIN) adding or fixing an index.
4. Index types
36
5. How B-trees and B+-trees store an index
• B-tree: Keys in internal nodes guide searches; leaves contain keys and row pointers.
• B+-tree (used by most DBMSs): All keys are copied to leaves, and leaves are linked for
[email protected]
range scans. Leaves are linked left-to-right, so range scans (e.g., CGPA BETWEEN 8 AND
9, inclusive) are very fast.
Think of it like a library index: internal nodes act as guides, pointing to the exact drawer (leaf
node) containing the record.
B-tree: B+-tree:
[Internal] [Internal]
/ | \ / | \
[Leaf] [Leaf] [Leaf] [Leaf]→[Leaf]→[Leaf] (linked)
7. Sharding
Sharding divides a large database into smaller, independent pieces called shards, each
hosted separately. This boosts performance, scalability, and availability.
A university chain keeps Delhi campus records on Server-A, Mumbai campus on Server-
B. Student look-ups stay local; only a cross-campus query touches both servers.
Key Components
• Shards: Independent partitions of data (e.g., each department maintains its own student records).
• Shard Key: A column that determines where data goes. (e.g., student's Branch determines
their department's database.)
• Shared-Nothing Architecture: Each shard operates independently.
Sharding Methods
• Range-Based Sharding: Divides data by value ranges.
Example: Student Roll No. 1-1000 → Shard 1.
Pros: Simple. Cons: Can lead to uneven data.
• Hashed Sharding: Uses a formula on the shard key for even distribution.
Example: A hash function applied to the Admission ID determines the target server,
ensuring uniform data distribution.
Pros: Even data spread.
Cons: Requires rehashing (and data movement) when adding/removing shards, though
consistent hashing minimizes this impact.
• Directory Sharding: Uses a lookup table to map keys to shards.
Example: Course ID maps to specific department server.
Pros: Flexible.
Cons: This method may fail if the lookup table contains incorrect or outdated mappings.
37
• Geo Sharding: Partitions data by location.
Example: Delhi Campus students → Delhi Server for low-latency access, with failover to
Mumbai if needed.
[email protected]
Pros: Geo sharding provides faster access by routing users to geographically closer
servers, but may result in uneven data distribution if user density varies by location.
Optimizing Sharding
• Cardinality: More unique shard key values allow more shards.
• Frequency: Avoid popular shard keys to prevent overloaded shards ("hotspots").
• Monotonic Change: Avoid monotonic keys (e.g., sequential timestamps) as they can lead
to unbalanced shard distribution and write hotspots.
Poor Sharding Example: Sharding student feedback by "courses completed" (e.g., 11+
courses → Shard C) can overload Shard C as students progress.
Sharding is powerful for large datasets. Choosing an appropriate shard key is crucial to ensure
performance and balanced data distribution.
38
• Dynamic Data Masking (DDM): Masks data on-the-fly as it's accessed, without changing
the original data.
Example: A college helpdesk representative sees only the last four digits of a student's
[email protected]
bank account number when accessing their fee payment records.
• Deterministic Data Masking: Replaces the same original value with the same masked
value every time.
Example: "[email protected]" is always masked as "[email protected]"
across all test systems.
• Non-Deterministic Data Masking: Replaces the same original value with a different
masked value each time.
Example: A student's parent's contact number might be masked differently each time it's
pulled for various internal reports, making it harder to link.
Note: Avoid non-deterministic masking for columns referenced in foreign keys to maintain
relational integrity.
• Format-Preserving Masking: Masks data while keeping its original format.
Example: A student's university ID (e.g., AB1234567) is masked in a way that retains the
two-letter prefix and the seven-digit format
• Shuffling: Rearranges existing data within a column.
Example: Student exam scores are shuffled across different students in a research
dataset, making scores realistic but delinked from individuals.
• Redaction: Irreversibly removes or blacks out sensitive information from view, unlike
masking which keeps structure.
Example: Student's medical history details are completely blacked out before a general
academic advisor views their file.
• Nulling Out: Replaces sensitive data with SQL NULL values (i.e., empty or missing data).
Example: For a survey on student satisfaction, student names and addresses are removed
by replacing them with NULL values before sharing the data with external analysts.
Data masking is vital for protecting sensitive information. By using these techniques,
colleges can ensure data usability in non-production environments while maintaining
privacy and security.
39
SQL Queries Practice
[email protected]
1. SQL Queries Practice for Interviews
S. No Explanation Query
2 Return first names in UPPER-case. SQL: SELECT UPPER (given_name) FROM staff;
List every distinct division SQL: SELECT DISTINCT division FROM staff;
3
represented in the staff table.
Find where the letter B occurs in the SQL: SELECT INSTR(given_name, 'B') FROM staff
5
name Siddarth. WHERE given_name = 'Siddharth';
Trim spaces on the right of first SQL: SELECT RTRIM (given_name) FROM staff;
6
names.
Trim spaces on the left of division SQL: SELECT LTRIM (division) FROM staff;
7
names.
Show each unique division together SQL: SELECT division, LENGTH(division) FROM
8
with its length. staff GROUP BY division;
Display all staff details, ordered by SQL: SELECT * FROM staff ORDER BY
11
first name A→Z. given_name ASC;
Display all staff, ordered by first SQL: SELECT * FROM staff ORDER BY
12
name A→Z and division Z→A. given_name ASC, division DESC;
Pull the records for employees SQL: SELECT * FROM staff WHERE given_name IN
13
named Arjun or Rohan. ('Arjun', 'Rohan');
Return everyone except Arjun and SQL: SELECT * FROM staff WHERE given_name
14
Rohan. NOT IN ('Arjun', 'Rohan');
Show workers whose division starts SQL: SELECT * FROM staff WHERE division LIKE
15
with “Finan”. 'Finan%';
Get staff whose first name contains SQL: SELECT * FROM staff WHERE
16
the letter a anywhere. LOWER(given_name) LIKE '%a%';
40
Fetch staff whose first name ends SQL: SELECT * FROM staff WHERE
17
with a. LOWER(given_name) LIKE '%a';
[email protected]
Fetch staff whose six-letter first SQL: SELECT * FROM staff WHERE given_name
18
name ends with h. LIKE '_____h'; AND LENGTH(given_name) = 6;
Show employees with pay between SQL: SELECT * FROM staff WHERE pay
19
100 000 and 500 000 (inclusive). BETWEEN 100000 AND 500000;
List staff who joined in February SQL: SELECT * FROM staff WHERE
20 2014. YEAR(hire_date) = 2014 AND MONTH(hire_date) =
2;
Count how many people work in the SQL: SELECT division, COUNT(*) FROM staff
21
Finance division. WHERE division = 'Finance';
Return full names of staff earning 50 SQL: SELECT CONCAT(given_name, ' ',
22 000 – 100 000. family_name) FROM staff WHERE pay BETWEEN
50000 AND 100000;
Show each division with its head- SQL: SELECT division, COUNT(staff_id) AS
23 count, highest first. total_staff FROM staff GROUP BY division ORDER
BY total_staff DESC;
List employees who also appear as SQL: SELECT * FROM staff AS s INNER JOIN role
24 Supervisor in the role table. AS r ON s.staff_id = r.staff_ref_id WHERE
r.role_name = 'Supervisor';
Find role titles held by more than one SQL: SELECT role_name, COUNT(*) AS cnt FROM
25
person. role GROUP BY role_name HAVING cnt > 1;
Return only rows with odd staff_ids. SQL: SELECT * FROM staff WHERE MOD(staff_id,
26
2) != 0;
Return only rows with even staff_ids. SQL: SELECT * FROM staff WHERE
27
MOD(staff_id,2) = 0;
Clone the entire staff table into SQL: CREATE TABLE staff_copy LIKE staff;
28
staff_copy. INSERT INTO staff_copy SELECT * FROM staff;
Show rows common to both staff SQL: SELECT s.* FROM staff s INNER JOIN
29
and its copy (by id). staff_copy USING (staff_id);
Show staff records that are not in SQL: SELECT s.* FROM staff s LEFT JOIN
30 the copy. staff_copy USING (staff_id) WHERE
staff_copy.staff_id IS NULL;
Display today’s date and current SQL: SELECT CURDATE(); SELECT NOW();
31
timestamp.
Fetch the top 5 highest-paid SQL: SELECT * FROM staff ORDER BY pay DESC
32
employees. LIMIT 5;
Get the 5th-highest salary using SQL: SELECT * FROM staff ORDER BY pay DESC
33
LIMIT. LIMIT 1 OFFSET 4;
41
Find the 5th-highest salary without SQL: SELECT pay FROM staff s1 WHERE 4 =
LIMIT. (SELECT COUNT(DISTINCT s2.pay) FROM staff s2
34
WHERE s2.pay
[email protected]
>= s1.pay);
List employees who share the same SQL: SELECT s1.* FROM staff s1 JOIN staff s2 ON
35
salary with someone else. s1.pay = s2.pay AND s1.staff_id <> s2.staff_id;
Return the second-highest salary SQL: SELECT MAX (pay) FROM staff WHERE pay
36
overall. NOT IN (SELECT MAX (pay) FROM staff);
Show every row twice (duplicate SQL: SELECT * FROM (SELECT * FROM staff
37 output). UNION ALL SELECT * FROM staff) AS dup ORDER
BY staff_id;
List staff IDs that do not receive SQL: SELECT staff_id FROM staff WHERE staff_id
38
bonuses. NOT IN (SELECT staff_ref_id FROM incentive);
Select the first 50 % of rows (by id SQL: SELECT * FROM staff ORDER BY staff_id
39
order). LIMIT (SELECT COUNT(*) / 2 FROM staff);
Return divisions that have fewer SQL: SELECT division, COUNT (*) AS headcount
40 than four employees. FROM staff GROUP BY division HAVING
headcount<4;
Show every division with its SQL: SELECT division, COUNT(*) AS headcount
41
employee count. FROM staff GROUP BY division;
Fetch the very last row in the table SQL: SELECT * FROM staff WHERE staff_id =
42
(largest id). (SELECT MAX (staff_id) FROM staff);
Fetch the very first row in the table SQL: SELECT * FROM staff WHERE staff_id =
43
(smallest id). (SELECT MIN (staff_id) FROM staff);
Display the last five rows, ordered SQL: (SELECT * FROM staff ORDER BY staff_id
44
naturally. DESC LIMIT 5) ORDER BY staff_id;
For each division, list the SQL: SELECT s.division, s.given_name, s.pay
employee(s) with the top salary. FROM (SELECT division, MAX(pay) AS top_pay
45
FROM staff GROUP BY division) t JOIN staff s ON
t.division = s.division AND t.top_pay = s.pay;
Pull the three highest distinct SQL: SELECT DISTINCT pay FROM staff s1
salaries via a correlated sub-query. WHERE 3 >= (SELECT COUNT(DISTINCT pay)
46
FROM staff s2 WHERE s2.pay
> s1.pay) ORDER BY pay DESC;
Pull the three lowest distinct salaries SQL: SELECT DISTINCT pay FROM staff s1
via a correlated sub-query. WHERE 3 >= (SELECT COUNT(DISTINCT pay)
47
FROM staff s2 WHERE s2.pay
< s1.pay) ORDER BY pay;
42
Generic formula for the n-th highest SQL: SELECT DISTINCT pay FROM staff s1
salary. WHERE :n >= (SELECT COUNT(DISTINCT pay)
48
FROM staff s2 WHERE s1.pay<=s2.pay) ORDER
[email protected]
BY s1.pay DESC;
Show total payroll cost per division, SQL: SELECT division, SUM (pay) AS total_pay
49 highest first. FROM staff GROUP BY division ORDER BY
total_pay DESC;
MCQs:
1. Which benefit of a DBMS is 4. In a three-tier DBMS architecture,
illustrated when a fee clerk can view which component issues SQL
only the fee status, whereas the queries to the database server?
principal can also view marks? A. Mobile client app
A. Growth (Scalability) B. Application (middle) server
B. Security C. Web browser’s JavaScript runtime
C. Speed (Indexing) D. Load balancer
D. No Extra Copies
5. A personal inventory system built
2. Flipkart stores every customer’s entirely in Microsoft Access on a
shopping cart in a key-value single laptop exemplifies which
database primarily because key- architectural style?
value systems offer what A. Two-tier
advantage? B. Three-tier
A. Table-level referential integrity C. One-tier
B. Ultra-fast look-ups via a simple key- D. Distributed
value structure
6. According to the CAP-theorem
C. Built-in ACID transactions for
examples, UPI wallet payments
complex joins
favor which two guarantees during
D. Time-series data for analytics
a network partition?
3. The library's subject-based shelf A. Availability + Consistency
organization mirrors how DBMS B. Consistency + Partition Tolerance
______ accelerate searches by C. Availability + Partition Tolerance
maintaining sorted references to D. Durability + Consistency
data locations.
7. The hostel-warden portal, which
A. Full-table scans
displays only Roll No., Name, and
B. Clustered indexes (physically
Hostel Fee Status, operates at what
reorder data)
level of abstraction?
C. Non-clustered indexes (separate
A. Physical level
sorted references)
B. Logical level
D. Transaction logs
C. View level
D. File-system level
43
8. Which scenario best illustrates 13. A weak entity in an ER model MUST
physical data independence? have:
A. Splitting an Address column into A. A many-to-many relationship
[email protected]
Street, City, and Pin Code without B. Total participation in its identifying
updating the application code relationship
B. Moving attendance files from HDDs to C. Double-lined relationship diamond
faster SSDs and adding a B-tree D. Both B and C
index, with no change to SQL queries 14. A relationship where one A relates
C. Creating a new Faculty table linked to many B’s, and each B to exactly
to Students by Roll No. one A is classified as:
D. Granting the examination branch A. M : N
access to marks via an additional view B. 1 : N
C. 1 : 1
9. Altering the logical schema (e.g.,
D. M : 1
breaking address into three
columns) without rewriting other 15. Treating a relationship set and its
modules demonstrates: participating entities as one higher-
A. Physical independence level unit so it can participate in
B. Logical independence another relationship is called:
C. No independence A. Specialization
D. View-level caching B. Generalization
C. Aggregation
10. Which data model organises D. Composition
records strictly in a top-down
16. In the relational model, a single
parent-child tree where each child
column whose values come from
has exactly one parent?
one domain is called a(n):
A. Object-Oriented Model
A. Domain
B. Hierarchical Model
B. Attribute
C. Network Model
C. Tuple
D. Entity–Relationship Model
D. Relation
11. A student belonging to several 17. Which statement correctly
clubs, with each club linked back to distinguishes a schema from an
many students, illustrates the need instance?
for a specific data model. A. Schema changes with every
A. Relational insert/delete; instance is fixed
B. Hierarchical B. Schema is the permanent structure;
C. Network instance is the current rows
D. Time-Series C. Schema lists current primary-key
values; instance lists data types
12. During conceptual design,
D. Schema enforces referential integrity;
engineers draw entities like Student
instance enforces entity integrity
and Course connected by
ENROLLED_IN. Which data model 18. The integrity rule that forbids NULL
is being used? values in a primary-key column is
A. Entity–Relationship (ER) called:
B. Object-Oriented A. Key Constraint
C. Relational B. Referential Integrity
D. Key-Value C. Entity Integrity
D. Domain Constraint
44
19. A candidate key is defined as: 25. Denormalisation refers to:
A. Any unique attribute set, even with A. Splitting a wide table into smaller
extras ones
[email protected]
B. A minimal super key—remove one B. Combining normalized tables or
attribute, and uniqueness breaks including redundant data to reduce
C. The chosen official identifier of the costly joins
table C. Encrypting data so only DBAs can
D. A foreign-key reference to another view it
table D. Applying BCNF to all relations
20. Which rule is mandatory for a 26. Denormalisation is justified when
primary-key column (or set)? the workload is mainly:
A. Must reference another relation’s
A. Heavy on writes, light on reads
key
B. Write-once, read-never
B. Must allow NULLs for optional rows
C. Read-intensive with few updates
C. Must be unique and not NULL
D. CPU-bound on triggers
D. Must include more than one attribute
27. The chief drawback of
21. Which statement is true of super
keys, candidate keys, and primary denormalising a schema is:
keys? A. Higher risk of update anomalies due
A. Every composite key is a candidate to duplicated facts
key B. Loss of primary-key enforcement
B. Every candidate key is also a super C. Mandatory distributed transactions
key for every write
C. Every candidate key is minimal by D. Loss of the lossless-join property
definition among the remaining tables
D. A primary key may contain
28. Durability is achieved through:
duplicates if no foreign key uses it
A. Write-ahead logging (WAL)
22. Which schema violates 2NF? B. Two-phase locking
A. STUDENT(rollNo PK, name) C. Snapshot isolation
B. MARKS(rollNo PK, courseCode D. All of the above
PK, grade, studentName)
C. COURSE(code PK, title, credits) 29. The isolation level that blocks dirty
D. All comply reads but still allows non-
repeatable reads is:
23. In BCNF, for every functional A. Read Uncommitted
dependency X → Y, X must be: B. Read Committed
A. A non-minimal super key C. Repeatable Read
B. A foreign key D. Serializable
C. A candidate key
D. A subset of Y 30. Strict 2PL improves basic 2PL by:
A. Releasing locks immediately after
24. The ability to reconstruct the
use
original relation without data loss
B. Holding all locks until commit/abort
after decomposition is called the
C. Using timestamp ordering
_______ property.
A. Dependency preservation D. Eliminating the shrinking phase
B. Lossless (non-additive) join
C. Partial-dependence removal
D. Referential integrity
45
31. The SQL family used to grant or 34. Which index type physically orders
revoke privileges is: table rows by the key itself?
A. DML A. Secondary (non-clustered)
[email protected]
B. DDL B. Primary (clustered)
C. DCL C. Bitmap
D. TCL D. Hash
35. Adding more machines and
32. After grouping rows, which clause
partitioning data across them to
filters entire groups based on
handle the load is called:
aggregates?
A. Vertical scaling (scale-up)
A. WHERE
B. Horizontal scaling (scale-out)
B. HAVING
C. Query parallelism
C. ORDER BY
D. Snapshot isolation
D. LIMIT
36. In RBAC, permissions are first
33. Which aggregate counts all rows,
attached to:
including those with NULLs?
A. Individual users
A. COUNT (column_name)
B. Database triggers
B. COUNT (*)
C. Roles
C. AVG (column_name)
D. Stored procedures
D. SUM (column_name)
ANSWER KEY
1. (B) 2. (B) 3. (C) 4. (B) 5. (C)
6. (B) 7. (C) 8. (B) 9. (B) 10. (B)
11. (C) 12. (A) 13. (D) 14. (B) 15. (C)
16. (B) 17. (A) 18. (C) 19. (B) 20. (C)
21. (B) 22. (B) 23. (C) 24. (B) 25. (B)
26. (C) 27. (A) 28. (A) 29. (B) 30. (B)
31. (C) 32. (B) 33. (B) 34. (B) 35. (B)
36. (C)
46