📍 Section 1: The Structured Query Language (SQL)
1.1 Introduction to SQL
● SQL (Structured Query Language) is used to communicate with relational databases.
● Declarative language: you state what data you want, not how to get it.
● Example: SELECT name FROM students;
1.2 Basic SQL Operations
● SELECT, FROM, WHERE – retrieving data
● INSERT INTO students VALUES (1, 'John');
● UPDATE students SET name='Mike' WHERE id=1;
● DELETE FROM students WHERE id=1;
1.3 Filtering and Sorting
● WHERE age > 18
● ORDER BY name DESC
● LIMIT 5 OFFSET 10
1.4 Aggregation & Grouping
● SELECT COUNT(*) FROM students;
● GROUP BY class HAVING AVG(score) > 80
1.5 Joins in SQL
● INNER JOIN: only matching rows
● LEFT JOIN: all left + matched right
● Example:
SELECT s.name, m.marks FROM students s
JOIN marks m ON s.id = m.student_id;
1.6 Subqueries and Nested Queries
● Scalar:
SELECT name FROM students WHERE age = (SELECT MAX(age) FROM students);
● Correlated: executed per row
1.7 Set Operations
● UNION, INTERSECT, EXCEPT – combine query results
1.8 Views and Indexes
● CREATE VIEW top_students AS SELECT * FROM students WHERE marks >
90;
● CREATE INDEX idx_name ON students(name);
1.9 Transactions in SQL
BEGIN;
UPDATE account SET balance = balance - 100 WHERE id = 1;
UPDATE account SET balance = balance + 100 WHERE id = 2;
COMMIT;
📍 Section 2: Storing and Indexing Data
2.1 Storage Basics
● Data in databases is stored in blocks called pages.
● Each page contains multiple rows (tuples).
● Pages are transferred between disk and main memory using a buffer manager.
Example:
Imagine a page as a folder with 100 paper forms (tuples). When the folder is full, a new one is
created.
2.2 Indexing Techniques
● Index = data structure for fast retrieval
● B+ Tree: Balanced tree used in most RDBMS
○ Internal nodes store keys
○ Leaf nodes store actual data pointers
● Example B+ Tree for student IDs:
[20 | 40]
/ | \
[5 10] [21 35] [42 50]
● Hash Index: Uses hash function on key; fast for equality lookups
○ Student ID 105 hashed to bucket 2
2.3 Clustered vs. Non-clustered Index
● Clustered: Index decides the physical order of records
● Non-clustered: Stores pointers to actual data
Example:
● Clustered index on student_id: rows are sorted on student_id
● Non-clustered index on name: index holds name and points to rows
2.4 Index Maintenance
● Insert/Delete operations in B+ Trees may require rebalancing
● Hash collisions are handled using chaining or open addressing
2.5 Bitmap Indexes
● For categorical fields with few values
● Example: Gender Index
M: 1 0 1 0
F: 0 1 0 1
2.6 Compression and Storage Formats
● Row-based: All attributes of one record stored together
● Column-based: All values of one attribute stored together
● RLE (Run-Length Encoding):
○ Data: AAAAABBB → A5B3
○ Useful when same value repeats consecutively
2.7 File Organization
● Heap File: Unordered, best for inserts
● Sorted File: Sorted by key, good for range queries
● Hashed File: Good for equality searches
📍 Section 3: Relational Data Processing
3.1 Relational Algebra and Query Execution
● σ (selection): Filters rows — σ(age > 25)(Students)
● π (projection): Chooses columns — π(name, dept)(Students)
● ⨝ (join): Combines tables
Execution Plan: Logical query → Relational algebra → Physical plan → Execution
3.2 Query Optimization
● Goal: minimize I/O and computation cost
● Use indexes, reduce intermediate data
● Join order matters: smaller tables joined first reduce size early
3.3 Join Algorithms
● Nested Loop Join:
for each row in R:
for each row in S:
if match: output
● Hash Join:
○ Build hash table on smaller input
○ Probe for matches in second input
● Sort-Merge Join:
○ Sort both inputs, then merge linearly
3.4 Buffer Management
● Manages which pages are kept in memory
● Replacement Policy: LRU (Least Recently Used)
● Pinning: lock critical pages in buffer
3.5 Parallel and Distributed Processing
● Partitioned Parallelism: Divide input data
● Pipelined Parallelism: Output of one operation fed into next
● Example: Big SQL query run across 4 servers, each handling a data slice
📍 Section 4: Transaction Processing
4.1 ACID Properties in Detail
● Atomicity: transaction is all-or-nothing
● Consistency: DB always moves from one valid state to another
● Isolation: concurrent transactions don’t interfere
● Durability: changes persist despite crashes
Example: Transferring ₹100 between two accounts — either both debit and credit happen or
neither
4.2 Concurrency Control Techniques
● Two-Phase Locking (2PL):
○ Growing phase: locks acquired
○ Shrinking phase: locks released
● Prevents non-serializable execution
4.3 Isolation Levels (with anomalies)
Isolation Level Dirty Read Non-repeatable Read Phantom Read
Read Uncommitted ✅ ✅ ✅
Read Committed ❌ ✅ ✅
Repeatable Read ❌ ❌ ✅
Serializable ❌ ❌ ❌
4.4 Write-Ahead Logging (WAL)
● Log changes before applying them
● REDO: for committed transactions
● UNDO: for uncommitted ones
4.5 Recovery Techniques
● ARIES (Analysis, Redo, Undo)
● Checkpoints reduce log scanning during recovery
4.6 Serializability
● Conflict Serializability: schedule can be converted to serial form by reordering
● Precedence Graph: Nodes = transactions; edge = dependency
○ Cycle ⇒ not serializable
📍 Section 5: Database Design
5.1 ER Modeling Example
● Entities: Student, Course
● Attributes: name, id, title
● Relationships: takes (Student ↔ Course)
5.2 Mapping to Relational Schema
● Each entity → table
● Each relationship → table with foreign keys
5.3 Normalization
● 1NF: atomic values
● 2NF: no partial dependency
● 3NF: no transitive dependency
● BCNF: stronger than 3NF
Example:
● Unnormalized:
Student(id, name, courses)
● 1NF:
Student(id, name), Enroll(student_id, course_id)
5.4 Denormalization
● Speeds up reads at cost of redundancy
● Example: merge student and enrollment tables
5.5 Schema Design Tips
● Use natural primary keys where possible
● Enforce constraints using CHECK, NOT NULL, UNIQUE
● Example: CHECK (salary > 0)
📍 Section 6: Beyond Relational Data
6.1 NoSQL Models
● Key-Value: simple, fast (e.g., Redis)
● Document: nested data (e.g., MongoDB)
● Column-Family: large-scale writes (e.g., Cassandra)
● Graph: relationships (e.g., Neo4j)
6.2 CAP Theorem
● Can only guarantee 2 of:
○ Consistency: all nodes same data
○ Availability: always responds
○ Partition Tolerance: survives network splits
6.3 Use Case Mapping
Model Best Use Case
Key-Value Caching sessions
Document Flexible schemas
Columnar Analytics on time-series
Graph Social networks, fraud
6.4 NoSQL Query Examples
// MongoDB
{ "age": { "$gt": 25 } }
// Neo4j
MATCH (a)-[:FRIEND]->(b) RETURN b.name;
6.5 Scaling
● Horizontal: add servers (scale out)
● Vertical: upgrade server (scale up)
● Sharding: split by key ranges (user_id < 1000 → node1)
6.6 Consistency Models
● Strong Consistency: read = latest write (RDBMS)
● Eventual Consistency: updates converge over time (Amazon S3)
● Causal Consistency: preserve cause-effect order (some NoSQLs)