0% found this document useful (0 votes)
26 views12 pages

SQL Basics and Indexing

Uploaded by

nthumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views12 pages

SQL Basics and Indexing

Uploaded by

nthumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

📍 Section 1: The Structured Query Language (SQL)

1.1 Introduction to SQL

● SQL (Structured Query Language) is used to communicate with relational databases.


● Declarative language: you state what data you want, not how to get it.
● Example: SELECT name FROM students;

1.2 Basic SQL Operations

● SELECT, FROM, WHERE – retrieving data


● INSERT INTO students VALUES (1, 'John');
● UPDATE students SET name='Mike' WHERE id=1;
● DELETE FROM students WHERE id=1;

1.3 Filtering and Sorting

● WHERE age > 18


● ORDER BY name DESC
● LIMIT 5 OFFSET 10

1.4 Aggregation & Grouping

● SELECT COUNT(*) FROM students;


● GROUP BY class HAVING AVG(score) > 80

1.5 Joins in SQL

● INNER JOIN: only matching rows


● LEFT JOIN: all left + matched right
● Example:

SELECT s.name, m.marks FROM students s


JOIN marks m ON s.id = m.student_id;

1.6 Subqueries and Nested Queries

● Scalar:

SELECT name FROM students WHERE age = (SELECT MAX(age) FROM students);

● Correlated: executed per row

1.7 Set Operations

● UNION, INTERSECT, EXCEPT – combine query results

1.8 Views and Indexes


● CREATE VIEW top_students AS SELECT * FROM students WHERE marks >
90;
● CREATE INDEX idx_name ON students(name);

1.9 Transactions in SQL

BEGIN;
UPDATE account SET balance = balance - 100 WHERE id = 1;
UPDATE account SET balance = balance + 100 WHERE id = 2;
COMMIT;

📍 Section 2: Storing and Indexing Data


2.1 Storage Basics

● Data in databases is stored in blocks called pages.

● Each page contains multiple rows (tuples).


● Pages are transferred between disk and main memory using a buffer manager.

Example:
Imagine a page as a folder with 100 paper forms (tuples). When the folder is full, a new one is
created.

2.2 Indexing Techniques

● Index = data structure for fast retrieval

● B+ Tree: Balanced tree used in most RDBMS

○ Internal nodes store keys

○ Leaf nodes store actual data pointers

● Example B+ Tree for student IDs:

[20 | 40]
/ | \
[5 10] [21 35] [42 50]

● Hash Index: Uses hash function on key; fast for equality lookups

○ Student ID 105 hashed to bucket 2

2.3 Clustered vs. Non-clustered Index

● Clustered: Index decides the physical order of records

● Non-clustered: Stores pointers to actual data

Example:

● Clustered index on student_id: rows are sorted on student_id

● Non-clustered index on name: index holds name and points to rows

2.4 Index Maintenance

● Insert/Delete operations in B+ Trees may require rebalancing

● Hash collisions are handled using chaining or open addressing


2.5 Bitmap Indexes

● For categorical fields with few values

● Example: Gender Index

M: 1 0 1 0
F: 0 1 0 1

2.6 Compression and Storage Formats

● Row-based: All attributes of one record stored together

● Column-based: All values of one attribute stored together

● RLE (Run-Length Encoding):

○ Data: AAAAABBB → A5B3

○ Useful when same value repeats consecutively

2.7 File Organization

● Heap File: Unordered, best for inserts

● Sorted File: Sorted by key, good for range queries

● Hashed File: Good for equality searches


📍 Section 3: Relational Data Processing
3.1 Relational Algebra and Query Execution

● σ (selection): Filters rows — σ(age > 25)(Students)

● π (projection): Chooses columns — π(name, dept)(Students)

● ⨝ (join): Combines tables

Execution Plan: Logical query → Relational algebra → Physical plan → Execution

3.2 Query Optimization

● Goal: minimize I/O and computation cost

● Use indexes, reduce intermediate data

● Join order matters: smaller tables joined first reduce size early
3.3 Join Algorithms

● Nested Loop Join:

for each row in R:


for each row in S:
if match: output

● Hash Join:

○ Build hash table on smaller input

○ Probe for matches in second input

● Sort-Merge Join:

○ Sort both inputs, then merge linearly

3.4 Buffer Management

● Manages which pages are kept in memory

● Replacement Policy: LRU (Least Recently Used)

● Pinning: lock critical pages in buffer

3.5 Parallel and Distributed Processing

● Partitioned Parallelism: Divide input data

● Pipelined Parallelism: Output of one operation fed into next

● Example: Big SQL query run across 4 servers, each handling a data slice
📍 Section 4: Transaction Processing
4.1 ACID Properties in Detail

● Atomicity: transaction is all-or-nothing

● Consistency: DB always moves from one valid state to another

● Isolation: concurrent transactions don’t interfere

● Durability: changes persist despite crashes

Example: Transferring ₹100 between two accounts — either both debit and credit happen or
neither

4.2 Concurrency Control Techniques

● Two-Phase Locking (2PL):

○ Growing phase: locks acquired

○ Shrinking phase: locks released

● Prevents non-serializable execution

4.3 Isolation Levels (with anomalies)


Isolation Level Dirty Read Non-repeatable Read Phantom Read

Read Uncommitted ✅ ✅ ✅

Read Committed ❌ ✅ ✅

Repeatable Read ❌ ❌ ✅

Serializable ❌ ❌ ❌

4.4 Write-Ahead Logging (WAL)

● Log changes before applying them

● REDO: for committed transactions

● UNDO: for uncommitted ones

4.5 Recovery Techniques

● ARIES (Analysis, Redo, Undo)

● Checkpoints reduce log scanning during recovery

4.6 Serializability

● Conflict Serializability: schedule can be converted to serial form by reordering

● Precedence Graph: Nodes = transactions; edge = dependency

○ Cycle ⇒ not serializable


📍 Section 5: Database Design
5.1 ER Modeling Example

● Entities: Student, Course

● Attributes: name, id, title

● Relationships: takes (Student ↔ Course)

5.2 Mapping to Relational Schema

● Each entity → table

● Each relationship → table with foreign keys

5.3 Normalization

● 1NF: atomic values

● 2NF: no partial dependency

● 3NF: no transitive dependency

● BCNF: stronger than 3NF

Example:

● Unnormalized:

Student(id, name, courses)

● 1NF:
Student(id, name), Enroll(student_id, course_id)

5.4 Denormalization

● Speeds up reads at cost of redundancy

● Example: merge student and enrollment tables

5.5 Schema Design Tips

● Use natural primary keys where possible

● Enforce constraints using CHECK, NOT NULL, UNIQUE

● Example: CHECK (salary > 0)


📍 Section 6: Beyond Relational Data
6.1 NoSQL Models

● Key-Value: simple, fast (e.g., Redis)

● Document: nested data (e.g., MongoDB)

● Column-Family: large-scale writes (e.g., Cassandra)

● Graph: relationships (e.g., Neo4j)

6.2 CAP Theorem

● Can only guarantee 2 of:

○ Consistency: all nodes same data

○ Availability: always responds

○ Partition Tolerance: survives network splits

6.3 Use Case Mapping

Model Best Use Case

Key-Value Caching sessions

Document Flexible schemas

Columnar Analytics on time-series

Graph Social networks, fraud

6.4 NoSQL Query Examples

// MongoDB
{ "age": { "$gt": 25 } }

// Neo4j
MATCH (a)-[:FRIEND]->(b) RETURN b.name;

6.5 Scaling

● Horizontal: add servers (scale out)


● Vertical: upgrade server (scale up)

● Sharding: split by key ranges (user_id < 1000 → node1)

6.6 Consistency Models

● Strong Consistency: read = latest write (RDBMS)

● Eventual Consistency: updates converge over time (Amazon S3)

● Causal Consistency: preserve cause-effect order (some NoSQLs)

You might also like