Name of student: George P Jose
Course and Branch: Bachelor of Technology
Semester: Five
Title of subject: INTRODUCTION TO DATABASE SYSTEM
Subject code: BTCSE 501
Current Mobile Number: +91 9895566322
Module 1
1. Salient Features of Database Systems:
• Data Integration: Combines data from various sources into a single, unified view,
eliminating redundancy and inconsistencies.
• Data Sharing: Allows multiple users to access and update data simultaneously,
promoting collaboration and information sharing.
• Data Security: Provides mechanisms to protect data from unauthorized access, ensuring
confidentiality and integrity.
2. DBMS vs. File System:
Feature DBMS File System
Data Structured, relational format Unstructured, flat files
Organization
Data Centralized control, enforcing data Decentralized control,
Management integrity limited data integrity
Data Security Robust security features Limited security
measures
Data Logical data independence (data Dependent on physical
Independence can change without affecting storage
applications)
3. DBA vs. Data Manager:
• DBA (Database Administrator): Responsible for overall database management,
including design, implementation, maintenance, and security.
• Data Manager: Focuses on data-related tasks within an organization, such as data
collection, analysis, and reporting.
4. Database Languages:
• DDL (Data Definition Language): Used to create, modify, and delete database
structures like tables, indexes, and views.
• DML (Data Manipulation Language): Used to insert, update, delete, and retrieve data
from the database.
• DCL (Data Control Language): Used to manage user access and privileges, ensuring
data security.
5. Data Model:
A data model represents the logical structure of data within a database.
• Hierarchical Model: Represents data as a tree structure, with a parent-child relationship
between nodes.
• Network Model: Similar to the hierarchical model but allows multiple parent-child
relationships, creating a more complex network structure.
• Relational Model: Represents data as a collection of tables, with rows representing
records and columns representing attributes.
6. Three-Tier Client-Server Architecture:
• Presentation Layer: Handles user interface and input/output.
• Application Layer: Processes business logic and interacts with the database.
• Data Layer: Stores and manages data.
Mapping between scheme levels is necessary to ensure data consistency and integrity
across the different layers.
7. ER Diagram:
An ER (Entity-Relationship) diagram is a graphical representation of data entities and their
relationships.
Constraints that can be added include:
• Cardinality: Specifies the number of instances of one entity that can be related to
instances of another entity (e.g., one-to-one, one-to-many, many-to-many).
• Participation: Indicates whether an entity must or can participate in a relationship (e.g.,
total participation, partial participation).
• Attributes: Describe the properties of entities and relationships.
• Keys: Identify unique instances of entities.
8. ER Diagram Conversion to Relational Schema:
9. ER Diagram for Car Insurance Company:
10. Weak Entity Sets:
Weak entity sets are dependent on other entity sets for their existence and cannot be
identified independently. They are typically represented by a double-lined rectangle in an ER
diagram.
Example: A weak entity set "Order" might depend on a strong entity set "Customer" for its
existence, as an order cannot exist without a customer.
11. Relational Model and Relational Algebra:
• Relational Model: Represents data as a collection of tables, with rows representing
records and columns representing attributes. Key concepts include:
o Relation: A set of tuples (rows) with the same attributes (columns).
o Attribute: A named property of a relation.
o Tuple: A single row in a relation.
o Domain: The set of possible values for an attribute.
o Key: A set of attributes that uniquely identifies a tuple.
• Relational Algebra: A set of operators used to manipulate relations and retrieve data
from a database. Common operators include:
o Select: Extracts tuples that satisfy a given condition.
o Project: Extracts specific attributes from a relation.
o Join: Combines tuples from two relations based on a common attribute.
o Union: Combines all tuples from two relations.
o Intersection: Finds tuples that exist in both relations.
o Difference: Finds tuples that exist in the first relation but not the second.
Module 2
1. What is RDBMS? Is there any order of tuples? Explain various operations
of relational algebra.
• RDBMS (Relational Database Management System): A type of database
management system that stores data in tables (relations) and allows for relationships
between tables. Examples include MySQL, PostgreSQL, and Oracle.
• Order of Tuples: In an RDBMS, tuples (rows) in a relation (table) do not have a
specific order. The order is determined only when data is retrieved using queries with
specific sorting instructions.
• Operations of Relational Algebra:
o Selection (σ): Selects rows that satisfy a given predicate.
o Projection (π): Selects specific columns from a table.
o Union (∪): Combines the results of two queries.
o Set Difference (−): Returns rows from the first query that are not in the
second query.
o Cartesian Product (×): Combines all rows from two tables.
o Rename (ρ): Renames the output relation.
o Join (⨝): Combines rows from two tables based on a related column.
2. Discuss the operations of relational algebra and the purpose of each.
• Selection (σ): Filters rows based on a condition.
• Projection (π): Reduces the number of columns in the result.
• Union (∪): Merges results from two queries, removing duplicates.
• Set Difference (−): Finds rows in one query that are not in another.
• Cartesian Product (×): Generates all possible combinations of rows from two tables.
• Rename (ρ): Changes the name of the output relation for clarity.
• Join (⨝): Combines rows from two tables based on a common attribute.
3. SQL Statements for Updates
• Update the year of the album with name ‘SUHANA RATH’ to 2018:
• UPDATE ALBUMS
• SET YEAR = 2018
• WHERE ALBUM-NAME = 'SUHANA RATH';
• Delete the album ‘YADON KI BAARISH’ along with all the songs in it:
• DELETE FROM SONGS
• WHERE ALBUM-ID IN (SELECT ALBUM-ID FROM ALBUMS WHERE ALBUM-NAME =
'YADON KI BAARISH');
•
• DELETE FROM ALBUMS
• WHERE ALBUM-NAME = 'YADON KI BAARISH';
4. Relational Algebra Expressions
For the relation ( R(A,B,C,D) ) where ( A ) is a key:
• ( \pi_{A,B}(\sigma_{A=2 \land B=3}®) )
• ( \pi_{A,B}(\sigma_{A=2}® \cap \sigma_{B=3}®) )
• ( \pi_{A,B}(\sigma_{A=2}®) \cap \pi_{A,B}(\sigma_{B=3}®) )
5. Differentiate between Primary Key, Candidate Key, and Super Key
• Primary Key: A unique identifier for a row in a table. Cannot be null.
• Candidate Key: A set of attributes that uniquely identify a row. A table can have
multiple candidate keys.
• Super Key: A set of attributes that uniquely identify a row. It can include additional
attributes not necessary for uniqueness.
6. Referential Integrity and Foreign Key
• Referential Integrity: Ensures that a foreign key value always points to an existing
row in another table.
• Implementation: Using foreign keys.
o Example: In a database with Orders and Customers tables, the Orders table
might have a CustomerID foreign key that references the CustomerID in the
Customers table.
7. Business Trips Database
• Foreign Keys:
o TRIP.Ssn references SALESPERSON.Ssn
o EXPENSE.TripId references TRIP.TripId
• Relational Algebra Expressions:
o Salespersons who traveled between Mumbai and Delhi with expenses >
50000:
o π_Name(σ_FromCity='Mumbai' ∧ ToCity='Delhi' ∧ Amount>50000
(SALESPERSON ⨝ TRIP ⨝ EXPENSE))
o Salesperson with greatest travel expenses:
o π_Name(σ_Amount=MAX(Amount) (SALESPERSON ⨝ TRIP ⨝ EXPENSE))
8. SQL Queries for Faculty and Courses
• 3-credit courses offered by ‘CS’ department:
• SELECT CNO, CNAME
• FROM COURSE
• WHERE CREDITS = 3 AND ODNO = (SELECT DNO FROM DEPARTMENT WHERE DNAME
= 'CS');
• Faculty teaching maximum 3 courses:
• SELECT NAME
• FROM FACULTY
• WHERE FNO IN (
• SELECT FNO
• FROM TEACHING
• GROUP BY FNO
• HAVING COUNT(CNO) <= 3
• );
• Departments with number of courses offered:
• SELECT DNAME, COUNT(CNO) AS CourseCount
• FROM DEPARTMENT LEFT JOIN COURSE ON DEPARTMENT.DNO = COURSE.ODNO
• GROUP BY DNAME
• HAVING COUNT(CNO) > 0
• ORDER BY CourseCount ASC;
9. Relational Algebra for Student Database
• Names of female students:
• π_NAME(σ_GENDER='Female'(STUDENT))
• Names of male students with advisor name:
• π_NAME, PNAME(σ_GENDER='Male'(STUDENT ⨝ ADVISOR=PROFID(PROFESSOR)))
• Students not enrolled in any course:
• π_ROLLNO, NAME(STUDENT) - π_ROLLNO, NAME(STUDENT ⨝ ENROLLMENT)
Module 3
1. Compare DML and DDL with Examples
DML (Data Manipulation Language):
• Purpose: Used to retrieve, insert, update, and delete data in a database.
• Example Statement:
• INSERT INTO Employees (EmployeeID, FirstName, LastName) VALUES (1,
'John', 'Doe');
DDL (Data Definition Language):
• Purpose: Used to define and modify database structures such as tables, indexes, and
schemas.
• Example Statement:
• CREATE TABLE Employees (
• EmployeeID INT PRIMARY KEY,
• FirstName VARCHAR(50),
• LastName VARCHAR(50)
• );
2. Nested Query in SQL
A nested query (or subquery) is a query within another SQL query. The inner query is
executed first, and its result is used by the outer query.
Example:
SELECT EmployeeID, FirstName, LastName
FROM Employees
WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE
DepartmentName = 'Sales');
In this example, the inner query finds the DepartmentID for the ‘Sales’ department, and the
outer query retrieves employees in that department.
3. Aggregate Functions in SQL
Aggregate functions perform calculations on a set of values and return a single value.
Common aggregate functions include:
• SUM()
• AVG()
• COUNT()
• MAX()
• MIN()
4. Types of Index in Index Sequential File
• Primary Index: Built on the primary key of a file.
• Secondary Index: Built on non-primary key fields.
• Dense Index: Contains an index record for every search key value in the data file.
• Sparse Index: Contains index records for only some of the search key values.
5. B-Tree vs. B±Tree
• B-Tree: All keys are stored in internal nodes and leaf nodes. Searching can occur at
any node.
• B±Tree: All keys are stored in leaf nodes, and internal nodes only store keys to guide
the search. Leaf nodes are linked for efficient sequential access.
6. Structure of Internal and Leaf Nodes in B±Tree
• Internal Node: Contains keys and pointers to child nodes.
• Leaf Node: Contains keys and pointers to data records or next leaf node.
Example Search: To search for a record with a specific key, start at the root and follow the
pointers based on the key comparisons until reaching the leaf node.
7. Block Accesses Calculation
Given:
• Record Size: 80 bytes
• Block Size: 512 bytes
• Block Pointer Size: 5 bytes
• Primary Key Size: 15 bytes
(i) No Index:
• Number of records per block = ( \frac{512}{80} \approx 6 )
• Number of blocks = ( \frac{10000}{6} \approx 1667 )
• Average block accesses = ( \frac{1667}{2} \approx 834 )
(ii) Multi-level Primary Index:
• First level index entries per block = ( \frac{512}{20} \approx 25 )
• Second level index entries per block = ( \frac{512}{20} \approx 25 )
• Number of first level index blocks = ( \frac{10000}{25} \approx 400 )
• Number of second level index blocks = ( \frac{400}{25} \approx 16 )
• Total block accesses = 1 (root) + 1 (second level) + 1 (first level) + 1 (data block) = 4
8. Dense Index vs. Sparse Index
• Dense Index: Contains an index entry for every search key value.
o Example: Index on a table with every employee’s ID.
• Sparse Index: Contains index entries for only some of the search key values.
o Example: Index on every 10th employee’s ID.
9. Hashing
Hashing is a technique to map data of arbitrary size to fixed-size values (hash values). It is
used for efficient data retrieval.
• Hash Function: Converts a key into a hash value.
• Hash Table: Stores data based on hash values.
• Collision Handling: Methods like chaining or open addressing to handle hash
collisions.
Module 4
1. Normalization and Normal Forms
Normalization is the process of organizing data in a database to reduce redundancy and
improve data integrity. It involves dividing large tables into smaller, related tables and
defining relationships between them. The goal is to ensure that each table contains data about
only one type of entity.
First Normal Form (1NF)
• Definition: A table is in 1NF if it contains only atomic (indivisible) values and each column
contains values of a single type.
• Example: A table with columns for student ID, student name, and courses taken, where each
course is listed in a separate row.
Second Normal Form (2NF)
• Definition: A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally
dependent on the primary key.
• Example: If a table has a composite primary key (e.g., student ID and course ID), each non-
key attribute must depend on both parts of the key, not just one.
Third Normal Form (3NF)
• Definition: A table is in 3NF if it is in 2NF and all the attributes are functionally dependent
only on the primary key.
• Example: If a table has a primary key of student ID, no non-key attribute should depend on
another non-key attribute.
Boyce-Codd Normal Form (BCNF)
• Definition: A table is in BCNF if it is in 3NF and for every functional dependency (X → Y), X is
a super key.
• Example: This is a stricter version of 3NF. If a table has a functional dependency where a
non-super key determines another attribute, it must be decomposed.
2. Fully Functional Dependencies and Partial Functional Dependencies
• Fully Functional Dependency: An attribute is fully functionally dependent on a set
of attributes if it is dependent on the whole set and not just a part of it.
o Example: In a table with a composite key (A, B), if C is dependent on both A and B,
then C is fully functionally dependent on (A, B).
• Partial Functional Dependency: An attribute is partially dependent on a set of
attributes if it is dependent on only a part of the composite key.
o Example: In a table with a composite key (A, B), if C is dependent only on A, then C is
partially dependent on (A, B).
3. Transitive Dependency
• Definition: A transitive dependency occurs when a non-key attribute depends on another
non-key attribute.
o Example: If A → B and B → C, then A → C is a transitive dependency.
4. Functional Dependencies (FDs) and Closure of a Set of FDs
• Functional Dependency: A relationship between two attributes, typically between a
key and a non-key attribute.
o Example: If A → B, then B is functionally dependent on A.
• Closure of a Set of FDs: The closure of a set of FDs is the set of all attributes that can
be functionally determined by a given set of attributes.
o Example: If F = {A → B, B → C}, the closure of A (denoted as A+) is {A, B, C}.
5. Equivalent Sets of Functional Dependencies
• Definition: Two sets of functional dependencies are equivalent if they imply each other.
o Example: If F1 implies F2 and F2 implies F1, then F1 and F2 are equivalent.
6. Key for Relation R and Decomposition into 2NF and 3NF
Given the relation R = {A, B, C, D, E, F, G, H} and F = {A → DE, B → F, AB → C, C →
GH, G → H}:
• Key for R: The key is AB because AB can determine all other attributes.
• Decomposition into 2NF:
o R1 = {A, B, C}
o R2 = {A, D, E}
o R3 = {B, F}
o R4 = {C, G, H}
• Decomposition into 3NF: The above decomposition is already in 3NF.
7. Minimal Cover of Functional Dependencies
Given F = {A → BC, B → C, A → B, AB → C}:
• Minimal Cover:
o Remove extraneous attributes: {A → B, A → C, B → C}
8. Lossless Join Property
• Definition: A decomposition of a relation is lossless if the join of the decomposed relations
results in the original relation without any loss of information.
• Importance: Ensures that no data is lost during decomposition.
9. Checking Lossless Decomposition
Given R(A1, A2, A3, A4, A5) with FDs A1 → A2A4 and A4 → A5:
• Decomposition:
o R1(A1, A2, A3)
o R2(A1, A4)
o R3(A2, A4, A5)
• Check for Lossless Join:
o The decomposition is lossless if the common attributes form a super key in one of
the decomposed relations.
Module 5
1. State Diagram of a Transaction
A transaction typically goes through several states during its lifecycle. The main states are:
• Active: The transaction is currently executing.
• Partially Committed: The transaction has completed its final operation but is not yet
committed.
• Committed: The transaction has been successfully completed and its changes are now
permanent.
• Failed: The transaction has encountered an error and cannot proceed.
• Aborted: The transaction has been rolled back and its changes have been undone.
Desirable Properties of Transactions:
• Atomicity: Ensures that all operations within a transaction are completed; if not, the
transaction is aborted.
• Consistency: Ensures that a transaction transforms the database from one consistent state to
another.
• Isolation: Ensures that the operations of a transaction are isolated from other transactions.
• Durability: Ensures that the changes made by a committed transaction are permanent.
2. ACID Properties
The ACID properties are crucial for ensuring reliable transactions in a database system:
• Atomicity: Ensures that all operations within a transaction are completed successfully or
none at all.
• Consistency: Ensures that a transaction brings the database from one valid state to another.
• Isolation: Ensures that the execution of one transaction is isolated from others.
• Durability: Ensures that once a transaction is committed, it remains so, even in the event of a
system failure.
3. Concurrent Schedule Issue
In the given schedule:
• T1 issues a read lock on Y and reads Y.
• T2 issues a read lock on X and reads X.
• T1 issues a write lock on X.
• T2 issues a write lock on Y.
The problem here is a deadlock situation where T1 is waiting for T2 to release the lock on Y,
and T2 is waiting for T1 to release the lock on X.
Solution: One way to resolve this is by using a deadlock detection and resolution mechanism,
such as:
• Timeouts: Abort a transaction if it waits too long.
• Wait-Die and Wound-Wait Schemes: Use timestamps to decide which transaction should
wait and which should be aborted.
4. Serializability
Serializability is the concept that ensures the outcome of executing transactions concurrently
is the same as if the transactions were executed serially, one after the other.
Conflict Serializability: To check for conflict serializability, we can use a precedence graph
(or serialization graph). If the graph is acyclic, the schedule is conflict-serializable.
Example:
• T1: R(X), W(X)
• T2: R(X), W(Y)
Construct a precedence graph:
• T1 → T2 (because T1 writes X after T2 reads X)
• T2 → T1 (because T2 writes Y after T1 reads Y)
If there are no cycles, the schedule is conflict-serializable.
5. Precedence Graph Example
To illustrate conflict serializability using a precedence graph:
• Transactions: T1 and T2
• Operations:
o T1: R(A), W(B)
o T2: R(B), W(A)
Steps:
1. Draw nodes for T1 and T2.
2. Add directed edges based on conflicts:
o T1 → T2 (T1 reads A before T2 writes A)
o T2 → T1 (T2 reads B before T1 writes B)
If the graph has no cycles, the schedule is conflict-serializable.
6. Two-Phase Locking Protocol
Two-Phase Locking (2PL) ensures serializability by dividing the transaction execution into
two phases:
• Growing Phase: A transaction may obtain locks but not release any.
• Shrinking Phase: A transaction may release locks but not obtain any new ones.
This protocol guarantees serializability because it prevents cycles in the precedence graph.
7. Types of Locks
Common types of locks used in concurrency control include:
• Shared Lock (S): Allows multiple transactions to read a resource.
• Exclusive Lock (X): Allows only one transaction to write to a resource.
• Intention Locks (IS, IX): Used to indicate an intention to acquire a shared or exclusive lock on
a lower-level resource.
8. Need for Concurrency Control
Concurrency control is needed to ensure the consistency and isolation of transactions.
Problems that can occur when transactions run concurrently include:
• Lost Update: When two transactions read the same data and update it, leading to one
update being lost.
• Dirty Read: When a transaction reads data written by another uncommitted transaction.
• Non-repeatable Read: When a transaction reads the same data twice and gets different
results.
• Phantom Read: When a transaction reads a set of rows that satisfy a condition and another
transaction inserts or deletes rows that satisfy the condition.
Examples:
• Lost Update: T1 and T2 both read X=100. T1 sets X=200, T2 sets X=300. Final value should be
300, but it might be 200.
• Dirty Read: T1 writes X=200 but hasn’t committed. T2 reads X=200. If T1 rolls back, T2 has
read incorrect data.
• Non-repeatable Read: T1 reads X=100. T2 updates X=200. T1 reads X again and gets 200.
• Phantom Read: T1 reads all rows where age > 30. T2 inserts a new row with age 35. T1 reads
again and sees the new row.