Q1 Discuss relational algebra operations with one
example each.(12 marks)
Relational Algebra Operations
Relational Algebra is a procedural query language used to query data from
relational databases. It uses operators to perform queries, where each
operator takes one or more relations as input and produces a new relation
as output.
Relational Algebra operations are classified into:
1. Selection (σ)
Definition: Selects rows (tuples) from a relation that satisfy a given
condition.
Notation: σcondition(Relation)
Example:
Let Employee(EID, Name, Dept, Salary) be a relation.
Query: Find employees who work in the 'HR' department.
Relational Algebra:
σDept='HR'(Employee)
2. Projection (π)
Definition: Extracts specific columns (attributes) from a relation.
Notation: πattribute_list(Relation)
Example:
Query: Get the names of all employees.
Relational Algebra:
πName(Employee)
3. Union ( ∪ )
Definition: Combines tuples from two relations, removing duplicates.
Notation: R ∪ S
Both relations must be union-compatible (same attributes and domains).
Example:
Let A and B be relations with the same schema.
Query: List all employees from both branches.
A∪B
Relational Algebra:
4. Set Difference ( − )
Definition: Returns tuples that are in the first relation but not in the
second.
Notation: R − S
Example:
Query: Employees in relation A but not in B.
Relational Algebra:
A−B
5. Cartesian Product ( × )
Definition: Combines each tuple of the first relation with every tuple of
the second.
Notation: R × S
Example:
Let Employee and Department be two relations.
Query: Get all combinations of employees and departments.
Relational Algebra:
Employee × Department
6. Rename (ρ)
Definition: Renames the output relation or its attributes.
Notation: ρnew_name(attribute1, attribute2,...)(Relation)
Example:
Rename Employee relation as Staff.
Relational Algebra:
ρStaff(Employee)
7. Intersection ( ∩ )
Definition: Returns tuples that are common in both relations.
Notation: R ∩ S
Example:
Query: Employees present in both A and B.
Relational Algebra:
A∩B
8. Join (⨝)
Definition: Combines related tuples from two relations based on a
common attribute.
Types: Natural Join, Theta Join, Equi Join
Example (Natural Join):
Let Employee(EID, Name, DeptID) and Department(DeptID, DeptName)
Query: Get employee names along with department names.
Employee ⨝ Department
Relational Algebra:
Q2. Explain Tuple Relational Calculus (TRC) with suitable
example
Tuple Relational Calculus (TRC)
Definition:
Tuple Relational Calculus is a non-procedural query language used to
describe the desired information without specifying the sequence of
operations to retrieve it. It uses tuple variables to represent rows of a
relation.
General Syntax:
{ t | P(t) }
t is a tuple variable.
P(t) is a predicate (condition) that describes t.
Example:
Let’s consider a relation:
Employee(EID, Name, Dept, Salary)
Query:
Find the names of all employees who work in the 'HR' department.
TRC Expression:
{ t.Name | Employee(t) ∧ t.Dept = 'HR' }
Explanation:
t is a tuple from the relation Employee.
The condition t.Dept = 'HR' selects only those tuples where the
department is HR.
The result will contain the Name of those employees.
Features:
It focuses on what to retrieve, not how.
Based on first-order predicate logic.
More declarative and expressive than relational algebra.
B) Describe expressive power of algebra and calculus
Expressive Power of Algebra and Calculus
Definition:
The expressive power of a query language refers to the range or types
of queries that can be expressed using that language.
1. Relational Algebra
A procedural query language.
Uses a set of operations (select, project, join, union, etc.) to
manipulate relations.
Specifies how to obtain the result step by step.
2. Relational Calculus
A non-procedural (declarative) query language.
Specifies what result is desired without specifying the procedure.
Includes:
o Tuple Relational Calculus (TRC)
o Domain Relational Calculus (DRC)
Comparison of Expressive Power
Relational Relational
Feature
Algebra Calculus
Query Type Procedural Non-procedural
Syntax Operator-based Predicate logic
Approach How to retrieve What to retrieve
Expressivene
Same Same
ss
Important Point:
Relational Algebra and Relational Calculus are equivalent in
expressive power.
Any query that can be expressed in one can also be expressed in the
other.
Q3. What are set operators? Explain various set
operations that can be performed using a query
Set Operators in DBMS
Definition:
Set operators are used in relational algebra and SQL to perform
operations between two or more relations (tables) that have the same
number and type of attributes (i.e., they must be union-compatible).
Common Set Operations
1. UNION ( ∪ )
Combines all tuples from two relations and removes duplicates.
Both relations must be union-compatible.
R∪S
Relational Algebra:
SQL:
SELECT * FROM R
UNION
SELECT * FROM S;
Example: Get names of all employees from both Branch1 and Branch2.
2. INTERSECTION ( ∩ )
Returns only the tuples that are common to both relations.
Relational Algebra:
R∩S
SQL:
SELECT * FROM R
INTERSECT
SELECT * FROM S;
Example: Get employees who work in both ProjectA and ProjectB.
3. SET DIFFERENCE ( − )
Returns tuples that are in the first relation but not in the second.
Relational Algebra:
R−S
SQL (in most databases, written as EXCEPT):
SELECT * FROM R
EXCEPT
SELECT * FROM S;
Example: Get employees in DeptA but not in DeptB.
4. CARTESIAN PRODUCT ( × )
Returns the cross-product of two relations.
Each tuple in the first relation is paired with every tuple in the
second.
Relational Algebra:
R×S
Example: Pair each employee with every department (used in joins).
Conditions for Set Operations
Union-compatibility:
o Same number of columns
o Same data types and order
B) Explain in detail the select, project operations in
relational algebra with an example?
1. SELECT Operation (σ)
Definition:
The SELECT operation is used to filter rows (tuples) from a relation that
satisfy a given predicate (condition).
Notation:
σ<condition>(Relation)
Purpose:
To extract specific rows based on a condition.
It is horizontal filtering (filters rows).
Example:
Relation: Employee
EI Dep Salar
Name
D t y
5000
1 Alice HR
0
6000
2 Bob IT
0
Charli 5500
3 HR
e 0
Query: Find all employees working in the HR department.
Relational Algebra:
σ<Dept = 'HR'>(Employee)
Result:
EI Dep Salar
Name
D t y
5000
1 Alice HR
0
Charli 5500
3 HR
e 0
2. PROJECT Operation (π)
Definition:
The PROJECT operation is used to retrieve specific columns
(attributes) from a relation.
Notation:
π<attribute_list>(Relation)
Purpose:
To extract specific columns only.
It is vertical filtering (filters columns).
Removes duplicate values automatically.
Example:
Query: Get the names of all employees.
Relational Algebra:
π<Name>(Employee)
Result:
Name
Alice
Bob
Charli
e
Key Differences Between SELECT and PROJECT:
Feature SELECT (σ) PROJECT (π)
Columns
Operates On Rows (tuples)
(attributes)
Based on Based on attribute
Filters
condition list
Removes
No Yes
Duplicates
Q4. Explain different Join operations in Relational
Algebra with suitable examples.
Definition:
A Join operation in relational algebra combines related tuples from two
relations based on a common attribute(s). It is a binary operation and
is used to extract meaningful data spread across multiple relations.
🔹 1. Theta Join (θ Join)
Notation:
R ⨝<θ> S
Definition:
Combines tuples from two relations R and S based on a general condition
θ (like =, ≠, <, >).
Example:
Relations:
Employee(EID, Name, DeptID)
Department(DeptID, DeptName)
Query: Get all employee and department pairs where Employee.DeptID =
Department.DeptID.
Relational Algebra:
Employee ⨝<Employee.DeptID = Department.DeptID> Department
🔹 2. Equi Join
Definition:
A Theta Join where the condition is based on equality (=) only.
Example:
Same as above:
Employee ⨝<Employee.DeptID = Department.DeptID> Department
Note: It results in duplicate attributes (both DeptID columns will be
present).
🔹 3. Natural Join (⋈)
Notation:
R⋈S
Definition:
A special type of Equi Join that automatically joins relations on all
attributes with the same name, and removes duplicate columns.
Example:
Employee ⋈ Department
Here, it joins on DeptID and removes one of the duplicate DeptID columns
in the result.
🔹 4. Outer Joins
Outer Joins preserve unmatched tuples from one or both relations by
filling with NULLs.
a) Left Outer Join (⟕)
Includes all tuples from the left relation (R), matched or not.
Example:
Employee ⟕ Department
b) Right Outer Join (⟖)
Includes all tuples from the right relation (S), matched or not.
Example:
Employee ⟖ Department
c) Full Outer Join (⟗)
Includes all tuples from both relations, matched or not.
Example:
Employee ⟗ Department
Note: Outer joins are not part of basic relational algebra but are
supported in extended relational algebra or SQL.
🔹 5. Semi Join
Notation:
R ▷ S (Left Semi Join)
S ◁ R (Right Semi Join)
Definition:
Returns only tuples from one relation (usually the left), where a
matching tuple exists in the other.
Example:
Get employees who work in a department.
Employee ▷ Department
🔹 6. Self Join
Definition:
A join of a relation with itself. Useful to compare rows within the same
relation.
Example:
Find pairs of employees who work in the same department.
E1(EID, Name, DeptID), E2(EID, Name, DeptID)
σ<E1.DeptID = E2.DeptID ∧ E1.EID ≠ E2.EID>(E1 × E2)
Q5. Write short notes on Domain Relational Calculus
(DRC) with suitable Example
✅ Definition:
Domain Relational Calculus (DRC) is a non-procedural query language
in relational database theory, where domain variables (variables that
take values from attribute domains) are used to specify queries.
Unlike relational algebra, which tells how to retrieve data, DRC specifies
what data is required.
✅ Syntax:
{ <x1, x2, ..., xn> | P(x1, x2, ..., xn) }
x1, x2, ..., xn are domain variables.
logical operators (∧, ∨, ¬, ⇒).
P(...) is a logical predicate (condition) involving variables and
The result is a set of tuples that satisfy the predicate P.
✅ Example Relation:
Employee(EID, Name, Dept, Salary)
EI Nam Dep Salar
D e t y
5000
1 Alice HR
0
6000
2 Bob IT
0
EI Nam Dep Salar
D e t y
5500
3 Carol HR
0
✅ Example Query:
Q: Retrieve names of employees who work in the 'HR' department.
Domain Relational Calculus Expression:
{ <N> | ∃E ∃D ∃S (Employee(E, N, D, S) ∧ D = 'HR') }
Explanation:
N is the domain variable for Name.
The tuple <E, N, D, S> represents an entry in the Employee relation.
The query finds all values of N such that there exists a tuple in
Employee where Dept = 'HR'.
✅ Components of DRC:
1. Domain Variables: Represent individual attribute values.
2. Logical Connectives: ∧ (AND), ∨ (OR), ¬ (NOT), ⇒ (IMPLIES)
3. Quantifiers:
o ∃ (Exists): There exists
o ∀ (For all): Universal quantifier
4. Predicate: Conditions using variables.
✅ Another Example:
Q: Get names and salaries of employees earning more than 55000.
DRC Expression:
{ <N, S> | ∃E ∃D (Employee(E, N, D, S) ∧ S > 55000) }
✅ Features of DRC:
Declarative: Describes what result is desired.
Based on first-order logic.
Can express all queries expressible in relational algebra.
Powerful but can be unsafe if queries return infinite results.
✅ Safe vs Unsafe Queries:
Safe Query: Always returns a finite set of results.
Unsafe Query: May return infinite results.
In practice, only safe queries are allowed in systems.
✅ Comparison with Tuple Relational Calculus (TRC):
Feature TRC DRC
Tuple variables (entire Domain variables (individual
Uses
rows) values)
Output Tuples Tuples of values
Example Variable
t.Name <N>
Form
Style Closer to SQL More mathematical
UNIT-3
Q1. Explain Pitfalls of RDBD
✅ Pitfalls of Relational Database Design
While designing a relational database, certain design mistakes—called
pitfalls—can lead to problems like data inconsistency, redundancy, and
performance issues. These typically occur due to poor normalization or
lack of proper structure.
🔹 1. Data Redundancy
Description: Same data is repeated in multiple places.
Problem: Increases storage usage and makes updates harder.
Example: Storing department name repeatedly with each employee
record.
🔹 2. Update Anomalies
Description: Difficulty in updating data accurately due to
redundancy.
Problem: Requires multiple updates; missing one causes
inconsistency.
Example: Changing department name in one row but not in others.
🔹 3. Insertion Anomalies
Description: Unable to insert data due to absence of other required
data.
Problem: Forces unnecessary data entry.
Example: Cannot insert a department unless an employee is also
added.
🔹 4. Deletion Anomalies
Description: Deleting a record unintentionally removes useful
information.
Problem: Loss of data that should have been retained.
Example: Deleting last employee of a department also removes
department info.
🔹 5. Poor Normalization
Description: Tables not designed following normalization rules
(1NF, 2NF, 3NF).
Problem: Leads to all the above anomalies and performance issues.
🔹 6. Loss of Data Integrity
Description: Violations in accuracy or consistency of data.
Problem: May result from improper constraints or foreign key
misuse.
B) Write about four types of NoSQL Databases with
example.
✅ Types of NoSQL Databases with Examples
NoSQL databases are designed for scalable, flexible, and high-
performance data storage, especially useful for handling unstructured
or semi-structured data.
There are four main types of NoSQL databases:
🔹 1. Document-Oriented Databases
Description:
Store data as documents (usually in JSON, BSON, or XML format).
Each document can have a different structure.
Ideal for storing complex, nested data.
Example:
🟢 MongoDB
"name": "Alice",
"age": 25,
"skills": ["Python", "SQL"]
Use Case: Content management systems, user profiles, catalogs.
🔹 2. Key-Value Stores
Description:
Store data as key-value pairs.
Simple and fast for lookup operations.
Highly scalable.
Example:
🟢 Redis, Amazon DynamoDB
"user:1001" → {"name": "John", "email": "[email protected]"}
Use Case: Caching, session storage, user preferences.
🔹 3. Column-Family Stores
Description:
Store data in columns instead of rows (opposite of RDBMS).
Good for handling large volumes of data with high read/write
speeds.
Example:
🟢 Apache Cassandra, HBase
Row Key: user123
Name → Alice
Age → 25
City → Mumbai
Use Case: Real-time analytics, recommendation engines, time-series
data.
4. Graph Databases
Description:
Store data as nodes (entities) and edges (relationships).
Ideal for managing highly connected data.
Example:
🟢 Neo4j
(Alice) -[FRIEND]-> (Bob)
Use Case: Social networks, fraud detection, recommendation systems.
✅ Conclusion
Each type of NoSQL database is optimized for different use cases:
Document → Flexible structure
Key-Value → High performance
Column → Big data analytics
Graph → Relationship-heavy data
Choosing the right type depends on the application’s data structure and
performance needs.
Q2 Explain in details about the features of SQL and
NoSQL.
✅ Features of SQL vs NoSQL Databases
Feature SQL Databases NoSQL Databases
Relational (Tables with rows Non-relational (Document, Key-
Data Model
and columns) Value, Column, Graph)
Schema Fixed, predefined schema Dynamic or flexible schema
Query Structured Query Language No standard language (uses
Language (SQL) JSON-like or custom queries)
ACID Strong ACID properties for Follows BASE (Eventually
Compliance reliable transactions Consistent)
Vertical scaling (upgrading Horizontal scaling (adding more
Scalability
hardware) servers)
Suitable for big data,
Suitable for structured data
Use Cases unstructured, and real-time
and complex queries
apps
B) Differentiate SQL with NoSQL
✅ Difference Between SQL and NoSQL
SQL (Relational NoSQL (Non-Relational
Feature
Databases) Databases)
Relational (tables with Non-relational (Document, Key-
Data Model
rows and columns) Value, Column, Graph)
Fixed, predefined
Schema Dynamic, flexible schema
schema
Query SQL (Structured Query Varies (No standard, often JSON-
Language Language) like or API-based)
ACID Fully ACID compliant BASE model (Basically Available,
Compliance (Atomicity, Consistency,
SQL (Relational NoSQL (Non-Relational
Feature
Databases) Databases)
etc.) Soft state, Eventually consistent)
Vertical (scale-up by
Horizontal (scale-out by adding
Scalability upgrading server
more nodes/servers)
hardware)
Joins and Supports complex joins Limited or no join support; uses
Relations and relationships denormalized data
Structured data, Unstructured/semi-structured
Best For
transactional systems data, big data, real-time apps
MySQL, PostgreSQL, MongoDB, Cassandra, Redis,
Examples
Oracle, MS SQL Server Neo4j
✅ Conclusion:
SQL is ideal for structured data and complex queries with strict
consistency.
NoSQL is best for scalable, flexible, and high-performance
applications dealing with large or unstructured data.
Q3 Explain the following with examples. (a) Lossless Join
Decomposition. (b) Dependency Preserving
Decomposition.
(a) Lossless Join Decomposition (6 marks)
✅ Definition:
A Lossless Join Decomposition ensures that when a relation is
decomposed into two or more sub-relations, the natural join of
these sub-relations will result in exactly the original relation, with no
loss of information.
✅ Condition:
A decomposition of relation R into R1 and R2 is lossless if:
(R1 ∩ R2) → R1 OR (R1 ∩ R2) → R2
(i.e., the common attributes form a super key of at least one of the
decomposed relations)
✅ Example:
Let relation R(A, B, C) with functional dependency:
A→B
Decompose R into:
R1(A, B)
R2(A, C)
Common attribute = A, and since A → B, A is a super key of R1.
✅ Hence, decomposition is lossless.
✅ Verification by Natural Join:
Original R:
A BC
1xp
2yq
R1(A, B):
AB
1x
2y
R2(A, C):
AC
1p
2q
R1 ⨝ R2 = R ⇒ ✅ Lossless
✅ Conclusion:
Lossless join is essential to ensure that no information is lost when
relations are normalized and later rejoined.
(b) Dependency Preserving Decomposition (6 marks)
✅ Definition:
A decomposition is dependency preserving if all functional
dependencies (FDs) from the original relation can be enforced by
simply enforcing FDs on individual decomposed relations, without
needing to recompute joins.
✅ Why It Matters?
Dependency preservation is important to ensure:
Data integrity
Efficient FD enforcement without costly joins
✅ Example:
Let R(A, B, C)
With functional dependencies:
F = { A → B, B → C }
Decompose R into:
R1(A, B)
R2(B, C)
Now, check:
In R1: A → B ✅
In R2: B → C ✅
Both FDs are preserved in the decomposition ⇒ ✅ Dependency
Preserving
✅ Counter Example:
Let F = { A → BC }
Decompose R into:
R1(A, B)
R2(A, C)
Now:
R1 has A → B ✅
R2 has A → C ✅
But A → BC is not directly available in either R1 or R2.
Hence, ❌ Not Dependency Preserving
Q4 Explain different types of Functional
Dependencies and its advantages with suitable
example.
✅ Functional Dependencies (FDs) in DBMS
🔹 Definition:
A Functional Dependency (FD) is a constraint between two sets
of attributes in a relation.
It is denoted as:
X→Y
Which means: If two tuples have the same value for attribute(s) X,
they must have the same value for Y.
✅ Types of Functional Dependencies
1. Trivial Functional Dependency
Definition: A functional dependency X → Y is trivial if Y is a
subset of X.
Example:
{A, B} → A (Trivial, since A ⊆ {A, B})
2. Non-Trivial Functional Dependency
Definition: A functional dependency X → Y is non-trivial if Y is
not a subset of X.
Example:
A → B (Non-trivial, as B is not part of A)
3. Fully Functional Dependency
Definition: An attribute Y is fully functionally dependent on X, if
it is dependent on all of X, not just a part of it.
Example:
In R(A, B, C),
{A, B} → C (Fully dependent if C is not dependent on A or B alone)
4. Partial Dependency
Definition: When a non-prime attribute is dependent on a part of
a candidate key (not the whole key).
Occurs in: 2NF Violation
Example:
If A and B form a candidate key, and:
A → C (C is partially dependent on A only, not on A+B)
5. Transitive Dependency
Definition: If A → B and B → C, then A → C is a transitive
dependency.
Occurs in: 3NF Violation
Example:
A → B, B → C ⇒ A → C
6. Multivalued Dependency (MVD)
Definition: If for a single value of A, there are multiple independent
values of B and C, then:
A →→ B
Occurs in: 4NF
Example:
In a student-course-hobby relation:
Student →→ Course
Student →→ Hobby
7. Join Dependency
Definition: A Join Dependency occurs when a relation can be
reconstructed by joining multiple projections.
Related to 5NF (Project-Join Normal Form).
✅ Advantages of Functional Dependencies
Helps in Normalization
Identify and remove redundancy by organizing data into proper
normal forms (1NF to 5NF).
Ensures Data Integrity
Avoids anomalies like update, delete, and insert anomalies.
Reduces Redundancy
Functional dependencies help avoid repeating data unnecessarily.
Optimizes Query Performance
Well-structured tables make query execution faster and more
efficient.
Helps in Schema Design
Identifies keys and candidate keys used for table decomposition and
design.
✅ Example Summary:
Assume a relation: Student(RollNo, Name, Dept, Advisor)
RollNo → Name, Dept, Advisor (Each roll number uniquely
identifies a student)
Dept → Advisor (Each department has one advisor)
Here:
RollNo → Name is a non-trivial FD
RollNo → RollNo is a trivial FD
If RollNo is the key, and Dept → Advisor, then RollNo → Advisor is a
transitive dependency
Q5 What is Normalization? Explain types of
Normalization in detail
✅ What is Normalization?
🔹 Definition:
Normalization is a process in relational database design used to
organize data efficiently by eliminating data redundancy and
ensuring data integrity.
It involves decomposing a large, complex table into smaller, well-
structured tables following certain normal forms.
✅ Objectives of Normalization:
Remove data redundancy
Avoid update, insert, and delete anomalies
Ensure data consistency
Simplify queries and maintenance
✅ Types of Normalization (Normal Forms)
🔹 1. First Normal Form (1NF)
Rule:
Each attribute must contain atomic (indivisible) values.
No repeating groups or arrays.
Example (Unnormalized):
Student Nam
Courses
ID e
Math,
101 Alice
Physics
After 1NF:
Student Nam Cours
ID e e
101 Alice Math
Physic
101 Alice
s
🔹 2. Second Normal Form (2NF)
Rule:
Must be in 1NF
No partial dependency (i.e., non-prime attribute should depend on
the whole candidate key)
Example:
If a table has a composite key (e.g., RollNo, CourseID) and:
RollNo → Name (Partial Dependency ❌)
Move RollNo → Name to a separate table to achieve 2NF.
🔹 3. Third Normal Form (3NF)
Rule:
Must be in 2NF
No transitive dependency (non-prime attribute should not depend
on another non-prime attribute)
Example:
If:
RollNo → Dept
Dept → HOD
⇒ RollNo → HOD (Transitive ❌)
Split into two relations to remove the transitive dependency.
🔹 4. Boyce-Codd Normal Form (BCNF)
Rule:
A stricter version of 3NF
For every FD X → Y, X must be a super key
Example:
If:
StudentID → Course
Course → Faculty
Even if in 3NF, if Course is not a super key, then it violates BCNF.
🔹 5. Fourth Normal Form (4NF)
Rule:
Must be in BCNF
No Multivalued Dependencies (MVDs)
Example:
If a student can take multiple Courses and have multiple Hobbies,
then:
Student →→ Course
Student →→ Hobby
Should be split into two separate relations.
🔹 6. Fifth Normal Form (5NF)
Rule:
Must be in 4NF
Deals with Join Dependencies
Data should be reconstructible from smaller joins without loss.
✅ Benefits of Normalization
Reduces data redundancy
Prevents data anomalies
Improves data integrity
Makes databases easier to maintain
Q6. Given a Relation R=(A,B,C) and Functional
Dependencies are F={ {A,B}→{C}, {C}→{A} }.
Determine all Candidate keys of R and the normal
form of R with proper explanation.
✅ Given:
Relation: R(A, B, C)
Functional Dependencies:
F={
(A, B) → C,
C→A
}
🔹 Step 1: Find Closure of Attribute Sets to Identify Candidate
Keys
▶ Try closure of {A, B}
We compute (A, B)+
Start with:
(A, B)+ = {A, B}
From FDs:
(A, B) → C ⇒ add C
Now: {A, B, C}
C → A ⇒ A already present
⇒ So, {A, B} is a super key
✅ (A, B)+ = {A, B, C} = all attributes of R
But is it minimal? Let's check if we can remove A or B.
▶ Try closure of {B}
(B)+ = {B} ⇒ No FD applies ⇒ Not a super key
▶ Try closure of {A}
(A)+ = {A} ⇒ No FD applies ⇒ Not a super key
▶ Try closure of {C}
FD: C → A ⇒ add A
(C)+ = {C}
From (A, B) → C ⇒ need both A & B → can't proceed
Now: {C, A}
So: (C)+ = {C, A}
⇒ Not all attributes, so not a super key
▶ Try closure of {B, C}
C → A ⇒ add A → {A, B, C} ✅
(B, C)+ = {B, C}
So, {B, C} is a super key
Is it minimal? Try removing B:
(C)+ = {C, A} — missing B ⇒ Not a super key
Try removing C:
(B)+ = {B} — not a super key
✅ So, {B, C} is a candidate key
✅ Candidate Keys of R:
{A, B}
{B, C}
✅ Both are minimal super keys ⇒ Candidate Keys
🔹 Step 2: Determine the Normal Form
▶ Check for 1NF:
If all attributes are atomic → Yes
✅ R is in 1NF
▶ Check for 2NF:
Rule: No partial dependency (i.e., no non-prime attribute depends
on part of a candidate key)
Candidate keys = {A, B} and {B, C}
Prime attributes = A, B, C → All attributes are part of some
candidate key
No non-prime attributes ⇒ ✅ No partial dependency
✅ R is in 2NF
▶ Check for 3NF:
Rule: For every FD X → Y, at least one is true:
X → Y is trivial
X is a super key
Y is a prime attribute
Check each FD:
(A, B) → C
(A, B) is a candidate key ✅ — OK
C→A
C is not a candidate key
A is a prime attribute ✅ — OK
✅ So all FDs satisfy 3NF conditions
✅ R is in 3NF
▶ Check for BCNF:
Rule: For every FD X → Y, X must be a super key
(A, B) → C ✅ — (A, B) is a candidate key → OK
C → A ❌ — C is not a candidate key → Violates BCNF
❌ Not in BCNF
✅ Final Answer:
Candidate Keys: {A, B} and {B, C}
Highest Normal Form: 3NF
R is not in BCNF due to FD: C → A, where C is not a super key
Q7. What is BCNF? Explain how a relation van be
converted to BCNF
Here is a 6-mark answer for:
✅ (a) What is BCNF?
Boyce-Codd Normal Form (BCNF)
🔹 Definition:
A relation is in BCNF (Boyce-Codd Normal Form) if:
For every non-trivial functional dependency X → Y,
the determinant X is a super key.
It is a stronger version of 3NF.
BCNF removes all anomalies caused by dependencies where LHS
is not a super key.
🔹 Why BCNF?
Even if a relation is in 3NF, it may still have redundancy due to functional
dependencies where determinant is not a super key.
🔹 Steps to Convert a Relation to BCNF:
Given: Relation R and set of functional dependencies F.
✅ Step 1:
Check all functional dependencies in F.
✅ Step 2:
If there exists a dependency X → Y where:
X is not a super key,
→ then BCNF is violated.
✅ Step 3:
Decompose R into two relations:
R1 = X ∪ Y
R2 = R – (Y – X)
Repeat the process for both until all resulting relations are in BCNF.
🔹 Example:
Let relation: R(A, B, C)
Functional dependencies:
A→B
B→A
Candidate Key = {A, C} or {B, C}
Here:
A → B: A is not a super key ⇒ violates BCNF
B → A: B is not a super key ⇒ violates BCNF
✅ Decompose:
1. Use A → B, create:
o R1(A, B)
o R2(A, C)
Now both R1 and R2 can be checked separately for BCNF.
B) Compare 4NF and 5NF
Here is a clear comparison between 4NF and 5NF — suitable for 6
marks:
✅ (b) Comparison of 4NF and 5NF
5NF (Fifth Normal
Feature 4NF (Fourth Normal Form)
Form)
A relation is in 4NF if it is in
A relation is in 5NF if it is
BCNF and has no multi-
Definition in 4NF and has no join
valued dependencies
dependencies.
(MVDs).
Type of
Multi-Valued Dependency
Dependency Join Dependency (JD)
(MVD)
Removed
Redundancy due to Redundancy due to
Anomaly
independent multi-valued complex joins involving
Addressed
facts multiple tables
More complex; usually
Easier to detect and resolve
Complexity occurs in advanced
using MVD rules
database designs
Decompose to remove Decompose to ensure
Decomposition independent multi-valued lossless joins of all
facts projections
A product may be
A student can take multiple
Example Use supplied by many
courses and have multiple
Case vendors in different
hobbies
locations
✅ Example for 4NF:
Relation: Student(Name, Course, Hobby)
A student can have multiple courses and multiple hobbies
independently.
MVDs:
Name →→ Course,
Name →→ Hobby
✅ Needs decomposition into:
Student_Course(Name, Course)
Student_Hobby(Name, Hobby)
✅ Example for 5NF:
Relation: Supply(Supplier, Part, Project)
A supplier may supply a part for many projects.
Decompose into three relations:
o Supplier_Part
o Part_Project
o Supplier_Project
To remove redundancy and preserve join dependencies.
UNIT-4
Q1) Explain Briefly about Transaction Management and
Concurrency Control
🔹 1. Transaction Management
✅ Definition:
A transaction is a sequence of operations performed as a single logical
unit of work in a database. It must either complete entirely or have no
effect at all.
✅ ACID Properties:
To ensure correctness and reliability, transactions follow ACID properties:
1. Atomicity – All operations in a transaction are completed or none at
all.
2. Consistency – The database moves from one valid state to another.
3. Isolation – Concurrent transactions are isolated from each other.
4. Durability – Once a transaction is committed, changes are
permanent.
🔹 2. Concurrency Control
✅ Definition:
Concurrency control is the process of managing simultaneous
operations on the database to ensure data consistency and isolation.
✅ Need for Concurrency Control:
When multiple transactions execute at the same time, problems may arise
such as:
Lost Update
Dirty Read
Non-repeatable Read
Phantom Read
✅ Techniques for Concurrency Control:
1. Lock-Based Protocols:
o Use shared and exclusive locks to control access to data.
o Example: Two-Phase Locking (2PL)
2. Timestamp Ordering:
o Assigns timestamps to each transaction and orders them
based on that.
3. Optimistic Concurrency Control:
o Transactions execute without restrictions, but are validated
before commit.
(b)Explain the ACID property of Transaction with
example.
ACID stands for Atomicity, Consistency, Isolation, and Durability
— the four key properties that ensure reliable transaction
processing in a database.
🔹 1. Atomicity
Ensures that a transaction is all-or-nothing.
If any part of the transaction fails, the entire transaction is
rolled back.
Example:
In a banking system, transferring ₹1000 from A to B:
Debit A’s account
Credit B’s account
✅ If debit succeeds but credit fails, both operations are
undone.
🔹 2. Consistency
Ensures the database moves from one valid state to another.
All integrity constraints must be maintained.
Example:
A rule says total bank balance = ₹1,00,000.
After any transaction, this rule must still hold.
🔹 3. Isolation
Ensures that concurrent transactions do not interfere with
each other.
Final result must be the same as if transactions were
executed serially.
Example:
If two users transfer money at the same time, their operations
are logically isolated, preventing conflicts like lost updates.
🔹 4. Durability
Once a transaction is committed, its changes are
permanent, even in case of system failure.
Example:
After money is transferred and transaction is committed; a power
failure occurs — the changes remain saved in the database.
Q2 Write brief notes on the following a) Multiple
Granularity
b) Time based Protocol
✅ (a) Multiple Granularity – (6 Marks)
🔹 Definition:
Multiple Granularity is a locking scheme in databases that allows
transactions to lock data items at various levels of granularity —
such as a database, table, page, record, or field — depending on
the need.
🔹 Granularity Levels (From coarse to fine):
Database → Table → Page → Tuple (Row) → Attribute (Column)
🔹 Lock Types Used:
1. IS (Intent Shared)
2. IX (Intent Exclusive)
3. S (Shared)
4. X (Exclusive)
5. SIX (Shared and Intent Exclusive)
6. Null (no lock)
🔹 Why It Is Needed:
To improve efficiency and concurrency.
Reduces overhead by locking larger units when detailed
locks are not needed.
Allows finer locks when necessary for precision.
🔹 Example:
If a transaction wants to read a few rows in a table:
It may acquire IS lock on the table,
And S lock on specific rows.
Another transaction wanting to update the table:
Acquires IX lock on the table,
And X lock on the rows being updated.
This ensures controlled concurrent access.
✅ (b) Time-Based Protocol
🔹 Definition:
A Time-Based Protocol (also called Timestamp Ordering Protocol)
is a concurrency control method where each transaction is
assigned a unique timestamp and transactions are ordered by
these timestamps to avoid conflicts.
🔹 Key Concepts:
Each transaction T is given a timestamp TS(T) when it starts.
Each data item Q maintains:
o read_TS(Q) = highest timestamp of any transaction
that read Q.
o write_TS(Q) = highest timestamp of any transaction
that wrote to Q.
🔹 Rules:
If a younger transaction (with higher timestamp) tries to
read/write an item updated by an older transaction, it is
allowed.
If an older transaction tries to overwrite a value written by a
younger transaction, it is rolled back (to avoid
inconsistency).
🔹 Advantages:
Deadlock-free: Since no locks are used.
Maintains strict serializability.
🔹 Example:
Assume:
T1 has timestamp 10
T2 has timestamp 20
If T2 wants to write to a data item already written by T1, it's fine.
order ⇒ T1 is aborted.
But if T1 (older) tries to write after T2, it violates timestamp
Q3. Define Serializability. Explain in detail about the
types and Illustrate with the help of suitable example.
🔹 Definition:
Serializability is the highest level of isolation in concurrency control.
It ensures that the outcome of executing concurrent transactions is
equivalent to some serial execution (i.e., one after another) —
preserving consistency and avoiding conflicts.
✅ Why Serializability?
In real-world databases, transactions run concurrently to improve
performance.
Without control, concurrent execution may cause inconsistencies
like lost updates, dirty reads, etc.
Serializability guarantees correctness by ensuring concurrency
doesn’t violate transaction logic.
✅ Types of Serializability
🔸 1. Conflict Serializability
✅ Definition:
A schedule is conflict serializable if it can be transformed into a
serial schedule by swapping non-conflicting operations.
✅ Conflicting Operations:
Two operations conflict if:
They belong to different transactions, and
They access the same data item, and
At least one of them is a write operation.
✅ Example:
Transactions:
T1: R(A), W(A)
T2: R(A), W(A)
Schedule S1:
T1: R(A) → W(A)
T2: R(A) → W(A)
This is not conflict serializable — T1 and T2 write on same data in
interleaved fashion.
🔸 2. View Serializability
✅ Definition:
A schedule is view serializable if it is view-equivalent to a serial
schedule, meaning:
Initial reads, final writes, and reads from same transactions match.
It is more general than conflict serializability but harder to test.
✅ View equivalence conditions:
Same initial reads
Same read-from relationships
Same final writes
🔸 3. Commutative (or Semantic) Serializability
Some operations may be commutative (e.g., addition), allowing
more flexibility.
Not based on read/write conflicts alone.
Rarely used in practice due to complexity.
✅ Example – Conflict Serializability
Consider:
T1:
R1(A), W1(A)
T2:
R2(A), W2(A)
Schedule S:
R1(A), R2(A), W1(A), W2(A)
Now let's check conflicts:
R1(A) & W2(A) → conflict
R2(A) & W1(A) → conflict
W1(A) & W2(A) → conflict
Cannot swap all conflicting operations to get a serial form → ❌ Not
conflict serializable.
But a schedule like:
R1(A), W1(A), R2(A), W2(A)
✅ Can be transformed to serial T1 → T2 ⇒ Conflict Serializable
✅ How to Check Serializability?
🔸 Precedence (Serialization) Graph:
1. Nodes = transactions
2. Edges = from Ti to Tj if Ti's operation conflicts with Tj’s later
operation
3. If the graph has no cycles, the schedule is conflict serializable
Q4 Draw Transaction State diagram and describe each
state that a transaction goes through during its
execution.
🔹 Definition:
A transaction in a database system goes through various states from
start to completion.
Understanding these states helps in managing commit, rollback,
crashes, and concurrency effectively.
✅ Transaction States and Diagram
🎯 Diagram:
✅ Explanation of Each State
🔸 1. Active
The transaction starts and begins executing its operations like read,
write, etc.
It remains active until:
o It finishes execution normally
o Or an error/failure occurs
Example: Deducting ₹1000 from account A (still running)
🔸 2. Partially Committed
The transaction has executed its final operation and is ready to
commit, but changes may not yet be permanent.
Still vulnerable to failure at this point.
Example: All operations done, preparing to save permanently.
🔸 3. Committed
All operations are successfully completed.
All changes are permanently recorded in the database.
Transaction cannot be undone.
Example: ₹1000 deducted and ₹1000 added to another account —
permanently saved.
🔸 4. Failed
If any error or crash occurs during execution or just before commit,
the transaction enters failed state.
All changes made must be undone.
Example: System crash during money transfer.
🔸 5. Aborted
All changes made by the transaction are rolled back.
The database is restored to the state before the transaction
began.
Sometimes, the transaction may be restarted.
✅ Summary Table
State Description
Active Transaction is executing operations
Partially
All operations done; waiting to commit
Committed
Transaction successfully completed and
Committed
saved
Failed Error occurred; transaction cannot
State Description
continue
Changes undone; transaction rolled
Aborted
back
Q5. What is deadlock? When does it occur? How is it
detected in database system
🔹 Definition:
A deadlock is a situation in a database system where two or more
transactions are waiting for each other to release locks, but none
of them ever proceeds.
As a result, all involved transactions remain stuck indefinitely.
🔹 Example:
T1 locks data item A and needs B
T2 locks data item B and needs A
Now both transactions are waiting on each other, creating a deadlock.
✅ When Does Deadlock Occur?
Deadlocks occur in a database system when the following four
conditions hold simultaneously (Coffman Conditions):
Condition Description
1. Mutual At least one resource (data item) is held in a non-
Exclusion shareable mode.
2. Hold and A transaction holds at least one lock and is waiting for
wait more locks.
3. No Pre- Resources cannot be forcibly taken from a transaction;
emption it must release them voluntarily.
4. Circular There exists a cycle in the wait-for graph (T1 waits for T2,
Wait T2 waits for T1, etc.)
✅ When all four conditions are true, deadlock can occur.
✅ Deadlock Detection in Database Systems
🔹 1. Wait-For Graph (WFG) Method
A directed graph is used where:
o Nodes = Transactions (T1, T2, T3, …)
o Edges = T1 → T2 if T1 is waiting for a lock held by T2
✅ How It Works:
The system periodically builds the wait-for graph.
If there is a cycle in the graph → Deadlock exists
🔸 Example:
Let:
T1 waits for T2
T2 waits for T3
T3 waits for T1
Graph:
T1 → T2 → T3 → T1
✅ Cycle detected ⇒ Deadlock Present
🔹 2. Detection Algorithm Steps
1. Maintain a lock table with info on held and requested locks.
2. Construct the wait-for graph from this table.
3. Use cycle detection algorithms (like DFS) to find cycles.
4. If a cycle exists, deadlock is detected.
🔹 3. Deadlock Resolution
Once deadlock is detected:
Choose a victim transaction to abort (based on priority, cost, etc.)
Rollback that transaction
Release its locks
Allow other transactions to proceed
✅ Deadlock vs Starvation
Deadlock Starvation
All transactions wait Some transactions wait indefinitely while
forever others proceed
Caused by circular wait Caused by unfair scheduling
Needs
Needs priority-based scheduling
detection/resolution
Q6 Explain Lock based protocol for Concurrency Control.
🔹 Definition:
Lock-based protocols are a type of concurrency control mechanism
in DBMS that use locks to manage simultaneous access to data items
by multiple transactions, ensuring consistency and isolation.
🔹 Types of Locks:
1. Shared Lock (S-lock):
o Used for read operations.
o Multiple transactions can hold S-lock on the same item.
2. Exclusive Lock (X-lock):
o Used for write operations.
o Only one transaction can hold X-lock on a data item.
o No other transaction can read or write the item during this
time.
🔹 Two-Phase Locking Protocol (2PL):
A widely used lock-based protocol.
✅ Phases:
1. Growing Phase:
o Transaction may acquire locks.
o No locks can be released.
2. Shrinking Phase:
o Transaction may release locks.
o No new locks can be acquired.
⚠️Once a transaction starts releasing locks, it cannot obtain any new
locks.
🔹 Example:
T1:
Lock-S(A)
Read(A)
Lock-X(A)
Write(A)
Unlock(A)
T2:
Waits until T1 releases the lock on A to proceed
✅ Ensures serializability
🔹 Strict Two-Phase Locking (Strict 2PL):
A variation where all exclusive locks are held until
commit/abort.
Prevents cascading rollbacks.
✅ Advantages:
Ensures conflict-serializable schedules.
Strict 2PL ensures recoverable schedules.
✅ Disadvantages:
May cause deadlocks.
Overhead in managing locks.
B) Discuss in details about 2PL. List and Explain types of
2PL
🔹 Definition:
Two-Phase Locking (2PL) is a concurrency control protocol used in
DBMS to ensure serializability of transactions.
It divides the execution of a transaction into two distinct phases:
✅ Phases of 2PL:
1. Growing Phase
o The transaction acquires all the required locks (shared or
exclusive).
o No lock is released in this phase.
2. Shrinking Phase
o After acquiring all locks, the transaction starts releasing
them.
o No new locks can be acquired.
🔸 Why Use 2PL?
It guarantees conflict serializability, which ensures correct
execution of concurrent transactions.
✅ Types of Two-Phase Locking (2PL):
🔹 1. Basic 2PL
Follows the basic rule:
o All locks acquired in growing phase
o All locks released in shrinking phase
Does not prevent cascading rollbacks
🔹 2. Conservative 2PL (Static 2PL)
All required locks are requested at the beginning of the
transaction.
If all locks cannot be granted, the transaction waits (does not
proceed).
Avoids deadlocks, but may cause delays or reduced
concurrency.
🔹 3. Strict 2PL
Exclusive (write) locks are held until the transaction commits
or aborts.
Prevents cascading rollbacks.
Ensures recoverability of transactions.
🔹 4. Rigorous 2PL
All locks (both shared and exclusive) are held until
commit/abort.
Strongest form; ensures strictest isolation.
✅ Example:
Transaction T1:
Lock-S(A)
Read A
Lock-X(B)
Write B
Unlock A, Unlock B
This follows growing phase (acquire locks), then shrinking phase
(release locks).
Q7 Discuss in detail about Conflict Serializability with
the help of suitable example.
🔹 Definition:
A schedule is said to be Conflict Serializable if it can be transformed
into a serial schedule (i.e., one transaction after another) by swapping
non-conflicting operations.
Conflict serializability ensures that concurrent execution of transactions is
correct and safe, giving the same result as some serial execution.
✅ Conflicting Operations:
Two operations conflict if:
1. They belong to different transactions,
2. They access the same data item, and
3. At least one of them is a write operation.
✅ Types of Conflicts:
Operatio Operatio Conflict
Example
n1 n2 ?
Read A, Read
Read Read No
A
Read A, Write
Read Write Yes
A
Write A, Read
Write Read Yes
A
Write A,
Write Write Yes
Write A
✅ Example: Conflict Serializable Schedule
Let’s take two transactions:
T1:
R1(A)
W1(A)
T2:
R2(A)
W2(A)
🔸 Schedule S:
R1(A), R2(A), W1(A), W2(A)
🔸 Step 1: Build Precedence Graph (Serialization Graph)
Nodes: T1, T2
Edges:
o R1(A) before W2(A) ⇒ T1 → T2
o R2(A) before W1(A) ⇒ T2 → T1
So, Graph:
T1 → T2
T2 → T1
⚠️Cycle exists ⇒ ❌ Not conflict serializable
✅ Now consider another schedule:
R1(A), W1(A), R2(A), W2(A)
Conflicts:
R1(A) before R2(A) – No conflict
W1(A) before R2(A) ⇒ T1 → T2
W1(A) before W2(A) ⇒ T1 → T2
No cycle in the graph (T1 → T2 only) ✅
✔️This schedule is conflict serializable, equivalent to serial order T1 →
T2
B) Write a short note on Validation Based Protocol
🔹 Definition:
The Validation-Based Protocol, also known as Optimistic
Concurrency Control (OCC), is a concurrency control technique in which
transactions execute without any locking, assuming that conflicts are
rare.
Instead of using locks, it uses a validation phase before commit to
ensure correctness.
✅ Phases of Validation-Based Protocol:
Each transaction goes through three phases:
🔸 1. Read Phase
The transaction reads data items from the database and stores
updates in local variables (workspace).
No changes are made to the actual database yet.
🔸 2. Validation Phase
Before committing, the system validates the transaction to ensure
no conflicts with other concurrent transactions.
It checks whether the transaction could have executed serially with
respect to other committed transactions.
🔸 3. Write Phase
If validation is successful, all changes are written to the
database.
If validation fails, the transaction is aborted and restarted.
✅ Validation Check Criteria (Basic Validation Rule):
Let Ti be the current transaction and Tj be a committed transaction:
If Tj completes before Ti's read phase ends, no conflict
If Tj's write set overlaps with Ti’s read set, and their times
overlap, then conflict → Ti is aborted
✅ Advantages:
No locking → better concurrency
Deadlock-free
Useful for systems where conflicts are rare, like read-mostly
databases
✅ Disadvantages:
May cause more aborts if many transactions conflict
Overhead in validation phase
✅ Example:
T1 and T2 both read item A
T1 finishes first and writes A
⇒ Conflict → T2 is aborted
Before T2 commits, it validates — sees T1 wrote A, which T2 read
UNIT-5
Q1. What is Data storage? Explain Magnetic Disk and its
types.
🔹 Definition:
Data storage refers to the method of saving and retrieving digital
data on physical or electronic media. It is an essential component of
computer systems, including databases, to persistently store data for
processing, backup, and retrieval.
Data can be stored on various devices like:
Magnetic Disks (e.g., Hard Disks)
Solid-State Drives (SSD)
Optical Disks (CD/DVD)
Flash memory (USB drives)
✅ Magnetic Disk – Explanation
🔹 Definition:
A magnetic disk is a non-volatile secondary storage device that
stores data by magnetizing spots on a rotating platter.
It is the most commonly used storage medium for databases and file
systems.
✅ Structure of a Magnetic Disk:
1. Platter: A circular disk coated with magnetic material.
2. Spindle: Rotates the platters.
3. Read/Write Head: Positioned on an arm that moves across the
disk surface.
4. Tracks: Concentric circles on the platter where data is stored.
5. Sectors: Each track is divided into sectors (smallest storage unit).
6. Cylinders: Set of tracks vertically aligned across platters.
✅ How Data Is Stored:
Data is written to the disk using a magnetic head that changes the
magnetic polarity of the disk surface.
The disk spins at a constant speed (measured in RPM), and the head
reads or writes data as the sectors pass under it.
✅ Types of Magnetic Disks:
🔸 1. Hard Disk Drive (HDD)
Most common type of magnetic disk.
Consists of multiple platters rotating at high speed (5400–7200 RPM
or more).
Large storage capacity (up to several TB).
Used in desktops, servers, and database systems.
Advantages:
High capacity
Cost-effective per GB
Disadvantages:
Slower than SSDs
Mechanical parts → prone to wear
🔸 2. Floppy Disk
Older, portable magnetic disk with small storage (1.44 MB).
Now obsolete.
Used in early personal computers.
Advantages: Portable and easy to use (in older systems)
Disadvantages: Very low capacity and slow
🔸 3. Zip Disk
Developed by Iomega; used magnetic disk technology.
Higher capacity than floppy disks (100MB to 750MB).
Rarely used today.
🔸 4. Magnetic Tape (for completeness)
Though not a disk, it's a magnetic storage medium.
Used for sequential access and data backup.
Slow but cost-effective for archival storage.
✅ Advantages of Magnetic Disks:
Random access capability
Large capacity
Non-volatile
Inexpensive for long-term storage
✅ Disadvantages:
Mechanical parts → can fail
Slower than flash-based storage (SSDs)
Susceptible to magnetic interference
Q2. Explain different types of file organizations with the
operations performed on files.
🔹 What is File Organization?
File organization refers to the way records are stored and arranged in a
file on secondary storage (like disk).
The method affects the efficiency of data access, insertion, deletion, and
updating.
✅ Types of File Organization
🔸 1. Sequential File Organization
Records are stored one after the other in sorted order (usually
by key).
Access is done sequentially from the beginning.
Operations:
Insertion: Costly (needs reshuffling to maintain order)
Search: Slow unless reading sequentially
Deletion/Update: Involves shifting records or marking them
Example: Payroll processing, where records are accessed in order.
🔸 2. Heap (Unordered) File Organization
Records are stored in the order they arrive.
No sorting or specific order is maintained.
Operations:
Insertion: Fast (just add at the end)
Search: Slow (need to scan entire file)
Deletion: Mark record as deleted
Update: Overwrite or mark and insert new record
Use Case: Temporary data, logs, etc.
🔸 3. Hashed File Organization
Uses a hash function on the key to compute the location
(bucket) where the record should be stored.
Operations:
Insertion: Apply hash → store in the bucket
Search: Fast, O(1) average
Deletion/Update: Easy using hash key
Handles collisions using techniques like chaining
Use Case: Fast lookup, like student ID or product code.
🔸 4. Indexed File Organization
Uses an index (similar to a book index) that holds keys and pointers
to records.
Types:
Single-level index
Multi-level index
Dense or sparse index
Operations:
Search: Fast via index
Insertion: Add record and update index
Deletion/Update: Modify both data and index
Use Case: Libraries, search engines, databases
🔸 5. Clustered File Organization
Records with the same or related keys are stored physically
close on disk.
Reduces access time for grouped data.
Operations:
Search and access for related records is optimized
Use Case: Situations needing multi-key retrieval or joins.
✅ File Operations in DBMS
Operati
Description
on
Creates a new file and allocates
Create
space
Read Fetches data from the file
Write Adds or updates records
Modifies the content of existing
Update
records
Operati
Description
on
Removes a record or marks it
Delete
deleted
Locates specific records by key or
Search
value
Q3. Explain physical storage media with example.
🔹 Definition:
Physical storage media refers to the actual hardware devices where
data is stored permanently or temporarily in a computer system or
database. It includes all electronic, magnetic, and optical devices
that hold data physically.
✅ Categories of Physical Storage Media
Storage media can be classified based on speed, volatility, and
durability into the following types:
🔸 1. Primary Storage (Main Memory)
Also known as volatile memory
Temporarily holds data that is currently being used by the system
Data is lost when power is off
✅ Examples:
RAM (Random Access Memory): Fast access, used by active
programs
Cache memory: Small, very fast memory close to the CPU
✅ Characteristics:
Very fast but expensive
Limited capacity
Volatile (data not retained on power-off)
🔸 2. Secondary Storage
Non-volatile storage used to store data permanently
Used for large-scale data like files, databases, OS
✅ Examples:
Hard Disk Drives (HDD)
Solid-State Drives (SSD)
✅ Characteristics:
Larger capacity than RAM
Slower than primary memory
Cheaper and persistent
🔸 3. Tertiary Storage
Used for backup and archival purposes
Typically removable and very large capacity, but slow access
✅ Examples:
Magnetic tapes
Optical disks (CDs, DVDs)
Cloud/External backup systems
✅ Characteristics:
Inexpensive per GB
High latency (slow access)
Mostly used for long-term storage
🔸 4. Off-line Storage
Storage media that is not always connected to the computer
Must be manually inserted or connected
✅ Examples:
USB flash drives
External hard drives
Removable magnetic tapes
✅ Characteristics:
Portable
Good for data transfer between systems
Can be lost or damaged physically
✅ Comparison Table:
Volatil
Storage Type Speed Cost Capacity Usage
e
Very Temporary program
Primary (RAM) Yes High Low
Fast data
Secondary Moderat Moderat
No High Databases, files
(HDD/SSD) e e
Tertiary
No Slow Low Very High Backup/archive
(Tape/DVD)
Off-line (USB, Ext Medium–
No Varies Varies Portability, backup
HDD) High
✅ Example Scenario in DBMS:
A student record being processed is loaded into RAM
The database file containing all students is stored on the hard
disk
Periodic backups of student records are saved on magnetic tape
External USB is used to export a report file for transfer
Q4 Explain in detail about Extensible Hashing scheme
🔹 Definition:
Extensible Hashing is a dynamic hashing technique used in
databases to handle the problem of bucket overflow efficiently. It
automatically grows or shrinks the hash structure depending on the
number of records, unlike static hashing which uses fixed-size hash tables.
It is widely used in dynamic file systems and indexing to ensure fast
and scalable access.
✅ Key Concepts:
🔸 1. Hash Function (h(k)):
Maps a key k to a binary value.
Only a prefix of the hash value is used (number of bits is controlled
by a global depth d).
🔸 2. Directory:
An array of pointers (like a table) that points to buckets.
Directory size = 2^d where d = global depth.
🔸 3. Bucket:
Stores actual data entries (records).
Each bucket has a local depth (ld) — number of hash bits used to
identify it.
✅ How Insertion Works:
1. Apply the hash function to get a binary key.
2. Use the first d bits to index into the directory.
3. Place the record in the pointed bucket.
4. If the bucket is full:
o If local depth < global depth: Split the bucket
o If local depth == global depth: Double the directory size
and increase global depth
🔸 Example:
Let’s say:
d = 2 ⇒ Directory size = 4 entries: 00, 01, 10, 11
Hash of key 25 = 11001... ⇒ Use first 2 bits → 11
Insert into bucket at index 11
If bucket overflows:
Split bucket and redistribute based on ld + 1 bits
Update pointers in directory
✅ Advantages of Extensible Hashing:
Grows/shrinks dynamically as needed
Reduces overflow chains seen in static hashing
Maintains constant-time access in average cases
Efficient use of space and time
✅ Disadvantages:
More complex implementation
Directory doubling may consume more space temporarily
(b) Compare Static Hashing and Dynamic Hashing
Feature Static Hashing Dynamic Hashing
1. Hash Table Fixed size; determined at Grows and shrinks
Size the time of creation dynamically based on data
Less flexible — overflows More flexible — adjusts to
2. Flexibility
lead to overflow chains number of records
Uses directory and bucket
3. Overflow Uses overflow buckets or
splitting (e.g., extensible
Handling chaining
hashing)
Maintains consistent
May degrade as overflow
4. Performance performance even as data
increases
grows
May waste space if too
5. Storage Efficient storage use due to
large; causes collisions if
Utilization dynamic resizing
too small
No directory — only fixed Uses directory to manage
6. Directory
buckets growing hash structure
7. Ease of More complex due to
Simple to implement
Implementation directory management
Q5. Explain in detail about Internal and External
Hashing Techniques
Aspect Internal Hashing External Hashing
Hashing used when data is Hashing used when data is
Definition stored in main memory stored on disk (secondary
(RAM) storage)
Mainly for in-memory
Used in database systems,
Usage data structures like hash
files, and large datasets
tables
Storage Works entirely in primary Involves disk blocks/buckets
Medium memory on external storage
Typically small and fixed- Buckets may span entire disk
Bucket Size
size arrays blocks and vary dynamically
Uses overflow blocks,
Overflow Uses open addressing or
chaining, or directory
Handling chaining
doubling
Performanc Very fast due to memory
Slower due to disk I/O latency
e speed
Static Hashing, Extensible
Hash table in a program
Examples Hashing, Linear Hashing in
storing keys
DBMS
Less flexible — requires More flexible — supports
Flexibility
predefined size dynamic file growth
B) What is Hash based Indexing. Explain with an
example.
🔹 Definition:
Hash-based indexing is a database indexing technique that uses a
hash function to map search key values to a location or bucket in the
index.
This method provides efficient access to data for equality-based
searches (e.g., WHERE ID = 101).
✅ How It Works:
1. A hash function h(key) is applied to the search key.
2. The result of the hash function determines the bucket where the
record is stored.
3. The bucket contains either the actual data or a pointer to the
data.
✅ Structure:
Hash Table: An array of buckets.
Buckets: Contain the records or pointers to the records.
Handles collisions using overflow buckets or chaining.
✅ Example:
Let’s say we are storing student records with Student_ID as the key.
Student_ Nam Branc
ID e h
101 Alice CSE
104 Bob ECE
108 Clara ME
Assume a simple hash function:
h(Student_ID) = Student_ID % 5
Then:
h(101) = 101 % 5 = 1 → Bucket 1 → stores record for ID 101
h(104) = 104 % 5 = 4 → Bucket 4 → stores record for ID 104
h(108) = 108 % 5 = 3 → Bucket 3 → stores record for ID 108
To find Student_ID = 104, we:
1. Apply the hash → h(104) = 4
2. Go to Bucket 4
3. Fetch the record quickly → O(1) access time
✅ Advantages:
Very fast for exact match queries (e.g., WHERE ID = 104)
Simple implementation
Constant-time (average-case) access
✅ Disadvantages:
Not suitable for range queries (e.g., ID > 100)
Collisions may reduce efficiency
Difficult to maintain with large or growing datasets unless using
dynamic hashing
Q6. Explain different types of Indexing.
✅ (a) Different Types of Indexing in DBMS
Indexing is a technique used in databases to speed up data retrieval by
creating pointers to data records based on key fields. Different types of
indexing are used depending on data access patterns and requirements.
✅ 1. Primary Index (Clustering Index)
Created on a primary key or unique field.
Records in the data file are physically ordered based on the index
key.
One index entry per block.
Example: Index on Student_ID in a sorted student table.
✅ 2. Secondary Index (Non-clustering Index)
Created on non-primary key fields.
Records are not sorted based on this field.
May have multiple entries pointing to same data block.
Example: Index on Department in the student table.
✅ 3. Dense Index
Contains one index entry for every search key in the data file.
More space needed, but faster lookup.
Example: Every student name is indexed individually.
✅ 4. Sparse Index
Contains entries for only some records (usually one per block).
Requires less space but extra lookup steps.
Example: Index contains only the first key of each data block.
✅ 5. Multilevel Index
When the primary or secondary index becomes large, another
index is built on top of it.
Forms a hierarchy of indexes, improving search performance.
Example: B-tree index with multiple levels.
✅ 6. Clustered and Non-Clustered Index
Clustered Index: Records are physically ordered to match the
index.
Non-Clustered Index: Index has pointers to records stored in
random order.
(b)Difference between B and B+ Indexing.
Feature B-Tree Indexing B+ Tree Indexing
Keys and data are stored in Only leaf nodes store the
Data Storage
internal and leaf nodes actual data/records
Leaf Node Leaf nodes are linked using
Leaf nodes are not linked
Structure pointers (linked list)
Suitable for both exact
Optimized for range and
Access Type match and range
sorted queries
queries
Traversal May need to traverse Always traverses leaf nodes
Time internal and leaf nodes only
Search Slightly slower due to Faster due to uniform search
Efficiency search at multiple levels path to data
Internal nodes store keys
Redundancy No duplication of keys
only, not actual data
Modern DBMS systems like
Used In Older indexing systems
MySQL, Oracle (for indexing)