0% found this document useful (0 votes)
28 views73 pages

Dbms See Laqs

The document discusses relational algebra operations, tuple relational calculus, and set operations in relational databases. It explains various relational algebra operations like selection, projection, union, and joins, along with examples. Additionally, it covers the expressive power of relational algebra and calculus, domain relational calculus, and pitfalls in relational database design.

Uploaded by

Nawaz Khadar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views73 pages

Dbms See Laqs

The document discusses relational algebra operations, tuple relational calculus, and set operations in relational databases. It explains various relational algebra operations like selection, projection, union, and joins, along with examples. Additionally, it covers the expressive power of relational algebra and calculus, domain relational calculus, and pitfalls in relational database design.

Uploaded by

Nawaz Khadar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Q1 Discuss relational algebra operations with one

example each.(12 marks)

Relational Algebra Operations

Relational Algebra is a procedural query language used to query data from


relational databases. It uses operators to perform queries, where each
operator takes one or more relations as input and produces a new relation
as output.

Relational Algebra operations are classified into:

1. Selection (σ)

Definition: Selects rows (tuples) from a relation that satisfy a given


condition.
Notation: σcondition(Relation)

Example:
Let Employee(EID, Name, Dept, Salary) be a relation.
Query: Find employees who work in the 'HR' department.
Relational Algebra:
σDept='HR'(Employee)

2. Projection (π)

Definition: Extracts specific columns (attributes) from a relation.


Notation: πattribute_list(Relation)

Example:
Query: Get the names of all employees.
Relational Algebra:
πName(Employee)

3. Union ( ∪ )

Definition: Combines tuples from two relations, removing duplicates.

Notation: R ∪ S
Both relations must be union-compatible (same attributes and domains).

Example:
Let A and B be relations with the same schema.
Query: List all employees from both branches.

A∪B
Relational Algebra:

4. Set Difference ( − )

Definition: Returns tuples that are in the first relation but not in the
second.
Notation: R − S

Example:
Query: Employees in relation A but not in B.
Relational Algebra:
A−B

5. Cartesian Product ( × )

Definition: Combines each tuple of the first relation with every tuple of
the second.
Notation: R × S

Example:
Let Employee and Department be two relations.
Query: Get all combinations of employees and departments.
Relational Algebra:
Employee × Department

6. Rename (ρ)

Definition: Renames the output relation or its attributes.


Notation: ρnew_name(attribute1, attribute2,...)(Relation)

Example:
Rename Employee relation as Staff.
Relational Algebra:
ρStaff(Employee)

7. Intersection ( ∩ )

Definition: Returns tuples that are common in both relations.


Notation: R ∩ S
Example:
Query: Employees present in both A and B.
Relational Algebra:
A∩B

8. Join (⨝)

Definition: Combines related tuples from two relations based on a


common attribute.
Types: Natural Join, Theta Join, Equi Join

Example (Natural Join):


Let Employee(EID, Name, DeptID) and Department(DeptID, DeptName)
Query: Get employee names along with department names.

Employee ⨝ Department
Relational Algebra:

Q2. Explain Tuple Relational Calculus (TRC) with suitable


example

Tuple Relational Calculus (TRC)

Definition:
Tuple Relational Calculus is a non-procedural query language used to
describe the desired information without specifying the sequence of
operations to retrieve it. It uses tuple variables to represent rows of a
relation.

General Syntax:

{ t | P(t) }

 t is a tuple variable.

 P(t) is a predicate (condition) that describes t.

Example:
Let’s consider a relation:
Employee(EID, Name, Dept, Salary)

Query:
Find the names of all employees who work in the 'HR' department.

TRC Expression:

{ t.Name | Employee(t) ∧ t.Dept = 'HR' }

Explanation:

 t is a tuple from the relation Employee.

 The condition t.Dept = 'HR' selects only those tuples where the
department is HR.

 The result will contain the Name of those employees.

Features:

 It focuses on what to retrieve, not how.

 Based on first-order predicate logic.

 More declarative and expressive than relational algebra.

B) Describe expressive power of algebra and calculus

Expressive Power of Algebra and Calculus

Definition:
The expressive power of a query language refers to the range or types
of queries that can be expressed using that language.

1. Relational Algebra

 A procedural query language.

 Uses a set of operations (select, project, join, union, etc.) to


manipulate relations.

 Specifies how to obtain the result step by step.


2. Relational Calculus

 A non-procedural (declarative) query language.

 Specifies what result is desired without specifying the procedure.

 Includes:

o Tuple Relational Calculus (TRC)

o Domain Relational Calculus (DRC)

Comparison of Expressive Power

Relational Relational
Feature
Algebra Calculus

Query Type Procedural Non-procedural

Syntax Operator-based Predicate logic

Approach How to retrieve What to retrieve

Expressivene
Same Same
ss

Important Point:

Relational Algebra and Relational Calculus are equivalent in


expressive power.
Any query that can be expressed in one can also be expressed in the
other.

Q3. What are set operators? Explain various set


operations that can be performed using a query

Set Operators in DBMS

Definition:
Set operators are used in relational algebra and SQL to perform
operations between two or more relations (tables) that have the same
number and type of attributes (i.e., they must be union-compatible).
Common Set Operations

1. UNION ( ∪ )

 Combines all tuples from two relations and removes duplicates.

 Both relations must be union-compatible.

R∪S
Relational Algebra:

SQL:

SELECT * FROM R

UNION

SELECT * FROM S;

Example: Get names of all employees from both Branch1 and Branch2.

2. INTERSECTION ( ∩ )

 Returns only the tuples that are common to both relations.

Relational Algebra:
R∩S

SQL:

SELECT * FROM R

INTERSECT

SELECT * FROM S;

Example: Get employees who work in both ProjectA and ProjectB.

3. SET DIFFERENCE ( − )

 Returns tuples that are in the first relation but not in the second.

Relational Algebra:
R−S

SQL (in most databases, written as EXCEPT):

SELECT * FROM R

EXCEPT
SELECT * FROM S;

Example: Get employees in DeptA but not in DeptB.

4. CARTESIAN PRODUCT ( × )

 Returns the cross-product of two relations.

 Each tuple in the first relation is paired with every tuple in the
second.

Relational Algebra:
R×S

Example: Pair each employee with every department (used in joins).

Conditions for Set Operations

 Union-compatibility:

o Same number of columns

o Same data types and order

B) Explain in detail the select, project operations in


relational algebra with an example?

1. SELECT Operation (σ)

Definition:

The SELECT operation is used to filter rows (tuples) from a relation that
satisfy a given predicate (condition).

Notation:

σ<condition>(Relation)

Purpose:

 To extract specific rows based on a condition.

 It is horizontal filtering (filters rows).

Example:
Relation: Employee

EI Dep Salar
Name
D t y

5000
1 Alice HR
0

6000
2 Bob IT
0

Charli 5500
3 HR
e 0

Query: Find all employees working in the HR department.

Relational Algebra:

σ<Dept = 'HR'>(Employee)

Result:

EI Dep Salar
Name
D t y

5000
1 Alice HR
0

Charli 5500
3 HR
e 0

2. PROJECT Operation (π)

Definition:

The PROJECT operation is used to retrieve specific columns


(attributes) from a relation.

Notation:

π<attribute_list>(Relation)

Purpose:

 To extract specific columns only.

 It is vertical filtering (filters columns).

 Removes duplicate values automatically.

Example:
Query: Get the names of all employees.

Relational Algebra:

π<Name>(Employee)

Result:

Name

Alice

Bob

Charli
e

Key Differences Between SELECT and PROJECT:

Feature SELECT (σ) PROJECT (π)

Columns
Operates On Rows (tuples)
(attributes)

Based on Based on attribute


Filters
condition list

Removes
No Yes
Duplicates

Q4. Explain different Join operations in Relational


Algebra with suitable examples.

Definition:
A Join operation in relational algebra combines related tuples from two
relations based on a common attribute(s). It is a binary operation and
is used to extract meaningful data spread across multiple relations.

🔹 1. Theta Join (θ Join)

Notation:

R ⨝<θ> S
Definition:
Combines tuples from two relations R and S based on a general condition
θ (like =, ≠, <, >).

Example:
Relations:

 Employee(EID, Name, DeptID)

 Department(DeptID, DeptName)

Query: Get all employee and department pairs where Employee.DeptID =


Department.DeptID.

Relational Algebra:

Employee ⨝<Employee.DeptID = Department.DeptID> Department

🔹 2. Equi Join

Definition:
A Theta Join where the condition is based on equality (=) only.

Example:
Same as above:

Employee ⨝<Employee.DeptID = Department.DeptID> Department

Note: It results in duplicate attributes (both DeptID columns will be


present).

🔹 3. Natural Join (⋈)

Notation:

R⋈S

Definition:
A special type of Equi Join that automatically joins relations on all
attributes with the same name, and removes duplicate columns.

Example:

Employee ⋈ Department

Here, it joins on DeptID and removes one of the duplicate DeptID columns
in the result.
🔹 4. Outer Joins

Outer Joins preserve unmatched tuples from one or both relations by


filling with NULLs.

a) Left Outer Join (⟕)

Includes all tuples from the left relation (R), matched or not.

Example:

Employee ⟕ Department

b) Right Outer Join (⟖)

Includes all tuples from the right relation (S), matched or not.

Example:

Employee ⟖ Department

c) Full Outer Join (⟗)

Includes all tuples from both relations, matched or not.

Example:

Employee ⟗ Department

Note: Outer joins are not part of basic relational algebra but are
supported in extended relational algebra or SQL.

🔹 5. Semi Join

Notation:

R ▷ S (Left Semi Join)

S ◁ R (Right Semi Join)

Definition:
Returns only tuples from one relation (usually the left), where a
matching tuple exists in the other.

Example:
Get employees who work in a department.

Employee ▷ Department

🔹 6. Self Join
Definition:
A join of a relation with itself. Useful to compare rows within the same
relation.

Example:
Find pairs of employees who work in the same department.

E1(EID, Name, DeptID), E2(EID, Name, DeptID)

σ<E1.DeptID = E2.DeptID ∧ E1.EID ≠ E2.EID>(E1 × E2)

Q5. Write short notes on Domain Relational Calculus


(DRC) with suitable Example

✅ Definition:

Domain Relational Calculus (DRC) is a non-procedural query language


in relational database theory, where domain variables (variables that
take values from attribute domains) are used to specify queries.

Unlike relational algebra, which tells how to retrieve data, DRC specifies
what data is required.

✅ Syntax:

{ <x1, x2, ..., xn> | P(x1, x2, ..., xn) }

 x1, x2, ..., xn are domain variables.

logical operators (∧, ∨, ¬, ⇒).


 P(...) is a logical predicate (condition) involving variables and

 The result is a set of tuples that satisfy the predicate P.

✅ Example Relation:

Employee(EID, Name, Dept, Salary)

EI Nam Dep Salar


D e t y

5000
1 Alice HR
0

6000
2 Bob IT
0
EI Nam Dep Salar
D e t y

5500
3 Carol HR
0

✅ Example Query:

Q: Retrieve names of employees who work in the 'HR' department.

Domain Relational Calculus Expression:

{ <N> | ∃E ∃D ∃S (Employee(E, N, D, S) ∧ D = 'HR') }

Explanation:

 N is the domain variable for Name.

 The tuple <E, N, D, S> represents an entry in the Employee relation.

 The query finds all values of N such that there exists a tuple in
Employee where Dept = 'HR'.

✅ Components of DRC:

1. Domain Variables: Represent individual attribute values.

2. Logical Connectives: ∧ (AND), ∨ (OR), ¬ (NOT), ⇒ (IMPLIES)

3. Quantifiers:

o ∃ (Exists): There exists

o ∀ (For all): Universal quantifier

4. Predicate: Conditions using variables.

✅ Another Example:

Q: Get names and salaries of employees earning more than 55000.

DRC Expression:

{ <N, S> | ∃E ∃D (Employee(E, N, D, S) ∧ S > 55000) }

✅ Features of DRC:

 Declarative: Describes what result is desired.


 Based on first-order logic.

 Can express all queries expressible in relational algebra.

 Powerful but can be unsafe if queries return infinite results.

✅ Safe vs Unsafe Queries:

 Safe Query: Always returns a finite set of results.

 Unsafe Query: May return infinite results.

 In practice, only safe queries are allowed in systems.

✅ Comparison with Tuple Relational Calculus (TRC):

Feature TRC DRC

Tuple variables (entire Domain variables (individual


Uses
rows) values)

Output Tuples Tuples of values

Example Variable
t.Name <N>
Form

Style Closer to SQL More mathematical


UNIT-3

Q1. Explain Pitfalls of RDBD

✅ Pitfalls of Relational Database Design

While designing a relational database, certain design mistakes—called


pitfalls—can lead to problems like data inconsistency, redundancy, and
performance issues. These typically occur due to poor normalization or
lack of proper structure.

🔹 1. Data Redundancy

 Description: Same data is repeated in multiple places.

 Problem: Increases storage usage and makes updates harder.

 Example: Storing department name repeatedly with each employee


record.

🔹 2. Update Anomalies

 Description: Difficulty in updating data accurately due to


redundancy.

 Problem: Requires multiple updates; missing one causes


inconsistency.

 Example: Changing department name in one row but not in others.

🔹 3. Insertion Anomalies
 Description: Unable to insert data due to absence of other required
data.

 Problem: Forces unnecessary data entry.

 Example: Cannot insert a department unless an employee is also


added.

🔹 4. Deletion Anomalies

 Description: Deleting a record unintentionally removes useful


information.

 Problem: Loss of data that should have been retained.

 Example: Deleting last employee of a department also removes


department info.

🔹 5. Poor Normalization

 Description: Tables not designed following normalization rules


(1NF, 2NF, 3NF).

 Problem: Leads to all the above anomalies and performance issues.

🔹 6. Loss of Data Integrity

 Description: Violations in accuracy or consistency of data.

 Problem: May result from improper constraints or foreign key


misuse.

B) Write about four types of NoSQL Databases with


example.

✅ Types of NoSQL Databases with Examples

NoSQL databases are designed for scalable, flexible, and high-


performance data storage, especially useful for handling unstructured
or semi-structured data.

There are four main types of NoSQL databases:


🔹 1. Document-Oriented Databases

Description:

 Store data as documents (usually in JSON, BSON, or XML format).

 Each document can have a different structure.

 Ideal for storing complex, nested data.

Example:
🟢 MongoDB

"name": "Alice",

"age": 25,

"skills": ["Python", "SQL"]

Use Case: Content management systems, user profiles, catalogs.

🔹 2. Key-Value Stores

Description:

 Store data as key-value pairs.

 Simple and fast for lookup operations.

 Highly scalable.

Example:
🟢 Redis, Amazon DynamoDB

"user:1001" → {"name": "John", "email": "[email protected]"}

Use Case: Caching, session storage, user preferences.

🔹 3. Column-Family Stores

Description:

 Store data in columns instead of rows (opposite of RDBMS).

 Good for handling large volumes of data with high read/write


speeds.
Example:
🟢 Apache Cassandra, HBase

Row Key: user123

Name → Alice

Age → 25

City → Mumbai

Use Case: Real-time analytics, recommendation engines, time-series


data.

4. Graph Databases

Description:

 Store data as nodes (entities) and edges (relationships).

 Ideal for managing highly connected data.

Example:
🟢 Neo4j

(Alice) -[FRIEND]-> (Bob)

Use Case: Social networks, fraud detection, recommendation systems.

✅ Conclusion

Each type of NoSQL database is optimized for different use cases:

 Document → Flexible structure

 Key-Value → High performance

 Column → Big data analytics

 Graph → Relationship-heavy data

Choosing the right type depends on the application’s data structure and
performance needs.
Q2 Explain in details about the features of SQL and
NoSQL.

✅ Features of SQL vs NoSQL Databases

Feature SQL Databases NoSQL Databases

Relational (Tables with rows Non-relational (Document, Key-


Data Model
and columns) Value, Column, Graph)

Schema Fixed, predefined schema Dynamic or flexible schema

Query Structured Query Language No standard language (uses


Language (SQL) JSON-like or custom queries)

ACID Strong ACID properties for Follows BASE (Eventually


Compliance reliable transactions Consistent)

Vertical scaling (upgrading Horizontal scaling (adding more


Scalability
hardware) servers)

Suitable for big data,


Suitable for structured data
Use Cases unstructured, and real-time
and complex queries
apps

B) Differentiate SQL with NoSQL

✅ Difference Between SQL and NoSQL

SQL (Relational NoSQL (Non-Relational


Feature
Databases) Databases)

Relational (tables with Non-relational (Document, Key-


Data Model
rows and columns) Value, Column, Graph)

Fixed, predefined
Schema Dynamic, flexible schema
schema

Query SQL (Structured Query Varies (No standard, often JSON-


Language Language) like or API-based)

ACID Fully ACID compliant BASE model (Basically Available,


Compliance (Atomicity, Consistency,
SQL (Relational NoSQL (Non-Relational
Feature
Databases) Databases)

etc.) Soft state, Eventually consistent)

Vertical (scale-up by
Horizontal (scale-out by adding
Scalability upgrading server
more nodes/servers)
hardware)

Joins and Supports complex joins Limited or no join support; uses


Relations and relationships denormalized data

Structured data, Unstructured/semi-structured


Best For
transactional systems data, big data, real-time apps

MySQL, PostgreSQL, MongoDB, Cassandra, Redis,


Examples
Oracle, MS SQL Server Neo4j

✅ Conclusion:

 SQL is ideal for structured data and complex queries with strict
consistency.

 NoSQL is best for scalable, flexible, and high-performance


applications dealing with large or unstructured data.

Q3 Explain the following with examples. (a) Lossless Join


Decomposition. (b) Dependency Preserving
Decomposition.

(a) Lossless Join Decomposition (6 marks)

✅ Definition:

A Lossless Join Decomposition ensures that when a relation is


decomposed into two or more sub-relations, the natural join of
these sub-relations will result in exactly the original relation, with no
loss of information.

✅ Condition:

A decomposition of relation R into R1 and R2 is lossless if:


(R1 ∩ R2) → R1 OR (R1 ∩ R2) → R2

(i.e., the common attributes form a super key of at least one of the
decomposed relations)

✅ Example:

Let relation R(A, B, C) with functional dependency:

A→B

Decompose R into:

 R1(A, B)

 R2(A, C)

Common attribute = A, and since A → B, A is a super key of R1.


✅ Hence, decomposition is lossless.

✅ Verification by Natural Join:

Original R:

A BC

1xp

2yq

R1(A, B):

AB

1x

2y

R2(A, C):

AC

1p

2q

R1 ⨝ R2 = R ⇒ ✅ Lossless

✅ Conclusion:
Lossless join is essential to ensure that no information is lost when
relations are normalized and later rejoined.

(b) Dependency Preserving Decomposition (6 marks)

✅ Definition:

A decomposition is dependency preserving if all functional


dependencies (FDs) from the original relation can be enforced by
simply enforcing FDs on individual decomposed relations, without
needing to recompute joins.

✅ Why It Matters?

Dependency preservation is important to ensure:

 Data integrity

 Efficient FD enforcement without costly joins

✅ Example:

Let R(A, B, C)
With functional dependencies:

F = { A → B, B → C }

Decompose R into:

 R1(A, B)

 R2(B, C)

Now, check:

 In R1: A → B ✅

 In R2: B → C ✅

 Both FDs are preserved in the decomposition ⇒ ✅ Dependency


Preserving

✅ Counter Example:

Let F = { A → BC }
Decompose R into:
 R1(A, B)

 R2(A, C)

Now:

 R1 has A → B ✅

 R2 has A → C ✅
But A → BC is not directly available in either R1 or R2.
Hence, ❌ Not Dependency Preserving

Q4 Explain different types of Functional


Dependencies and its advantages with suitable
example.

 ✅ Functional Dependencies (FDs) in DBMS

 🔹 Definition:

 A Functional Dependency (FD) is a constraint between two sets


of attributes in a relation.
It is denoted as:

 X→Y

 Which means: If two tuples have the same value for attribute(s) X,
they must have the same value for Y.

 ✅ Types of Functional Dependencies

 1. Trivial Functional Dependency

 Definition: A functional dependency X → Y is trivial if Y is a


subset of X.

 Example:

 {A, B} → A (Trivial, since A ⊆ {A, B})


 2. Non-Trivial Functional Dependency

 Definition: A functional dependency X → Y is non-trivial if Y is


not a subset of X.

 Example:

 A → B (Non-trivial, as B is not part of A)

 3. Fully Functional Dependency

 Definition: An attribute Y is fully functionally dependent on X, if


it is dependent on all of X, not just a part of it.

Example:
In R(A, B, C),

 {A, B} → C (Fully dependent if C is not dependent on A or B alone)

 4. Partial Dependency

 Definition: When a non-prime attribute is dependent on a part of


a candidate key (not the whole key).

 Occurs in: 2NF Violation

 Example:
If A and B form a candidate key, and:

 A → C (C is partially dependent on A only, not on A+B)

 5. Transitive Dependency

 Definition: If A → B and B → C, then A → C is a transitive


dependency.

 Occurs in: 3NF Violation

 Example:

 A → B, B → C ⇒ A → C

 6. Multivalued Dependency (MVD)

 Definition: If for a single value of A, there are multiple independent


values of B and C, then:

 A →→ B

 Occurs in: 4NF

 Example:
In a student-course-hobby relation:

 Student →→ Course

 Student →→ Hobby

 7. Join Dependency

 Definition: A Join Dependency occurs when a relation can be


reconstructed by joining multiple projections.

 Related to 5NF (Project-Join Normal Form).

 ✅ Advantages of Functional Dependencies

 Helps in Normalization

 Identify and remove redundancy by organizing data into proper


normal forms (1NF to 5NF).

 Ensures Data Integrity

 Avoids anomalies like update, delete, and insert anomalies.

 Reduces Redundancy

 Functional dependencies help avoid repeating data unnecessarily.

 Optimizes Query Performance

 Well-structured tables make query execution faster and more


efficient.

 Helps in Schema Design


 Identifies keys and candidate keys used for table decomposition and
design.

 ✅ Example Summary:

 Assume a relation: Student(RollNo, Name, Dept, Advisor)

 RollNo → Name, Dept, Advisor (Each roll number uniquely


identifies a student)

 Dept → Advisor (Each department has one advisor)

 Here:

 RollNo → Name is a non-trivial FD

 RollNo → RollNo is a trivial FD

 If RollNo is the key, and Dept → Advisor, then RollNo → Advisor is a


transitive dependency

Q5 What is Normalization? Explain types of


Normalization in detail

 ✅ What is Normalization?

 🔹 Definition:

 Normalization is a process in relational database design used to


organize data efficiently by eliminating data redundancy and
ensuring data integrity.
It involves decomposing a large, complex table into smaller, well-
structured tables following certain normal forms.

 ✅ Objectives of Normalization:

 Remove data redundancy

 Avoid update, insert, and delete anomalies

 Ensure data consistency


 Simplify queries and maintenance

 ✅ Types of Normalization (Normal Forms)

 🔹 1. First Normal Form (1NF)

 Rule:

 Each attribute must contain atomic (indivisible) values.

 No repeating groups or arrays.

 Example (Unnormalized):

 Student  Nam
 Courses
ID e

 Math,
 101  Alice
Physics

 After 1NF:

 Student  Nam  Cours


ID e e

 101  Alice  Math

 Physic
 101  Alice
s

 🔹 2. Second Normal Form (2NF)

 Rule:

 Must be in 1NF

 No partial dependency (i.e., non-prime attribute should depend on


the whole candidate key)

 Example:
If a table has a composite key (e.g., RollNo, CourseID) and:

 RollNo → Name (Partial Dependency ❌)

 Move RollNo → Name to a separate table to achieve 2NF.


 🔹 3. Third Normal Form (3NF)

 Rule:

 Must be in 2NF

 No transitive dependency (non-prime attribute should not depend


on another non-prime attribute)

 Example:
If:

 RollNo → Dept

 Dept → HOD

 ⇒ RollNo → HOD (Transitive ❌)

 Split into two relations to remove the transitive dependency.

 🔹 4. Boyce-Codd Normal Form (BCNF)

 Rule:

 A stricter version of 3NF

 For every FD X → Y, X must be a super key

 Example:
If:

 StudentID → Course

 Course → Faculty

 Even if in 3NF, if Course is not a super key, then it violates BCNF.

 🔹 5. Fourth Normal Form (4NF)

 Rule:

 Must be in BCNF

 No Multivalued Dependencies (MVDs)


 Example:
If a student can take multiple Courses and have multiple Hobbies,
then:

 Student →→ Course

 Student →→ Hobby

 Should be split into two separate relations.

 🔹 6. Fifth Normal Form (5NF)

 Rule:

 Must be in 4NF

 Deals with Join Dependencies

 Data should be reconstructible from smaller joins without loss.

 ✅ Benefits of Normalization

 Reduces data redundancy

 Prevents data anomalies

 Improves data integrity

 Makes databases easier to maintain

Q6. Given a Relation R=(A,B,C) and Functional


Dependencies are F={ {A,B}→{C}, {C}→{A} }.
Determine all Candidate keys of R and the normal
form of R with proper explanation.

 ✅ Given:

 Relation: R(A, B, C)

 Functional Dependencies:
F={
(A, B) → C,
C→A
}

 🔹 Step 1: Find Closure of Attribute Sets to Identify Candidate


Keys

 ▶ Try closure of {A, B}

 We compute (A, B)+

 Start with:
(A, B)+ = {A, B}
From FDs:

 (A, B) → C ⇒ add C

 Now: {A, B, C}

 C → A ⇒ A already present

⇒ So, {A, B} is a super key


 ✅ (A, B)+ = {A, B, C} = all attributes of R

 But is it minimal? Let's check if we can remove A or B.

 ▶ Try closure of {B}

 (B)+ = {B} ⇒ No FD applies ⇒ Not a super key

 ▶ Try closure of {A}

 (A)+ = {A} ⇒ No FD applies ⇒ Not a super key

 ▶ Try closure of {C}

FD: C → A ⇒ add A
 (C)+ = {C}

From (A, B) → C ⇒ need both A & B → can't proceed


Now: {C, A}

So: (C)+ = {C, A}

 ⇒ Not all attributes, so not a super key


 ▶ Try closure of {B, C}

C → A ⇒ add A → {A, B, C} ✅
 (B, C)+ = {B, C}

 So, {B, C} is a super key

 Is it minimal? Try removing B:

 (C)+ = {C, A} — missing B ⇒ Not a super key


Try removing C:

 (B)+ = {B} — not a super key


✅ So, {B, C} is a candidate key

 ✅ Candidate Keys of R:

 {A, B}

 {B, C}

 ✅ Both are minimal super keys ⇒ Candidate Keys

 🔹 Step 2: Determine the Normal Form

 ▶ Check for 1NF:

 If all attributes are atomic → Yes


✅ R is in 1NF

 ▶ Check for 2NF:

 Rule: No partial dependency (i.e., no non-prime attribute depends


on part of a candidate key)

 Candidate keys = {A, B} and {B, C}

 Prime attributes = A, B, C → All attributes are part of some


candidate key
 No non-prime attributes ⇒ ✅ No partial dependency

 ✅ R is in 2NF

 ▶ Check for 3NF:

 Rule: For every FD X → Y, at least one is true:

 X → Y is trivial

 X is a super key

 Y is a prime attribute

 Check each FD:

 (A, B) → C

 (A, B) is a candidate key ✅ — OK

 C→A

 C is not a candidate key

 A is a prime attribute ✅ — OK

 ✅ So all FDs satisfy 3NF conditions

 ✅ R is in 3NF

 ▶ Check for BCNF:

 Rule: For every FD X → Y, X must be a super key

 (A, B) → C ✅ — (A, B) is a candidate key → OK

 C → A ❌ — C is not a candidate key → Violates BCNF

 ❌ Not in BCNF

 ✅ Final Answer:

 Candidate Keys: {A, B} and {B, C}

 Highest Normal Form: 3NF

 R is not in BCNF due to FD: C → A, where C is not a super key


Q7. What is BCNF? Explain how a relation van be
converted to BCNF

Here is a 6-mark answer for:

✅ (a) What is BCNF?

Boyce-Codd Normal Form (BCNF)

🔹 Definition:

A relation is in BCNF (Boyce-Codd Normal Form) if:

For every non-trivial functional dependency X → Y,


the determinant X is a super key.

 It is a stronger version of 3NF.

 BCNF removes all anomalies caused by dependencies where LHS


is not a super key.

🔹 Why BCNF?

Even if a relation is in 3NF, it may still have redundancy due to functional


dependencies where determinant is not a super key.

🔹 Steps to Convert a Relation to BCNF:

Given: Relation R and set of functional dependencies F.

✅ Step 1:

Check all functional dependencies in F.

✅ Step 2:

If there exists a dependency X → Y where:

 X is not a super key,


→ then BCNF is violated.
✅ Step 3:

Decompose R into two relations:

 R1 = X ∪ Y

 R2 = R – (Y – X)

Repeat the process for both until all resulting relations are in BCNF.

🔹 Example:

Let relation: R(A, B, C)


Functional dependencies:

 A→B

 B→A

Candidate Key = {A, C} or {B, C}

Here:

 A → B: A is not a super key ⇒ violates BCNF

 B → A: B is not a super key ⇒ violates BCNF

✅ Decompose:

1. Use A → B, create:

o R1(A, B)

o R2(A, C)

Now both R1 and R2 can be checked separately for BCNF.

B) Compare 4NF and 5NF

Here is a clear comparison between 4NF and 5NF — suitable for 6


marks:

✅ (b) Comparison of 4NF and 5NF


5NF (Fifth Normal
Feature 4NF (Fourth Normal Form)
Form)

A relation is in 4NF if it is in
A relation is in 5NF if it is
BCNF and has no multi-
Definition in 4NF and has no join
valued dependencies
dependencies.
(MVDs).

Type of
Multi-Valued Dependency
Dependency Join Dependency (JD)
(MVD)
Removed

Redundancy due to Redundancy due to


Anomaly
independent multi-valued complex joins involving
Addressed
facts multiple tables

More complex; usually


Easier to detect and resolve
Complexity occurs in advanced
using MVD rules
database designs

Decompose to remove Decompose to ensure


Decomposition independent multi-valued lossless joins of all
facts projections

A product may be
A student can take multiple
Example Use supplied by many
courses and have multiple
Case vendors in different
hobbies
locations

✅ Example for 4NF:

Relation: Student(Name, Course, Hobby)

 A student can have multiple courses and multiple hobbies


independently.

 MVDs:
Name →→ Course,
Name →→ Hobby
✅ Needs decomposition into:

 Student_Course(Name, Course)

 Student_Hobby(Name, Hobby)

✅ Example for 5NF:


Relation: Supply(Supplier, Part, Project)

 A supplier may supply a part for many projects.

 Decompose into three relations:

o Supplier_Part

o Part_Project

o Supplier_Project
To remove redundancy and preserve join dependencies.

UNIT-4

Q1) Explain Briefly about Transaction Management and


Concurrency Control
🔹 1. Transaction Management

✅ Definition:

A transaction is a sequence of operations performed as a single logical


unit of work in a database. It must either complete entirely or have no
effect at all.

✅ ACID Properties:

To ensure correctness and reliability, transactions follow ACID properties:

1. Atomicity – All operations in a transaction are completed or none at


all.

2. Consistency – The database moves from one valid state to another.

3. Isolation – Concurrent transactions are isolated from each other.

4. Durability – Once a transaction is committed, changes are


permanent.

🔹 2. Concurrency Control

✅ Definition:

Concurrency control is the process of managing simultaneous


operations on the database to ensure data consistency and isolation.

✅ Need for Concurrency Control:

When multiple transactions execute at the same time, problems may arise
such as:

 Lost Update

 Dirty Read

 Non-repeatable Read

 Phantom Read

✅ Techniques for Concurrency Control:

1. Lock-Based Protocols:

o Use shared and exclusive locks to control access to data.

o Example: Two-Phase Locking (2PL)

2. Timestamp Ordering:
o Assigns timestamps to each transaction and orders them
based on that.

3. Optimistic Concurrency Control:

o Transactions execute without restrictions, but are validated


before commit.

(b)Explain the ACID property of Transaction with


example.

ACID stands for Atomicity, Consistency, Isolation, and Durability


— the four key properties that ensure reliable transaction
processing in a database.

🔹 1. Atomicity

 Ensures that a transaction is all-or-nothing.

 If any part of the transaction fails, the entire transaction is


rolled back.

Example:
In a banking system, transferring ₹1000 from A to B:

 Debit A’s account

 Credit B’s account


✅ If debit succeeds but credit fails, both operations are
undone.

🔹 2. Consistency

 Ensures the database moves from one valid state to another.

 All integrity constraints must be maintained.

Example:
A rule says total bank balance = ₹1,00,000.
After any transaction, this rule must still hold.

🔹 3. Isolation
 Ensures that concurrent transactions do not interfere with
each other.

 Final result must be the same as if transactions were


executed serially.

Example:
If two users transfer money at the same time, their operations
are logically isolated, preventing conflicts like lost updates.

🔹 4. Durability

 Once a transaction is committed, its changes are


permanent, even in case of system failure.

Example:
After money is transferred and transaction is committed; a power
failure occurs — the changes remain saved in the database.

Q2 Write brief notes on the following a) Multiple


Granularity
b) Time based Protocol

✅ (a) Multiple Granularity – (6 Marks)

🔹 Definition:

Multiple Granularity is a locking scheme in databases that allows


transactions to lock data items at various levels of granularity —
such as a database, table, page, record, or field — depending on
the need.

🔹 Granularity Levels (From coarse to fine):

Database → Table → Page → Tuple (Row) → Attribute (Column)

🔹 Lock Types Used:

1. IS (Intent Shared)

2. IX (Intent Exclusive)
3. S (Shared)

4. X (Exclusive)

5. SIX (Shared and Intent Exclusive)

6. Null (no lock)

🔹 Why It Is Needed:

 To improve efficiency and concurrency.

 Reduces overhead by locking larger units when detailed


locks are not needed.

 Allows finer locks when necessary for precision.

🔹 Example:

If a transaction wants to read a few rows in a table:

 It may acquire IS lock on the table,

 And S lock on specific rows.

Another transaction wanting to update the table:

 Acquires IX lock on the table,

 And X lock on the rows being updated.

This ensures controlled concurrent access.

✅ (b) Time-Based Protocol

🔹 Definition:

A Time-Based Protocol (also called Timestamp Ordering Protocol)


is a concurrency control method where each transaction is
assigned a unique timestamp and transactions are ordered by
these timestamps to avoid conflicts.
🔹 Key Concepts:

 Each transaction T is given a timestamp TS(T) when it starts.

 Each data item Q maintains:

o read_TS(Q) = highest timestamp of any transaction


that read Q.

o write_TS(Q) = highest timestamp of any transaction


that wrote to Q.

🔹 Rules:

 If a younger transaction (with higher timestamp) tries to


read/write an item updated by an older transaction, it is
allowed.

 If an older transaction tries to overwrite a value written by a


younger transaction, it is rolled back (to avoid
inconsistency).

🔹 Advantages:

 Deadlock-free: Since no locks are used.

 Maintains strict serializability.

🔹 Example:

Assume:

 T1 has timestamp 10

 T2 has timestamp 20

If T2 wants to write to a data item already written by T1, it's fine.

order ⇒ T1 is aborted.
But if T1 (older) tries to write after T2, it violates timestamp

Q3. Define Serializability. Explain in detail about the


types and Illustrate with the help of suitable example.

🔹 Definition:
Serializability is the highest level of isolation in concurrency control.
It ensures that the outcome of executing concurrent transactions is
equivalent to some serial execution (i.e., one after another) —
preserving consistency and avoiding conflicts.

✅ Why Serializability?

 In real-world databases, transactions run concurrently to improve


performance.

 Without control, concurrent execution may cause inconsistencies


like lost updates, dirty reads, etc.

 Serializability guarantees correctness by ensuring concurrency


doesn’t violate transaction logic.

✅ Types of Serializability

🔸 1. Conflict Serializability

✅ Definition:

A schedule is conflict serializable if it can be transformed into a


serial schedule by swapping non-conflicting operations.

✅ Conflicting Operations:

Two operations conflict if:

 They belong to different transactions, and

 They access the same data item, and

 At least one of them is a write operation.

✅ Example:

Transactions:

 T1: R(A), W(A)

 T2: R(A), W(A)

Schedule S1:
T1: R(A) → W(A)

T2: R(A) → W(A)

This is not conflict serializable — T1 and T2 write on same data in


interleaved fashion.

🔸 2. View Serializability

✅ Definition:

A schedule is view serializable if it is view-equivalent to a serial


schedule, meaning:

 Initial reads, final writes, and reads from same transactions match.

It is more general than conflict serializability but harder to test.

✅ View equivalence conditions:

 Same initial reads

 Same read-from relationships

 Same final writes

🔸 3. Commutative (or Semantic) Serializability

 Some operations may be commutative (e.g., addition), allowing


more flexibility.

 Not based on read/write conflicts alone.

 Rarely used in practice due to complexity.

✅ Example – Conflict Serializability

Consider:

T1:

R1(A), W1(A)

T2:

R2(A), W2(A)
Schedule S:

R1(A), R2(A), W1(A), W2(A)

Now let's check conflicts:

 R1(A) & W2(A) → conflict

 R2(A) & W1(A) → conflict

 W1(A) & W2(A) → conflict

Cannot swap all conflicting operations to get a serial form → ❌ Not


conflict serializable.

But a schedule like:

R1(A), W1(A), R2(A), W2(A)

✅ Can be transformed to serial T1 → T2 ⇒ Conflict Serializable

✅ How to Check Serializability?

🔸 Precedence (Serialization) Graph:

1. Nodes = transactions

2. Edges = from Ti to Tj if Ti's operation conflicts with Tj’s later


operation

3. If the graph has no cycles, the schedule is conflict serializable

Q4 Draw Transaction State diagram and describe each


state that a transaction goes through during its
execution.

🔹 Definition:

A transaction in a database system goes through various states from


start to completion.
Understanding these states helps in managing commit, rollback,
crashes, and concurrency effectively.
✅ Transaction States and Diagram

🎯 Diagram:

✅ Explanation of Each State

🔸 1. Active

 The transaction starts and begins executing its operations like read,
write, etc.

 It remains active until:

o It finishes execution normally

o Or an error/failure occurs

Example: Deducting ₹1000 from account A (still running)

🔸 2. Partially Committed

 The transaction has executed its final operation and is ready to


commit, but changes may not yet be permanent.

 Still vulnerable to failure at this point.

Example: All operations done, preparing to save permanently.

🔸 3. Committed
 All operations are successfully completed.

 All changes are permanently recorded in the database.

 Transaction cannot be undone.

Example: ₹1000 deducted and ₹1000 added to another account —


permanently saved.

🔸 4. Failed

 If any error or crash occurs during execution or just before commit,


the transaction enters failed state.

 All changes made must be undone.

Example: System crash during money transfer.

🔸 5. Aborted

 All changes made by the transaction are rolled back.

 The database is restored to the state before the transaction


began.

 Sometimes, the transaction may be restarted.

✅ Summary Table

State Description

Active Transaction is executing operations

Partially
All operations done; waiting to commit
Committed

Transaction successfully completed and


Committed
saved

Failed Error occurred; transaction cannot


State Description

continue

Changes undone; transaction rolled


Aborted
back

Q5. What is deadlock? When does it occur? How is it


detected in database system

🔹 Definition:

A deadlock is a situation in a database system where two or more


transactions are waiting for each other to release locks, but none
of them ever proceeds.
As a result, all involved transactions remain stuck indefinitely.

🔹 Example:

 T1 locks data item A and needs B

 T2 locks data item B and needs A

Now both transactions are waiting on each other, creating a deadlock.

✅ When Does Deadlock Occur?

Deadlocks occur in a database system when the following four


conditions hold simultaneously (Coffman Conditions):

Condition Description

1. Mutual At least one resource (data item) is held in a non-


Exclusion shareable mode.

2. Hold and A transaction holds at least one lock and is waiting for
wait more locks.

3. No Pre- Resources cannot be forcibly taken from a transaction;


emption it must release them voluntarily.

4. Circular There exists a cycle in the wait-for graph (T1 waits for T2,
Wait T2 waits for T1, etc.)
✅ When all four conditions are true, deadlock can occur.

✅ Deadlock Detection in Database Systems

🔹 1. Wait-For Graph (WFG) Method

 A directed graph is used where:

o Nodes = Transactions (T1, T2, T3, …)

o Edges = T1 → T2 if T1 is waiting for a lock held by T2

✅ How It Works:

 The system periodically builds the wait-for graph.

 If there is a cycle in the graph → Deadlock exists

🔸 Example:

Let:

 T1 waits for T2

 T2 waits for T3

 T3 waits for T1

Graph:

T1 → T2 → T3 → T1

✅ Cycle detected ⇒ Deadlock Present

🔹 2. Detection Algorithm Steps

1. Maintain a lock table with info on held and requested locks.

2. Construct the wait-for graph from this table.

3. Use cycle detection algorithms (like DFS) to find cycles.

4. If a cycle exists, deadlock is detected.

🔹 3. Deadlock Resolution

Once deadlock is detected:


 Choose a victim transaction to abort (based on priority, cost, etc.)

 Rollback that transaction

 Release its locks

 Allow other transactions to proceed

✅ Deadlock vs Starvation

Deadlock Starvation

All transactions wait Some transactions wait indefinitely while


forever others proceed

Caused by circular wait Caused by unfair scheduling

Needs
Needs priority-based scheduling
detection/resolution

Q6 Explain Lock based protocol for Concurrency Control.

🔹 Definition:

Lock-based protocols are a type of concurrency control mechanism


in DBMS that use locks to manage simultaneous access to data items
by multiple transactions, ensuring consistency and isolation.

🔹 Types of Locks:

1. Shared Lock (S-lock):


o Used for read operations.

o Multiple transactions can hold S-lock on the same item.

2. Exclusive Lock (X-lock):

o Used for write operations.

o Only one transaction can hold X-lock on a data item.

o No other transaction can read or write the item during this


time.

🔹 Two-Phase Locking Protocol (2PL):

A widely used lock-based protocol.

✅ Phases:

1. Growing Phase:

o Transaction may acquire locks.

o No locks can be released.

2. Shrinking Phase:

o Transaction may release locks.

o No new locks can be acquired.

⚠️Once a transaction starts releasing locks, it cannot obtain any new


locks.

🔹 Example:

T1:

 Lock-S(A)

 Read(A)

 Lock-X(A)

 Write(A)

 Unlock(A)

T2:

 Waits until T1 releases the lock on A to proceed


✅ Ensures serializability

🔹 Strict Two-Phase Locking (Strict 2PL):

 A variation where all exclusive locks are held until


commit/abort.

 Prevents cascading rollbacks.

✅ Advantages:

 Ensures conflict-serializable schedules.

 Strict 2PL ensures recoverable schedules.

✅ Disadvantages:

 May cause deadlocks.

 Overhead in managing locks.

B) Discuss in details about 2PL. List and Explain types of


2PL

🔹 Definition:

Two-Phase Locking (2PL) is a concurrency control protocol used in


DBMS to ensure serializability of transactions.
It divides the execution of a transaction into two distinct phases:

✅ Phases of 2PL:

1. Growing Phase
o The transaction acquires all the required locks (shared or
exclusive).

o No lock is released in this phase.

2. Shrinking Phase

o After acquiring all locks, the transaction starts releasing


them.

o No new locks can be acquired.

🔸 Why Use 2PL?

 It guarantees conflict serializability, which ensures correct


execution of concurrent transactions.

✅ Types of Two-Phase Locking (2PL):

🔹 1. Basic 2PL

 Follows the basic rule:

o All locks acquired in growing phase

o All locks released in shrinking phase

 Does not prevent cascading rollbacks

🔹 2. Conservative 2PL (Static 2PL)

 All required locks are requested at the beginning of the


transaction.

 If all locks cannot be granted, the transaction waits (does not


proceed).

 Avoids deadlocks, but may cause delays or reduced


concurrency.

🔹 3. Strict 2PL

 Exclusive (write) locks are held until the transaction commits


or aborts.
 Prevents cascading rollbacks.

 Ensures recoverability of transactions.

🔹 4. Rigorous 2PL

 All locks (both shared and exclusive) are held until


commit/abort.

 Strongest form; ensures strictest isolation.

✅ Example:

Transaction T1:

 Lock-S(A)

 Read A

 Lock-X(B)

 Write B

 Unlock A, Unlock B

This follows growing phase (acquire locks), then shrinking phase


(release locks).

Q7 Discuss in detail about Conflict Serializability with


the help of suitable example.

🔹 Definition:
A schedule is said to be Conflict Serializable if it can be transformed
into a serial schedule (i.e., one transaction after another) by swapping
non-conflicting operations.

Conflict serializability ensures that concurrent execution of transactions is


correct and safe, giving the same result as some serial execution.

✅ Conflicting Operations:

Two operations conflict if:

1. They belong to different transactions,

2. They access the same data item, and

3. At least one of them is a write operation.

✅ Types of Conflicts:

Operatio Operatio Conflict


Example
n1 n2 ?

Read A, Read
Read Read No
A

Read A, Write
Read Write Yes
A

Write A, Read
Write Read Yes
A

Write A,
Write Write Yes
Write A

✅ Example: Conflict Serializable Schedule

Let’s take two transactions:

T1:

 R1(A)

 W1(A)

T2:

 R2(A)
 W2(A)

🔸 Schedule S:

R1(A), R2(A), W1(A), W2(A)

🔸 Step 1: Build Precedence Graph (Serialization Graph)

 Nodes: T1, T2

 Edges:

o R1(A) before W2(A) ⇒ T1 → T2

o R2(A) before W1(A) ⇒ T2 → T1

So, Graph:

T1 → T2

T2 → T1

⚠️Cycle exists ⇒ ❌ Not conflict serializable

✅ Now consider another schedule:

R1(A), W1(A), R2(A), W2(A)

Conflicts:

 R1(A) before R2(A) – No conflict

 W1(A) before R2(A) ⇒ T1 → T2

 W1(A) before W2(A) ⇒ T1 → T2

No cycle in the graph (T1 → T2 only) ✅

✔️This schedule is conflict serializable, equivalent to serial order T1 →


T2

B) Write a short note on Validation Based Protocol

🔹 Definition:
The Validation-Based Protocol, also known as Optimistic
Concurrency Control (OCC), is a concurrency control technique in which
transactions execute without any locking, assuming that conflicts are
rare.
Instead of using locks, it uses a validation phase before commit to
ensure correctness.

✅ Phases of Validation-Based Protocol:

Each transaction goes through three phases:

🔸 1. Read Phase

 The transaction reads data items from the database and stores
updates in local variables (workspace).

 No changes are made to the actual database yet.

🔸 2. Validation Phase

 Before committing, the system validates the transaction to ensure


no conflicts with other concurrent transactions.

 It checks whether the transaction could have executed serially with


respect to other committed transactions.

🔸 3. Write Phase

 If validation is successful, all changes are written to the


database.

 If validation fails, the transaction is aborted and restarted.

✅ Validation Check Criteria (Basic Validation Rule):

Let Ti be the current transaction and Tj be a committed transaction:

 If Tj completes before Ti's read phase ends, no conflict

 If Tj's write set overlaps with Ti’s read set, and their times
overlap, then conflict → Ti is aborted
✅ Advantages:

 No locking → better concurrency

 Deadlock-free

 Useful for systems where conflicts are rare, like read-mostly


databases

✅ Disadvantages:

 May cause more aborts if many transactions conflict

 Overhead in validation phase

✅ Example:

 T1 and T2 both read item A

 T1 finishes first and writes A

⇒ Conflict → T2 is aborted
 Before T2 commits, it validates — sees T1 wrote A, which T2 read

UNIT-5

Q1. What is Data storage? Explain Magnetic Disk and its


types.

🔹 Definition:

Data storage refers to the method of saving and retrieving digital


data on physical or electronic media. It is an essential component of
computer systems, including databases, to persistently store data for
processing, backup, and retrieval.

Data can be stored on various devices like:

 Magnetic Disks (e.g., Hard Disks)

 Solid-State Drives (SSD)

 Optical Disks (CD/DVD)

 Flash memory (USB drives)

✅ Magnetic Disk – Explanation

🔹 Definition:

A magnetic disk is a non-volatile secondary storage device that


stores data by magnetizing spots on a rotating platter.
It is the most commonly used storage medium for databases and file
systems.

✅ Structure of a Magnetic Disk:

1. Platter: A circular disk coated with magnetic material.

2. Spindle: Rotates the platters.

3. Read/Write Head: Positioned on an arm that moves across the


disk surface.

4. Tracks: Concentric circles on the platter where data is stored.

5. Sectors: Each track is divided into sectors (smallest storage unit).

6. Cylinders: Set of tracks vertically aligned across platters.

✅ How Data Is Stored:

 Data is written to the disk using a magnetic head that changes the
magnetic polarity of the disk surface.
 The disk spins at a constant speed (measured in RPM), and the head
reads or writes data as the sectors pass under it.

✅ Types of Magnetic Disks:

🔸 1. Hard Disk Drive (HDD)

 Most common type of magnetic disk.

 Consists of multiple platters rotating at high speed (5400–7200 RPM


or more).

 Large storage capacity (up to several TB).

 Used in desktops, servers, and database systems.

Advantages:

 High capacity

 Cost-effective per GB

Disadvantages:

 Slower than SSDs

 Mechanical parts → prone to wear

🔸 2. Floppy Disk

 Older, portable magnetic disk with small storage (1.44 MB).

 Now obsolete.

 Used in early personal computers.

Advantages: Portable and easy to use (in older systems)


Disadvantages: Very low capacity and slow

🔸 3. Zip Disk

 Developed by Iomega; used magnetic disk technology.

 Higher capacity than floppy disks (100MB to 750MB).

 Rarely used today.


🔸 4. Magnetic Tape (for completeness)

 Though not a disk, it's a magnetic storage medium.

 Used for sequential access and data backup.

 Slow but cost-effective for archival storage.

✅ Advantages of Magnetic Disks:

 Random access capability

 Large capacity

 Non-volatile

 Inexpensive for long-term storage

✅ Disadvantages:

 Mechanical parts → can fail

 Slower than flash-based storage (SSDs)

 Susceptible to magnetic interference

Q2. Explain different types of file organizations with the


operations performed on files.

🔹 What is File Organization?

File organization refers to the way records are stored and arranged in a
file on secondary storage (like disk).
The method affects the efficiency of data access, insertion, deletion, and
updating.

✅ Types of File Organization

🔸 1. Sequential File Organization


 Records are stored one after the other in sorted order (usually
by key).

 Access is done sequentially from the beginning.

Operations:

 Insertion: Costly (needs reshuffling to maintain order)

 Search: Slow unless reading sequentially

 Deletion/Update: Involves shifting records or marking them

Example: Payroll processing, where records are accessed in order.

🔸 2. Heap (Unordered) File Organization

 Records are stored in the order they arrive.

 No sorting or specific order is maintained.

Operations:

 Insertion: Fast (just add at the end)

 Search: Slow (need to scan entire file)

 Deletion: Mark record as deleted

 Update: Overwrite or mark and insert new record

Use Case: Temporary data, logs, etc.

🔸 3. Hashed File Organization

 Uses a hash function on the key to compute the location


(bucket) where the record should be stored.

Operations:

 Insertion: Apply hash → store in the bucket

 Search: Fast, O(1) average

 Deletion/Update: Easy using hash key

 Handles collisions using techniques like chaining

Use Case: Fast lookup, like student ID or product code.

🔸 4. Indexed File Organization


 Uses an index (similar to a book index) that holds keys and pointers
to records.

Types:

 Single-level index

 Multi-level index

 Dense or sparse index

Operations:

 Search: Fast via index

 Insertion: Add record and update index

 Deletion/Update: Modify both data and index

Use Case: Libraries, search engines, databases

🔸 5. Clustered File Organization

 Records with the same or related keys are stored physically


close on disk.

 Reduces access time for grouped data.

Operations:

 Search and access for related records is optimized

Use Case: Situations needing multi-key retrieval or joins.

✅ File Operations in DBMS

Operati
Description
on

Creates a new file and allocates


Create
space

Read Fetches data from the file

Write Adds or updates records

Modifies the content of existing


Update
records
Operati
Description
on

Removes a record or marks it


Delete
deleted

Locates specific records by key or


Search
value

Q3. Explain physical storage media with example.

🔹 Definition:

Physical storage media refers to the actual hardware devices where


data is stored permanently or temporarily in a computer system or
database. It includes all electronic, magnetic, and optical devices
that hold data physically.

✅ Categories of Physical Storage Media

Storage media can be classified based on speed, volatility, and


durability into the following types:

🔸 1. Primary Storage (Main Memory)

 Also known as volatile memory

 Temporarily holds data that is currently being used by the system

 Data is lost when power is off

✅ Examples:

 RAM (Random Access Memory): Fast access, used by active


programs

 Cache memory: Small, very fast memory close to the CPU

✅ Characteristics:

 Very fast but expensive

 Limited capacity

 Volatile (data not retained on power-off)


🔸 2. Secondary Storage

 Non-volatile storage used to store data permanently

 Used for large-scale data like files, databases, OS

✅ Examples:

 Hard Disk Drives (HDD)

 Solid-State Drives (SSD)

✅ Characteristics:

 Larger capacity than RAM

 Slower than primary memory

 Cheaper and persistent

🔸 3. Tertiary Storage

 Used for backup and archival purposes

 Typically removable and very large capacity, but slow access

✅ Examples:

 Magnetic tapes

 Optical disks (CDs, DVDs)

 Cloud/External backup systems

✅ Characteristics:

 Inexpensive per GB

 High latency (slow access)

 Mostly used for long-term storage

🔸 4. Off-line Storage

 Storage media that is not always connected to the computer

 Must be manually inserted or connected

✅ Examples:

 USB flash drives

 External hard drives


 Removable magnetic tapes

✅ Characteristics:

 Portable

 Good for data transfer between systems

 Can be lost or damaged physically

✅ Comparison Table:

Volatil
Storage Type Speed Cost Capacity Usage
e

Very Temporary program


Primary (RAM) Yes High Low
Fast data

Secondary Moderat Moderat


No High Databases, files
(HDD/SSD) e e

Tertiary
No Slow Low Very High Backup/archive
(Tape/DVD)

Off-line (USB, Ext Medium–


No Varies Varies Portability, backup
HDD) High

✅ Example Scenario in DBMS:

 A student record being processed is loaded into RAM

 The database file containing all students is stored on the hard


disk

 Periodic backups of student records are saved on magnetic tape

 External USB is used to export a report file for transfer

Q4 Explain in detail about Extensible Hashing scheme

🔹 Definition:
Extensible Hashing is a dynamic hashing technique used in
databases to handle the problem of bucket overflow efficiently. It
automatically grows or shrinks the hash structure depending on the
number of records, unlike static hashing which uses fixed-size hash tables.

It is widely used in dynamic file systems and indexing to ensure fast


and scalable access.

✅ Key Concepts:

🔸 1. Hash Function (h(k)):

 Maps a key k to a binary value.

 Only a prefix of the hash value is used (number of bits is controlled


by a global depth d).

🔸 2. Directory:

 An array of pointers (like a table) that points to buckets.

 Directory size = 2^d where d = global depth.

🔸 3. Bucket:

 Stores actual data entries (records).

 Each bucket has a local depth (ld) — number of hash bits used to
identify it.

✅ How Insertion Works:

1. Apply the hash function to get a binary key.

2. Use the first d bits to index into the directory.

3. Place the record in the pointed bucket.

4. If the bucket is full:

o If local depth < global depth: Split the bucket

o If local depth == global depth: Double the directory size


and increase global depth
🔸 Example:

Let’s say:

 d = 2 ⇒ Directory size = 4 entries: 00, 01, 10, 11

 Hash of key 25 = 11001... ⇒ Use first 2 bits → 11

 Insert into bucket at index 11

If bucket overflows:

 Split bucket and redistribute based on ld + 1 bits

 Update pointers in directory

✅ Advantages of Extensible Hashing:

 Grows/shrinks dynamically as needed

 Reduces overflow chains seen in static hashing

 Maintains constant-time access in average cases

 Efficient use of space and time

✅ Disadvantages:

 More complex implementation

 Directory doubling may consume more space temporarily

(b) Compare Static Hashing and Dynamic Hashing


Feature Static Hashing Dynamic Hashing

1. Hash Table Fixed size; determined at Grows and shrinks


Size the time of creation dynamically based on data

Less flexible — overflows More flexible — adjusts to


2. Flexibility
lead to overflow chains number of records

Uses directory and bucket


3. Overflow Uses overflow buckets or
splitting (e.g., extensible
Handling chaining
hashing)

Maintains consistent
May degrade as overflow
4. Performance performance even as data
increases
grows

May waste space if too


5. Storage Efficient storage use due to
large; causes collisions if
Utilization dynamic resizing
too small

No directory — only fixed Uses directory to manage


6. Directory
buckets growing hash structure

7. Ease of More complex due to


Simple to implement
Implementation directory management

Q5. Explain in detail about Internal and External


Hashing Techniques
Aspect Internal Hashing External Hashing

Hashing used when data is Hashing used when data is


Definition stored in main memory stored on disk (secondary
(RAM) storage)

Mainly for in-memory


Used in database systems,
Usage data structures like hash
files, and large datasets
tables

Storage Works entirely in primary Involves disk blocks/buckets


Medium memory on external storage

Typically small and fixed- Buckets may span entire disk


Bucket Size
size arrays blocks and vary dynamically

Uses overflow blocks,


Overflow Uses open addressing or
chaining, or directory
Handling chaining
doubling

Performanc Very fast due to memory


Slower due to disk I/O latency
e speed

Static Hashing, Extensible


Hash table in a program
Examples Hashing, Linear Hashing in
storing keys
DBMS

Less flexible — requires More flexible — supports


Flexibility
predefined size dynamic file growth

B) What is Hash based Indexing. Explain with an


example.

🔹 Definition:

Hash-based indexing is a database indexing technique that uses a


hash function to map search key values to a location or bucket in the
index.
This method provides efficient access to data for equality-based
searches (e.g., WHERE ID = 101).
✅ How It Works:

1. A hash function h(key) is applied to the search key.

2. The result of the hash function determines the bucket where the
record is stored.

3. The bucket contains either the actual data or a pointer to the


data.

✅ Structure:

 Hash Table: An array of buckets.

 Buckets: Contain the records or pointers to the records.

 Handles collisions using overflow buckets or chaining.

✅ Example:

Let’s say we are storing student records with Student_ID as the key.

Student_ Nam Branc


ID e h

101 Alice CSE

104 Bob ECE

108 Clara ME

Assume a simple hash function:

h(Student_ID) = Student_ID % 5

Then:

 h(101) = 101 % 5 = 1 → Bucket 1 → stores record for ID 101

 h(104) = 104 % 5 = 4 → Bucket 4 → stores record for ID 104

 h(108) = 108 % 5 = 3 → Bucket 3 → stores record for ID 108

To find Student_ID = 104, we:

1. Apply the hash → h(104) = 4

2. Go to Bucket 4

3. Fetch the record quickly → O(1) access time


✅ Advantages:

 Very fast for exact match queries (e.g., WHERE ID = 104)

 Simple implementation

 Constant-time (average-case) access

✅ Disadvantages:

 Not suitable for range queries (e.g., ID > 100)

 Collisions may reduce efficiency

 Difficult to maintain with large or growing datasets unless using


dynamic hashing

Q6. Explain different types of Indexing.

✅ (a) Different Types of Indexing in DBMS

Indexing is a technique used in databases to speed up data retrieval by


creating pointers to data records based on key fields. Different types of
indexing are used depending on data access patterns and requirements.

✅ 1. Primary Index (Clustering Index)

 Created on a primary key or unique field.

 Records in the data file are physically ordered based on the index
key.

 One index entry per block.

Example: Index on Student_ID in a sorted student table.

✅ 2. Secondary Index (Non-clustering Index)

 Created on non-primary key fields.


 Records are not sorted based on this field.

 May have multiple entries pointing to same data block.

Example: Index on Department in the student table.

✅ 3. Dense Index

 Contains one index entry for every search key in the data file.

 More space needed, but faster lookup.

Example: Every student name is indexed individually.

✅ 4. Sparse Index

 Contains entries for only some records (usually one per block).

 Requires less space but extra lookup steps.

Example: Index contains only the first key of each data block.

✅ 5. Multilevel Index

 When the primary or secondary index becomes large, another


index is built on top of it.

 Forms a hierarchy of indexes, improving search performance.

Example: B-tree index with multiple levels.

✅ 6. Clustered and Non-Clustered Index

 Clustered Index: Records are physically ordered to match the


index.

 Non-Clustered Index: Index has pointers to records stored in


random order.

(b)Difference between B and B+ Indexing.


Feature B-Tree Indexing B+ Tree Indexing

Keys and data are stored in Only leaf nodes store the
Data Storage
internal and leaf nodes actual data/records

Leaf Node Leaf nodes are linked using


Leaf nodes are not linked
Structure pointers (linked list)

Suitable for both exact


Optimized for range and
Access Type match and range
sorted queries
queries

Traversal May need to traverse Always traverses leaf nodes


Time internal and leaf nodes only

Search Slightly slower due to Faster due to uniform search


Efficiency search at multiple levels path to data

Internal nodes store keys


Redundancy No duplication of keys
only, not actual data

Modern DBMS systems like


Used In Older indexing systems
MySQL, Oracle (for indexing)

You might also like